Prepare Your Data

When you connect your database to GoodData, the schema of your database is automatically used to generate a physical data model (PDM) of your data. The PDM describes the tables of your database and represents how the actual data is organized and stored in the database.

The logical data model (LDM) is then mapped to and built on top of the PDM. The LDM is what ultimately determines your analytical options when you build visualizations and dashboards.

To ensure the PDM is generated successfuly, we recommend you consider:

Supported Data Types

Make sure that the columns that you want to include in the model have the supported data types assigned. The data types are validated using the JDBC data types. To find out how the JDBC data types are mapped to the data types in your database, refer to documentation for your specific database.

GoodData supports the following JDBC data types:

BIGINTBITBOOLEAN
CHARCLOBDATE
DECIMALDOUBLEFLOAT
INTEGERLONGNVARCHARLONGVARCHAR
NCHARNCLOBNUMERIC
NVARCHARREALROWID
SMALLINTSQLXMLSTRING
TIMESTAMP_WITH_TIMEZONETIMESTAMPTINYINT
VARCHAR

Columns with an unsupported data type (for example, when the UUID database data type is mapped to the OTHER JDBC data type) are skipped and not included in the PDM.

Recommended Naming Conventions

You do not need to follow a specific database naming conventions to be able to connect your data and create physical and logical data models. However implementing the following naming conventions to name objects in your database is going to make the process of creating the logical data model much easier:

EntityName FormatExample
TabletablePrefix__baseNamegd_table__customer
ViewviewPrefix__baseNamegd_view__customer
LabellabelPrefix__attributeBaseName__labelBaseNamels__country_id__country_name
FactfactPrefix__baseNamef__sales_amount
GraingrainPrefix__baseNamegr__id_customer_pk
ReferencereferencePrefix__tableBaseName__referenceBaseNamer__customer__surname
Workspace Data FilterwdfPrefix__baseNamewdf__region

Make note of the following:

  • Ensuring that Label, Fact, Grain and Reference entities follow the recommended naming conventions should save you from having to do additional configuration when creating the logical data model.

  • Workspace Data Filter is used to filter data in workspaces. To learn more, go to Workspace Data Filters. Attributes and labels are not generated for columns that have been filtered using the workspace data filter.

  • The prefixes (tablePrefix, labelPrefix, and so on) are not case-sensitive.

Separators

In the table above, a double underscore (__) is used as a separator between prefixes and entity names, or between separate sections of reference names and label names. Note that in an API request for generating the LDM you can define any string as a separator.

We recommend that the string you want to use as a separator be as unique as possible. Consider the situation when you use an underscore (_) as a separator and you have a column named referencePrefix_table_1_fk_column. This column name cannot be split into three sections as it would be required (prefix, referencedTable, referencedColumn).

Naming Conventions vs. Referential Integrity

Do not apply the naming conventions to the columns that are used as or are included in a primary key or a foreign key in your database. Such columns will be interpreted as grains and references based on referential integrity (see Default Configuration).

How Is PDM Transformed to LDM

When a declarative definition of the LDM is generated, certain rules are applied that affect how LDM entities are created based on the PDM. Understanding these rules will help you adjust your database and make the process of generating the LDM smoother and more accurate.

Default Configuration

By default, entities for the LDM are derived from the PDM based on data types assigned to table columns and referential integrity in the database.

LDM EntityExpected Database Entity
DatasetTable, view
AttributeCHAR-like, INT columns
FactNUMERIC-like columns
Date datasetDATE, TIMESTAMP (TZ) columns
GrainPRIMARY KEY
ReferenceFOREIGN KEY

The default configuration has the following limitations:

  • It always reads the whole PDM; you cannot select just a part of the PDM to read.
  • It cannot detect advanced LDM entities such as attribute labels.
  • It cannot generate facts from columns with the INT data type.
  • It cannot generate grains and references if referential integrity is not maintained in the database.
  • It cannot generate proper LDM entities (such as facts, labels, and so on) from columns with incorrectly defined data types.

To overcome these limitations, use the naming conventions in your database to make sure that PDM entities are identified and interpreted correctly regardless of the column data types and referential integrity.

Entity Names in the Generated LDM

In the generated LDM, entity names and descriptions are generated based on the corresponding column names. To make an entity name more human-readable, the following happens when the entity name is injected into the LDM:

  • The prefix is removed from the name.
  • The first character is upper-cased.
  • Any non-alphanumeric character is replaced by a space.
Column Name in PDMEntity ID in LDMEntity Name in LDM
gd_table__customercustomerCustomer
customer.f__amountcustomer.amountAmount
customer.ls__customer_key__customer_namecustomer.customer_nameCustomer name
customer.region_name (attribute)attr.customer.region_nameRegion name

You can create a PDM manually by following instruction in the Advanced Modelling Use Cases article.