Dataset

A dataset is a logical object that represents a set of related facts, attributes, and attribute labels.

Datasets are basic organization units of a logical data model (LDM).

You can look at a dataset as a representation of a database table with its primary key and foreign key. A dataset’s primary key (so-called “grain”) defines the cardinality of the dataset. The primary key must be defined via either an attribute or a referenced dataset.

Facts, attributes, and attribute labels related to a particular dataset are automatically tagged with a tag object set to the identifier of the dataset.

Relationships between Datasets

Datasets are associated with each other through relationships. A relationship is a one-directional mapping between two datasets through a single primary key. The primary key functions like a database primary key. It identifies the field in the originating dataset that contains information to uniquely identify the data in other fields in the dataset.

When a relationship is created between two datasets, a foreign key field is inserted into the target dataset. This foreign key is populated by references to the primary key values in the dataset at the other end of the relationship.

The relationship is important because it determines what you slice by what when building your own metrics using MAQL - Analytical Query Language.

Example: Datasets in the LDM

Primary Key Model

  • Fact datasets
    • Order Lines (primary key Order Line ID)
    • Campaign / Channels (primary key Campaign Channel ID)
  • Attribute datasets
    • Customers (primary key Customer ID)
    • Products (primary key Product ID)
    • Campaigns (primary key Campaign ID)
  • Date datasets
    • Date

The direction of the arrow determines which dataset’s data can be analyzed (sliced) by the data from the other dataset. For example, in the LDM above, the relationship between the Customer and Order Lines datasets allows you to slice Quantity by Customer Name.

Date Datasets

You can also add a Date dataset to your LDM to manage time-based data. The Date dataset enables aggregation at various levels of time granularity such as day, week, month, and so on. For more information about the supported levels of granularity, see Date Dataset.