APPROXIMATE_COUNT

Use APPROXIMATE_COUNT to return the approximate number of unique values for an attribute. Because the returned value is a statistical estimate and not an exact figure, this function can increase the processing speed when analyzing large datasets. You can use APPROXIMATE_COUNT anywhere you can use COUNT. The smaller the amount of data to process, the more likely that the returned results of APPROXIMATE_COUNT will match the results of COUNT.

Note

APPROXIMATE_COUNT is available only for workspaces that use Vertica Data Sources.
APPROXIMATE_COUNT uses the Vertica default behavior for APPROXIMATE_COUNT_DISTINCT. It is not possible to use modifiers.
GoodData.CN defaults to COUNT when APPROXIMATE_COUNT is unavailable in a workspace.

Syntax

APPROXIMATE_COUNT uses the following syntax:

Form	Syntax	Example
single parameter	`SELECT APPROXIMATE_COUNT(attribute)`	`SELECT APPROXIMATE_COUNT({attribute/order_id})`
two parameters	`SELECT APPROXIMATE_COUNT(attribute, primary_key)`	`SELECT APPROXIMATE_COUNT({attribute/order_id}, {attribute/campaign_id})`
with USING	`SELECT APPROXIMATE_COUNT(attribute) USING primary_key`	`SELECT APPROXIMATE_COUNT({attribute/order_id}) USING {attribute/campaign_id}`

Single-Parameter Version

The single-parameter version of APPROXIMATE_COUNT dynamically gets the context of where to count from the insight it is used in.

Two-Parameter Version

In the two-parameter version, the context where to count the attribute is determined explicitly by the second parameter - the primary key of the dataset.

The primary key is connection point between datasets. It connects the APPROXIMATE_COUNT function’s first parameter to the dataset in which the count is to take place.

Specifying APPROXIMATE_COUNT Context Resolution with USING

You deploy the USING keyword in logical data models with ambiguous connection points.

The context for computation of APPROXIMATE_COUNT may be ambiguous if there are multiple fact tables which relate to a counted attribute. Imagine a model with fact datasets Purchase_Fact and Sales_Fact that are both connected to the Store and Product attributes.

model for approximate_count example

If you build an insight that displays the count of products per store with SELECT APPROXIMATE_COUNT({attribute/product}) and slice it with Store, it would not compute because it is ambiguous if the insight displays the approximate number of unique products that have been purchased by store or the approximate number of unique products that have been sold by the store.

In a metric, USING provides a hint for which context should be used. For example the insight will show the approximate number of unique purchased products per store if the Purchase_Fact attribute is placed into the USING clause SELECT APPROXIMATE_COUNT({attribute/product}) USING {attribute/purchase_fact}.

The attribute in the USING clause does not need to be from the actual fact table, it can also be another attribute which uniquely determines the correct context (e.g. use of the Purchase Date attribute in the USING clause directs the use of the Purchase_Fact dataset to join Product with Store because Sales_Fact does not directly relate to Purchase Date).

AVG