Dremio
Disclaimer
Support for the Dremio Data Source Manager (DSM) is in beta. Beta features are available for users to test and provide feedback. They do not have their implementation finalized. The behavior or interface for these features may change in the future.
Important
Do not use beta features in your production environment.
Known limitations
- Date arithmetic:
- Not all
period-over-period
functionality works due to incorrectly translatedTIMESTAMPADD
- Not all
- Functions:
- Some
WINDOW
frames are not supported GREATEST
andLEAST
functions do not work correctly, if a second parameter is used- Example:
GREATEST(column_name, 5000)
- Example:
- Rarely, if a report works with empty dimensionality, Dremio incorrectly adds the clause
OFFSET 0 ROWS FETCH NEXT 0 ROWS ONLY
, which causes the report to return 0 rows.
- Some
- Only UTF-8 characters are supported
- Rarely, a duplicate column alias may be generated, resulting in the corresponding report failing to execute (internal error)
- When caching is enabled, the report may fail with an internal error
- The caching mechanism creates tables for each pre-aggregation and utilizes it in outer aggregations
- Dremio does not register tables into its catalog, if populating SELECT statements return 0 rows
Deployment
You can run Dremio (OSS Version) in a docker container. The image for Dremio is available on Dockerhub.
Note
GoodData uses driver version 19.1.0-202111160130570172-0ee00450
.
The following example demonstrates how to start GoodData.CN with Dremio using Minio to serve as S3 storage:
version: '3.7'
services:
gooddata-cn-ce:
image: gooddata/gooddata-cn-ce:2.0.1
ports:
- "3000:3000"
- "5432:5432"
volumes:
- gooddata-cn-ce-data:/data
environment:
LICENSE_AND_PRIVACY_POLICY_ACCEPTED: "YES"
dremio:
image: dremio/dremio-oss:17.0.0
ports:
- '9047:9047'
- '31011:31010'
- '45678:45678'
volumes:
# DB drivers
- ./db-drivers/VERTICA/vertica-jdbc-10.0.1-2.jar:/opt/dremio/jars/3rdparty/vertica-jdbc-10.0.1-2.jar
- ./db-drivers/SNOWFLAKE/snowflake-jdbc-3.12.9.jar:/opt/dremio/jars/3rdparty/snowflake-jdbc-3.12.9.jar
# DB plugins
- ./db-drivers/DREMIO/dremio-verticaarp-plugin.jar:/opt/dremio/jars/dremio-verticaarp-plugin.jar
- ./db-drivers/DREMIO/dremio-snowflake-plugin.jar:/opt/dremio/jars/dremio-snowflake-plugin.jar
# DATA volume
- dremio-data:/opt/dremio/data
minio:
image: minio/minio:RELEASE.2021-08-25T00-41-18Z
volumes:
- minio-data:/data
ports:
- '19000:9000'
- '19001:19001'
environment:
MINIO_ACCESS_KEY: tiger_abcde_k1234567
MINIO_SECRET_KEY: tiger_abcde_k1234567_secret1234567890123
command: server --console-address ":19001" /data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
volumes:
gooddata-cn-ce-data:
dremio-data:
minio-data:
Note
By passing the environment variable LICENSE_AND_PRIVACY_POLICY_ACCEPTED=YES
you agree to the terms and conditions in the GOODDATA.CN COMMUNITY EDITION LICENSE AGREEMENT, including GoodData’s Privacy Policy. Please read it carefully. In order to use GoodData.CN, you must agree to the terms and conditions therein.
Prepare Dremio for GoodData
To learn how to register Data Sources to Dremio, refer to the official Dremio documentation for connecting a Data Source.
To access the Dremio web console, load localhost:9047
in your web browser. Register the user
and password
for later use when you create the Data Source definition.
Depending on the Data Source you use, additional preparation may be necessary to integrate your Data Source Manager with GoodData. For general considerations, refer to Preparing Data Source Managers for GoodData.
Data Sources Providing Metadata
If you use a Data Source that accommodates metadata (for example, Postgres), consider the following to enure your scan of the Data Sources returns data:
- Database tables and views can be scanned only if they have been queried in Dremio.
- Alternatively, you can create Dremio datasets on top of the tables or views to have them available as views without needing to query Dremio.
Data Sources that do not Provide Metadata
If you use a Data Source that does not accommodate metadata, you must always create the datasets.
Data Source Details
Use the following information when creating a data source to use with your Dremio DSM:
- The following considerations apply when you are configuring the JDBC URL:
- If you start Dremio as docker container, you can connect using this URL:
jdbc:dremio:direct=dremio:31010
. - If you run Dremio outside of a docker container, consult the official Dremio documentation for configuring the JDBC URL.
- There are almost no limits for the driver setup except insecure parameters like e.g.
trustStorePassword
. For all possibilities, see the official documentation.
- If you start Dremio as docker container, you can connect using this URL:
- Basic authentication is supported. Specify the
user
andpassword
accordingly. - You can set
enableCaching
totrue
andcachePath
to["$scratch"]
- Learn more about the caching mechanism in Enable caching and Cache Path.
Performance Tips
If you want to query large datasets or even join large datasets from different data sources, we recommend that you use the Dremio reflections feature.
Query Timeout
Query timeout is not supported for Dremio yet.