Migrate from dbt-spark to dbt-databricks
- 1 Introduction
- 2 Migrate your dbt projects in dbt Cloud
- 3 Configure your credentials
- 4 Migrate dbt projects in dbt Core
- 5 Try these examples
Introduction
You can migrate your projects from using the dbt-spark
adapter to using the dbt-databricks adapter. In collaboration with dbt Labs, Databricks built this adapter using dbt-spark as the foundation and added some critical improvements. With it, you get an easier set up — requiring only three inputs for authentication — and more features such as support for Unity Catalog.
Prerequisites
- Your project must be compatible with dbt 1.0 or greater. Refer to Upgrading to v1.0 for details. For the latest version of dbt, refer to Upgrading to v1.7.
- For dbt Cloud, you need administrative (admin) privileges to migrate dbt projects.
Simpler authentication
Previously, you had to provide a cluster
or endpoint
ID which was hard to parse from the http_path
that you were given. Now, it doesn't matter if you're using a cluster or an SQL endpoint because the dbt-databricks setup requires the same inputs for both. All you need to provide is:
- hostname of the Databricks workspace
- HTTP path of the Databricks SQL warehouse or cluster
- appropriate credentials
Better defaults
The dbt-databricks
adapter provides better defaults than dbt-spark
does. The defaults help optimize your workflow so you can get the fast performance and cost-effectiveness of Databricks. They are:
- The dbt models use the Delta table format. You can remove any declared configurations of
file_format = 'delta'
since they're now redundant. - Accelerate your expensive queries with the Photon engine.
- The
incremental_strategy
config is set tomerge
.
With dbt-spark, however, the default for incremental_strategy
is append
. If you want to continue using incremental_strategy=append
, you must set this config specifically on your incremental models. If you already specified incremental_strategy=merge
on your incremental models, you don't need to change anything when moving to dbt-databricks; but, you can keep your models clean (tidy) by removing the config since it's redundant. Read About incremental_strategy to learn more.
For more information on defaults, see Caveats.
Pure Python
If you use dbt Core, you no longer have to download an independent driver to interact with Databricks. The connection information is all embedded in a pure-Python library called databricks-sql-connector
.