Note: This is an English translation of the original Japanese article.
https://dev.classmethod.jp/articles/snowflake-pupr-dbt-projects/
Hi, I'm Sagara.
I learned about this from the following post by @mmotohas, but "dbt Projects on Snowflake," which allows you to develop and run dbt projects within your Snowflake account, is now in public preview!
https://x.com/mmotohas/status/1938050501352034447
I tried it out right away, so I'll summarize what I found.
What are dbt Projects?
To reiterate, dbt Projects on Snowflake is a feature that allows you to create, run, and manage dbt Core (the open-source version of dbt) directly on the Snowflake platform.
(The link below is to the official documentation, but as of June 26, 2024, only the LIMITEDACCESS link was available.)
https://docs.snowflake.com/LIMITEDACCESS/dbt-projects-on-snowflake?_fsi=JchN3RQh&_fsi=JchN3RQh
The following video is a good reference to understand what this feature does.
https://www.youtube.com/watch?v=w7C7OkmYPFs
Important Notes
Quoting from the official documentation mentioned above, the following limitations exist as of June 26, 2024. (Please check the official documentation for the latest information.)
- Only dbt Core projects are supported. If you want to use a repository developed with dbt Cloud, you'll need to create a
profiles.yml
file, among other things. - Git package limitations: Packages that directly specify a Git repository in the
packages.yml
file are not supported. - File count limit: A single dbt project can contain up to 20,000 files.
- No sharing: dbt Projects on Snowflake runs in a Workspace, but since Workspaces are created in a personal database, they cannot be shared with other users.
- Configuration: Integrating with a Git repository or using external packages requires prior setup, such as API integrations and external access integrations.
Additionally, the following features are mentioned as planned for future improvements:
- Partial command support: The dbt commands that can be run within a Workspace are still limited (though
run
,test
, etc., are supported). - Unimplemented UI: Displaying dbt run history and setting execution schedules using the Snowsight UI are planned for future updates. Currently, scheduling requires creating a Snowflake Task with SQL.
- Performance improvements: Performance enhancements are planned to significantly reduce the startup time of dbt processes (by 20-40x).
What I'll Try
I will create a new dbt Project in a Workspace to see what can be done with dbt Projects on Snowflake.
Test Environment
My test environment is a non-trial Snowflake account in the AWS Tokyo region.
Enable Secondary Roles
dbt Projects use the Workspace feature, and Workspaces are created in a personal database associated with each user.
Execute the following query after replacing my_user
with your username to enable secondary roles.
ALTER USER my_user SET DEFAULT_SECONDARY_ROLES = ('ALL');
Create a Database and Schema for Testing
Run the following queries to create a database and schema for testing.
USE ROLE SYSADMIN;
CREATE DATABASE SAGARA_DBT_ON_SNOWFLAKE_FIRST_TIME;
GRANT OWNERSHIP ON DATABASE SAGARA_DBT_ON_SNOWFLAKE_FIRST_TIME TO ROLE SAGARA_ADMIN_ROLE;
USE ROLE SAGARA_ADMIN_ROLE;
USE DATABASE SAGARA_DBT_ON_SNOWFLAKE_FIRST_TIME;
CREATE SCHEMA DBT_SSAGARA; -- For development
CREATE SCHEMA PRODUCTION; -- For production
Launch a Workspace and Set Up a dbt Project
From the left menu, launch Workspaces.
Then, click the top-left item, and from Create Workspace
, press + New
.
Give your Workspace a name and create it.
Next, press + Add New
, then dbt Project
.
Enter the Project Name
, role, warehouse, database, and schema. Here, enter the database and schema that will be the output destination when you run tests during dbt development.
As you can see in the image below, the necessary folders and files for dbt have been automatically created! The profiles.yml
file contains the role, warehouse, database, and schema you just entered.
Viewing the Lineage of the Sample Models
When you set up a dbt Project, the models
folder contains my_first_dbt_model.sql
and my_second_dbt_model.sql
by default.
To check the data lineage between these sample models, make sure Compile
is selected in the top right, and then press the button next to it.
The logs will then be displayed in the Workspace's Output panel.
After this, if you refresh your browser, a DAG tab will be added, allowing you to see the lineage. (When I tried it, the DAG tab didn't appear until I refreshed the browser.)
Building the Sample Models
To apply the sample models (whose lineage we just checked) to Snowflake, let's try building them.
Change the selection in the top right from Compile
to Build
and press the button to the right.
The logs will then be displayed in the Workspace's Output panel.
Now, if you look at the database and schema you created, you can see that the sample models have been built and applied.
Add a Profile for the Production Schema
To later run our dbt Project on Snowflake as a Snowflake Task, we need to add a profile for the production schema.
I've added it as shown in the red box in the image below. Now, prod
can be selected from the Profile
dropdown at the top of the Workspace.
Deploy the dbt Project from the Workspace to the Snowflake Account
With dbt Projects on Snowflake, you need to deploy your dbt project to your Snowflake account in order to run it as a Task.
From the Connect
menu in the top right, click Deploy dbt project
.
Specify a database and schema, select Create dbt project
for the Select or create dbt project
option, and enter a name.
The result will be displayed as shown below.
If you look at the destination schema, you'll see that the dbt Project has been deployed with the specified name.
Create and Run a Task Based on the Deployed dbt Project
Let's create and run a task based on the deployed dbt project.
Open a new worksheet and run the following query to create the task. The key point here is using --target prod
to ensure the operations are performed on the production schema.
CREATE OR ALTER TASK sagara_dbt_on_snowflake_first_time.production.dbt_execute
WAREHOUSE = sagara_dbt_prod_wh
SCHEDULE = 'USING CRON 0 9 1 1 * Asia/Tokyo'
AS
EXECUTE DBT PROJECT sagara_dbt_on_snowflake_first_time.production.sagara_dbt_deployment_first_time args='build --target prod';
After that, run the following query to manually execute the task.
EXECUTE TASK sagara_dbt_on_snowflake_first_time.production.dbt_execute;
Once the task completes successfully, you will see that the table and view have been created in the production schema.
Conclusion
"dbt Projects on Snowflake," which allows you to develop and run dbt projects within your Snowflake account, is now in public preview, so I tried it out!
As someone used to dbt Cloud, I had no trouble with the development process. It's amazing that the entire workflow, from dbt development to production runs, can be completed within a Snowflake account.
This time I didn't integrate with GitHub, but next I'd like to explore what the development process looks like with GitHub integration