November 6, 2023

dbt Materialization Types and Strategies Explained

By Arnab Mondal

dbt is one of the leading Data transformation tools in the industry. 

In this blog, we will look into some materialization types available in dbt. Each type allows for creating different models, each with its specialty and usage strategy.

What are Materializations?

dbt models are stored as materializations in a warehouse. By default, dbt models are materialized as ‘Views.’ However, there are a total of 5 types of materialization that you can use. The dbt-supported materializations that we will be discussing in the upcoming sections are as follows: 

  • Table

  • View

  • Incremental

  • Ephemeral

  • Materialized View

In this blog, we will be only covering these 5 types of materializations, which are in-built. Custom materializations can be found in different packages out on the hub, or you could also build your own type of materializations as you need.

How to Configure Materialization?

dbt has been designed so that models are, by default, materialized as views. However, this can be changed into any other 4 options listed above. Configure a model with a materialization other than the default View by providing the correct materialized configuration parameter in the dbt_project.yml file.

Example:
				
					
models:
  my_project:
    events:
      # materialize all models in models/events as tables
      +materialized: table
    csvs:
      # this is redundant, and does not need to be set
      +materialized: view


				
			

We can also configure the materialization type inside the dbt SQL file or the yaml file. phData’s recommendation would be to follow and use the hierarchical nature of dbt and configure it at the dbt_project.yml level or the model level. This is useful to compare the performance of each type for the models. 

				
					{{ config(materialized='table') }}
				
			

Types of SQL Materializations

View

For the default View materialization, the model structure is re-calibrated every time you run a view. This is done via a create view statement. On the flip side, this approach doesn’t require additional data storage.

Additionally, these views will always contain the latest data records, so there are no worries about refreshing or loading the latest data. 

The downside of this approach is that your query will be way slower to run if views are stacked on top of each other or if they have undergone a significant transformation. The easy approach to follow here is to start with the default materialization Views and then slowly change to the other materializations as applicable once you observe slow performance or problematic bugs. 

Usage Tip: For easier or non-significant transformations such as renaming or changing column data types, you can use Views. A view, in general, would not be a permanent table and would always reference the data from tables. It will compute the data dynamically when executed.

Config Examples : 

  1. Indexes: You can also use indexes in dbt to improve the query performance. This can give you a significant improvement in performance and can be used with Materialized views and sources as well. You can do this by providing either of the following.

  • Column as a list: A list of columns to be used as indexes

  • Name as a string: The name for the Index

  • Cluster as a string: The cluster to use while creating the index

  • Default as a boolean: It’s set to False, only set to True if you want to use all the columns mentioned as an index

Table

For the Table materialization, the model structure is re-calibrated on each run and designed as Table. A simple create table query or command is used to do this. One of the biggest advantages of using this type of materialization is that a Table will never create performance issues as they are pretty fast to query. 

The main downside of this approach is that it makes re-building or changes really slow and complex when you need to re-assess or edit heavy transformations. Another downside is that newly added source data is not auto-added to the Tables you create. This could lead to mistakes and incomplete results. 

Usage Tip: To optimize the performance of any slower-performing transformations, you can use Tables. In case these transformations are utilized by multiple models downstream, then this would improve your overall model performance multi-fold. For faster integration, you can also use Table materialization for models being called or queried by BI tools. 

Config Examples: 

  1. Transient: You can use the transient parameter to declare a table as a transient table in Snowflake. It is a boolean parameter which can be set to true or false.

  1. Cluster By: You can use the cluster_by config parameter to specify which column Snowflake should cluster the table.

Incremental

Models with incremental materialization allow you to insert or update new records into a table. This will allow you to update the records since the last time that source records were loaded onto that table. The benefit of this option is that it reduces your time utilized for each build since you can incrementally keep adding new records to your existing model. However, incremental models require a finer touch and more advanced configurations, which require you to be well-versed in the advanced concepts of dbt. 

Usage Tip: Consider incremental models as models where sequential events are transformed and stored. We would not suggest you utilize incremental models immediately when starting. Gradually, as you begin observing performance issues or you need to load new data records to your model, you can transition to an incremental materialization for your models.

Config Examples: 

  1. Unique: You can specify the unique columns via which the incremental model will be loaded incrementally and avoid duplicates.

Ephemeral

Ephemeral models are not a permanent part of the database. dbt will replace the ephemeral code with a common table expression in all dependent models. The ephemeral models can be reused in multiple downstream models, which would help you reduce clutter and organize your database. 

On the downside, you cannot run the select query on this model type. Also, you cannot run dbt macro operations in ephemeral models. One of the disadvantages of ephemeral models is that each use makes the queries harder to debug as there is no real-time table to trace. 

Usage Tip: You may consider splitting ephemeral models across multiple schemas. Ephemeral models are most effective for lightweight transformations that come up early in your Directed Acyclic Graph (DAG). Remember to use ephemeral models where they will not be needed downstream, as this would reduce the necessity for direct queries.

Materialized View

The materialized view type is used to create a table materialized in your target database. This materialization uses an on_configuration_change configuration block, which can align with the incremental nature of the materialized view. dbt can apply the changes without dropping them or having to recreate them. So you will be able to update your existing materialized views. The on_config_change has 3 modes:

  • Apply (Default): It will try to update the existing database object if possible and avoid a complete rebuild whenever possible.

  • Continue: It allows the runs to continue and warns that the object was not modified since it was running.

  • Fail: If a change is detected, you can force the entire run to fail.

Usage Tips: You can use a –full-refresh tag to override the configuration part and trigger a complete refresh of the materialized view. It has both of the best features of a table and view and can act as an incremental model. They might be a bit more complex than normal tables, but they can be executed in a schedule and might not be supported by all platforms.

Config example : 

  1. Cluster: You can pass the cluster connection parameter in the config block. This can also be specified in the dbt_project.yml file. In the case of both, the model config block overrides the yml file details.

Recent Addition: dbt Python Models Materialization Types

dbt Python models were added in dbt core v1.3, and their materialization types are:

  • Table

  • Incremental

Closing

In the above sections, we have covered the pros, cons, and usage tips for SQL Incremental Models. Mostly, Python incremental materializations follow the same strategies as their SQL counterparts. The specific strategy supported depends on your choice of adapter. 

As you may have already understood, Python models cannot be materialized as Views or Ephemeral. Python models also support dbt non-model resources such as tests. dbt Python models also support merging incremental strategy for multiple environments, just like dbt. The other strategy of insert_overwrite will be supported in the future.

Each type of dbt materialization has its own advantages, and you should use it accordingly to optimize your data lake/warehouse. 

phData has a team of experts that can help you determine the materialization type needed to achieve your dbt strategy.

FAQs

The default materialization in dbt is always a View. Suppose no other materialization is defined as the default in the dbt_project.yml file. In that case, dbt will assume it to be View.

dbt currently supports 5 types of SQL materializations (Table, View, Incremental, Materialized Views & Ephemeral)  and 2 types of Python materializations (Table and Incremental).

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit