dbt (Data build tool) is a development framework for analysts, also called Analytics Engineers, that enables them to build, test, and document data pipelines using only SQL.
dbt was initially created by Fishtown Analytics, a data consultancy company, to address the challenges of data transformation and analytics engineering. It was later open-sourced, and Fishtown rebranded to dbt Labs, which today is the company behind the dbt framework.
Dbt aims to empower analytics engineers to own the entire data lifecycle, from raw data to insights, by using traditional SQL to easily build modeling and data pipelines at scale without needing expensive development resources and while applying software engineering best practices such as versioning, testing, and code documentation.
What is dbt?
dbt is a framework that allows analysts to write SQL code that defines how raw data is transformed into analytical models. dbt then executes this code against a data warehouse, such as Snowflake, BigQuery, or Redshift, and creates tables or views that can be used for reporting and analysis. It also generates documentation and tests for the data models, ensuring data quality and reliability.
dbt differs from other data transformation tools in several ways.
- While having a graphical user interface (GUI) tool, dbt supports a powerful command-line interface (CLI) tool that allows analysts to write code in SQL, as the most common language.
- dbt doesn’t perform data extraction or loading but only focuses on data transformation, leaving the data ingestion to other tools, such as Stitch Data or Fivetran.
- dbt adopts the principle of “source of truth as code,” meaning that all the logic and metadata of the data pipeline are stored and version-controlled in a code repository, hosted by popular tools as GitHub or GitLab.
How Does dbt Work?
dbt follows a simple workflow: write, run, test, and document. The main steps involved are:
- Write: Analysts write SQL code that defines the data models using a templating language called Jinja that allows them to use variables, macros, and logic in their SQL. They can also use pre-built packages from the dbt community that provide standard data models, such as Google Analytics or Shopify.
- Run: Analysts run the dbt CLI command dbt run to execute the SQL code against the data warehouse and create the data models as tables or views. They can also specify the order and dependencies of the data models using a configuration file called dbt_project.yml.
- Test: Analysts write SQL code that defines the tests for the data models, such as uniqueness, not null, or referential integrity. To run the tests and validate the data quality, they utilize the dbt CLI command dbt test.
- Document: Analysts write markdown files describing the data models, fields, and sources. They run the dbt CLI command dbt docs generate to generate a web-based documentation site that provides a searchable and interactive view of the data pipeline and its data lineage. Data lineage shows how the data models are derived from the source data and how they are transformed and tested along the way.
dbt has two main versions: dbt Core and dbt Cloud. dbt Core is the free and open-source version that can be installed locally or on a server and run from the command line or a scheduler. dbt Cloud is the premium and hosted version that provides a web interface, a scheduler, a query editor, and other features that enhance the dbt experience.
Benefits of dbt
dbt offers many benefits for data analysts and analytics engineers, such as:
- Improving data quality, consistency, and reliability: dbt enables analysts to write tests and documentation for their data models, ensuring the data is accurate, complete, and trustworthy. It also enforces a consistent naming convention and structure for the data models, making them easier to understand and maintain.
- Increasing productivity, efficiency, and time to value: dbt allows analysts to write SQL code that is reusable, modular, and parameterized, reducing code duplication and complexity. The framework also automates the execution, testing, and documentation of the data pipeline, saving time and resources.
- Enhancing data documentation, testing, and version control: dbt generates a documentation site that provides a comprehensive and up-to-date view of the data pipeline, including the data models, their fields, their sources, and their tests. dbt also integrates with version control systems, such as GitHub or GitLab, that enable analysts to track, review, and collaborate on code changes.
- Supporting scalability, flexibility, and performance of data pipelines: dbt leverages the power and scalability of the data warehouse, allowing analysts to easily handle large and complex data sets. Supporting different data warehouse dialects, such as Snowflake, BigQuery, or Redshift, allows analysts to choose the best platform for their needs. The framework also optimizes the performance of the data pipeline by applying incremental loading, partitioning, and clustering techniques.
Limitations of dbt
dbt also has limitations, especially when managing large-scale dbt projects across multiple teams. These are:
- Maintaining large-scale projects across multiple teams: dbt can become challenging to manage when the data pipeline grows in size and complexity, especially when multiple teams are involved. Analysts may face issues such as code conflicts, dependency management, and code quality control. To ensure the smooth operation of the data pipeline, analysts may need to adopt additional tools and practices, such as code reviews, pull requests, and continuous integration.
- Visibility to dependencies, specifically column-level lineage, is not supported: dbt provides a high-level view of the data pipeline, showing the dependencies between the data models and their sources. However, it doesn’t provide a granular view of the data pipeline, showing the dependencies between the individual columns and fields. This limitation may hinder analysts in tracking the origin and transformation of a specific column or field, which may affect the data quality and trustworthiness. Analysts may need to use other tools, such as Foundational, to obtain the column-level lineage of their data pipeline.
From SQL to Insights
dbt is a powerful and popular framework for analysts and data engineers, enabling them to build, test, and document data pipelines using only SQL. dbt improves data quality, consistency, and reliability. It also increases productivity, efficiency, and time to value, enhances data documentation, testing, and version control, and supports data pipeline scalability, flexibility, and performance.
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
// Desktop
let itemsInView = 3;
let scrollSpeed = 1.2; if (window.matchMedia("(max-width: 479px)").matches) {
// Mobile Portrait
itemsInView = 1;
scrollSpeed = 1.2;
} else if (window.matchMedia("(max-width: 767px)").matches) {
// Mobile Landscape
itemsInView = 1;
scrollSpeed = 1.2;
} else if (window.matchMedia("(max-width: 991px)").matches) {
// Tablet
itemsInView = 2;
scrollSpeed = 1.2;
}
let moveAmount = horizontalItem.length - itemsInView;
let minHeight =
scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
if (moveAmount <= 0) {
moveAmount = 0;
minHeight = 0;
// horizontalSection.css('height', '100vh');
} else {
horizontalSection.css("height", "200vh");
}
moveDistance = horizontalItem.outerWidth() * moveAmount;
horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
calculateScroll();
};let tl = gsap.timeline({
scrollTrigger: {
trigger: ".horizontal-trigger",
// trigger element - viewport
start: "top top",
end: "bottom top",
invalidateOnRefresh: true,
scrub: 1
}
});
tl.to(".horizontal-section .list", {
x: () => -moveDistance,
duration: 1
});
</script>