
Introduction
Data quality is always a hot topic – it’s a top priority yet it remains a key concern for the vast majority of data organizations. Ironically, there are many tools that address data quality however it’s not always easy to understand what’s the right solution for your organization.
In this blog post we’ll explore Snowflake’s native data quality solution, called Data Quality Monitoring. We’ll review how it works and provide a step-by-step guide to create your first Data Metric Functions – DMFs – which are the tests themselves that are used by this product. We think that becoming familiarized with Data Quality Monitoring and DMFs is critical knowledge for any organization running on Snowflake, and we’ll also explore pros and cons against third-party solutions.
Let’s get started!
What is Snowflake Data Quality Monitoring?
Earlier this year, Snowflake introduced Data Quality Monitoring and Data Metric Functions (DMFs) as a powerful framework for data quality monitoring natively within Snowflake. DMFs allow you to set up custom, pre-defined rules (Called System DMFs) to validate data within Snowflake tables and columns. These rules are automatically enforced, helping you maintain high data quality across your Snowflake environment.
As of recently, Data Quality Monitoring and DMFs are officially supported as part of the Enterprise Edition, after several months in Public Preview.
Why Use Snowflake’s Native Data Quality Solution?
Snowflake’s data quality solution provides several key benefits over third-party tools:
- Seamless Integration: DMFs are built directly into Snowflake, making them easy to use without needing additional setup or infrastructure.
- Cost-Effective: Since DMFs are part of Snowflake, there are no extra licensing fees, and the per-unit price per test (E.g., a query that checks a table) is smaller when charged by the warehouse provider vs. a third-party which would charge a premium.
- Real-Time Validation: One of the biggest advantages in DMFs is that they can be triggered as soon as data changes occur, catching issues immediately. This is not the case for third-party vendors which run testing in a fixed cadence.
- Security: Data quality vendors typically need access to raw data, which is not great from a security perspective. Using DMFs minimizes third-party access to sensitive data such as PII and simplifies governance.
Of course, there are also pros for using third-party vendors for data quality:
- Advanced Features: While DMFs are powerful, they may lack some advanced features like machine learning-based anomaly detection found in third-party tools.
- Customization Limits: Some third-party tools might offer more flexibility for complex data quality workflows.
If your organization is using Snowflake extensively, Data Quality Monitoring and DMFs could offer a seamless, cost-effective, and most importantly, real-time solution for data quality.
Step-by-Step Guide to Implementing Snowflake Data Quality with DMFs
In the following section we’ll walk you through setting up your first DMFs to upgrade your Snowflake data quality.
Step 1: Create Tables with Sample Data
Start by creating Customers
and Orders
tables and filling them with sample data. Run the following in your Snowflake console:
Step 2: Create DMFs for Data Validation
Next, create DMFs to validate specific fields. Let’s consider the following examples:
- The
order_status
field is an enum that must be one of the following values:pending
,in progress
,completed
. - The
country
field must be a 2-character uppercase string. - The
email
field must contain a valid email format i.e.,<address>@<domain>
Here’s the SQL code for these validation rules:
Step 3: Schedule DMFs for Real-Time Monitoring
Let’s schedule our validations to run on every change made to the table. The TRIGGER_ON_CHANGES
schedule will run the DMF whenever rows in the table are changed, and specifically when rows are inserted, deleted or updated (via INSERT
, UPDATE
or any other DML operation).
This part is important - This scheduling option is unique to DMFs and is not available in most observability and/or monitoring tools that run on a predetermined cadence. This allows Snowflake DMFs to detect problems immediately as they occur, without needing to wait for the next scheduled run which can be hours later. Increasing the cadence is usually non-viable since these runs can get expensive when done by a third-party. We therefore strongly believe that this option is a game changer for how fast data issues can get detected, and reduce the overall impact of data incidents for customers.
Step 4: Attach DMFs to Specific Columns
Now that we have DMF defined, and have set the schedule for DMF validations against the relevant tables, we are only missing one final piece - Attaching the DMF to specific tables and columns.
DMFs are standalone Snowflake objects that are not attached to a specific table/column. This approach is quite elegant as it allows writing a specific DMF once (E.g., a DMF for validating emails), while using it multiple times across many different tables and columns.
Here’s an example code to attach our DMFs to the specific columns in question:
Step 5: Test the DMFs
It’s time to test that our new DMFs are working properly. In order to do so, we will insert some offending rows to our tables, and see if we indeed get validation errors reported from our DMFs.
Step 6: View DMF Validation Results
Snowflake stores all DMF results in Snowflake tables in a standard format, which makes it easy to read and view those results. Having this in a standardized format, we'd like to believe that this can become the standard way of reporting validation results in Snowflake, making interoperability across different products such as data management platforms and catalogs a lot easier.
Let’s verify that we indeed see the DMF validation errors. This query will show the latest validation result for the INVALID_ORDER_STATUS_COUNT DMF:
Please note that it may take about 5 minutes for results to arrive. Afterwards, you should expect to see output similar to this:

You can see here that the number of rows that failed the INVALID_ORDER_STATUS_COUNT
validation is exactly 1, which is the single row we added in the previous step.
Step 7: Set Up Email Notifications for Validation Errors
Good data quality solutions provide notifications for broadcasting their results, and Snowflake’s Data Quality Monitoring does this too. For example, we can set up an email notification to be sent once per day if there are any rows that failed the validation of INVALID_ORDER_STATUS_COUNT
.
First, make sure you have email integration configured in your Snowflake account, by running the following:
Then, define a function that sends the email if the INVALID_ORDER_STATUS_COUNT
DMF has picked up bad rows. Make sure you use your real email address when using this code, so that the email will get delivered successfully:
Then, we create a recurring task that runs this function, and is scheduled to run once an hour:
For the sake of the test, we can also test it immediately without waiting for the next scheduled run, by calling the function:
At this point you can check that you’ve received the notification email. It may take a few minutes for the email to get to your inbox - once it’s arrived, the test is officially done!

Step 8: Clean Up
Now that we are done with the test, let’s clean up all of the objects we created exclusively for this tutorial:
What’s Next?
In this article we have learned how to configure a DMF to validate data in Snowflake, attach DMFs to tables to run on a fixed schedule, view validation results, and set up email notifications to alert about validation errors. With this knowledge, you can now go ahead and define DMFs for all the important tables in your Snowflake warehouse to make sure data invariants and data contracts are validated and enforced.
However, this process can be quite tedious, especially when you have hundreds and thousands of tables, each with dozens of different columns and a large number of data validation rules.
Fortunately, there are ways to automate a lot of this work. For example, we can infer most of the rules automatically by looking at the SQL code that’s executed in the warehouse and make specific assertions based on the actual queries that we observe. For example, if we see that a specific column is always tested against a small number of values, we can assume that this column is an enum, and accordingly create an appropriate DMF to validate it.
To make this easier, at Foundational we created a Free DMF Generator, which automatically generates DMF rules based solely on a sample SQL query. We also invite you to schedule time with us to see how Foundational can automate and keep DMFs up to date for you, so that you’d never have to worry about Snowflake data not being validated.
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
// Desktop
let itemsInView = 3;
let scrollSpeed = 1.2; if (window.matchMedia("(max-width: 479px)").matches) {
// Mobile Portrait
itemsInView = 1;
scrollSpeed = 1.2;
} else if (window.matchMedia("(max-width: 767px)").matches) {
// Mobile Landscape
itemsInView = 1;
scrollSpeed = 1.2;
} else if (window.matchMedia("(max-width: 991px)").matches) {
// Tablet
itemsInView = 2;
scrollSpeed = 1.2;
}
let moveAmount = horizontalItem.length - itemsInView;
let minHeight =
scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
if (moveAmount <= 0) {
moveAmount = 0;
minHeight = 0;
// horizontalSection.css('height', '100vh');
} else {
horizontalSection.css("height", "200vh");
}
moveDistance = horizontalItem.outerWidth() * moveAmount;
horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
calculateScroll();
};let tl = gsap.timeline({
scrollTrigger: {
trigger: ".horizontal-trigger",
// trigger element - viewport
start: "top top",
end: "bottom top",
invalidateOnRefresh: true,
scrub: 1
}
});
tl.to(".horizontal-section .list", {
x: () => -moveDistance,
duration: 1
});
</script>