Table of Content
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It FreeOne of the most common complaints today by people working with data is the unexpected nature of data changes which often simply breaks your dashboards. Of course, other types of elements – queries, jobs, models etc. – can break too, but often it is the dashboard that infamously stops working, sending data engineers to try and understand where, how and why did data change.
By their nature, queries are fragile. Unlike software engineering where every cross-system interface is managed through an API, in the data stack every query, which can be part of an Airflow job or a dbt model, defines an interface that needs to be managed – but is often not.
Consider the following scenarios:
In all of these cases, a change happening in one place is being pushed without completing the necessary downstream adjustments which would then introduce incidents.
There are several reasons that would explain the fundamentals for why data contracts are needed in modern data architectures:
Therefore, implementing data contracts can be important for several reasons:
Although the concept of data contracts presumably dates back to the Eighties, it has re-emerged as a popular topic when addressing data quality in the modern data stack and as such, received different interpretations and definitions.
At its core, a data contract is a specification that applies to a single element in the data stack, for example, a table. There’s no strict requirement to what should be defined but commonly two types of definitions appear:
In addition, we recognize two parts around managing data contracts:
Better defined contracts would naturally create stronger resilience to issues but at the same time will introduce additional complexity in their ongoing management, handling updates, and enforcing them throughout the data stack. Therefore, a huge challenge in implementing data contracts is introducing them to an existing large-scale data stack - Creating definitions for hundreds or thousands of tables and implementing effective enforcement in a way that would not severely impact productivity is extremely hard. And ultimately, the top priority for everyone is to use data and power decisions - so productivity always wins.
When thinking about implementing data contracts in an existing organization, we need to assume that a mature, large-scale data stack already exists. There are thousands of different elements, and a similar number of dependencies. But ultimately, a person making a change wants to answer a very simple set of questions:
Answering these questions should happen at time of build, which is before a developer would merge code to production and while issues can still get fixed without negatively impacting the data. The natural place for this check to occur is when a pull request is created, allowing an organized process around change management. Working with pull requests is a foundational cornerstone of change management in software development and it should be the same for data engineering.
Another aspect of implementing data contracts goes to the process of managing a change:
Different tools and frameworks can be used for these processes according to the organization’s task and change management methodologies.
At Foundational, we are pragmatists. We consider the starting point of the data stack as a constraint that is hard to change – maybe even impossible. Our goal is taking an existing data stack and automatically create the entire set of implied data contracts that already exist, while allowing data teams to add new contracts easily:
We believe that this approach makes it easy for large-scale organizations to think and act around data contracts without fundamentally changing the development process and negatively impact productivity.
At Foundational, we are solving extremely complex problems that data teams face on a day-to-day basis. Automating data contracts is just one aspect – Connect with us to learn more.