Introducing Foundational: A Code Analysis Platform for Data Engineering

Starting a company is first and foremost personal. Along with many other things, it combines the desire to create a new organization and DNA, along with the passion to impact an industry.

In the past decade, we’ve seen the data ecosystem reinvent itself again, and again, and again. New products and technologies have made it faster and way easier to collect, process and analyze data at scale. Data teams are growing; CDOs are in hot demand. But at the same time, it seems that data teams everywhere are facing repeating, fundamental problems.

Governance, data quality, development procedures, and sometimes even appropriate testing are, in many cases, an afterthought. On one hand, creating dashboards, DAGs, and dbt models has never been easier. On the other hand, something as seemingly simple as deleting a column can become quite complex!

Barak, Omri and I have been working with large-scale data platforms for a long time, and with software systems for even longer. We've come to realize that the ideal reality we aim to create for data teams is reminiscent of what we've all grown to practice and love in the software industry. It's a reality that enables multiple teams across diverse platforms, disciplines, and even organizations to collaborate, work fast, and build reliably – all as part of a unified workflow.

This is what we’re set to solve with Foundational.

A disconnect between code and data

The modern data ecosystems are complex — and by design, change continuously. A modern data stack today leverages multiple platforms, technologies, and tools where data moves across multiple domains and stakeholders. Even a simple path in the form of operational data that goes into a warehouse and then into a BI tool, will typically include multiple tables, views, DAGs, transformation code, CDC and potentially other acronyms. And then what happens when we need to change something? It becomes a convoluted, multiple stakeholder problem. Consequently, it’s also very slow. And no one likes slow.

But why is this happening? There are many aspects to this but one point of view is that it’s just too hard for the individual that is trying to push a change to figure out what’s going on and exactly how the new code change will impact data. Presumably this could be solved with a process, which then makes everything and everyone slow and frustrated. Is this how it’s supposed to be?

In contrast, in software engineering, thanks to technology and the software development lifecycle (SDLC), large-scale organizations are able to build and maintain increasingly complex systems in parallel, test in parallel, and deploy in parallel. It’s not perfect, but there’s a much higher degree of predictability and it scales a lot better. It feels a lot better too.

Why does this matter?

The disconnect between code and data as described above prevents us from building it scale. If we solve it, it will allow us to bring the best practices from the world of software development into the world of data and data engineering. It will also allow us to close many loops that are open today, for example:

How can I tell, for this new piece of code I just wrote, if there’s any downstream dependency such as a dashboard or a notebook that I should know about?
How can I know, for this data issue I am seeing in the live data, what was the last commit which may have created it? And who is the person that did it?
How can I create a CI/CD check on pending changes given certain criteria?

To solve these problems, we have to understand what’s changing in the source code. Looking at the warehouse queries is not enough, as there are many steps before the query that could be relevant.

This is what we’re building at Foundational. It’s a new type of management system that at its core relies on a new kind of code analysis. We analyze the code right at the GitHub repository. Not just SQL, but every type of code that is related to data. What we seek to understand is not only the outcome of the code, but also what the code is actually doing. And because we access git, we can also see and analyze the changes, old and new.

We are Foundational

We started Foundational to change how data is being built and maintained, and to introduce a new approach that pushes governance and data quality to the left, before code is deployed. But ultimately, it’s about developing with trust, speed, and confidence. We want data teams to have the confidence and speed that software teams have, and to have better trust in their code.

In the past we’ve also experienced firsthand how hard it can be for new technology to get adopted if the barrier of entry is high, and if the implementation is complex. We’re building Foundational differently -

We only access code, and not data
There are absolutely no code changes needed, setup can be done in minutes
Insights are integrated into existing tools, everyone can use Foundational immediately

We are incredibly excited to solve this together. Thank you to all of our incredible partners who have supported us so far.

Chat with us

At Foundational, we are solving extremely complex problems that data teams face on a day-to-day basis. Understanding code is only one aspect of it – Connect with us to learn more.

Hello, Foundational!

A disconnect between code and data

Why does this matter?

We are Foundational

Chat with us

Related posts

Spark Lineage via Code Analysis

Foundational Now Available on AWS Marketplace!

Expanding the Horizon of OpenLineage: Extracting Lineage from Code with Foundational

Next-gen Data Management.
For Everyone

Related posts

Spark Lineage via Code Analysis

Foundational Now Available on AWS Marketplace!

Expanding the Horizon of OpenLineage: Extracting Lineage from Code with Foundational

Next-gen Data Management.For Everyone

Next-gen Data Management.
For Everyone