Blog
Announcements
Spark Lineage via Code Analysis

Spark Lineage via Code Analysis

Announcements
March 31, 2025
Barak Fargoun
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free

Introduction: A New Era for Spark Data Lineage

Apache Spark has become a cornerstone of modern data processing, powering large-scale analytics and machine learning for organizations worldwide. Its speed, versatility, and ability to handle massive datasets in data lakes are undeniable. Spark is heavily used in platforms like Databricks, Amazon EMR, etc.

However, a significant challenge has persisted: data lineage. While SQL-based workloads and transformations (e.g., dbt) have established solutions, Spark's unique nature has made it difficult to achieve comprehensive and automated lineage. This is a problem, as there aren't a lot of solutions for Spark lineage. Even existing solutions, like Databricks Unity Catalog or OpenLineage, are only capable of looking at runtime lineage. This makes it hard to get impact analysis for potential code changes before they are running.

For those using platforms like Amazon EMR, the challenge is even greater. There is no good solution, other than deploying their own OpenLineage, which can be quite complex and still won't provide impact analysis for code changes.

Announcing Foundational's Spark Lineage Solution

Foundational is excited to announce a major advancement in data lineage for Spark! We're introducing a powerful new capability that automates data lineage extraction directly from Spark code. This code-based approach represents a significant leap forward, providing data teams with unprecedented visibility and control over their Spark data pipelines.

Key Highlights:

  • Code-Based Lineage: Foundational's innovative approach analyzes Spark code (PySpark, Scala Spark, Spark SQL) to extract lineage, providing more detailed and accurate information.
  • Proactive Insights: By analyzing code, Foundational enables lineage visibility for pending code changes and pull requests, allowing for proactive identification of potential data issues before they are deployed.
  • End-to-End Visibility: Foundational goes beyond basic Spark lineage, aiming to provide a more complete view that connects to upstream sources (e.g. operational databases) and downstream BI tools.

Want to Learn More?

This is just a spotlight! For a detailed explanation of how Foundational extracts Spark lineage, including technical details and examples, please refer to our help center article on Foundational Spark Lineage.

The Future of Spark Data Management

Foundational is committed to empowering data teams with the tools they need to manage complex data environments effectively. This new Spark lineage capability is a major step towards that goal, providing increased transparency, improved governance, and greater confidence in Spark-driven data initiatives.

code snippet <goes here>
<style>.horizontal-trigger {height: calc(100% - 100vh);}</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
 // Desktop
 let itemsInView = 3;
 let scrollSpeed = 1.2;  if (window.matchMedia("(max-width: 479px)").matches) {
   // Mobile Portrait
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 767px)").matches) {
   // Mobile Landscape
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 991px)").matches) {
   // Tablet
   itemsInView = 2;
   scrollSpeed = 1.2;
 }
 let moveAmount = horizontalItem.length - itemsInView;
 let minHeight =
   scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
 if (moveAmount <= 0) {
   moveAmount = 0;
   minHeight = 0;
   // horizontalSection.css('height', '100vh');
 } else {
   horizontalSection.css("height", "200vh");
 }
 moveDistance = horizontalItem.outerWidth() * moveAmount;
 horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
 calculateScroll();
};let tl = gsap.timeline({
 scrollTrigger: {
   trigger: ".horizontal-trigger",
   // trigger element - viewport
   start: "top top",
   end: "bottom top",
   invalidateOnRefresh: true,
   scrub: 1
 }
});
tl.to(".horizontal-section .list", {
 x: () => -moveDistance,
 duration: 1
});
</script>
Share this post
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free