Introduction: A New Era for Spark Data Lineage
Apache Spark has become a cornerstone of modern data processing, powering large-scale analytics and machine learning for organizations worldwide. Its speed, versatility, and ability to handle massive datasets in data lakes are undeniable. Spark is heavily used in platforms like Databricks, Amazon EMR, etc.
However, a significant challenge has persisted: data lineage. While SQL-based workloads and transformations (e.g., dbt) have established solutions, Spark's unique nature has made it difficult to achieve comprehensive and automated lineage. This is a problem, as there aren't a lot of solutions for Spark lineage. Even existing solutions, like Databricks Unity Catalog or OpenLineage, are only capable of looking at runtime lineage. This makes it hard to get impact analysis for potential code changes before they are running.
For those using platforms like Amazon EMR, the challenge is even greater. There is no good solution, other than deploying their own OpenLineage, which can be quite complex and still won't provide impact analysis for code changes.
Announcing Foundational's Spark Lineage Solution
Foundational is excited to announce a major advancement in data lineage for Spark! We're introducing a powerful new capability that automates data lineage extraction directly from Spark code. This code-based approach represents a significant leap forward, providing data teams with unprecedented visibility and control over their Spark data pipelines.
Key Highlights:
- Code-Based Lineage: Foundational's innovative approach analyzes Spark code (PySpark, Scala Spark, Spark SQL) to extract lineage, providing more detailed and accurate information.
- Proactive Insights: By analyzing code, Foundational enables lineage visibility for pending code changes and pull requests, allowing for proactive identification of potential data issues before they are deployed.
- End-to-End Visibility: Foundational goes beyond basic Spark lineage, aiming to provide a more complete view that connects to upstream sources (e.g. operational databases) and downstream BI tools.
Want to Learn More?
This is just a spotlight! For a detailed explanation of how Foundational extracts Spark lineage, including technical details and examples, please refer to our help center article on Foundational Spark Lineage.
The Future of Spark Data Management
Foundational is committed to empowering data teams with the tools they need to manage complex data environments effectively. This new Spark lineage capability is a major step towards that goal, providing increased transparency, improved governance, and greater confidence in Spark-driven data initiatives.
code snippet <goes here>
<style>.horizontal-trigger {height: calc(100% - 100vh);}</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
// Desktop
let itemsInView = 3;
let scrollSpeed = 1.2; if (window.matchMedia("(max-width: 479px)").matches) {
// Mobile Portrait
itemsInView = 1;
scrollSpeed = 1.2;
} else if (window.matchMedia("(max-width: 767px)").matches) {
// Mobile Landscape
itemsInView = 1;
scrollSpeed = 1.2;
} else if (window.matchMedia("(max-width: 991px)").matches) {
// Tablet
itemsInView = 2;
scrollSpeed = 1.2;
}
let moveAmount = horizontalItem.length - itemsInView;
let minHeight =
scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
if (moveAmount <= 0) {
moveAmount = 0;
minHeight = 0;
// horizontalSection.css('height', '100vh');
} else {
horizontalSection.css("height", "200vh");
}
moveDistance = horizontalItem.outerWidth() * moveAmount;
horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
calculateScroll();
};let tl = gsap.timeline({
scrollTrigger: {
trigger: ".horizontal-trigger",
// trigger element - viewport
start: "top top",
end: "bottom top",
invalidateOnRefresh: true,
scrub: 1
}
});
tl.to(".horizontal-section .list", {
x: () => -moveDistance,
duration: 1
});
</script>