Building once, reusing everywhere: global data products
The Challenge
Imagine a common scenario: your finance team needs a report on monthly revenue, your marketing team wants to analyze customer lifetime value, and your data science team is building a churn prediction model. All these initiatives rely on similar core data, like customer transaction history or aggregated sales figures. Without a centralized, trustworthy source for these common data assets, teams often end up re-creating the same data logic in different pipelines. This leads to duplicated effort, inconsistent definitions of key metrics, redundant computations, and a higher risk of errors. It's like every department having its own slightly different version of "the truth," causing confusion and inefficiency.
The Solution: Your Centralized Data Asset Library
Mage addresses this fundamental problem with Global Data Products. Think of a Global Data Product as a curated, versioned, and intelligently managed output of a pipeline that can be easily consumed and reused across your entire organization. It's about building foundational data assets once, ensuring their quality and freshness, and then making them effortlessly available to everyone who needs them, transforming data into a shared, trustworthy resource.
Universal Referencing: Any Global Data Product can be used as a building block within any other pipeline. Other blocks can depend on it just like they would on an in-pipeline block. This fosters a "build once, reuse everywhere" mentality, breaking down data silos.
Lazy Triggering and Smart Reprocessing: Efficiency is key. A Global Data Product only runs when its data is requested and the data is outdated. This intelligent "lazy triggering" prevents unnecessary recomputation, saving valuable compute resources and ensuring that pipelines only run when truly needed.
Configurable Freshness: You have precise control over how "fresh" your data needs to be. You can configure how long a data product stays fresh (e.g., 12 hours, 1 week) and define when it should become eligible for reprocessing based on specific times or dates. This allows for optimized scheduling tailored to the data's use case.
Granular Partition Control: For historical analysis or specific time-windowed reporting, you can customize how much historical data to retrieve by setting partition windows when consuming a Global Data Product. This offers flexibility without requiring new data generation.
Override Settings for Precision: While Global Data Products promote standardization, they also allow flexibility. Pipelines referencing a Global Data Product can override freshness thresholds, output block selections, and partition ranges for precise control over how the data is consumed.
Concurrency Protection: When multiple pipelines or teams simultaneously request the same Global Data Product, Mage ensures it runs only once and shares the output, avoiding duplicate executions and further optimizing resource usage.
Reduced Duplication and Cost Savings: By centralizing reusable pipeline outputs, Global Data Products actively reduce duplication, optimize compute costs, and ensure data consistency across your organization. Every block produces data products that can be partitioned, versioned, and backfilled, making these outputs robust data assets.
Real-World Scenario: Standardized Customer Profile for Multiple Departments
Imagine a large subscription-based service company. The "Customer Profile" is a critical data asset, combining demographic information, subscription tier, historical usage, and support interaction summaries. This profile is needed by:
The Marketing team for personalized campaign targeting.
The Customer Success team for proactive outreach and issue resolution.
The Product team for understanding feature adoption and identifying pain points.
The Analytics team for executive dashboards.
Instead of each team building its own customer profile, the data engineering team can:
Create a core pipeline in Mage that extracts raw customer data, cleans it, and transforms it into a unified "Customer Profile" dataset.
Designate the output of this pipeline as a Global Data Product.
Configure its freshness (e.g., daily update) and enable automatic reprocessing when necessary.
Each downstream team can then easily reference this single "Customer Profile" Global Data Product in their own pipelines. The marketing pipeline might consume it to segment users, the customer success pipeline might join it with live support ticket data, and the product team's dashboards can display metrics derived from it.
If the definition of "active user" (a metric within the customer profile) changes, the data engineering team updates only the central "Customer Profile" pipeline. All consuming pipelines automatically get the updated, consistent definition without any manual updates or risk of conflicting data.
By implementing Global Data Products, the company fosters a culture of data reuse and trust. Teams move faster, data quality improves, and the entire organization operates from a single, consistent version of their most important data assets.
The limitless possibilities with Mage
Effortless migration from legacy data tools
Deploying your way: SaaS, Hybrid, Private, and On-Prem Options
Building and automating complex ETL/ELT data pipelines efficiently
AI-powered development and intelligent debugging
The joy of building: a superior developer experience
Fast, accurate insights using AI-powered data analysis
Eliminating stale documentation and fostering seamless collaboration
Enabling lean teams: building fast, scaling smart, staying agile
Accelerating growing teams and mid-sized businesses