3.2 Transformers

Lesson

3.2 Transformers

Transformers are the workhorses of your data pipelines—they clean, reshape, and enhance the data flowing through your system. Use these blocks to clean, transform, and enhance data from other blocks. Every data engineering project requires transformation logic, and Mage makes this process intuitive and modular.

Understanding data transformation

All blocks (except Scratchpads and Sensors) pass their data from the return statement in their decorated function to all their downstream blocks. This creates a clear data flow where each transformer receives data from its upstream blocks, applies transformations, and passes the results forward.

The power of transformers lies in their modularity. Instead of writing one large transformation script, you can break complex logic into smaller, testable pieces that can be reused across different pipelines.

Transformer block structure

Every transformer follows this basic pattern:

@transformer
def transform_df(df: DataFrame, *args, **kwargs) -> DataFrame:
    # Your transformation logic here
    return df

Data transformation types in Mage

When creating transformer blocks in Mage, you can choose from several built-in templates that handle common data transformation scenarios:

Python transformers:

  • Generic (no template): Start with a blank Python transformer for custom logic

  • Clean column names: Standardize column naming conventions and remove special characters

  • Remove duplicate rows: Eliminate duplicate records based on specified criteria

  • Select columns: Choose specific columns to keep while dropping others

  • Filter rows: Apply conditional filtering logic to subset your data

SQL transformers:

  • Generic SQL: Write custom SQL transformation logic with full database capabilities

  • Automated SQL: Use Mage's visual interface for simple transformations without writing code

  • Raw SQL: Handle complex SQL operations with multiple statements and advanced database features

Language-specific options:

  • Python transformer: For pandas-based data manipulation and complex business logic

  • SQL transformer: For database-native operations and optimized query performance

  • R transformer: For statistical analysis and specialized R packages

  • PySpark transformer: For big data processing and distributed computing

Python data transformation block template:

Conclusion

Transformers represent the analytical heart of your data engineering workflows in Mage. By breaking complex data processing into modular, reusable blocks, you create pipelines that are not only more maintainable but also more reliable and easier to debug. Whether you're cleaning messy data with Python, performing complex aggregations with SQL, or building sophisticated features for machine learning, Mage's transformer ecosystem provides the tools and flexibility you need.