5.2 Git commands

Lesson

5.2 Git commands

Git is the foundation of modern software development, and mastering its core commands is essential for any data engineer working with Mage Pro. This section covers the fundamental Git commands you'll use daily to manage your data pipeline code, collaborate with teammates, and maintain a clean development workflow.

Essential Git Commands

Checking your status

Before making any changes, always check the current state of your repository:

git

This command shows you:

  • Which files have been modified

  • Which changes are staged for commit

  • Which branch you're currently on

  • Whether your local branch is ahead or behind the remote

Pro tip: Run git status frequently—it's your compass in the Git workflow and helps prevent common mistakes.

Staging changes

After modifying your Mage pipelines or configuration files, you need to stage your changes:

*# Stage all changes in the current directory*
git add .

*# Stage specific files*
git add my_pipeline.py io_config.yaml

*# Stage changes in specific directories*
git

Think of staging as a way to organize what you want to move permanently to your main codebase. You’ll need to decide what should be included in your next commit.

Creating commits

A commit represents a snapshot of your code at a specific point in time. Write clear, descriptive commit messages:

*# Basic commit*
git commit -m "Add customer segmentation pipeline"

*# More detailed commit with description*
git commit -m "Fix data validation in user analytics pipeline

- Updated null value handling in transformer blocks
- Added data quality tests for user_id column
- Fixed timezone conversion bug in date calculations"

Commit message best practices:

  • Use the imperative mood: "Add feature" not "Added feature"

  • Keep the first line under 50 characters

  • Include specific details about what changed and why

Working with branches

Branches allow you to work on features or experiments without affecting the main codebase. Think of a feature branch as an isolated copy of your project where you can make changes, test your code, and collaborate with teammates before merging the code into the permanent production branch:

*# Create and switch to a new branch*
git checkout -b feature/customer-pipeline

*# Switch between existing branches*
git checkout main
git checkout development

*# See all branches*
git branch

*# See remote branches*
git branch -r

Synchronizing with Remote Repository

Keep your local code in sync with your team's work by regularly fetching updates from the shared repository, pushing your changes for others to access, and maintaining coordination across all development environments:

*# Fetch latest changes from remote (doesn't merge)*
git fetch origin

*# Pull latest changes and merge into current branch*
git pull origin main

*# Push your commits to remote repository*
git push origin feature/customer-pipeline

*# Push and set upstream tracking*
git push -u

Viewing history and changes

Understanding your project's history helps with debugging and collaboration by providing visibility into what changed, when it changed, who made the changes, and why specific decisions were made throughout your data pipeline development:

*# See commit history*
git log

*# Compact history view*
git log --oneline

*# See what changed in the last commit*
git show

*# Compare current changes with last commit*
git diff

*# Compare staged changes*
git diff --cached


Common workflows in Mage Pro

Daily development workflow

Here's a typical sequence you may follow when working on Mage pipelines. The workflow is designed to maintain code quality, prevent conflicts with teammates, and ensure your changes are properly tracked and backed up throughout the development process:

*# 1. Start your day by updating your local code*
git checkout development
git pull origin development

*# 2. Create a feature branch for your work*
git checkout -b feature/revenue-analysis-pipeline

*# 3. Make changes to your pipelines in Mage Pro UI# ... work on your data pipeline ...# 4. Stage and commit your changes*
git add .
git status  *# Always check what you're committing*
git commit -m "Add revenue analysis pipeline with quarterly aggregation"

*# 5. Push to remote repository*
git push -u


Troubleshooting common issues

Setting up Git identity

If you encounter authentication issues, configure your Git identity to establish proper credentials and permissions. This will allow you to access repositories, which is essential for pushing code changes and maintaining accurate commit attribution:

git config --global user.name "Your Name"
git config --global user.email "your.email@company.com"

# Verify your configuration
git config --list

# Set identity for current repository only (overrides global)
git config user.name "Your Name"
git config user.email "your.email@company.com"

Resolving merge conflicts

When Git can't automatically merge changes, you'll need to resolve conflicts manually by identifying conflicting sections, choosing which changes to keep, and ensuring the final code works correctly before completing the merge:

*# After a failed merge, see which files have conflicts*
git status

*# Edit the conflicted files to resolve issues# Look for conflict markers: <<<<<<< ======= >>>>>>># After resolving, stage the files*
git add resolved_file.py

*# Complete the merge*
git commit -m "Resolve merge conflict in pipeline configuration"

This is a very important topic in Git, and is just a basic overview of Git conflicts. For more on this see official Github documentation on merge conflicts.

Undoing changes

Sometimes you need to undo work due to mistakes, failed experiments, or changing requirements, and Git provides several safe methods to revert changes at different stages of the development process:

*# Discard unstaged changes to a file*
git checkout -- filename.py

*# Unstage a file (keep changes)*
git reset HEAD filename.py

*# Undo last commit (keep changes in working directory)*
git reset --soft HEAD~1

*# Undo last commit and discard changes (be careful!)*
git reset --hard


Best practices for data pipeline development

Commit Frequently: Make small, focused commits rather than large ones. This makes it easier to track changes and debug issues.

Use Meaningful Branch Names: Follow conventions like feature/pipeline-name, bugfix/issue-description, or hotfix/critical-fix.

Test Before Pushing: Always run your pipelines in a development or test environment before deploying changes to production.

Keep Commits Atomic: Each commit should represent one logical change. Don't mix pipeline updates with configuration changes in the same commit.

Document Your Changes: Use commit messages to explain not just what changed, but why it changed. Future you (and your teammates) will thank you.

Conclusion

Mastering these Git commands forms the foundation of professional data pipeline development. While Mage Pro's integrated Git terminal simplifies many operations, understanding these fundamentals ensures you can work confidently with version control, collaborate effectively with your team, and maintain clean, traceable code history.

In the next section, we'll explore how to set up deployments in Mage Pro, building on these Git skills to create automated workflows that move your code safely from development to production.