6.4 Practice exercise

Lesson

6.4 Practice exercise

Scenario

You're working for NYC's Performance Analytics team and need to create a simple bronze layer table for your 311 service request data. This will be your first step in building the analytics pipeline.

Exercise requirements

Create a raw SQL block that will:

  • Input: Raw 311 CSV data from your data loader block

  • Output: A simple bronze_311_requests table

  • Goal: Store the raw data with basic column renaming

Step-by-step implementation

Step 1: Add raw SQL block

  1. In your pipeline, add a new transformation block

  2. Select "SQL" from the block types

  3. Check the "Raw SQL" checkbox

  4. Name the block: create_bronze_table

Step 2: Configure connection

  • Connection: Select your database connection

  • Profile: Choose your profile from io_config.yml

Step 3: Write simple raw SQL:

-- Create a simple table for 311 requests
CREATE TABLE IF NOT EXISTS bronze_311_requests (
    unique_key VARCHAR(50),
    created_date VARCHAR(50),
    agency VARCHAR(10),
    complaint_type VARCHAR(100),
    borough VARCHAR(20),
    status VARCHAR(20)
);

-- Insert data from upstream block
INSERT INTO bronze_311_requests
SELECT 
    "Unique Key" as unique_key,
    "Created Date" as created_date,
    "Agency" as agency,
    "Complaint Type" as complaint_type,
    "Borough" as borough,
    "Status" as status
FROM {{ df_1 }};

-- Show a quick summary
SELECT COUNT(*) as total_records FROM

Why use raw SQL?

This exercise uses Raw SQL because it includes:

  • Multiple statements: CREATE TABLE and INSERT statements

  • DDL operations: Creating the table structure

  • Data reference: Using {{ df_1 }} to access upstream data

Expected results

  • A new bronze_311_requests table in your database

  • All 311 records with simplified column names

  • A count showing total records processed

This simple bronze layer gives you hands-on experience with Raw SQL blocks while keeping the complexity manageable for your first SQL block exercise.