6.4 Practice exercise

Scenario

You're working for NYC's Performance Analytics team and need to create a simple bronze layer table for your 311 service request data. This will be your first step in building the analytics pipeline.

Exercise requirements

Create a raw SQL block that will:

Input: Raw 311 CSV data from your data loader block
Output: A simple bronze_311_requests table
Goal: Store the raw data with basic column renaming

Step-by-step implementation

Step 1: Add raw SQL block

In your pipeline, add a new transformation block
Select "SQL" from the block types
Check the "Raw SQL" checkbox
Name the block: create_bronze_table

Step 2: Configure connection

Connection: Select your database connection
Profile: Choose your profile from io_config.yml

Step 3: Write simple raw SQL:

-- Create a simple table for 311 requests
CREATE TABLE IF NOT EXISTS bronze_311_requests (
    unique_key VARCHAR(50),
    created_date VARCHAR(50),
    agency VARCHAR(10),
    complaint_type VARCHAR(100),
    borough VARCHAR(20),
    status VARCHAR(20)
);

-- Insert data from upstream block
INSERT INTO bronze_311_requests
SELECT 
    "Unique Key" as unique_key,
    "Created Date" as created_date,
    "Agency" as agency,
    "Complaint Type" as complaint_type,
    "Borough" as borough,
    "Status" as status
FROM {{ df_1 }};

-- Show a quick summary
SELECT COUNT(*) as total_records FROM

Why use raw SQL?

This exercise uses Raw SQL because it includes:

Multiple statements: CREATE TABLE and INSERT statements
DDL operations: Creating the table structure
Data reference: Using {{ df_1 }} to access upstream data

Expected results

A new bronze_311_requests table in your database
All 311 records with simplified column names
A count showing total records processed

This simple bronze layer gives you hands-on experience with Raw SQL blocks while keeping the complexity manageable for your first SQL block exercise.

Next lesson

Made in Silicon Valley

Platform

Docs

Community