How Mage uses AWS to build a SaaS startup

First published on July 28, 2021

Last updated at April 22, 2022


11 minute read

Tommy Dang



We go over each Amazon Web Service (AWS) that Mage uses and how we use it to build AI SaaS tools for the product developers.

  1. Simple Storage Service (S3)

  2. Redshift

  3. DynamoDB

  4. Relational Database Service (RDS)

  5. ElasticBeanstalk

  6. API Gateway

  7. Elastic Container Registry (ECR)

  8. Elastic Container Service (ECS)

  9. Simple Queue Service (SQS)

  10. Lambda

  11. Elastic Map Reduce (EMR)

  12. Route 53

Simple Storage Service (S3)

Very nice lake

Amazon Simple Storage Service

(S3) is a data storage service. S3 provides access security, easy navigation, and virtually unlimited storage making it one of the best platforms for growing businesses.

How do we use it?

S3 acts as our data lake. We store feature sets, training sets, trained models that have been serialized for storage, metrics from training models, summary statistics for feature sets, batch predictions, feature values for online retrieval during real-time inference, and more. S3 is great at storing unlimited amounts of unstructured data.


Mark Rothko, Untitled (Red and Burgundy Over Blue), 1969. Courtesy of Sotheby’s.

Amazon Redshift

is a tool for analyzing, processing, and storing big data. Having a tool that allows for quick understanding of big data has become essential in creating successful applications and business analytics. Redshift can be used for real-time analytics, log analytics, data reporting, collecting data through warehouses, and more. An alternative is



How do we use it?

When you use Mage to build and deploy an ML model, you can opt into using automatic A/B testing. Whenever you make an API call for a model’s prediction, 50% of your API calls will use the model’s prediction and the other 50% will get a default or random prediction value. Behind the scenes, Mage will automatically bucket your users, calculate downstream experiment metrics, and conclude the experiment when it has converged with statistical significance.

We use Redshift to store experiment assignments. Users are assigned to an experiment and a treatment group as soon as a prediction is attempted for that user. We store these in Redshift so that we can perform SQL-like queries on the experimentation data to calculate its results.


A dynamo is an electrical generator that creates direct current using a commutator. Dynamos were the first electrical generators capable of delivering power for industry, and the foundation upon which many other later electric-power conversion devices were based, including the electric motor, the alternating-current alternator, and the rotary converter. (Wikipedia)

As websites and applications grow, databases must scale to handle the volume.



is a


database containing built-in security, performance that scales, and backup.

How do we use it?

Every time a prediction is made, we store the request payload, feature vector used in the model’s prediction, and the prediction result in DynamoDB. This data is used to debug predictions and report prediction results to users on their dashboard. We use DynamoDB because it can store schemaless data and data stored can be partitioned by several values. We use the prediction’s UUID as the primary key for storing each result.

Relational Database Service (RDS)

The Matrix

Relational databases store information into tables which can then be used to find and sort specific information. Amazon’s

Relational Database Service (RDS)

is a cloud-based RDS that assists in storing data and automating time-consuming tasks.

How do we use it?

We store all our relational data in RDS; workspaces, teams, users, datasets, models, etc. We use MySQL in RDS because it’s a database our team has worked with extensively in the past, especially at Airbnb.


Elastic Beanstalk

is a solution to running and scaling applications in almost any language. Elastic Beanstalk allows for managing multiple AWS services such as infrastructure, deployments, and scale applications.

Beans, beans the magical tart, the more you eat, the more you…

How do we use it?

Mage’s backend service is written in Python and uses the Django framework. We currently use ElasticBeanstalk to run the service. We also use ElasticBeanstalk workers to run our background workers that process messages published to SQS. ElasticBeanstalk was quick, easy, and simple to get up and running. It’s been working great so far but we will eventually migrate to running the backend service in a Docker container on ECS (Elastic Container Service); similar to how we run our other services.

API Gateway

Stargate SG-1

An Application Programming Interface (API) is an intermediary which allows two applications to talk to each other. Using

Amazon API Gateway

allows developers to run APIs simultaneously and monitor their performance at scale.

How do we use it?

Mage’s has a public facing API that is used to stream data into a user’s dataset and another endpoint for retrieving a machine learning model’s prediction. Both of these endpoints are behind Amazon’s API Gateway. This service enables Mage to throttle API requests for each unique user’s API key and allows us to throttle potential bad actors or accidental spikes in traffic volume. That way, no single entity can cause downtime for every other user accessing Mage’s API.

Elastic Container Registry (ECR)

Register your containers or they go overboard



image houses files needed to run an application: source code, libraries, dependencies, tools, etc. Amazon’s

Elastic Container Registry (ECR)

is a fully managed host for Docker images.

How do we use it?

ECR stores all our Docker images and their versions. It’s easy to use and all our Amazon services have access to ECR.

Elastic Container Service (ECS)

Shipping containers to production


packages software into units called containers, which hold everything required to make software run. Using Docker containers ensures your code will run in any environment.

Elastic Container Services

automates, runs, and manages applications running in Docker containers. It connects with all other AWS services making it easy to run container workloads.

How do we use it?

ECS runs our public API service so that users can get real-time predictions from their trained machine learning models via an API request. Mage’s public API service is a Python application using the Flask framework. ECS runs the application in a Docker container. We use auto scaling with ECS so that Mage can handle large amounts of API requests from users. We have 2 clusters on ECS for our API service, 1 for production and 1 for staging. That way, we can test all our changes on staging before deploying it to production and if we catch any bugs on production, we can instantly rollback to a previous working build.

Simple Queue Service (SQS)


Using a messaging queue service allows different parts within your service or application to communicate.


Simple Queue Service (SQS)

is Amazon’s messaging queue service which allows for storing, sending, and receiving messages between software components.

How do we use it?

We use SQS to handle background jobs. When we need to kick off a long running job after a user takes an action on the platform, we’ll publish a message to a queue and our background workers will consume that message and begin executing. For example, when a user adds data from Amplitude, instead of fetching all the data and making the user wait, we fetch the data in the background and notify the user when the data has been completely retrieved and saved as a new feature set.


AWS Lambda

is a serverless computing service which allows developers to run code. Lambda handles all capacity, scaling, and patching of code and provides visibility into its performance. Using Lambda allows developers to focus all attention on writing code instead of backend servers.

How do we use it?

Whenever we need to execute and quickly complete a complex data processing task asynchronously, we use AWS Lambda functions. For example, whenever a user adds new data or if Mage fetches new data from a user’s data warehouse, we will execute Python code in a Lambda function that calculates and extracts a subsample of the entire dataset (aka feature set). At the end of the execution, we upload that subsample data, upload it to S3, then show that to the user when they are viewing the dataset details.

Elastic MapReduce (EMR)


Getting the most out of company data is a top business priority to creating the best product possible. Data processing can often be a time consuming process which is where

Amazon Elastic Map Reduce

comes in. EMR captures, stores, and analyzes data from

Apache Spark 




How do we use it?

Majority of our data processing uses Apache Spark running on EMR. This way, we can process nearly unlimited data relatively quickly. For example, when a user builds a model and before Mage can train it, we create the training set for that model using Spark, PySpark (Python library for accessing the Spark API), and custom scripts running on an EMR cluster.

Route 53

M.C. Escher, “Relativity.” Copyright 2017 The M.C. Escher Company, The Netherlands. All rights reserved.

The domain name system (DNS) is a record of all website names on the internet — translating readable names like

, to a set of numbered IP addresses used to allow computers to communicate.


Route 53

gives developers and users an easy way to route users to their applications.

How do we use it?

Route 53 handles all our domains, subdomains, mx records for emails, etc.


Game of Thrones

We are constantly trying new services and using them in different combinations to deliver the most magical experience for developers on the 


platform. We’ll keep sharing our experience and learnings. If you’re interested in learning more or sharing your experience using AWS, please email us or join our community on 


. We can’t wait to hear from you!

Start building for free

No need for a credit card to get started.
Trying out Mage to build ranking models won’t cost a cent.

No need for a credit card to get started. Trying out Mage to build ranking models won’t cost a cent.