Mage Blog

Growth

John Patrick Hinek

Growth Marketing Associate

1*0Pqqx6wGDfFm_7GLebg2Hw.png

1*zVbB8m4GvAU1Z0NcNldu7A.png

Mathematical equations lay the foundation for AI. For basic applications of AI, being a math expert isn’t necessary as there are plenty of tutorials and pre-built libraries maintained by the open source community. However, learning some math concepts will help you understand why an algorithm works the way it does and how it can be improved.

), compared to a scalar which is a single number (

A vector is a mathematical way of getting from one point to another. Vectors need to travel some distance which is measured in magnitude (length) and direction (orientation).

Vectors are used in machine learning for their ability to organize data. One example of a commonly used vector in machine learning is a feature vector. These contain information on multiple elements of an entity; for example, a feature vector can contain 5 values: the age of a user, the gender of a user, their height, their weight, and their current city. Feature vectors represent attributes of an entity in a way that a machine learning model can easily perform calculations on.

Vectors are written into Python using Numpy. They can be created using the function 

Vectors can be added, subtracted, divided, multiplied, etc.

1*zz27BuQfvnM7dMNHAS4MGQ.png

A Matrix is a 2-dimensional array (table) of numbers with at least one column and at least one row. The array in a matrix can be represented as having 

. A vector can also be considered a matrix. Row vectors can be represented as 

A matrix is measured by its size which is the number of rows by the number of columns.

Matrices are used to represent data that machine learning models learn from. By structuring the data in a matrix, machine learning algorithms can leverage linear algebra and matrix multiplication for efficient calculations across large amounts of data.

In Python, a matrix is represented by a 2-D Numpy array:

Like vectors, matrices can be added, subtracted, divided, multiplied, etc. Matrices of the same size are added together by adding the corresponding data points.

Cost function is used to optimize a machine learning model to make better predictions. The cost function calculates the error between predicted outcomes compared with actual outcomes. The goal of training a machine learning model is to minimize the error from this cost.

To minimize the cost function error so a line best fits the dataset, we need to use a gradient descent algorithm. Gradient descent is an optimization algorithm that finds the minimum of a function — finding the value that’ll give us the lowest error.

(Image)

1*ZXgcrAtJT6k9wvKwjxZF7Q.png

Gradient descent works by taking steps to find the minimum value of the parabola graph.

As the graph slopes down towards the x-axis, theta (measure of an angle) will increase as the negative slope is turned positive by alpha. As the graph goes up past the minimum, theta will become negative, signaling it has passed the minimum and needs to backtrack. Repeatedly running this algorithm will leave us with the minimum.

1*f0CuPDSWFUr9XGESWQ4JUA.png

There are 2 main categories that most machine learning problems belong to: classification and regression.

Classification is the process of taking an input and mapping it to a discrete label. Classification problems typically deal with classes or categories. For example, a photo of a cat (input) and mapping it to be labeled either cat or dog.

Regression allows us to predict a continuous outcome variable based on predictor variables. Generally, regression problems deal with real numbers. For example, trying to predict home prices based on recent sales in an area.

Evaluation metrics explain the performance of a model and can provide insight to areas that might need improvement.

In our example, we built an ML model to predict a color. Every time the model chooses a color, a score will be added to their box. Actual values are on the y-axis and the model’s predicted values are on the x-axis.

Accuracy is the total number of correct predictions divided by the total number of predictions made by the model. Accuracy works well on balanced data: a dataset with approximately the same number values across each category present. For example, 100 dogs and 95 cats is a balanced dataset. An unbalanced dataset is 100 dogs and 11 cats.

1*hRjDQP_dXgFf4DZaXnPJxA.png

18 (correct predictions) / 22 (total number of predictions) = 0.81

1*6AhwZiAXom5I6IyDtlQZgA.png

15 (correct predictions) / 22 (total number of predictions) = 0.68

Classification accuracy can give a false sense of achieving high performance. Such can occur when there is a disproportionate number of 1 color over the other (e.g. 100 blues and only 11 reds).

Models can be evaluated based on four features:

Precision is the number of correct positive results divided by the number of positive results predicted by the ML model. Using precision, we discover the amount of positive identifications that were actually correct. Precision is helpful for understanding how good a model is at predicting a specific category. This metric is helpful in multi-category classification problems. For example, predicting which color car to sell.

Precision = True positive / (True positives + False positive)

Referring back to our model 1 color identifier from above, the true positive value would be 10 and the false positive value would be 3.

Recall is the number of all positive results divided by the number of all relevant samples. Recall answers the question: what proportion of actual positives were identified correctly? A model that produces no false negatives has a recall of 1. Recall is useful when assessing whether a model can effectively detect the occurrence of a specific category. For example, if you need a model that can catch all fraudsters on your website then you’ll want a model that has high recall. A model with high recall may have low precision (gets a lot of predictions wrong) but it’s really good at detecting all the fraudsters.

Recall = True positive / (True positive + False negative)

Referring to our color identifier model, the true positives are again 10 and the false negative is 1.

1*sGiq4D5jFHKqWDn1Cj3TLg.jpg

F1 score tries to find the balance between precision and recall. It tells how precise and robust a classifier is and ranges on a scale of [0,1]. The greater the F1 score, the better the performance of the model.

1*71znpTOxhKmBu5Hh7Q67lQ.png

Engineering

Fundamentals of being an AI/ML sorcerer supreme

Harnessing the power of AI can be magical and impactful. Before mastering this ability, one must understand the basic foundation of AI. We’ll go over basic math concepts, common prediction types, and popular algorithms.