Feature Engineering - Count unique values

First published on February 25, 2022

Last updated at March 28, 2022

 

4 minute read

Nathaniel Tjandra

Growth

TLDR

In this Mage Academy lesson on feature engineering, we’ll learn how to get the count of unique values in a dataset and learn about ways we can use this information.

Glossary

  • Definition

  • Conceptual example

  • How to code

  • Magical no-code solution ✨🔮

Definition

To start off, we’ll take a look at what it means for a value to be unique. If a value is unique, then it appears at least one or more times in a dataset. Next, when we take the count, we’re looking at the number of occurrences in a dataset. Combining count and unique let us determine the number of occurrences of all values.

Conceptual example

Knowing the number of occurrences of all values can be very helpful when running a business, like a revolving sushi bar. For those that haven’t been a revolving sushi bar, it’s a fun place where it’s similar to a buffet but it’s not all you can eat. 

It revolves so everyone can try. (Source: The Frontier Post)

For a business with a lot of variety of options counting them one by one is inefficient so this is where counting just the unique values or grouping them together helps the most.

Grouping price by color (Source: DHgate)

Having too many items isn’t a problem so long as we group them by similar features. Here we’ll group dishes that are related in price together by the color of their plate. We’ll use this conveyor sushi dataset and calculate our profits by counting unique values instead of traditionally counting all individual values.

How to code

We have our 

dataset

of all plates sold within a day’s work, along with our price per plate. We’ll use this data to determine what our best selling types of items are and use the information to quickly calculate our revenue for the day.

In Python we can create a dictionary to contain the values, and iterate through each and every one of them. Then we’ll keep track of the values we’ve seen before as our uniques, similar to when we did

count aggregate values

.

In addition to using dictionaries, Python also supports the Counter object which functions exactly the same, but all in one neat package with less clutter for your code.

1
2
# Get the plate color
Counter(df['Color'])

Similarly, Pandas has a built-in 

count()

function that’s accessed by a groupby as well as a unique function. First we’ll use 

unique()

to see how many different plates the restaurant has. 

1
print(df['Color'].unique())

Here we’ll group the colors and use each individual id. 

1
df.groupby('Color')['ID'].count()

By being able to count values, we simplified our data and made it easier to calculate our price. Using the total types of plates purchased, we can multiply the price to find out our total revenue. In this case, our restaurant made $1966. Alternatively, we can also take the

total sum

.

Magical no-code solution ✨🔮

The conveyor belt system was invented to help out small businesses. Do you know what else was invented to help small businesses? Mage.

Our product, Mage, has a built-in button that generates a visual view of all of your unique values and counts up to the top 10 values found. After uploading your data, you can find the no-code on the right corner. Then you can select the column by clicking on a row containing the column.

Want to learn more about machine learning (ML)? Visit 

Mage Academy

! ✨🔮