Monday, May 15, 2017

Using the crosstab() function in Pandas

If you are doing Data Science or Machine Learning in Python, chances are you will come cross a function named crosstbab() frequently.
So, what does it do? Let me explain that with an example.

From the documentation, the crosstab() function computes a simple cross-tabulation of two (or more) factors.

Consider the following example:

import pandas as pd
df = pd.DataFrame(
    {
        "Gender": ['Male','Male','Female','Female','Female'], 
        "Team" : [1,2,3,3,1]
    })
print(df)

The above code snippet will print out the following:

   Gender  Team
0    Male     1
1    Male     2
2  Female     3
3  Female     3
4  Female     1

Now, it would be useful to see the distribution of genders across the teams, and this is where the crosstab() function comes in:

print("Displaying the distribution of genders in each team")
print(pd.crosstab(df.Gender, df.Team))

The above code snippet will print out the following:

Displaying the distribution of genders in each team
Team    1  2  3
Gender         
Female  1  0  2
Male    1  1  0

Now you know!

No comments: