python – Count unique values per groups with Pandas

python – Count unique values per groups with Pandas

You need nunique:

df = df.groupby(domain)[ID].nunique()

print (df)
domain
facebook.com    1
google.com      1
twitter.com     2
vk.com          3
Name: ID, dtype: int64

If you need to strip characters:

df = df.ID.groupby([df.domain.str.strip()]).nunique()
print (df)
domain
facebook.com    1
google.com      1
twitter.com     2
vk.com          3
Name: ID, dtype: int64

Or as Jon Clements commented:

df.groupby(df.domain.str.strip())[ID].nunique()

You can retain the column name like this:

df = df.groupby(by=domain, as_index=False).agg({ID: pd.Series.nunique})
print(df)
    domain  ID
0       fb   1
1      ggl   1
2  twitter   2
3       vk   3

The difference is that nunique() returns a Series and agg() returns a DataFrame.

Generally to count distinct values in single column, you can use Series.value_counts:

df.domain.value_counts()

#vk.com          5
#twitter.com     2
#facebook.com    1
#google.com      1
#Name: domain, dtype: int64

To see how many unique values in a column, use Series.nunique:

df.domain.nunique()
# 4

To get all these distinct values, you can use unique or drop_duplicates, the slight difference between the two functions is that unique return a numpy.array while drop_duplicates returns a pandas.Series:

df.domain.unique()
# array([vk.com, twitter.com, facebook.com, google.com], dtype=object)

df.domain.drop_duplicates()
#0          vk.com
#2     twitter.com
#4    facebook.com
#6      google.com
#Name: domain, dtype: object

As for this specific problem, since youd like to count distinct value with respect to another variable, besides groupby method provided by other answers here, you can also simply drop duplicates firstly and then do value_counts():

import pandas as pd
df.drop_duplicates().domain.value_counts()

# vk.com          3
# twitter.com     2
# facebook.com    1
# google.com      1
# Name: domain, dtype: int64

python – Count unique values per groups with Pandas

df.domain.value_counts()

>>> df.domain.value_counts()

vk.com          5

twitter.com     2

google.com      1

facebook.com    1

Name: domain, dtype: int64

Leave a Reply

Your email address will not be published. Required fields are marked *