python – Pandas DataFrame Groupby two columns and get counts

python – Pandas DataFrame Groupby two columns and get counts

You are looking for size:

In [11]: df.groupby([col5, col2]).size()
Out[11]:
col5  col2
1     A       1
      D       3
2     B       2
3     A       3
      C       1
4     B       1
5     B       2
6     B       1
dtype: int64

To get the same answer as waitingkuo (the second question), but slightly cleaner, is to groupby the level:

In [12]: df.groupby([col5, col2]).size().groupby(level=1).max()
Out[12]:
col2
A       3
B       2
C       1
D       3
dtype: int64

Followed by @Andys answer, you can do following to solve your second question:

In [56]: df.groupby([col5,col2]).size().reset_index().groupby(col2)[[0]].max()
Out[56]: 
      0
col2   
A     3
B     2
C     1
D     3

python – Pandas DataFrame Groupby two columns and get counts

Idiomatic solution that uses only a single groupby

(df.groupby([col5, col2]).size() 
   .sort_values(ascending=False) 
   .reset_index(name=count) 
   .drop_duplicates(subset=col2))

  col5 col2  count
0    3    A      3
1    1    D      3
2    5    B      2
6    3    C      1

Explanation

The result of the groupby size method is a Series with col5 and col2 in the index. From here, you can use another groupby method to find the maximum value of each value in col2 but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2 with the drop_duplicates method.

Leave a Reply

Your email address will not be published. Required fields are marked *