python – Pandas DataFrame Groupby two columns and get counts
python – Pandas DataFrame Groupby two columns and get counts
You are looking for size
:
In [11]: df.groupby([col5, col2]).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the second question), but slightly cleaner, is to groupby the level:
In [12]: df.groupby([col5, col2]).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
Followed by @Andys answer, you can do following to solve your second question:
In [56]: df.groupby([col5,col2]).size().reset_index().groupby(col2)[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
python – Pandas DataFrame Groupby two columns and get counts
Idiomatic solution that uses only a single groupby
(df.groupby([col5, col2]).size()
.sort_values(ascending=False)
.reset_index(name=count)
.drop_duplicates(subset=col2))
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.