Python pandas groupby aggregate on multiple columns, then pivot

Python pandas groupby aggregate on multiple columns, then pivot

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation.

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns.

rnm_cols = dict(size=Size, sum=Sum, mean=Mean, std=Std)
df.set_index([Category, Item]).stack().groupby(Category) 
  .agg(rnm_cols.keys()).rename(columns=rnm_cols)

            Size   Sum        Mean        Std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

option 1
use agg ← link to docs

agg_funcs = dict(Size=size, Sum=sum, Mean=mean, Std=std)
df.set_index([Category, Item]).stack().groupby(level=0).agg(agg_funcs)

                  Std   Sum        Mean  Size
Category                                     
Books        2.081666    58   19.333333     3
Clothes      4.041452   148   49.333333     3
Technology  70.710678  1800  300.000000     6

option 2
more for less
use describe ← link to docs

df.set_index([Category, Item]).stack().groupby(level=0).describe().unstack()

            count        mean        std    min    25%    50%    75%    max
Category                                                                   
Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0
Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0
Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0
df.groupby(Category).agg({Item:size,shop1:[sum,mean,std],shop2:[sum,mean,std],shop3:[sum,mean,std]})

Or if you want it across all shops then:

df1 = df.set_index([Item,Category]).stack().reset_index().rename(columns={level_2:Shops,0:costs})
df1.groupby(Category).agg({Item:size,costs:[sum,mean,std]})

Python pandas groupby aggregate on multiple columns, then pivot

If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. To do that, you can first stack your dataframe and then group by Category:

stacked = df.set_index([Item, Category]).stack().reset_index()
stacked.columns = [Item, Category, Shop, Price]
stacked.groupby(Category).agg({Price:[count,sum,mean,std]})

Which results in

           Price                             
           count   sum        mean        std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

Leave a Reply

Your email address will not be published. Required fields are marked *