python – T-test in Pandas
python – T-test in Pandas
it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as:
from scipy.stats import ttest_ind
cat1 = my_data[my_data[Category]==cat1]
cat2 = my_data[my_data[Category]==cat2]
ttest_ind(cat1[values], cat2[values])
>>> (1.4927289925706944, 0.16970867501294376)
it returns a tuple with the t-statistic & the p-value
see here for other t-tests http://docs.scipy.org/doc/scipy/reference/stats.html
EDIT: I had not realized this was about the data format. You could use
import pandas as pd
import scipy
two_data = pd.DataFrame(data, index=data[Category])
Then accessing the categories is as simple as
scipy.stats.ttest_ind(two_data.loc[cat], two_data.loc[cat2], equal_var=False)
The loc operator
accesses rows by label.
one sided or two sided dependent or independent
If you have two independent samples but you do not know that they have equal variance, you can use Welchs t-test. It is as simple as
scipy.stats.ttest_ind(cat1[values], cat2[values], equal_var=False)
For reasons to prefer Welchs test, see https://stats.stackexchange.com/questions/305/when-conducting-a-t-test-why-would-one-prefer-to-assume-or-test-for-equal-vari.
For two dependent samples, you can use
scipy.stats.ttest_rel(cat1[values], cat2[values])
python – T-test in Pandas
I simplify the code a little bit.
from scipy.stats import ttest_ind
ttest_ind(*my_data.groupby(Category)[value].apply(lambda x:list(x)))