python pandas extract year from datetime: df[year] = df[date].year is not working

python pandas extract year from datetime: df[year] = df[date].year is not working

If youre running a recent-ish version of pandas then you can use the datetime attribute dt to access the datetime components:

In [6]:

df[date] = pd.to_datetime(df[date])
df[year], df[month] = df[date].dt.year, df[date].dt.month
df
Out[6]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

EDIT

It looks like youre running an older version of pandas in which case the following would work:

In [18]:

df[date] = pd.to_datetime(df[date])
df[year], df[month] = df[date].apply(lambda x: x.year), df[date].apply(lambda x: x.month)
df
Out[18]:
        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

Regarding why it didnt parse this into a datetime in read_csv you need to pass the ordinal position of your column ([0]) because when True it tries to parse columns [1,2,3] see the docs

In [20]:

t=date   Count
6/30/2010   525
7/30/2010   136
8/31/2010   125
9/30/2010   84
10/29/2010  4469
df = pd.read_csv(io.StringIO(t), sep=s+, parse_dates=[0])
df.info()
<class pandas.core.frame.DataFrame>
Int64Index: 5 entries, 0 to 4
Data columns (total 2 columns):
date     5 non-null datetime64[ns]
Count    5 non-null int64
dtypes: datetime64[ns](1), int64(1)
memory usage: 120.0 bytes

So if you pass param parse_dates=[0] to read_csv there shouldnt be any need to call to_datetime on the date column after loading.

This works:

df[date].dt.year

Now:

df[year] = df[date].dt.year
df[month] = df[date].dt.month

gives this data frame:

        date  Count  year  month
0 2010-06-30    525  2010      6
1 2010-07-30    136  2010      7
2 2010-08-31    125  2010      8
3 2010-09-30     84  2010      9
4 2010-10-29   4469  2010     10

python pandas extract year from datetime: df[year] = df[date].year is not working

When to use dt accessor

A common source of confusion revolves around when to use .year and when to use .dt.year.

The former is an attribute for pd.DatetimeIndex objects; the latter for pd.Series objects. Consider this dataframe:

df = pd.DataFrame({Dates: pd.to_datetime([2018-01-01, 2018-10-20, 2018-12-25])},
                  index=pd.to_datetime([2000-01-01, 2000-01-02, 2000-01-03]))

The definition of the series and index look similar, but the pd.DataFrame constructor converts them to different types:

type(df.index)     # pandas.tseries.index.DatetimeIndex
type(df[Dates])  # pandas.core.series.Series

The DatetimeIndex object has a direct year attribute, while the Series object must use the dt accessor. Similarly for month:

df.index.month               # array([1, 1, 1])
df[Dates].dt.month.values  # array([ 1, 10, 12], dtype=int64)

A subtle but important difference worth noting is that df.index.month gives a NumPy array, while df[Dates].dt.month gives a Pandas series. Above, we use pd.Series.values to extract the NumPy array representation.

Leave a Reply

Your email address will not be published. Required fields are marked *