python – Pandas: convert dtype object to int

python – Pandas: convert dtype object to int

Documenting the answer that worked for me based on the comment by @piRSquared.

I needed to convert to a string first, then an integer.

>>> df[purchase].astype(str).astype(int)

pandas >= 1.0

convert_dtypes

The (self) accepted answer doesnt take into consideration the possibility of NaNs in object columns.

df = pd.DataFrame({
     a: [1, 2, np.nan], 
     b: [True, False, np.nan]}, dtype=object) 
df                                                                         

     a      b
0    1   True
1    2  False
2  NaN    NaN

df[a].astype(str).astype(int) # raises ValueError

This chokes because the NaN is converted to a string nan, and further attempts to coerce to integer will fail. To avoid this issue, we can soft-convert columns to their corresponding nullable type using convert_dtypes:

df.convert_dtypes()                                                        

      a      b
0     1   True
1     2  False
2  <NA>   <NA>

df.convert_dtypes().dtypes                                                 

a      Int64
b    boolean
dtype: object

If your data has junk text mixed in with your ints, you can use pd.to_numeric as an initial step:

s = pd.Series([1, 2, ...])
s.convert_dtypes()  # converts to string, which is not what we want

0      1
1      2
2    ...
dtype: string 

# coerces non-numeric junk to NaNs
pd.to_numeric(s, errors=coerce)

0    1.0
1    2.0
2    NaN
dtype: float64

# one final `convert_dtypes` call to convert to nullable int
pd.to_numeric(s, errors=coerce).convert_dtypes() 

0       1
1       2
2    <NA>
dtype: Int64

python – Pandas: convert dtype object to int

Its simple

pd.factorize(df.purchase)[0]

Example:

labels, uniques = pd.factorize([b, b, a, c, b])`
labels
# array([0, 0, 1, 2, 0])
uniques
# array([b, a, c], dtype=object)

Leave a Reply

Your email address will not be published. Required fields are marked *