python – Pandas: convert dtype object to int
python – Pandas: convert dtype object to int
Documenting the answer that worked for me based on the comment by @piRSquared.
I needed to convert to a string first, then an integer.
>>> df[purchase].astype(str).astype(int)
pandas >= 1.0
convert_dtypes
The (self) accepted answer doesnt take into consideration the possibility of NaNs in object columns.
df = pd.DataFrame({
a: [1, 2, np.nan],
b: [True, False, np.nan]}, dtype=object)
df
a b
0 1 True
1 2 False
2 NaN NaN
df[a].astype(str).astype(int) # raises ValueError
This chokes because the NaN is converted to a string nan, and further attempts to coerce to integer will fail. To avoid this issue, we can soft-convert columns to their corresponding nullable type using convert_dtypes
:
df.convert_dtypes()
a b
0 1 True
1 2 False
2 <NA> <NA>
df.convert_dtypes().dtypes
a Int64
b boolean
dtype: object
If your data has junk text mixed in with your ints, you can use pd.to_numeric
as an initial step:
s = pd.Series([1, 2, ...])
s.convert_dtypes() # converts to string, which is not what we want
0 1
1 2
2 ...
dtype: string
# coerces non-numeric junk to NaNs
pd.to_numeric(s, errors=coerce)
0 1.0
1 2.0
2 NaN
dtype: float64
# one final `convert_dtypes` call to convert to nullable int
pd.to_numeric(s, errors=coerce).convert_dtypes()
0 1
1 2
2 <NA>
dtype: Int64
python – Pandas: convert dtype object to int
Its simple
pd.factorize(df.purchase)[0]
Example:
labels, uniques = pd.factorize([b, b, a, c, b])`
labels
# array([0, 0, 1, 2, 0])
uniques
# array([b, a, c], dtype=object)