# python – How to check whether a pandas DataFrame is empty?

## python – How to check whether a pandas DataFrame is empty?

You can use the attribute `df.empty`

to check whether its empty or not:

```
if df.empty:
print(DataFrame is empty!)
```

Source: Pandas Documentation

I use the `len`

function. Its much faster than `empty`

. `len(df.index)`

is even faster.

```
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10000, 4), columns=list(ABCD))
def empty(df):
return df.empty
def lenz(df):
return len(df) == 0
def lenzi(df):
return len(df.index) == 0
%timeit empty(df)
%timeit lenz(df)
%timeit lenzi(df)
10000 loops, best of 3: 13.9 µs per loop
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 695 ns per loop
len on index seems to be faster
```

#### python – How to check whether a pandas DataFrame is empty?

To see if a dataframe is empty, I argue that one should test for the **length of a dataframes columns index**:

```
if len(df.columns) == 0: 1
```

## Reason:

According to the Pandas Reference API, there is a distinction between:

- an empty dataframe with 0 rows and
*0 columns* - an empty dataframe with rows containing
`NaN`

hence*at least 1 column*

Arguably, they are not the same. The other answers are imprecise in that `df.empty`

, `len(df)`

, or `len(df.index)`

make no distinction and return **index is 0** and **empty is True** in both cases.

## Examples

Example 1: An empty dataframe with 0 rows and 0 columns

```
In [1]: import pandas as pd
df1 = pd.DataFrame()
df1
Out[1]: Empty DataFrame
Columns: []
Index: []
In [2]: len(df1.index) # or len(df1)
Out[2]: 0
In [3]: df1.empty
Out[3]: True
```

Example 2: A dataframe which is emptied to 0 rows but still retains `n`

columns

```
In [4]: df2 = pd.DataFrame({AA : [1, 2, 3], BB : [11, 22, 33]})
df2
Out[4]: AA BB
0 1 11
1 2 22
2 3 33
In [5]: df2 = df2[df2[AA] == 5]
df2
Out[5]: Empty DataFrame
Columns: [AA, BB]
Index: []
In [6]: len(df2.index) # or len(df2)
Out[6]: 0
In [7]: df2.empty
Out[7]: True
```

Now, building on the previous examples, in which the *index is 0* and *empty is True*. When reading the **length of the columns index** for the first loaded dataframe df1, it returns 0 columns to prove that it is indeed empty.

```
In [8]: len(df1.columns)
Out[8]: 0
In [9]: len(df2.columns)
Out[9]: 2
```

**Critically**, while the second dataframe df2 contains no data, it is **not completely empty** because it returns the amount of empty columns that persist.

## Why it matters

Lets add a new column to these dataframes to understand the implications:

```
# As expected, the empty column displays 1 series
In [10]: df1[CC] = [111, 222, 333]
df1
Out[10]: CC
0 111
1 222
2 333
In [11]: len(df1.columns)
Out[11]: 1
# Note the persisting series with rows containing `NaN` values in df2
In [12]: df2[CC] = [111, 222, 333]
df2
Out[12]: AA BB CC
0 NaN NaN 111
1 NaN NaN 222
2 NaN NaN 333
In [13]: len(df2.columns)
Out[13]: 3
```

It is evident that the original columns in df2 have re-surfaced. Therefore, it is prudent to instead read the **length of the columns index** with `len(pandas.core.frame.DataFrame.columns)`

to see if a dataframe is empty.

## Practical solution

```
# New dataframe df
In [1]: df = pd.DataFrame({AA : [1, 2, 3], BB : [11, 22, 33]})
df
Out[1]: AA BB
0 1 11
1 2 22
2 3 33
# This data manipulation approach results in an empty df
# because of a subset of values that are not available (`NaN`)
In [2]: df = df[df[AA] == 5]
df
Out[2]: Empty DataFrame
Columns: [AA, BB]
Index: []
# NOTE: the df is empty, BUT the columns are persistent
In [3]: len(df.columns)
Out[3]: 2
# And accordingly, the other answers on this page
In [4]: len(df.index) # or len(df)
Out[4]: 0
In [5]: df.empty
Out[5]: True
```

```
# SOLUTION: conditionally check for empty columns
In [6]: if len(df.columns) != 0: # <--- here
# Do something, e.g.
# drop any columns containing rows with `NaN`
# to make the df really empty
df = df.dropna(how=all, axis=1)
df
Out[6]: Empty DataFrame
Columns: []
Index: []
# Testing shows it is indeed empty now
In [7]: len(df.columns)
Out[7]: 0
```

Adding a new data series works as expected without the re-surfacing of empty columns (factually, without any series that were containing rows with only `NaN`

):

```
In [8]: df[CC] = [111, 222, 333]
df
Out[8]: CC
0 111
1 222
2 333
In [9]: len(df.columns)
Out[9]: 1
```