python – Convert list of dictionaries to a pandas DataFrame
python – Convert list of dictionaries to a pandas DataFrame
Supposing d
is your list of dicts, simply:
df = pd.DataFrame(d)
Note: this does not work with nested data.
How do I convert a list of dictionaries to a pandas DataFrame?
The other answers are correct, but not much has been explained in terms of advantages and limitations of these methods. The aim of this post will be to show examples of these methods under different situations, discuss when to use (and when not to use), and suggest alternatives.
DataFrame()
, DataFrame.from_records()
, and .from_dict()
Depending on the structure and format of your data, there are situations where either all three methods work, or some work better than others, or some dont work at all.
Consider a very contrived example.
np.random.seed(0)
data = pd.DataFrame(
np.random.choice(10, (3, 4)), columns=list(ABCD)).to_dict(r)
print(data)
[{A: 5, B: 0, C: 3, D: 3},
{A: 7, B: 9, C: 3, D: 5},
{A: 2, B: 4, C: 7, D: 6}]
This list consists of records with every keys present. This is the simplest case you could encounter.
# The following methods all produce the same output.
pd.DataFrame(data)
pd.DataFrame.from_dict(data)
pd.DataFrame.from_records(data)
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
Word on Dictionary Orientations: orient=index
/columns
Before continuing, it is important to make the distinction between the different types of dictionary orientations, and support with pandas. There are two primary types: columns, and index.
orient=columns
Dictionaries with the columns orientation will have their keys correspond to columns in the equivalent DataFrame.
For example, data
above is in the columns orient.
data_c = [
{A: 5, B: 0, C: 3, D: 3},
{A: 7, B: 9, C: 3, D: 5},
{A: 2, B: 4, C: 7, D: 6}]
pd.DataFrame.from_dict(data_c, orient=columns)
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
Note: If you are using pd.DataFrame.from_records
, the orientation is assumed to be columns (you cannot specify otherwise), and the dictionaries will be loaded accordingly.
orient=index
With this orient, keys are assumed to correspond to index values. This kind of data is best suited for pd.DataFrame.from_dict
.
data_i ={
0: {A: 5, B: 0, C: 3, D: 3},
1: {A: 7, B: 9, C: 3, D: 5},
2: {A: 2, B: 4, C: 7, D: 6}}
pd.DataFrame.from_dict(data_i, orient=index)
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
This case is not considered in the OP, but is still useful to know.
Setting Custom Index
If you need a custom index on the resultant DataFrame, you can set it using the index=...
argument.
pd.DataFrame(data, index=[a, b, c])
# pd.DataFrame.from_records(data, index=[a, b, c])
A B C D
a 5 0 3 3
b 7 9 3 5
c 2 4 7 6
This is not supported by pd.DataFrame.from_dict
.
Dealing with Missing Keys/Columns
All methods work out-of-the-box when handling dictionaries with missing keys/column values. For example,
data2 = [
{A: 5, C: 3, D: 3},
{A: 7, B: 9, F: 5},
{B: 4, C: 7, E: 6}]
# The methods below all produce the same output.
pd.DataFrame(data2)
pd.DataFrame.from_dict(data2)
pd.DataFrame.from_records(data2)
A B C D E F
0 5.0 NaN 3.0 3.0 NaN NaN
1 7.0 9.0 NaN NaN NaN 5.0
2 NaN 4.0 7.0 NaN 6.0 NaN
Reading Subset of Columns
What if I dont want to read in every single column? You can easily specify this using the columns=...
parameter.
For example, from the example dictionary of data2
above, if you wanted to read only columns A, D, and F, you can do so by passing a list:
pd.DataFrame(data2, columns=[A, D, F])
# pd.DataFrame.from_records(data2, columns=[A, D, F])
A D F
0 5.0 3.0 NaN
1 7.0 NaN 5.0
2 NaN NaN NaN
This is not supported by pd.DataFrame.from_dict
with the default orient columns.
pd.DataFrame.from_dict(data2, orient=columns, columns=[A, B])
ValueError: cannot use columns parameter with orient=columns
Reading Subset of Rows
Not supported by any of these methods directly. You will have to iterate over your data and perform a reverse delete in-place as you iterate. For example, to extract only the 0th and 2nd rows from data2
above, you can use:
rows_to_select = {0, 2}
for i in reversed(range(len(data2))):
if i not in rows_to_select:
del data2[i]
pd.DataFrame(data2)
# pd.DataFrame.from_dict(data2)
# pd.DataFrame.from_records(data2)
A B C D E
0 5.0 NaN 3 3.0 NaN
1 NaN 4.0 7 NaN 6.0
The Panacea: json_normalize
for Nested Data
A strong, robust alternative to the methods outlined above is the json_normalize
function which works with lists of dictionaries (records), and in addition can also handle nested dictionaries.
pd.json_normalize(data)
A B C D
0 5 0 3 3
1 7 9 3 5
2 2 4 7 6
pd.json_normalize(data2)
A B C D E
0 5.0 NaN 3 3.0 NaN
1 NaN 4.0 7 NaN 6.0
Again, keep in mind that the data passed to json_normalize
needs to be in the list-of-dictionaries (records) format.
As mentioned, json_normalize
can also handle nested dictionaries. Heres an example taken from the documentation.
data_nested = [
{counties: [{name: Dade, population: 12345},
{name: Broward, population: 40000},
{name: Palm Beach, population: 60000}],
info: {governor: Rick Scott},
shortname: FL,
state: Florida},
{counties: [{name: Summit, population: 1234},
{name: Cuyahoga, population: 1337}],
info: {governor: John Kasich},
shortname: OH,
state: Ohio}
]
pd.json_normalize(data_nested,
record_path=counties,
meta=[state, shortname, [info, governor]])
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
For more information on the meta
and record_path
arguments, check out the documentation.
Summarising
Heres a table of all the methods discussed above, along with supported features/functionality.
* Use orient=columns
and then transpose to get the same effect as orient=index
.
python – Convert list of dictionaries to a pandas DataFrame
In pandas 16.2, I had to do pd.DataFrame.from_records(d)
to get this to work.