python filter list of dictionaries based on key value
python filter list of dictionaries based on key value
You can try a list comp
>>> exampleSet = [{type:type1},{type:type2},{type:type2}, {type:type3}]
>>> keyValList = [type2,type3]
>>> expectedResult = [d for d in exampleSet if d[type] in keyValList]
>>> expectedResult
[{type: type2}, {type: type2}, {type: type3}]
Another way is by using filter
>>> list(filter(lambda d: d[type] in keyValList, exampleSet))
[{type: type2}, {type: type2}, {type: type3}]
Use filter
, or if the number of dictionaries in exampleSet
is too high, use ifilter
of the itertools
module. It would return an iterator, instead of filling up your systems memory with the entire list at once:
from itertools import ifilter
for elem in ifilter(lambda x: x[type] in keyValList, exampleSet):
print elem
python filter list of dictionaries based on key value
Trying a few answers from this post, I tested the performance of each answer.
As my initial guess, the list comprehension is way faster, the filter
and list
method is second and the pandas
is third, by far.
defined variables:
import pandas as pd
exampleSet = [{type: type + str(number)} for number in range(0, 1_000_000)]
keyValList = [type21, type950000]
1st – list comprehension
%%timeit
expectedResult = [d for d in exampleSet if d[type] in keyValList]
60.7 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2nd – filter
and list
%%timeit
expectedResult = list(filter(lambda d: d[type] in keyValList, exampleSet))
94 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
3rd – pandas
%%timeit
df = pd.DataFrame(exampleSet)
expectedResult = df[df[type].isin(keyValList)].to_dict(records)
336 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
On a side note, using pandas
to deal with a dict
is not a great idea since the pandas.DataFrame
is basically a more memory consuming dict
and if you are not going to use a dataframe in the end it is just inefficient.