python filter list of dictionaries based on key value

python filter list of dictionaries based on key value

You can try a list comp

>>> exampleSet = [{type:type1},{type:type2},{type:type2}, {type:type3}]
>>> keyValList = [type2,type3]
>>> expectedResult = [d for d in exampleSet if d[type] in keyValList]
>>> expectedResult
[{type: type2}, {type: type2}, {type: type3}]

Another way is by using filter

>>> list(filter(lambda d: d[type] in keyValList, exampleSet))
[{type: type2}, {type: type2}, {type: type3}]

Use filter, or if the number of dictionaries in exampleSet is too high, use ifilter of the itertools module. It would return an iterator, instead of filling up your systems memory with the entire list at once:

from itertools import ifilter
for elem in ifilter(lambda x: x[type] in keyValList, exampleSet):
    print elem

python filter list of dictionaries based on key value

Trying a few answers from this post, I tested the performance of each answer.

As my initial guess, the list comprehension is way faster, the filter and list method is second and the pandas is third, by far.

defined variables:

import pandas as pd

exampleSet = [{type: type + str(number)} for number in range(0, 1_000_000)]

keyValList = [type21, type950000]


1st – list comprehension

%%timeit
expectedResult = [d for d in exampleSet if d[type] in keyValList]

60.7 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

2nd – filter and list

%%timeit
expectedResult = list(filter(lambda d: d[type] in keyValList, exampleSet))

94 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

3rd – pandas

%%timeit
df = pd.DataFrame(exampleSet)
expectedResult = df[df[type].isin(keyValList)].to_dict(records)

336 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


On a side note, using pandas to deal with a dict is not a great idea since the pandas.DataFrame is basically a more memory consuming dict and if you are not going to use a dataframe in the end it is just inefficient.

Leave a Reply

Your email address will not be published. Required fields are marked *