Python Killed: 9 when running a code using dictionaries created from 2 csv files

Python Killed: 9 when running a code using dictionaries created from 2 csv files

Most likely kernel kills it because your script consumes too much of memory.
You need to take different approach and try to minimize size of data in memory.

You may also find this question useful: Very large matrices using Python and NumPy

In the following code snippet I tried to avoid loading huge data1.csv into memory by processing it line-by-line. Give it a try.

import csv

from collections import OrderedDict # to save keys order

with open(data.csv, rb) as csvfile:
    reader = csv.reader(csvfile, delimiter=,)
    next(reader) #skip header
    d = OrderedDict((rows[2], {val: rows[1], flag: False}) for rows in reader)

with open(data1.csv, rb) as csvfile:
    reader = csv.reader(csvfile, delimiter=,)
    next(reader) #skip header
    for rows in reader:
        if rows[0] in d:
            d[rows[0]][flag] = True

import sys
sys.stdout = open(rs_pos_ref_alt.csv, w)

for k, v in d.iteritems():
    if v[flag]:
        print [v[val], k]

First off, create a python script and run the following code to find all Python processes.

import subprocess

wmic_cmd = wmic process where name=python.exe or name=pythonw.exe get commandline,processid
wmic_prc = subprocess.Popen(wmic_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
wmic_out, wmic_err = wmic_prc.communicate()
pythons = [item.rsplit(None, 1) for item in wmic_out.splitlines() if item][1:]
pythons = [[cmdline, int(pid)] for [cmdline, pid] in pythons]
for line in pythons:
    cv = str(line).split(\)
    cb=str(cv).strip()
    fin = cv[-1]
    if fin[0:11] != pythonw.exe:
        print pythonw.exe, fin
    if fin[0:11] != python.exe:
        print python.exe, fin

After you have run it, paste the output, here in the questions section, where I will see a notification.

*EDIT

List all process and post them in your answer, use the following:

import psutil
for process in psutil.process_iter():
    print process

Python Killed: 9 when running a code using dictionaries created from 2 csv files

How much memory does your computer have?

You can add a couple of optimizations that will save some memory, and if thats not enough, you can trade-off some CPU and IO for better memory efficiency.

If youre only comparing the keys and dont really do anything with the values, you can extract only the keys:

d1 = set([rows[0] for rows in my_data1])

Then instead of OrderedDict, you can try using ordered set either from this answer — Does python has ordered set or using ordered-set
module from pypi.

Once you got all the intersecting keys, you can write another program that looks up all the matching values from source csv.

If these optimizations arent enough, you can extract all the keys from the bigger set, save them into a file, then load keys one-by-one from the file using generators so the program you will only keep one set of keys plus one key instead of two sets.

Also Id suggest using python pickle module for storing intermediate results.

Leave a Reply

Your email address will not be published. Required fields are marked *