Python : Compare two csv files and print out differences

Python : Compare two csv files and print out differences

The problem is that you are comparing each line in fileone to the same line in filetwo. As soon as there is an extra line in one file you will find that the lines are never equal again. Try this:

with open(old.csv, r) as t1, open(new.csv, r) as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

with open(update.csv, w) as outFile:
    for line in filetwo:
        if line not in fileone:
            outFile.write(line)

You may find this package useful (csv-diff):

pip install csv-diff

Once installed, you can run it from the command line:

csv-diff one.csv two.csv --key=id

Python : Compare two csv files and print out differences

It feels natural detecting differences using sets.

#!/usr/bin/env python3

import sys
import argparse
import csv


def get_dataset(f):
    return set(map(tuple, csv.reader(f)))


def main(f1, f2, outfile, sorting_column):
    set1 = get_dataset(f1)
    set2 = get_dataset(f2)
    different = set1 ^ set2

    output = csv.writer(outfile)

    for row in sorted(different, key=lambda x: x[sorting_column], reverse=True):
        output.writerow(row)


if __name__ == __main__:
    parser = argparse.ArgumentParser()

    parser.add_argument(infile, nargs=2, type=argparse.FileType(r))
    parser.add_argument(outfile, nargs=?, type=argparse.FileType(w), default=sys.stdout)
    parser.add_argument(-sc, --sorting-column, nargs=?, type=int, default=0)

    args = parser.parse_args()

    main(*args.infile, args.outfile, args.sorting_column)

Leave a Reply

Your email address will not be published. Required fields are marked *