csv – Python Pandas Error tokenizing data

csv – Python Pandas Error tokenizing data

you could also try;

data = pd.read_csv(file1.csv, on_bad_lines=skip)

Do note that this will cause the offending lines to be skipped.

It might be an issue with

  • the delimiters in your data
  • the first row, as @TomAugspurger noted

To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,

df = pandas.read_csv(filepath, sep=delimiter, header=None)

In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: If file contains no header row, then you should explicitly pass header=None. In this instance, pandas automatically creates whole-number indices for each field {0,1,2,…}.

According to the docs, the delimiter thing should not be an issue. The docs say that if sep is None [not specified], will try to automatically determine this. I however have not had good luck with this, including instances with obvious delimiters.

Another solution may be to try auto detect the delimiter

# use the first 2 lines of the file to detect separator
temp_lines = csv_file.readline() + n + csv_file.readline()
dialect = csv.Sniffer().sniff(temp_lines, delimiters=;,)

# remember to go back to the start of the file for the next time its read
csv_file.seek(0) 

df = pd.read_csv(csv_file, sep=dialect.delimiter)

csv – Python Pandas Error tokenizing data

The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows arent representative of the actual data in the file.

Try it with data = pd.read_csv(path, skiprows=2)

Leave a Reply

Your email address will not be published. Required fields are marked *