How to import a csv file using python with headers intact, where first column is a non-numerical

How to import a csv file using python with headers intact, where first column is a non-numerical

For Python 3

Remove the rb argument and use either r or dont pass argument (default read mode).

with open( <path-to-file>, r ) as theFile:
    reader = csv.DictReader(theFile)
    for line in reader:
        # line is { workers: w0, constant: 7.334, age: -1.406, ... }
        # e.g. print( line[ workers ] ) yields w0
        print(line)

For Python 2

import csv
with open( <path-to-file>, rb ) as theFile:
    reader = csv.DictReader( theFile )
    for line in reader:
        # line is { workers: w0, constant: 7.334, age: -1.406, ... }
        # e.g. print( line[ workers ] ) yields w0

Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.

Pythons csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Heres one way of doing it.

Assuming your file is named myclone.csv and contains

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

this code should give you an idea or two:

>>> import csv
>>> f = open(myclone.csv, rb)
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
[workers, constant, age]
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{workers: [], constant: [], age: []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{workers: [w0, w1, w2, w3], constant: [7.334, 5.235, 3.2225, 0], age: [-1.406, -4.936, -1.478, 0]}
>>> column[workers]
[w0, w1, w2, w3]
>>> column[constant]
[7.334, 5.235, 3.2225, 0]
>>> column[age]
[-1.406, -4.936, -1.478, 0]
>>>

To get your numeric values into floats, add this

converters = [str.strip] + [float] * (len(headers) - 1)

up front, and do this

for h, v, conv in zip(headers, row, converters):
  column[h].append(conv(v))

for each row instead of the similar two lines above.

How to import a csv file using python with headers intact, where first column is a non-numerical

You can use pandas library and reference the rows and columns like this:

import pandas as pd

input = pd.read_csv(path_to_file);

#for accessing ith row:
input.iloc[i]

#for accessing column named X
input.X

#for accessing ith row and column named X
input.iloc[i].X

Leave a Reply

Your email address will not be published. Required fields are marked *