Simple CSV to XML Conversion – Python
Simple CSV to XML Conversion – Python
A possible solution is to first load the csv into Pandas and then convert it row by row into XML, as so:
import pandas as pd
df = pd.read_csv(untitled.txt, sep=|)
With the sample data (assuming separator and so on) loaded as:
Title Type Format Year Rating Stars
0 Enemy Behind War,Thriller DVD 2003 PG 10
1 Transformers Anime,Science Fiction DVD 1989 R 9
Description
0 Talk about...
1 A Schientific fiction
And then converting to xml with a custom function:
def convert_row(row):
return <movietitle=%s>
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie> % (
row.Title, row.Type, row.Format, row.Year, row.Rating, row.Stars, row.Description)
print n.join(df.apply(convert_row, axis=1))
This way you get a string containing the xml:
<movietitle=Enemy Behind>
<type>War,Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about...</description>
</movie>
<movietitle=Transformers>
<type>Anime,Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>9</stars>
<description>A Schientific fiction</description>
</movie>
that you can dump in to a file or whatever.
Inspired by this great answer.
Edit: Using the loading method you posted (or a version that actually loads the data to a variable):
import csv
f = open(movies2.csv)
csv_f = csv.reader(f)
data = []
for row in csv_f:
data.append(row)
f.close()
print data[1:]
We get:
[[Enemy Behind, War, Thriller, DVD, 2003, PG, 10, Talk about...], [Transformers, Anime, Science Fiction, DVD, 1989, R, 9, A Schientific fiction]]
And we can convert to XML with minor modifications:
def convert_row(row):
return <movietitle=%s>
<type>%s</type>
<format>%s</format>
<year>%s</year>
<rating>%s</rating>
<stars>%s</stars>
<description>%s</description>
</movie> % (row[0], row[1], row[2], row[3], row[4], row[5], row[6])
print n.join([convert_row(row) for row in data[1:]])
Getting identical results:
<movietitle=Enemy Behind>
<type>War</type>
<format>Thriller</format>
<year>DVD</year>
<rating>2003</rating>
<stars>PG</stars>
<description>10</description>
</movie>
<movietitle=Transformers>
<type>Anime</type>
<format>Science Fiction</format>
<year>DVD</year>
<rating>1989</rating>
<stars>R</stars>
<description>9</description>
</movie>
I tried to generalize robertoias function convert_row for any header instead of writing it by hand.
import csv
import pandas as pd
f = open(movies2.csv)
csv_f = csv.reader(f)
data = []
for row in csv_f:
data.append(row)
f.close()
df = pd.read_csv(movies2.csv)
header= list(df.columns)
def convert_row(row):
str_row = <%s>%s</%s> n*(len(header)-1)
str_row = <%s>%s +n+ str_row + </%s>
var_values = [list_of_elments[k] for k in range(1,len(header)) for list_of_elments in [header,row,header]]
var_values = [header[0],row[0]]+var_values+[header[0]]
var_values =tuple(var_values)
return str_row % var_values
text =<collection shelf=New Arrivals>+n+n.join([convert_row(row) for row in data[1:]])+n +</collection >
print(text)
with open(output.xml, w) as myfile:
myfile.write(text)
Of course with pandas now, it is simpler to just use
to_xml()
:
df= pd.read_csv(movies2.csv)
with open(outputf.xml, w) as myfile:
myfile.write(df.to_xml())