utf 8 – Correctly reading text from Windows-1252(cp1252) file in python

utf 8 – Correctly reading text from Windows-1252(cp1252) file in python

CP1252 cannot represent ā; your input contains the similar character â. repr just displays an ASCII representation of a unicode string in Python 2.x:

>>> print(repr(bJxe2nis.decode(cp1252)))
uJxe2nis
>>> print(bJxe2nis.decode(cp1252))
Jânis

I think uJxe2nis is correct, see:

>>> print uJxe2nis.encode(utf-8)
Jânis

Are you getting actual errors from SQLAlchemy or in your applications output?

utf 8 – Correctly reading text from Windows-1252(cp1252) file in python

I had the same problem with some XML files, I solved reading the file with ANSI encoding (Windows-1252) and writing a file with UTF-8 encoding:

import os
import sys

path = os.path.dirname(__file__)

file_name = my_input_file.xml

if __name__ == __main__:
    with open(os.path.join(path, ./ + file_name), r, encoding=cp1252) as f1:
        lines = f1.read()
        f2 = open(os.path.join(path, ./ + my_output_file.xml), w, encoding=utf-8)
        f2.write(lines)
        f2.close()

Leave a Reply

Your email address will not be published. Required fields are marked *