unicode – How to completely sanitize a string of illegal characters in python?

unicode – How to completely sanitize a string of illegal characters in python?

You can pass, ignore to skip invalid characters in .encode/.decode
like ILLEGAL.decode(utf8,ignore)

>>> ILLEGAxa0L.decode(utf8)
...
UnicodeDecodeError: utf8 codec cant decode byte 0xa0 in position 6: unexpected code byte

>>> ILLEGAxa0L.decode(utf8,ignore)
uILLEGAL
>>>

Declare the coding on the second line of your script. It really has to be second. Like

#!/usr/bin/python
# coding=utf-8

This might be enough to solve your problem all by itself. If not, see str.encode(utf-8) and str.decode(utf-8).

unicode – How to completely sanitize a string of illegal characters in python?

you can also use:

python3 -c import urllib, sys ; print urllib.quote_plus(sys.stdin.read());

taken from https://wiki.python.org/moin/Powerful%20Python%20One-Liners

** ps, in the website its pointed to use python, but I tested in python3 and it works just fine

Leave a Reply

Your email address will not be published. Required fields are marked *