unicode – How to completely sanitize a string of illegal characters in python?
unicode – How to completely sanitize a string of illegal characters in python?
You can pass, ignore to skip invalid characters in .encode/.decode
like ILLEGAL.decode(utf8,ignore)
>>> ILLEGAxa0L.decode(utf8)
...
UnicodeDecodeError: utf8 codec cant decode byte 0xa0 in position 6: unexpected code byte
>>> ILLEGAxa0L.decode(utf8,ignore)
uILLEGAL
>>>
Declare the coding on the second line of your script. It really has to be second. Like
#!/usr/bin/python
# coding=utf-8
This might be enough to solve your problem all by itself. If not, see str.encode(utf-8) and str.decode(utf-8).
unicode – How to completely sanitize a string of illegal characters in python?
you can also use:
python3 -c import urllib, sys ; print urllib.quote_plus(sys.stdin.read());
taken from https://wiki.python.org/moin/Powerful%20Python%20One-Liners
** ps, in the website its pointed to use python, but I tested in python3 and it works just fine