How to split a Python string on new line characters

How to split a Python string on new line characters

✨ Splitting line in Python:

Have you tried using str.splitlines() method?:

From the docs:

str.splitlines([keepends])

Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is
given and true.

For example:

>>> Line 1nnLine 3rLine 4rn.splitlines()
[Line 1, , Line 3, Line 4]

>>> Line 1nnLine 3rLine 4rn.splitlines(True)
[Line 1n, n, Line 3r, Line 4rn]

Which delimiters are considered?

This method uses the universal newlines approach to splitting lines.

The main difference between Python 2.X and Python 3.X is that the former uses the universal newlines approach to splitting lines, so r, n, and rn are considered line boundaries for 8-bit strings, while the latter uses a superset of it that also includes:

  • v or x0b: Line Tabulation (added in Python 3.2).
  • f or x0c: Form Feed (added in Python 3.2).
  • x1c: File Separator.
  • x1d: Group Separator.
  • x1e: Record Separator.
  • x85: Next Line (C1 Control Code).
  • u2028: Line Separator.
  • u2029: Paragraph Separator.

splitlines VS split:

Unlike str.split() when a delimiter string sep is given, this method
returns an empty list for the empty string, and a terminal line break
does not result in an extra line:

>>> .splitlines()
[]

>>> Line 1n.splitlines()
[Line 1]

While str.split(n) returns:

>>> .split(n)
[]

>>> Line 1n.split(n)
[Line 1, ]

✂️ Removing additional whitespace:

If you also need to remove additional leading or trailing whitespace, like spaces, that are ignored by str.splitlines(), you could use str.splitlines() together with str.strip():

>>> [str.strip() for str in Line 1  n  nLine 3 rLine 4 rn.splitlines()]
[Line 1, , Line 3, Line 4]

️ Removing empty strings ():

Lastly, if you want to filter out the empty strings from the resulting list, you could use filter():

>>> # Python 2.X:
>>> filter(bool, Line 1nnLine 3rLine 4rn.splitlines())
[Line 1, Line 3, Line 4]

>>> # Python 3.X:
>>> list(filter(bool, Line 1nnLine 3rLine 4rn.splitlines()))
[Line 1, Line 3, Line 4]

Additional comment regarding the original question:

As the error you posted indicates and Burhan suggested, the problem is from the print. Theres a related question about that could be useful to you: UnicodeEncodeError: charmap codec cant encode – character maps to <undefined>, print function

a.txt

this is line 1
this is line 2

code:

Python 3.4.0 (default, Mar 20 2014, 22:43:40) 
[GCC 4.6.3] on linux
Type help, copyright, credits or license for more information.
>>> file = open(a.txt).read()
>>> file
>>> file.split(n)
[this is line 1, this is line 2, ]

Im on Linux, but I guess you just use rn on Windows and it would also work

How to split a Python string on new line characters

Leave a Reply

Your email address will not be published. Required fields are marked *