python – When should I ever use file.read() or file.readlines()?
python – When should I ever use file.read() or file.readlines()?
The short answer to your question is that each of these three methods of reading bits of a file have different use cases. As noted above, f.read()
reads the file as an individual string, and so allows relatively easy file-wide manipulations, such as a file-wide regex search or substitution.
f.readline()
reads a single line of the file, allowing the user to parse a single line without necessarily reading the entire file. Using f.readline()
also allows easier application of logic in reading the file than a complete line by line iteration, such as when a file changes format partway through.
Using the syntax for line in f:
allows the user to iterate over the file line by line as noted in the question.
(As noted in the other answer, this documentation is a very good read):
https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
Note:
It was previously claimed that f.readline()
could be used to skip a line during a for loop iteration. However, this doesnt work in Python 2.7, and is perhaps a questionable practice, so this claim has been removed.
Hope this helps!
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
When size is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory
Sorry for all the edits!
For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:
for line in f:
print line,
This is the first line of the file.
Second line of the file
python – When should I ever use file.read() or file.readlines()?
Note that readline()
is not comparable to the case of reading all lines in for-loop since it reads line by line and there is an overhead which is pointed out by others already.
I ran timeit
on two identical snippts but one with for-loop and the other with readlines()
. You can see my snippet below:
def test_read_file_1():
f = open(ml/README.md, r)
for line in f.readlines():
print(line)
def test_read_file_2():
f = open(ml/README.md, r)
for line in f:
print(line)
def test_time_read_file():
from timeit import timeit
duration_1 = timeit(lambda: test_read_file_1(), number=1000000)
duration_2 = timeit(lambda: test_read_file_2(), number=1000000)
print(duration using readlines():, duration_1)
print(duration using for-loop:, duration_2)
And the results:
duration using readlines(): 78.826229238
duration using for-loop: 69.487692794
The bottomline, I would say, for-loop is faster but in case of possibility of both, Id rather readlines()
.