Numerous propositions for counting the number of lines in a file can be found here
One of the suggestions is (effectively):
with open("foo.txt", "rb") as handle:
line_count = sum(1 for _ in handle)
When I looked at that I thought "That can't be right" but it does indeed produce the correct result.
Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.
It seems odd to me that a file opened in binary mode could be considered as line-oriented.
I must be missing something fundamental here.
Numerous propositions for counting the number of lines in a file can be found here
One of the suggestions is (effectively):
with open("foo.txt", "rb") as handle:
line_count = sum(1 for _ in handle)
When I looked at that I thought "That can't be right" but it does indeed produce the correct result.
Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.
It seems odd to me that a file opened in binary mode could be considered as line-oriented.
I must be missing something fundamental here.
Share Improve this question asked Feb 6 at 9:29 Adon BilivitAdon Bilivit 27k3 gold badges8 silver badges24 bronze badges 3 |2 Answers
Reset to default 2io.BufferedIOBase
inherits from io.IOBase
, where it's documented that:
IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings)
So apparently it has been a design choice to always return lines when iterating over a file object, the only difference being the type that the iterator returns.
You can use the read(size)
method of your handle
to return the next size
bytes of your file (see doc):
byte_list = []
with open("foo.txt", "rb") as handle:
while b := handle.read(1):
byte_list.append(b)
Of course, that is sub-optimal as you could simply read the whole file then separate the bytes.
handle
... to reveal one byte at a time" - I'm not sure why; iterating over a file handle consistently gives lines whatever the other settings. If you did want to read it in non-line chunks, e.g. byte-by-byte, see e.g. stackoverflow.com/q/4566498/3001761. – jonrsharpe Commented Feb 6 at 9:42