python - Iterating over a BufferedReader gives unexpected results

Numerous propositions for counting the number of lines in a file can be found here

One of the suggestions is (effectively):

with open("foo.txt", "rb") as handle:
    line_count = sum(1 for _ in handle)

When I looked at that I thought "That can't be right" but it does indeed produce the correct result.

Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.

It seems odd to me that a file opened in binary mode could be considered as line-oriented.

I must be missing something fundamental here.

Numerous propositions for counting the number of lines in a file can be found here

One of the suggestions is (effectively):

with open("foo.txt", "rb") as handle:
    line_count = sum(1 for _ in handle)

When I looked at that I thought "That can't be right" but it does indeed produce the correct result.

Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.

It seems odd to me that a file opened in binary mode could be considered as line-oriented.

I must be missing something fundamental here.

Share Improve this question asked Feb 6 at 9:29 Adon Bilivit 27k3 gold badges8 silver badges24 bronze badges

"I would expect iterating over handle... to reveal one byte at a time" - I'm not sure why; iterating over a file handle consistently gives lines whatever the other settings. If you did want to read it in non-line chunks, e.g. byte-by-byte, see e.g. stackoverflow.com/q/4566498/3001761. – jonrsharpe Commented Feb 6 at 9:42
1 @jonrsharpe I know how to get bytes from the binary stream. It just seems counter-intuitive (to me) that iterating over the handle reveals lines rather than individual bytes. Fortunately, @ robertklep found the documentation that I couldn't and has explained it in his answer – Adon Bilivit Commented Feb 6 at 10:16
@jonrsharpe To be honest, I'm also surprised that Python would look for line breaks in my movie file. The link you posted is helpful, thank you. – Jeyekomon Commented Feb 6 at 10:23

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

io.BufferedIOBase inherits from io.IOBase, where it's documented that:

IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings)

So apparently it has been a design choice to always return lines when iterating over a file object, the only difference being the type that the iterator returns.

You can use the read(size) method of your handle to return the next size bytes of your file (see doc):

byte_list = []
with open("foo.txt", "rb") as handle:
    while b := handle.read(1):
        byte_list.append(b)

Of course, that is sub-optimal as you could simply read the whole file then separate the bytes.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Iterating over a BufferedReader gives unexpected results - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)