最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Iterating over a BufferedReader gives unexpected results - Stack Overflow

programmeradmin0浏览0评论

Numerous propositions for counting the number of lines in a file can be found here

One of the suggestions is (effectively):

with open("foo.txt", "rb") as handle:
    line_count = sum(1 for _ in handle)

When I looked at that I thought "That can't be right" but it does indeed produce the correct result.

Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.

It seems odd to me that a file opened in binary mode could be considered as line-oriented.

I must be missing something fundamental here.

Numerous propositions for counting the number of lines in a file can be found here

One of the suggestions is (effectively):

with open("foo.txt", "rb") as handle:
    line_count = sum(1 for _ in handle)

When I looked at that I thought "That can't be right" but it does indeed produce the correct result.

Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.

It seems odd to me that a file opened in binary mode could be considered as line-oriented.

I must be missing something fundamental here.

Share Improve this question asked Feb 6 at 9:29 Adon BilivitAdon Bilivit 27k3 gold badges8 silver badges24 bronze badges 3
  • "I would expect iterating over handle... to reveal one byte at a time" - I'm not sure why; iterating over a file handle consistently gives lines whatever the other settings. If you did want to read it in non-line chunks, e.g. byte-by-byte, see e.g. stackoverflow.com/q/4566498/3001761. – jonrsharpe Commented Feb 6 at 9:42
  • 1 @jonrsharpe I know how to get bytes from the binary stream. It just seems counter-intuitive (to me) that iterating over the handle reveals lines rather than individual bytes. Fortunately, @ robertklep found the documentation that I couldn't and has explained it in his answer – Adon Bilivit Commented Feb 6 at 10:16
  • @jonrsharpe To be honest, I'm also surprised that Python would look for line breaks in my movie file. The link you posted is helpful, thank you. – Jeyekomon Commented Feb 6 at 10:23
Add a comment  | 

2 Answers 2

Reset to default 2

io.BufferedIOBase inherits from io.IOBase, where it's documented that:

IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings)

So apparently it has been a design choice to always return lines when iterating over a file object, the only difference being the type that the iterator returns.

You can use the read(size) method of your handle to return the next size bytes of your file (see doc):

byte_list = []
with open("foo.txt", "rb") as handle:
    while b := handle.read(1):
        byte_list.append(b)

Of course, that is sub-optimal as you could simply read the whole file then separate the bytes.

发布评论

评论列表(0)

  1. 暂无评论