最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - For a custom Mapping class that returns self as iterator, list() returns empty. How do I fix it? - Stack Overflow

programmeradmin6浏览0评论

The following is a simplified version of what I am trying to do (the actual implementation has a number of nuances):

from __future__ import annotations

from collections.abc import MutableMapping

class SideDict(MutableMapping, dict):
    """
    The purpose of this special dict is to side-attach another dict. A key
    and its value from main dict are preferred over same key in the
    side-dict. If only a key is not present in main dict, then it is used
    from the side-dict.        
    """

    # The starting SideDict instance will have side_dict=None, a subsequent
    # SideDict instance can use the first instance as its side_dict.
    def __init__(self, data, side_dict: SideDict | None):
        self._store = dict(data)
        self._side_dict = side_dict

        self._iter_keys_seen = []
        self._iter_in_side_dict = False
        self._iter = None
        # Also other stuff

    # Also implements __bool__, __contains__, __delitem__, __eq__, __getitem__,
    # __missing__, __or__, __setitem__ and others.

    def __iter__(self):
        self._iter_keys_seen = []
        self._iter_in_side_dict = False
        self._iter = None
        return self

    def __next__(self):
        while True:
            # Start with an iterator that is on self._store
            if self._iter is None:
                self._iter = self._store.__iter__()

            try:
                next_ = self._iter.__next__()
                if next_ in self._iter_keys_seen:
                    continue
                # Some other stuff I do with next_
                self._iter_keys_seen.append(next_)
                return next_
            except StopIteration as e:
                if self._side_dict is None or self._iter_in_side_dict:
                    raise e
                else:
                    # Switching to side-dict iterator
                    self._iter_in_side_dict = True
                    self._iter = self._side_dict.__iter__()

    def __len__(self):
        return len([k for k in self])  # Its not the most efficient, but
                                       # I don't know any other way.

sd_0 = SideDict(data={"a": "A"}, side_dict=None)
sd_1 = SideDict(data={"b": "B"}, side_dict=sd_0)
sd_2 = SideDict(data={"c": "C"}, side_dict=sd_1)

print(len(sd_0), len(sd_1), len(sd_2))  # all work fine
print(list(sd_0))  # ! Here is the problem, shows empty list `[]` !

On putting some print()s, here is what I observed being called:

  1. list() triggers obj.__iter__() first.
  2. Followed by obj.__len__(). I vaguely understand that this is done so as to allocate optimal length of list.
  3. Because obj.__len__() has list-comprehension ([k for k in self]), it again triggers obj.__iter__().
  4. Followed by obj.__next__() multiple times as it iterates through obj._store and obj._side_dict.
  5. When obj.__next__() hits the final un-silenced StopIteration, list-comprehension in obj.__len__() ends.
  6. Here the problem starts. list() seems to be calling obj.__next__() again immediately after ending obj.__len__(), and it hits StopIteration again. There is no obj.__iter__(). And so the final result is an empty list!

What I think might be happening is that list() starts an iterator on its argument, but before doing anything else, it wants to find out the length. My __len__() uses an iterator itself, so it seems the both are using the same iterator. And then this iterator is consumed in obj.__len__(), and nothing left for outer list() to consume. Please correct me if I am wrong.

So how can I change my obj.__len__() to use a non-clashing iterator?

The following is a simplified version of what I am trying to do (the actual implementation has a number of nuances):

from __future__ import annotations

from collections.abc import MutableMapping

class SideDict(MutableMapping, dict):
    """
    The purpose of this special dict is to side-attach another dict. A key
    and its value from main dict are preferred over same key in the
    side-dict. If only a key is not present in main dict, then it is used
    from the side-dict.        
    """

    # The starting SideDict instance will have side_dict=None, a subsequent
    # SideDict instance can use the first instance as its side_dict.
    def __init__(self, data, side_dict: SideDict | None):
        self._store = dict(data)
        self._side_dict = side_dict

        self._iter_keys_seen = []
        self._iter_in_side_dict = False
        self._iter = None
        # Also other stuff

    # Also implements __bool__, __contains__, __delitem__, __eq__, __getitem__,
    # __missing__, __or__, __setitem__ and others.

    def __iter__(self):
        self._iter_keys_seen = []
        self._iter_in_side_dict = False
        self._iter = None
        return self

    def __next__(self):
        while True:
            # Start with an iterator that is on self._store
            if self._iter is None:
                self._iter = self._store.__iter__()

            try:
                next_ = self._iter.__next__()
                if next_ in self._iter_keys_seen:
                    continue
                # Some other stuff I do with next_
                self._iter_keys_seen.append(next_)
                return next_
            except StopIteration as e:
                if self._side_dict is None or self._iter_in_side_dict:
                    raise e
                else:
                    # Switching to side-dict iterator
                    self._iter_in_side_dict = True
                    self._iter = self._side_dict.__iter__()

    def __len__(self):
        return len([k for k in self])  # Its not the most efficient, but
                                       # I don't know any other way.

sd_0 = SideDict(data={"a": "A"}, side_dict=None)
sd_1 = SideDict(data={"b": "B"}, side_dict=sd_0)
sd_2 = SideDict(data={"c": "C"}, side_dict=sd_1)

print(len(sd_0), len(sd_1), len(sd_2))  # all work fine
print(list(sd_0))  # ! Here is the problem, shows empty list `[]` !

On putting some print()s, here is what I observed being called:

  1. list() triggers obj.__iter__() first.
  2. Followed by obj.__len__(). I vaguely understand that this is done so as to allocate optimal length of list.
  3. Because obj.__len__() has list-comprehension ([k for k in self]), it again triggers obj.__iter__().
  4. Followed by obj.__next__() multiple times as it iterates through obj._store and obj._side_dict.
  5. When obj.__next__() hits the final un-silenced StopIteration, list-comprehension in obj.__len__() ends.
  6. Here the problem starts. list() seems to be calling obj.__next__() again immediately after ending obj.__len__(), and it hits StopIteration again. There is no obj.__iter__(). And so the final result is an empty list!

What I think might be happening is that list() starts an iterator on its argument, but before doing anything else, it wants to find out the length. My __len__() uses an iterator itself, so it seems the both are using the same iterator. And then this iterator is consumed in obj.__len__(), and nothing left for outer list() to consume. Please correct me if I am wrong.

So how can I change my obj.__len__() to use a non-clashing iterator?

Share Improve this question edited Mar 24 at 6:27 fishfin asked Mar 22 at 1:39 fishfinfishfin 2931 silver badge8 bronze badges 11
  • Can you show an actual Python implementation of __next__ instead of an English approximation? (Not necessarily your real implementation, but one that demonstrates the issue.) – mkrieger1 Commented Mar 22 at 8:03
  • If you want to show a solution to your problem (which is not already shown in an answer), please add an answer instead of editing it into the question. – mkrieger1 Commented Mar 22 at 8:06
  • I have since deleted the __next__ code as the generator method is both fast and succint, so put in what I remember in the question. Also added the actual implementations as separate answers. – fishfin Commented Mar 22 at 9:24
  • Please make a minimal reproducible example. Here, I get TypeError: 'ellipsis' object is not iterable at sd_0 = SideDict(data=..., side_dict=None) – wjandrea Commented Mar 22 at 14:04
  • 1 @wjandrea Thanks for your edits and notes. (1) Replaced ellipsis with actual code. (2) list() itself calls __len__() first, so I could not use list(self) within __len__(), as it will be recursive as you found out. (3) Valid point, I initialized the variables in __init__() now. I was curious why you used next() without iter() first, but its valid as __iter__() returns self anyway. (4) I modified the title to make it closest to what I think was my challenge. – fishfin Commented Mar 24 at 6:34
 |  Show 6 more comments

2 Answers 2

Reset to default 3

The problem is that your object is its own iterator. Most objects should not be their own iterator - it only makes sense to do that if the object's only job is to be an iterator, or if there's some other inherent reason you shouldn't be able to perform two independent loops over the same object.

Most iterable objects should return a new iterator object from __iter__, and not implement __next__. The simplest way to do this is usually by either writing __iter__ as a generator function, or returning an iterator over some other object that happens to have the right elements. For example, using the set-like union functionality of dict key views:

def __iter__(self):
    return iter(self._store.keys() | self._side_dict.keys())

Or using a generator:

def __iter__(self):
    yield from self._store

    for key in self._side_dict:
        if key not in self._store:
            yield key

In this case, the generator has the advantage of not building the self._store.keys() | self._side_dict.keys() set.


Also, unless you're writing this thing as a learning exercise, you should probably just use collections.ChainMap. It handles all of this already.

Based on code and hints in the answer by @user2357112, I implemented in two different ways, documenting here in case it will be useful to others.

1. The Better Solution

~6x faster than Solution 2 for list(side_dict_with_5_items)

class SideDict(...):
    def __iter__(self):
        yield from self._store
        # This works too:
        # for key in self._store:
        #     yield key

        if self._side_dict is not None:
            for key in self._side_dict :
                if key not in self._store:
                    yield key

    # Removed __next__(...), all other stuff remains the same

2. Another Working Solution

Just for the concept

class SideDict(...):
    def __iter__(self):
        return SideDictIterator(self)

    # Removed __next__(...), all other stuff remains the same

class SideDictIterator:
    def __init__(self, side_dict: SideDict):
        self._side_dict = side_dict

        self._iter_keys_seen = []
        self._iter = self._side_dict._store.__iter__()

    def __iter__(self):
        return self

    def __next__(self):
        # Exactly the same stuff that was in SideDict.__next__(),
        # except using self._side_dict instead of self

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论