python - Performance impact of inheriting from many classes

I am investigating the performance impact of a very broad inheritance setup.

Start with 260 distinct attribute names, from a0 through z9.
Create 260 classes with 1 uniquely-named attribute each. Create one class that inherits from those 260 classes.
Create 130 classes with 2 uniquely-named attributes each. Create one class that inherits from those 130 classes.
Repeat for 52 classes with 5 attributes each, 26 classes with 10 attributes each, and 1 class with all 260 attributes.
Create one instance of each of the five classes, and then measure the time to read (and add together) all 260 attributes on each.

Average performance from 2.5M reads, interleaving in different orders.

 From260: 2.48
 From130: 1.55
  From52: 1.22
  From26: 1.15
AllInOne: 1.00

These values sort of fit on a linear regression...but they don't. And these relationships hold true across many different runs, of various sizes and test orders.

The values come closer to fitting a second-degree polynomial, or exponential fit...but again, the data does not fit so cleanly as to be obvious.

As I massively increase the number of subclasses, will the performance falloff be linear, or non-linear?

Here's some updated code that samples many different subclass combinations up to 2310:

from time import time

TOTAL_ATTRS = 2310

# Create attribute names "a0000" through "a2309"
attr_names = [f"a{i:04d}" for i in range(TOTAL_ATTRS)]

# Map each attribute to a default value (1..2310)
all_defaults = {name: i + 1 for i, name in enumerate(attr_names)}

# The provided factors of 2310
factors = [1, 2, 3, 5, 6, 7, 10, 11, 14, 15, 21, 22, 30, 33, 35, 42,
           55, 66, 70, 77, 105, 110, 154, 165, 210, 231, 330, 385,
           462, 770, 1155, 2310]

# Build a dictionary mapping each factor to a composite class.
# For factor f, create f subclasses each with (2310 // f) attributes,
# then create a composite class inheriting from all f subclasses.
composite_classes = {}
for f in factors:
    group_size = TOTAL_ATTRS // f
    subclasses = []
    for i in range(f):
        group = attr_names[i * group_size:(i + 1) * group_size]
        group_defaults = {name: all_defaults[name] for name in group}
        subclass = type(f"Sub_{f}_{i}", (object,), group_defaults)
        subclasses.append(subclass)
    composite_classes[f] = type(f"From_{f}", tuple(subclasses), {})

iterations = range(0, 1_000)

for n, c in composite_classes.items():
    i = c()
    t = time()
    for _ in iterations:
        for a in attr_names:
            getattr(i, a)
    print(f"Inheriting from {n} subclasses: {time()-t:.3f}s")

and the results, which seem far more linear than polynomial, but which have odd "ledges" in them:

I am investigating the performance impact of a very broad inheritance setup.

Start with 260 distinct attribute names, from a0 through z9.
Create 260 classes with 1 uniquely-named attribute each. Create one class that inherits from those 260 classes.
Create 130 classes with 2 uniquely-named attributes each. Create one class that inherits from those 130 classes.
Repeat for 52 classes with 5 attributes each, 26 classes with 10 attributes each, and 1 class with all 260 attributes.
Create one instance of each of the five classes, and then measure the time to read (and add together) all 260 attributes on each.

Average performance from 2.5M reads, interleaving in different orders.

 From260: 2.48
 From130: 1.55
  From52: 1.22
  From26: 1.15
AllInOne: 1.00

These values sort of fit on a linear regression...but they don't. And these relationships hold true across many different runs, of various sizes and test orders.

The values come closer to fitting a second-degree polynomial, or exponential fit...but again, the data does not fit so cleanly as to be obvious.

As I massively increase the number of subclasses, will the performance falloff be linear, or non-linear?

Here's some updated code that samples many different subclass combinations up to 2310:

from time import time

TOTAL_ATTRS = 2310

# Create attribute names "a0000" through "a2309"
attr_names = [f"a{i:04d}" for i in range(TOTAL_ATTRS)]

# Map each attribute to a default value (1..2310)
all_defaults = {name: i + 1 for i, name in enumerate(attr_names)}

# The provided factors of 2310
factors = [1, 2, 3, 5, 6, 7, 10, 11, 14, 15, 21, 22, 30, 33, 35, 42,
           55, 66, 70, 77, 105, 110, 154, 165, 210, 231, 330, 385,
           462, 770, 1155, 2310]

# Build a dictionary mapping each factor to a composite class.
# For factor f, create f subclasses each with (2310 // f) attributes,
# then create a composite class inheriting from all f subclasses.
composite_classes = {}
for f in factors:
    group_size = TOTAL_ATTRS // f
    subclasses = []
    for i in range(f):
        group = attr_names[i * group_size:(i + 1) * group_size]
        group_defaults = {name: all_defaults[name] for name in group}
        subclass = type(f"Sub_{f}_{i}", (object,), group_defaults)
        subclasses.append(subclass)
    composite_classes[f] = type(f"From_{f}", tuple(subclasses), {})

iterations = range(0, 1_000)

for n, c in composite_classes.items():
    i = c()
    t = time()
    for _ in iterations:
        for a in attr_names:
            getattr(i, a)
    print(f"Inheriting from {n} subclasses: {time()-t:.3f}s")

and the results, which seem far more linear than polynomial, but which have odd "ledges" in them:

Share Improve this question edited Feb 3 at 18:47 jsbueno 111k11 gold badges157 silver badges234 bronze badges asked Feb 3 at 17:36 Phrogz 304k113 gold badges667 silver badges757 bronze badges

1 How massive is massive? Since you have a testing framework, why not create a condition with 2500 attributes and see if that reveals anything? – JonSG Commented Feb 3 at 17:54
1 CPython has an internal type attribute cache with a max size of 4096 entries, and instance attribute lookups do check this cache when trying to find an attribute through the class. I would expect most of your test lookups to hit the cache, but something is causing a significant slowdown anyway, so there seems to be something weird going on. – user2357112 Commented Feb 3 at 18:03
1 @JonSG Let me clean up my test code and I'll include it. Good point. – Phrogz Commented Feb 3 at 18:05
1 Is this just academic curiosity? What's the use case for a class with hundreds of superclass? – Barmar Commented Feb 3 at 18:06
1 @JonSG "How massive is massive?" Probably not more than 1000. I'm uncertain at this stage. – Phrogz Commented Feb 3 at 18:25

| Show 4 more comments

2 Answers 2

Sorted by: Reset to default 3

The slowdown will be weird and hard to fully predict, depending on subtle details of memory allocation and attribute access order.

Worst-case, not only will you experience a linear slowdown, you'll slow down completely unrelated attribute accesses in unrelated code.

CPython has a 4096-entry type attribute cache, and instance attribute lookups do check this cache when looking for an attribute through the class. The cache entry used for an attribute lookup is determined using a simple hash based on the type's version tag and the address of the string object for the attribute being looked up:

#define MCACHE_HASH(version, name_hash)                                 \
        (((unsigned int)(version) ^ (unsigned int)(name_hash))          \
         & ((1 << MCACHE_SIZE_EXP) - 1))

#define MCACHE_HASH_METHOD(type, name)                                  \
    MCACHE_HASH(FT_ATOMIC_LOAD_UINT32_RELAXED((type)->tp_version_tag),   \
                ((Py_ssize_t)(name)) >> 3)

If an attribute is found through the cache, the lookup is quick, and doesn't depend on the depth of the inheritance hierarchy. But if an attribute is not found through the cache, the lookup has to go through the class's MRO, one class at a time, performing a dict lookup in each class until it finds (or fails to find) the attribute. This takes an amount of time linear in the number of classes it has to look through.

Note that because of the descriptor protocol, Python has to do this even if it finds an entry for the attribute directly in the instance dict.

So the more attributes you use, the greater the chance you run into a hash collision, either with another one of your attributes or with something completely unrelated, like list.append. The longer your MRO, the greater the impact of a collision.

If two attributes happen to produce a hash collision, then accessing one while the other is in the cache will need to perform a slow MRO search for the attribute. Then it'll evict the other attribute from the cache. If you access this attribute again before the other one, it'll be quick, but the next time you access the attribute that just got evicted, you'll go through another slow MRO search and eviction.

Because the cache cares about the address of the attribute string instead of the actual attribute name, using a different string object for a lookup will also cause a cache miss. In normal Python code, attribute names get interned, so this isn't a problem, but when you're generating attribute names dynamically, the interning doesn't happen.

Anyway - the behavior seems to be "close enough to linear" for all practical purposes.

What may make a difference here is taking in account the behavior for reading/writing attributes.

(1) if the attribute is created at the instance level, it doesn't mind, after the class is created, how deep in the hierarchy its value had been set - all attributes will live the instance's flat namespace, which is a dictionary. (Ok, the descriptor protocol would mandate a MRO search for the attribute, regardless of it being in the instance dict - but then, also, there is the attribute cache, as described in @user235711 's answer which compensates for that. TL;DR: repeated instance attribute access will be O(1), regardless of inheritance depth)

(2) reading an attribute from a class, in code, (which is what your experiment does), will trigger a traversal, searching each superclass namespace for the attribute - and this is the only case where a deep inheritance chain makes a difference. So, for 250 subclasses, reading an attribute that is defined in the deep-most superclass, will require 250 namspace checks. While reading an attribute defined 125 classes upstream will just need this - amount of namespace seeks - thus each attribute will take a different amount of time to be read, depending on the depth it was defined at - this factor alone is likely responsible for the "non-linearity" in your fidings. If you just define all attributes on the first class, them stack 248 empty classes, and read those attributs from the 250th generation-child, your plot shape will change.

However it is mostly irrelevant - Any real system that would be impacted by this could trivially implement a metaclass, or a __init_subclass__ method which could flatten the namspace for each class with a large-inheritance chain. If one would need that to be live, i.e. runtime change on super-grand-parent-1 attribute reflects on the attribute as read on child-250 class, a mechanism a little more sophisticated, with an intermediary mapping, could be built on the __getattribute__ special method, and the to get the attribute would be flattened to O(1) nonetheless.

Just tell if you need such a mechanism. (You can write a new question with the "metaclass" tag, for example)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Performance impact of inheriting from many classes - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)