python - How does PyPy implement integers?

CPython implements arbitrary-precision integers as PyLongObject, which extends PyObject. On 64-bit platforms they take at least 28 bytes, which is quite memory intensive. Also well-known is that it keeps a cache of small integer objects from -5 to 256.

I am interested in seeing how PyPy implements these, in particular what optimizations there are optimizations for limited size integer objects. It is difficult to find documentation online. The PyPy docs mention a tagged pointer optimization for "small ints" of 63 bits (signed?). The most obvious to me is treating an integer as a primitive instead of a general purpose object if possible.

Share Improve this question asked Feb 17 at 4:20 qwr 11k6 gold badges69 silver badges115 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

The PyPy docs mention a tagged pointer optimization as something that you need to enable explicitly, and it's never enabled by default because it comes with a large performance cost.

Instead, the story is:

There are two different internal representations, one for 64-bit signed integers and one for larger integers.
The common, small representation is a "PyObject"-like object, but only 2x8 bytes in total (including headers etc.). (The reason is that our production GC on 64-bit machines adds only a single 64-bit header, which packs a number to identify the RPython class and a number of GC flags; and then the W_IntObject class only adds one 64-bit field for the integer value.)
There is a separate optimization for "list of small integers", which is implemented as an array of 64-bit ints instead of an array of objects (so 8 bytes per item instead of 16-plus-8-for-the-pointer, plus increased locality).

I think you found all the docs there are about this. Empirically, yes, 63-bit signed ints on a 64-bit box. This is obtained by eyeball, under

[PyPy 7.3.12 with MSC v.1929 64 bit (AMD64)] on win32

running this:

import gc
N = 10_000_000
for i in (0, 31,  62, 63, 64):
    base = 1 << i
    print(f"2**{i} for {N:_}")
    xs = list(range(base, base + N))
    gc.collect()
    input("MORE")
    print("neg")
    xs = [-i for i in xs]
    gc.collect()
    input("MORE")

So it tries various ranges, and then pauses. You look at your favorite task manager then, to note memory use.

Up until

2**63 for 10_000_000

memory use remains small, consistent with about 80 million bytes used for the ints in the list (10 million ints times 8 bytes each).

Then (crossing to 64 bits) memory use explodes.

PyPy is indeed time and space-efficient for "small enough" ints.

On 64-bit platforms they [PyLongObject] take at least 28 bytes

Alas, it's worse than that. You're presumably looking at sys.getsizeof() output, but that's only part of the story. Small objects (like ints) are allocated by a special small-object allocator, which aligns memory to a 16-byte boundary. getsizeof() knows nothing about that. So the actual minimum allocated size is 32 bytes. The next-smallest is 32+16 = 48 bytes, and so on.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How does PyPy implement integers? - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)