The x86-64 ABI requires that int128 and long double types be placed on 16byte boundaries.
However, Python's pymalloc implementation is designed to return blocks of memory with 8-byte boundaries. .. .. Are you okay? If you try to malloc a structure containing a type that requires 16byte alignment, it should be a multiple of 16byte, so malloc (sizoef (struct)) is safe and malloc (sizeof (structure) + not a multiple of 16). ) Seems to be a problem in such cases.
The more problematic is the GC header. The GC header in a 64-bit environment is 24 bytes. As the name "header" implies, it is placed at the beginning of GC-targeted objects (objects that may make up circular references, such as tuples, lists, and dicts).
When creating a GC target object, allocate memory like malloc (sizeof (GC header) + object size)
and then use the address shifted by the GC header for a pointer of type PyObject *
. So even if malloc returns a 16byte boundary address, the Python object will be stored at the +24byte address, and if the object's structure has an int128 or long double type, it will be placed across the 16byte boundary. I will.
It seems that recent compilers are actually using instructions that assume 16-byte boundaries, and instead of "theoretically violating but actually working without problems", they are "really segfaulting". It has become.
I didn't notice it until recently, but Python 2.7 reported this phenomenon earlier, and 2.7.15 increased the GC header from 24byte to 32byte. (changelog, [commit](https://github.com/python/cpython/ commit / 0b91f8a668201fc58fa732b8acc496caedfdbae0), issue)
A similar issue was recently reported in Python 3.7 (Crash issue, Old ubsan error issue. org / issue27987)) So it looks like we'll have to apply the same fix unless we find another good way to not break ABI compatibility. For rare types such as long double and int128, it's a shame for me to have to +8 bytes of memory consumption for tuples that are large and never contain those types.
By the way, in Python 3.8, the contents of the GC header are reduced from 3 words (24 bytes in a 64-bit environment) to 2 words (16 bytes in the same), and the memory usage of the GC target objects str, bytes, int, float, etc. is reduced to 8. I was successful in cutting the bite. Therefore, it is not affected by this problem and no extra padding is required. The result is a 16 byte reduction compared to Python 2.7.15 or later and maybe 3.7.4 or later. I'm really glad that this improvement was successful.
Recommended Posts