Here are some tips for working with binaries in Python.
There are two ways to work with binaries in Python, the struct
module and the ctypes.Structure
class.
Basically, the struct
module uses the ctypes.Structure
class when you want to handle a few bytes of binary, or when you want to work with more bytes or C / C ++.
struct
moduleAs an example, let's read the binary of a PNG file. In a PNG file, the first 8 bytes are fixed in the header. The 9th to 18th bytes of data are stored in the IHDR area (to be exact, part of IHDR), including the vertical and horizontal size, bit depth, and color mode of the image.
import struct
png_data = open("sample.png ", "rb").read()
struct.unpack_from(">I4sIIBB", png_data, 8)
# (13, b'IHDR', 250, 156, 8, 2)
You can read the data with struct.unpack
, but if the offset and size of the buffer to be given do not match, an error will occur.
If you want to read part of the data, struct.unpack_from
is useful.
x
When reading the binary, putting (dust area for alignment) comes out by all means.
The x
format is convenient because it skips the data.
data = b'd\x00\xb0\x04'
# NG
kind, _, value = struct.unpack("BBH", data)
# Yes!
kind, value = struct.unpack("BxH", data)
The struct.Struct
class is a classification of the format string of the struct
module.
Since the format is analyzed when the class is instantiated, it is faster to create the instance in advance when repeatedly pack
/ ʻunpackin the loop. It's confusing with the
ctypes.Structre` class.
point = struct.Struct("HH")
for x, y in zip(range(10), range(10)):
point.pack(x, y)
letter | C language type | Standard size |
---|---|---|
x | Putting bite | 1 |
c | char | 1 |
b | signed char | 1 |
B | unsigned char, BYTE | 1 |
? | _Bool | 1 |
h | short | 2 |
H | unsinged short, WORD | 2 |
i | int | 4 |
I | unsigned int, DWORD | 4 |
l | long, LONG | 4 |
L | unsigned long, ULONG | 4 |
q | long long, LONGLONG | 8 |
Q | unsigned long long, ULONGLONG | 8 |
n | ssize_t(Python3.3 or later) | Native only |
N | size_t(Python3.3 or later) | Native only |
f | float | 4 |
d | double | 8 |
s | char[] | - |
p | char[] | - |
P | void * | - |
Format character example:
BITMAPINFOHEADER structure
typedef struct tagBITMAPINFOHEADER {
DWORD biSize;
LONG biWidth;
LONG biHeight;
WORD biPlanes;
WORD biBitCount;
DWORD biCompression;
DWORD biSizeImage;
LONG biXPelsPerMeter;
LONG biYPelsPerMeter;
DWORD biClrUsed;
DWORD biClrImportant;
} BITMAPINFOHEADER;
Format characters for the BITMAPINFOHEADER structure
"IllHHIIllII"
letter | Byte order | size | alignment |
---|---|---|---|
@ | Native | Native | Native |
= | Native | Standard size | None |
< | Little endian | Standard size | None |
> | Big endian | Standard size | None |
! | Big endian | Standard size | None |
@
@When=The difference of(CPU=amd64,OS=Ubuntu64bit)
struct.calcsize("BI")
# 8
struct.calcsize("=BI")
# 5
Note that if you explicitly specify the endian, the alignment will be "none".
You can work with C / C ++ structures in the ctypes.Structure
class.
If you try to read a lot of data with the'struct'module, the format will look like a spell, so if you want to write a solid read of a large amount of binary data, you should use the ctypes.Structure
class. Let's do it.
Inherit ctypes.Structure
and define the types in _field_
.
from ctypes import *
"""
typedef struct {
char identity[4];
uint16_t x;
uint16_t y;
} TestStructure;
"""
class TestStructure(Structure):
_fields_ = (
('identity', c_char * 4),
('x', c_uint16),
('y', c_uint16),
)
The instance is defined as follows.
t = TestStructure(b"TEST", 100, 100)
In C language, the size of ʻintand
short changes depending on the environment. From C99, it is possible to specify fixed size types such as ʻint16_t
and ʻint32_t, so specify the fixed size as much as possible. Should be used. Along with that, let's use a fixed size type such as
ctypes.c_int16 instead of
ctypes.c_int` on the Python side.
You can write by passing the ctypes.Structure
instance to write
of ʻioor
FILE` as it is.
import io
buffer = io.BytesIO()
buffer.write(TestStructure(b"TEST", 100, 100))
buffer.getvalue()
# b'TESTd\x00d\x00'
You can read it by passing the ctypes.Structure
instance to readinto
as it is.
buffer = io.BytesIO(b'TESTd\x00d\x00')
t = TestStructure()
buffer.readinto(t)
t.identity, t.x, t.y
# (b'TEST', 100, 100)
The offset position of the structure member can be obtained by the class method class name.member name.offset
.
class Point(Structure):
_fields_ = (
('x', c_uint16),
('y', c_uint16),
)
Point.y.offset
# 2
sizeof
You can get the size of the structure with ctypes.sizeof
.
class TestStructure(Structure):
_fields_ = (
('flags', c_ubyte),
('value', c_int32),
)
sizeof(TestStructure)
# 8
memset / memmove
The equivalents of the C language memset
and memmove
are ctypes.memset
and ctypes.memmove
.
c_array = (c_char * 12)()
memset(c_array, 0, sizeof(c_array))
memmove(c_array, b"test\x00", len(b"test\x00"))
You can map data by casting pointers to structures as in C / C ++.
If you want to specify a pointer to a structure, cast it with ctypes.POINTER
, ctypes.cast
. You can get the value referenced by the pointer with contents
.
class PointText(Structure):
_fields_ = (
('x', c_uint16),
('y', c_uint16),
('text', c_char * 0),
)
data = b'd\x00d\x00null terminate text\x00'
p_point = cast(data, POINTER(Point))
p_point.contents.x, p_point.contents.y
# (200, 120)
#Null-terminated string read
string_at(addressof(p_point.contents) + PointText.text.offset)
# b'null terminate text'
Read the null-terminated string with ctypes.stering_at
, and use ctypes.wstring_at
for Unicode.
However, be aware that pointer manipulation can crash Python itself, and you should avoid unlengthened members such as char []
if possible.
You can convert ctypes
objects to PyObject
with memoryview
.
p = Point(200, 120)
memoryview(p).tobytes()
# b'\xc8\x00x\x00'
class BPoint(BigEndianStructure):
_fields_ = (
('x', c_uint16),
('y', c_uint16),
)
class LPoint(LittleEndianStructure):
_fields_ = (
('x', c_uint16),
('y', c_uint16),
)
bpoint = BPoint(0x0102, 0x0304)
lpoint = LPoint(0x0102, 0x0304)
memoryview(bpoint).tobytes()
# b'\x01\x02\x03\x04'
memoryview(lpoint).tobytes()
# b'\x02\x01\x04\x03'
http://docs.python.jp/3.5/library/struct.html http://docs.python.jp/3.5/library/ctypes.html
Recommended Posts