LOCAL Student Department Advent Calendar Day 11
I was worried that all the people were stubborn and "Isn't I too weak ...?", So I decided to throw a changing ball. As far as I've searched, there is no Japanese literature, so I think anyone who researches this in the future will see this article almost automatically. I would like to ask all of you. If you think, "What are you talking about? Isn't it full of incorrect information?", Please comment. I will do my best to fix it.
Official: kaitai.io
Kaitai Struct is a declarative language used to describe binary data structures.
The source code of the binary data parser can be automatically generated based on the data structure written in your own language.
license The Compiler and Visualizer described later are GPL v3 +, and the library for each language is MIT (JS is Apache v2). Does this mean that the source code generated using Compiler will infect the GPL ...? Please tell me a detailed person.
Kaitai Struct Compiler (KSC)
For more information on installation, click here (http://kaitai.io/#download)
Mac is one shot with brew install kaitai-struct-compiler
.
For Windows, go to the link above and download the installer.
# Import GPG key, if you never used any BinTray repos before
sudo apt-key adv --keyserver hkp://pool.sks-keyservers.net --recv 379CE192D401AB61
# Add stable repository
echo "deb https://dl.bintray.com/kaitai-io/debian jessie main" | sudo tee /etc/apt/sources.list.d/kaitai.list
# ... or unstable repository
echo "deb https://dl.bintray.com/kaitai-io/debian_unstable jessie main" | sudo tee /etc/apt/sources.list.d/kaitai.list
sudo apt-get update
sudo apt-get install kaitai-struct-compiler
Kaitai Struct Visualizer (KSV)
This is a simple visualizer for .ksy
files.
Written in Ruby
, it is available as a gem
package.
gem install kaitai-struct-visualizer
For well-known files, there is a .ksy
file in the Official github repository (https://github.com/kaitai-io/kaitai_struct_formats).
(If you want to use the .ksy
file that exists here, please check the license described in meta / license
in the file.)
If you write a new .ksy
, send a pull request.
(kaitai_struct_formats/CONTRIBUTING.md)
matrix.py
import numpy as np
import struct
def create_header(*mats: [np.ndarray], magic: bytes = None) -> bytes:
header = magic
header += struct.pack('<H', len(mats))
length = len(header) + 8 * len(mats)
for mat in mats:
header += struct.pack('<HH', mat.shape[0], mat.shape[1])
header += struct.pack('<I', length)
length += 4 * mat.shape[0] * mat.shape[1]
return header
mat1 = np.random.randint(-1024, 1024, [3, 3], dtype=np.int32)
mat2 = np.random.randint(-1024, 1024, [5, 9], dtype=np.int32)
mat3 = np.random.randint(-1024, 1024, [2, 2], dtype=np.int32)
with open('test.matrix', 'wb') as o:
magic = b'THIS IS MAT FILE.\x01\x02'
o.write(create_header(mat1, mat2, mat3, magic=magic))
for mat in [mat1, mat2, mat3]:
for y in mat:
for x in y:
o.write(struct.pack('<i', x))
test.matrix
Offset: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000: 4D 41 54 01 02 2F 03 00 03 00 03 00 20 00 00 00 MAT../..........
00000010: 05 00 09 00 44 00 00 00 02 00 02 00 F8 00 00 00 ....D.......x...
00000020: DC FE FF FF 49 01 00 00 A7 FF FF FF 17 02 00 00 \~..I...'.......
00000030: 25 FC FF FF 35 FF FF FF B5 00 00 00 CF FE FF FF %|..5...5...O~..
00000040: E2 FF FF FF 5D 00 00 00 15 FE FF FF 30 FC FF FF b...]....~..0|..
00000050: 4C 03 00 00 C1 FF FF FF B0 FD FF FF 31 02 00 00 L...A...0}..1...
00000060: 54 03 00 00 C4 FF FF FF 65 FF FF FF D0 FE FF FF T...D...e...P~..
00000070: 75 01 00 00 DE FE FF FF ED 00 00 00 ED FC FF FF u...^~..m...m|..
00000080: BE FD FF FF E5 02 00 00 EC FE FF FF 22 FE FF FF >}..e...l~.."~..
00000090: C3 02 00 00 11 00 00 00 29 03 00 00 00 01 00 00 C.......).......
000000a0: 78 00 00 00 C4 FC FF FF 4C 02 00 00 88 00 00 00 x...D|..L.......
000000b0: 43 FF FF FF 35 FF FF FF A4 00 00 00 CF 02 00 00 C...5...$...O...
000000c0: 3A FF FF FF 33 FF FF FF BD FE FF FF F9 01 00 00 :...3...=~..y...
000000d0: 22 FF FF FF 3A 02 00 00 7C 00 00 00 15 FF FF FF "...:...|.......
000000e0: D8 FE FF FF 42 00 00 00 82 02 00 00 24 02 00 00 X~..B.......$...
000000f0: 8A FE FF FF AF FF FF FF EF 02 00 00 96 01 00 00 .~../...o.......
00000100: 83 01 00 00 2F 02 00 00
The structure of the file starts from the beginning
b'MAT\x01\x02/'
Let's write this in matrix.ksy
.
KSY (Kaitai Struct YAML) declares a single user-defined type (literally translated from official). User-defined type
meta
doc
seq
types
instances
enums
Consists of.
You don't have to have everything.
See the official reference for more information.meta
meta
meta:
id: matrix
endian: le
Describe the name of the user-defined type to be described in meta / id
. It must be present in the .ksy
file.
meta / endian
describes the default endian used in the structure (le
/ be
)
seq
seq
seq:
- id: magic
contents: ['MAT', 1, 0x2, '/']
- id: header_num
type: u2
- id: headers
repeat: expr
repeat-expr: header_num
type: header
Describe the data structure in seq
.
ʻIdis the variable name. If the data is a constant, write the constant in
contents. If you want to get the value, describe the data type in
type(Click here for details (https://doc.kaitai.io/ksy_reference.html#primitive-data-types)). You can also use the types described in
typesdescribed later. Here, the
headertype is used.
repeat can contain any of ʻexpr
, ʻeos, ʻuntil
(see here for details). )
If you put ʻexpr, put the number of repeats in
repeat-expr`.
types
types
types:
header:
seq:
- id: shape0
type: u2
- id: shape1
type: u2
- id: offset
type: u4
instances:
mat_body:
pos: offset
io: _root._io
type: matrix
matrix:
seq:
- id: dim0
repeat: expr
repeat-expr: _parent.shape0
type: dim1
types:
dim1:
seq:
- id: dim1
repeat: expr
repeat-expr: _parent._parent.shape0
type: s4
User-defined types can be nested in types
.
I'm using ʻinstances with the
headertype, which can be used to read data other than those that exist in sequence, such as
seq`.
header.instances
instances:
mat_body:
pos: offset
io: _root._io
type: matrix
Usage is very similar to seq
.
ʻId is the
mat_body here. ʻO
is the IO stream to use.
pos
is the number of bytes from the beginning of ʻio.
type is the same as for
seq`.
Some fields (in this case repeat-expr
, pos
, ʻio) can reference variables as well as constant values. You cannot see data that has not been read yet. The data has a tree structure (it is easy to understand if you use ksv), and you can specify the parent element with
_parent. You can also specify the top element with
_root`.
Visualize
At this point, you have written the following code.
matrix.ksy
meta:
id: matrix
endian: le
seq:
- id: magic
contents: ['MAT', 1, 0x2, '/']
- id: header_num
type: u2
- id: headers
repeat: expr
repeat-expr: header_num
type: header
types:
header:
seq:
- id: shape0
type: u2
- id: shape1
type: u2
- id: offset
type: u4
instances:
mat_body:
pos: offset
io: _root._io
type: matrix
matrix:
seq:
- id: dim0
repeat: expr
repeat-expr: _parent.shape0
type: dim1
types:
dim1:
seq:
- id: dim1
repeat: expr
repeat-expr: _parent._parent.shape0
type: s4
Let's visualize this using ksv (Kaitai Struct Visualizer).
The usage is ksv <file_to_parse.bin> <format.ksy>
.
shell
$ ksv test.matrix matrix.ksy
ksv
[-] [root] 00000000: 4d 41 54 01 02 2f 03 00 03 00 03 00 20 00 00 00 | MAT../...... ...
[.] magic = 4d 41 54 01 02 2f 00000010: 05 00 09 00 44 00 00 00 02 00 02 00 f8 00 00 00 | ....D...........
[.] header_num = 3 00000020: dc fe ff ff 49 01 00 00 a7 ff ff ff 17 02 00 00 | ....I...........
[-] headers (3 = 0x3 entries) 00000030: 25 fc ff ff 35 ff ff ff b5 00 00 00 cf fe ff ff | %...5...........
[-] 0 00000040: e2 ff ff ff 5d 00 00 00 15 fe ff ff 30 fc ff ff | ....].......0...
[.] shape0 = 3 00000050: 4c 03 00 00 c1 ff ff ff b0 fd ff ff 31 02 00 00 | L...........1...
[.] shape1 = 3 00000060: 54 03 00 00 c4 ff ff ff 65 ff ff ff d0 fe ff ff | T.......e.......
[.] offset = 32 00000070: 75 01 00 00 de fe ff ff ed 00 00 00 ed fc ff ff | u...............
[-] mat_body 00000080: be fd ff ff e5 02 00 00 ec fe ff ff 22 fe ff ff | ............"...
[-] dim0 (3 = 0x3 entries) 00000090: c3 02 00 00 11 00 00 00 29 03 00 00 00 01 00 00 | ........).......
[-] 0 000000a0: 78 00 00 00 c4 fc ff ff 4c 02 00 00 88 00 00 00 | x.......L.......
[-] dim1 (3 = 0x3 entries) 000000b0: 43 ff ff ff 35 ff ff ff a4 00 00 00 cf 02 00 00 | C...5...........
[.] 0 = -292 000000c0: 3a ff ff ff 33 ff ff ff bd fe ff ff f9 01 00 00 | :...3...........
[.] 1 = 329 000000d0: 22 ff ff ff 3a 02 00 00 7c 00 00 00 15 ff ff ff | "...:...|.......
[.] 2 = -89 000000e0: d8 fe ff ff 42 00 00 00 82 02 00 00 24 02 00 00 | ....B.......$...
[-] 1 000000f0: 8a fe ff ff af ff ff ff ef 02 00 00 96 01 00 00 | ................
[-] dim1 (3 = 0x3 entries) 00000100: 83 01 00 00 2f 02 00 00 | ..../...
[.] 0 = 535
[.] 1 = -987
[.] 2 = -203
[-] 2
[+] dim1
[-] 1
[.] shape0 = 5
[.] shape1 = 9
[.] offset = 68
[-] mat_body
[+] dim0
[+] 2
It seems that it can be read well.
This is the main subject. I made a compressed file in the article here. This time, decompress this compressed file using KS. See the article for the structure of the file.
mcp.ksy
meta:
id: mcp
encoding: UTF-8
endian: le
seq:
- id: file
type: file
repeat: eos
types:
file:
seq:
- id: filename_len
type: u4
- id: filebody_len
type: u4
- id: filename
type: str
size: filename_len
- id: filebody
size: filebody_len
process: zlib
meta / encoding
specifies the default encoding to use with type: str
.
repeat: eos
repeats until the end of the stream.
process: zlib
answers the read data with zlib. (Very convenient)
process
, click here](https://doc.kaitai.io/ksy_reference.html#spec-process)Generate code from mcp.ksy
using ksc (Kaitai Struct Compiler).
usage
Usage: kaitai-struct-compiler [options] <file>...
<file>... source files (.ksy)
-t, --target <language> target languages (graphviz, csharp, all, perl, java, go, cpp_stl, php, lua, python, ruby, javascript)
-d, --outdir <directory>
output directory (filenames will be auto-generated)
-I, --import-path <directory>:<directory>:...
.ksy library search path(s) for imports (see also KSPATH env variable)
--go-package <package> Go package (Go only, default: none)
--java-package <package>
Java package (Java only, default: root package)
--java-from-file-class <class>
Java class to be invoked in fromFile() helper (default: io.kaitai.struct.ByteBufferKaitaiStream)
--dotnet-namespace <namespace>
.NET Namespace (.NET only, default: Kaitai)
--php-namespace <namespace>
PHP Namespace (PHP only, default: root package)
--python-package <package>
Python package (Python only, default: root package)
--opaque-types <value> opaque types allowed, default: false
--ksc-exceptions ksc throws exceptions instead of human-readable error messages
--ksc-json-output output compilation results as JSON to stdout
--verbose <value> verbose output
--debug enable debugging helpers (mostly used by visualization tools)
--help display this help and exit
--version output version information and exit
shell
$ ksc -t python mcp.ksy
mcp.py
# This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild
from pkg_resources import parse_version
from kaitaistruct import __version__ as ks_version, KaitaiStruct, KaitaiStream, BytesIO
import zlib
if parse_version(ks_version) < parse_version('0.7'):
raise Exception("Incompatible Kaitai Struct Python API: 0.7 or later is required, but you have %s" % (ks_version))
class Mcp(KaitaiStruct):
def __init__(self, _io, _parent=None, _root=None):
self._io = _io
self._parent = _parent
self._root = _root if _root else self
self._read()
def _read(self):
self.file = []
i = 0
while not self._io.is_eof():
self.file.append(self._root.File(self._io, self, self._root))
i += 1
class File(KaitaiStruct):
def __init__(self, _io, _parent=None, _root=None):
self._io = _io
self._parent = _parent
self._root = _root if _root else self
self._read()
def _read(self):
self.filename_len = self._io.read_u4le()
self.filebody_len = self._io.read_u4le()
self.filename = (self._io.read_bytes(self.filename_len)).decode(u"UTF-8")
self._raw_filebody = self._io.read_bytes(self.filebody_len)
self.filebody = zlib.decompress(self._raw_filebody)
The code that mcp.py
is generated.
Let's use this to write a decompression script.
extract.py
from mcp import Mcp
import os
import sys
mcps = Mcp.from_file(sys.argv[1])
out = 'output/'
if len(sys.argv) >= 3:
out = sys.argv[2]
for f in mcps.file:
if os.path.dirname(f.filename):
os.makedirs(os.path.join(out, os.path.dirname(f.filename)), exist_ok=True)
with open(os.path.join(out, f.filename), 'wb') as o:
o.write(f.filebody)
You can answer with python extract.py <target.mcp> [output_folder]
To read the file, use KaitaiStruct.from_file (file_path)
.
If you want to read the byte string as it is, use KaitaiStruct.from_bytes (bytes)
.
For IO streams, use KaitaiStruct.from_io (io)
.
I think KS is quite convenient. It's easy to write and you can use it in your favorite language, so the cost of learning new things is very low. The official reference is honestly hard to read, but more and more people like me will write articles about KS in the future (probably).
Would you like to "disassemble" using KS?
Recommended Posts