Try working with binary data in Python

I had the opportunity to process binary data in Python, but since I had touched the binary data for the first time and had to look it up in various ways, I would like to keep it as a memorandum!

This time, we dealt with the data of the extension file "**. Sl2 **", so we will write the .sl2 data as an example!

About binary data

The data structure of binary data is determined by each format. This time, I was dealing with a file called .sl2 that I saw for the first time, but in that case, I first need to know "the structure of the .sl2 file" in some way. If you don't know this, you can't handle it!

In my case, I referred to the following page, so I will explain based on this reference page. (** As a result, this page was wrong ... **) https://wiki.openstreetmap.org/wiki/SL2

header

It seems that most binary data has a header. This is a fixed value in the first few bytes and contains a description of the data format such as version information.

In this case, the following table and explanation are described in the "** Basic Structure **" column of the reference page. There seem to be several types in the .sl2 file, but for the time being, the header seems to be 10 bytes.

The files show up with a 10 byte header. First 2 bytes describe the format version (01=SLG, 02=SL2). Bytes 5,4 provide the block size. It varies depending on the sensor: 0x07b2 for HDI=Primary/Secondary+DSI and 0x0c80 for Sidescan). Seen values are image.png

Byte order

If you write it very roughly, it means the "arrangement" and "arrangement order" of the data, and it defines the order in which the data is stored when it is written to the memory.

As far as I can tell, the byte order most often belongs to "big endian" or "little endian", so you need to find out which one.

In this case, I found out that it is little endian because there is the following description in the "** Basic Structure **" column of the reference page.

The file is a binary file with little endian format. Float values are stored as IEEE 754 floating-point "single format".

Byte block

Finally, we will read the data after the header. On the reference page, the data type and length of each block are defined as shown in the table below (partial excerpt).

image.png

First, look at the data description column in the rightmost column and select the data you want to extract. After deciding the data to be extracted, check the data type (variable type) and offset value.

The offset is the information of the position relative to the reference point and represents the address of the data. Since there are 144 bytes in a set of data this time, it means that the number of bytes in which the data is written is shown.

This should close the 144 byte frame.

Python struct module

I've organized a lot about binary data above. We use a module called ** struct ** to handle this binary data. Official documentation

About byte order

The official document has the following table, which defines the characters that represent the byte order. image.png

About data format

As confirmed in the [Byte block](#Byte block) chapter, each data has its own data type (valiable type). It is necessary to change the processing method depending on the data type, but in struct, as long as you pass the data type information After that, it feels like it will do whatever it takes to match the mold.

However, the format may differ from the official document, so you need to read it accordingly. In this case, it will be as follows.

short int → unsigned short(H)
int       → unsigned long(L)
byte(int) → unsigned char(B)

image.png

Reading binary data

It can be read by setting the option when opening the file to rb (read binary).

with open(file_name, 'br') as f:
   data = f.read()

Interpretation of binary data

You can convert the read binary data by using the struct.unpack_from () function. The basic format is struct.unpack_from (data type, data, offset). I already know the data type and offset, so all I have to do is specify it!

Below is the big picture. It looks longer than I expected, but basically I adjust the header first and then repeat the work of unpacking by offsetting by the number of items.

import sys
import struct

OLAR_EARTH_RADIUS = 6356752.3142
# PI = Math: : PI
MAX_UINT4 = 4294967295
FT2M = 1/3.2808399  # factor for feet to meter conversions
KN2KM = 1/1.852     # factor for knots to km conversions

args = sys.argv

if args[1] == '':
    print('Usage:  python sl2decoder.py your_file.sl2')

block_offset = 0

#Shift the header by 10 bytes
block_offset += 10   

# Datatypes:
# ===================================================================================================
# Type    Definition                                          Directive for Python's String#unpack
# ---------------------------------------------------------------------------------------------------
# byte 	  UInt8                                               B
# short   UInt16LE                                            H
# int 	  UInt32LE                                            L
# float   FloatLE (32 bits IEEE 754 floating point number)    f
# flags   UInt16LE                                            H
# ---------------------------------------------------------------------------------------------------

#Define offset and data type for each item
block_def = {
    'blockSize'         : {'offset': 26, 'type': '<H'},
    #  'lastBlockSize': {'offset': 28, 'type': '<H'},
    'channel'           : {'offset': 30, 'type': '<H'},
    'packetSize'        : {'offset': 32, 'type': '<H'},
    'frameIndex'        : {'offset': 34, 'type': '<L'},
    'upperLimit'        : {'offset': 38, 'type': '<f'},
    'lowerLimit'        : {'offset': 42, 'type': '<f'},
    'frequency'         : {'offset': 51, 'type': '<B'},
    #  'time1': {'offset': 58, 'type': '<H'}          # unknown resolution, unknown epoche
    'waterDepthFt'      : {'offset': 62, 'type': '<f'},  # in feet
    'keelDepthFt'       : {'offset': 66, 'type': '<f'},  # in feet
    'speedGpsKnots'     : {'offset': 98, 'type': '<f'},  # in knots
    'temperature'       : {'offset': 102, 'type': '<f'}, # in °C
    'lowrance_longitude': {'offset': 106, 'type': '<L'}, # Lowrance encoding (easting)
    'lowrance_latitude' : {'offset': 110, 'type': '<L'}, # Lowrance encoding (northing)
    'speedWaterKnots'   : {'offset': 114, 'type': '<f'}, # from "water wheel sensor" if present, else GPS value(?)
    'courseOverGround'  : {'offset': 118, 'type': '<f'}, # ourseOverGround in radians
    'altitudeFt'        : {'offset': 122, 'type': '<f'}, # in feet
    'heading'           : {'offset': 126, 'type': '<f'}, # in radians
    'flags'             : {'offset': 130, 'type': '<H'},
    #  'time': {'offset': 138, 'type': '<H', 'len': 4}          # unknown resolution, unknown epoche
}

with open('%s_output_py.csv' % args[0], 'w') as f_raw:
    title = ','.join(['Channel', 'Frequency', 'UpperLimit[ft]', 'LowerLimit[ft]', 'Depth[ft]', 'WaterTemp[C]', 'WaterSpeed[kn]',
 'PositionX', 'PositionY', 'Speed[kn]', 'Track[rad]','Altitude[ft]', 'Heading[rad]']) + '\n'
    f_raw.write(title)

    alive_counter = 0

    with open(args[1], 'br') as f:
        data = f.read()
        sl2_file_size = len(data)

        while block_offset < sl2_file_size:
            h = {}
            if alive_counter % 100 == 0:
                print('%d done...' % round(100.0*block_offset/sl2_file_size))

            for k, v in block_def.items():
                t_offset = block_offset + v['offset']
                h[k] = struct.unpack_from(v['type'], data, t_offset)

            print(h['blockSize'])
            block_offset += h['blockSize'][0]

            #Combine into one line of data
            csv_line = ','.join([str(h['channel'][0]), str(h['frequency'][0]), 
                                 str(h['upperLimit'][0]), str(h['lowerLimit'][0]), 
                                 str(h['waterDepthFt'][0]), str(h['temperature'][0]), 
                                 str(h['speedWaterKnots'][0]), str(h['lowrance_longitude'][0]), 
                                 str(h['lowrance_latitude'][0]), str(h['speedGpsKnots'][0]), 
                                 str(h['courseOverGround'][0]), str(h['altitudeFt'][0]), 
                                 str(h['heading'][0])]) + '\n'

            f_raw.write(csv_line)

print('Read up to block_offset %d' % block_offset)


Recommended Posts

Try working with binary data in Python
Working with LibreOffice in Python
Working with sounds in Python
Try working with Mongo in Python on Mac
Working with LibreOffice in Python: import
Working with DICOM images in Python
Try scraping the data of COVID-19 in Tokyo with Python
Get additional data in LDAP with python
Try logging in to qiita with Python
Post Test 3 (Working with PosgreSQL in Python)
Working with 3D data structures in pandas
Data analysis with python 2
Try scraping with Python.
Binary search in Python
Try gRPC in Python
Binary search with python
Binary search with Python3
Try 9 slices in Python
Binary search in Python (binary search)
Data analysis with Python
Read table data in PDF file with Python
A story stuck with handling Python binary data
[Introduction for beginners] Working with MySQL in Python
Sample data created with python
Handle Ambient data in Python
Scraping with selenium in Python
Scraping with chromedriver in python
Specific sample code for working with SQLite3 in Python
Display UTM-30LX data in Python
Debugging with pdb in Python
Try Python output with Haxe 3.2
Get Youtube data with python
Try implementing associative memory with Hopfield network in Python
Python: Working with Firefox with selenium
Try embedding Python in a C ++ program with pybind11
Scraping with Selenium in Python
Try LINE Notify in Python
Binary search in Python / C ++
Algorithm in Python (binary search)
Scraping with Tor in Python
Tweet with image in Python
Combined with permutations in Python
Location information data display in Python --Try plotting with the map display library (folium)-
Try running Python with Try Jupyter
Try implementing Yubaba in Python 3
Try face recognition with Python
Read json data with python
Try running python in a Django environment created with pipenv
Try sorting your own objects with priority queue in Python
[Homology] Count the number of holes in data with Python
Get Leap Motion data in Python.
Number recognition in images with Python
Write a binary search in Python
Try scraping with Python + Beautiful Soup
Testing with random numbers in Python
GOTO in Python with Sublime Text 3
Read Protocol Buffers data in Python3
Get data from Quandl in Python
Save the binary file in Python
Scraping with Selenium in Python (Basic)
CSS parsing with cssutils in Python