Read the binary file output in Fortran with python

Numerical calculation is done in Fortran, and figures and analysis are for people called python. It is assumed that the output of the numerical calculation is a binary file. Also, in python, it is assumed that numpy is used for analysis. This article first describes the Fortran binary output format and then describes how to read it in python.

Fortran output format

There are three types of Fortran binary output formats: sequential (order search), direct (direct search), and stream. Let's look at each. Note that the binary output here refers to form = "unformatted". Not form = "binary". (I don't know much about form = "binary")

sequential (sequential search)

A format that writes from the beginning of the file. Every time you write, a 4-byte marker (which may be 8 bytes if it is old) is added to the beginning and end of the output. The number of bytes of output is entered at the beginning. It is necessary to read from the beginning (although it is not impossible to read with stream if you specify the number of bytes ...)

real(4) :: a=1,b=2
open(10,file="test.seq", form="unformatted", action="write",access="sequential")
write(10) a
write(10) b

direct

Output by specifying the record length (number of bytes; recl). When outputting, specify rec and specify the output position. Therefore, it is not always necessary to write from the beginning (although it is usually output from the beginning). It is convenient that you do not have to read from the beginning when reading. In the case of Intel's compiler (ifort), the default of recl is 4 bytes (for example, if recl = 4, 16 bytes are output). It is safe to fix -assume byte recl in byte units as an option.

real(4) :: a=1,b=2
open(10,file="test.dir", form="unformatted", action="write",access="direct",recl=4)
write(10,rec=1) a
write(10,rec=2) b

stream Stream I / O has been added since Fortran 2003. Similar to sequential, except that there are no markers at the beginning and end of the file.

open(10,file="test.stm", form="unformatted", action="write",access="stream")
write(10) a
write(10) b !Pos is automatically specified.

The output position can also be specified (number of bytes) using pos. If you specify pos when inputting, it is not always necessary to read from the beginning.

About endian

There are big endian and little endian. If not specified, the machine default will be used. Either one is fine, but it's safe to use them in a unified way so that you can understand them. The method to specify at compile time is as follows

$ gfortran -fconvert=big-endian test.f90
$ ifort -convert=big_endian -assume byterecl test.f90

It can also be specified with the open statement. Specify with convert (probably set as an extension in most compilers).

open(10, file="test.dat", form="unformatted", action="write, access="stream" , &
& convert="big_endian" )

Read with python

It can be read with the standard python library. Read the binary and convert it with np.frombuffer. If you create the following class, you can handle it for Fortran. Since the output is a one-dimensional array, convert it with reshape if necessary. The explanation of dtype is only typical. For more information [https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html#] And

symbol	meaning
>	Big endian
<	Little endian
i4	4-byte integer
f4	4-byte floating point
f8	8-byte floating point

Examples below. I'm going to read an example of reading a 4-byte real number with 200 elements.

sequential (sequential search)

import numpy as np
import struct
class seq_read :
    def __init__(self,filename, endian=">") :
        self.f = open(filename, "rb")
        self.endian = endian
        
    def read(self, dtype) :
        num, = struct.unpack(self.endian+"i",self.f.read(4))    
        data = np.frombuffer(self.f.read(num), dtype )
        num, = struct.unpack(self.endian+"i",self.f.read(4))            
        return data
    
    def rewind(self) :
        self.f.seek(0) 
        
    def __del__(self) :
        self.f.close()
### example ### 
f = seq_read("test.seq", endian=">" ) 
data = f.read(">i") #big endian 4-byte integer
f.rewind() #To the top of the file

direct

Since the type of direct access does not change and the record length is constant, set it when creating an instance.

import numpy as np
import struct
class dir_read :
    def __init__(self, filename, recl, dtype) : 
        self.f = open(filename, "rb")
        self.recl = recl
        self.dtype = dtype        
        
    def read(self, rec) : #rec starts from 1(Fortran-like)
        self.f.seek((rec-1)*self.recl)
        data = np.frombuffer(self.f.read(self.recl), self.dtype)
        return data
    
    def __del__(self) :
        self.f.close()
### example ### 
f2 = dir_read("test.dir",4*200,">f")
print(f2.read(2))

You can also read it using numpy.fromfile. Specify the number of bytes to start reading with offset, and specify the number of elements to read with dtype.

numpy 1.7 or later

import numpy as np
recl = 200
data = np.fromfile("test.dir",dtype=">"+str(recl)+"f4",count=1,offset=4*recl)[0]

stream

You can read it with seek and np.frombuffer used above. seek is the same as pos, so anyone who can use stream output in Fortran should be able to do it right away.

Read Fortran output in python

Read the binary file output in Fortran with python

Fortran output format

sequential (sequential search)

direct

About endian

Read with python

sequential (sequential search)

direct

numpy 1.7 or later