Japan Meteorological Agency analysis rainfall (GRIB2 format) is handled by Python as a Numpy array

I want to read the analysis rainfall in Python

The bin file of 1km mesh analyzed rainfall (2006-) provided by the Japan Meteorological Agency / Meteorological Business Support Center is in GRIB2 format. However, as of April 2020, it cannot be handled by GDAL or pygrib because it contains its own extensions. There is also a method of binary dumping using wgrib2 that supports analysis rainfall or converting it to NetCDF and reading it, but if you want to extract only the numerical value, it is more convenient to read it directly.

Analysis rainfall GRIB2 file structure

The data consists of ** 8 ** (Sections 0 to 8, of which Section 2 is omitted), of which the integer level value representing precipitation in Section 7 is run-length compressed west. 2560 elements are stored from north to east [^ 1]. Precipitation is converted by converting this level value based on a look-up table with the ** data representative value ** (10 times the precipitation representing the 1 km grid) defined in Section 5, and multiplying it by 1/10. You can get it by

Each data section

The specifications of the binary data are the Technical Information on Distribution Materials (Meteorological Edition) No. 238 and No. 193 Revised Edition of the Japan Meteorological Agency. It is described in /238.pdf). The templates used in each section are specified in International Meteorological Report Ceremony / Separate Volume.

Each section has a fixed-length section and a variable-length section, and the section length (number of bytes) and section number are stored at the beginning of the section. To read a particular section, start reading by adding the lengths of the previous section. In the analysis rainfall data format, Sections 0, 1, 3, 4, and 6 are defined as fixed lengths of 16, 21, 72, 82, and 6 bytes in order, and Sections 5 and 7 are the number of levels and precipitation. It is variable according to the distribution. However, Section 5 is actually 213 bytes (= 17 + 2 * 98) because the current analysis rainfall has not changed at the maximum level of 98.

Data representative value

The following table shows the correspondence between the level values and data representative values defined in Section 5.

level Data representative value Corresponding rainfall
1 0 No rainfall, 0mm
2 4 Minimum value, 0.Less than 5mm
3 10 1.0mm
(...) (In 10 increments) (1mm increments)
79 770 77.0mm
80 800 80.0mm
(...) (In 50 increments) (In 5mm increments)
89 1250 125.0mm
90 1300 130.0mm
(...) (In 100 increments) (In 10mm increments)
97 2000 200.0mm
98 2550 205.0 mm or more

In addition to this, Section 7 includes level 0 as a "missing value". When decoding as an integer value, it is convenient to assign an appropriate value, so in the code below, it is set to "-10".

Run-length encoding

The run-length coding method used in Section 7 expresses the number of iterations by the sum of powers. [Explanation of run-length coding method](http://www.jmbsc.or.jp/jp/jp /online/joho-sample/Run-Length_Encoding.pdf) Expands based on the formula.

Reference: Toyoda, E., 2012: GRIB2 templates for JMA Radar / Rain gauge-Analyzed Precipitation data: PDT 4.50008 and DRT 5.200

Before expansion, it is a byte string of unsigned integers (0 to 255). The level value from 0 to the highest value (<= 98) or less that appears is the level value as it is, and the value from 0 to 255 corresponding to the number of repetitions of the previous level value. If the latter is continuous, the sum of the powerful numbers raised in order is the number of repetitions.

Decode

The following is the precipitation included in the sample of the analysis rainfall file downloaded from Meteorological Business Support Center. Code to read as Numpy's ndarray (North-South 3360x East-West 2560).

from itertools import repeat
import struct
import numpy as np

def set_table(section5):
    max_level = struct.unpack_from('>H', section5, 15)[0]
    table = (
        -10, # define representative of level 0 (Missing Value)
        *struct.unpack_from('>'+str(max_level)+'H', section5, 18)
    )
    return np.array(table, dtype=np.int16)

def decode_runlength(code, hi_level):
    for raw in code:
        if raw <= hi_level:
            level = raw
            pwr = 0
            yield level
        else:
            length = (0xFF - hi_level)**pwr * (raw - (hi_level + 1))
            pwr += 1
            yield from repeat(level, length)

def load_jmara_grib2(file):
    with open(file, 'rb') as f:
        binary = f.read()

    len_ = {'sec0':16, 'sec1':21, 'sec3':72, 'sec4':82, 'sec6':6}
    
    end4 = len_['sec0'] + len_['sec1'] + len_['sec3'] + len_['sec4'] - 1
    len_['sec5'] = struct.unpack_from('>I', binary, end4+1)[0]
    section5 = binary[end4:(end4+len_['sec5']+1)]

    end6 = end4 + len_['sec5'] + len_['sec6']
    len_['sec7'] = struct.unpack_from('>I', binary, end6+1)[0]
    section7 = binary[end6:(end6+len_['sec7']+1)]
    
    highest_level = struct.unpack_from('>H', section5, 13)[0]
    level_table = set_table(section5)
    decoded = np.fromiter(
        decode_runlength(section7[6:], highest_level), dtype=np.int16
    ).reshape((3360, 2560))

    # convert level to representative
    return level_table[decoded]

file = rb'./SRF_GPV_Ggis1km_Prr60lv_ANAL_20180707/Z__C_RJTD_20180707000000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin'
rain = load_jmara_grib2(file) / 10
When writing in Fortran

If you write the above code in Fortran and make it a library, it will be about 40 times faster.

jmara.f90


subroutine load_jmara_grib2(filepath, decoded_2d)
    implicit none
    character(*), intent(in) :: filepath
    integer, parameter :: len_sec0 = 16, len_sec1 = 21, len_sec2 = 0
    integer, parameter :: len_sec3 = 72, len_sec4 = 82, len_sec6 = 6
    integer :: len_sec5, len_sec7, pos, filesize

    integer, parameter :: x_n = 2560, y_n = 3360
    integer(1), allocatable :: binary(:)
    integer(2), allocatable :: table(:)
    integer(2) :: highest_level, decoded(x_n*y_n)
    integer(2), intent(out) :: decoded_2d(x_n, y_n)

    inquire(file=filepath, size=filesize)
    allocate(binary(filesize))
    open(100, file=trim(filepath), status='old', access='direct', form='unformatted', recl=filesize)
    read(100, rec=1) binary
    close(100)

    pos = len_sec0 + len_sec1 + len_sec2 + len_sec3 + len_sec4
    len_sec5 = pack_int32(binary(pos+1), binary(pos+2), binary(pos+3), binary(pos+4))
    highest_level = pack_int16(binary(pos+13), binary(pos+14))
    call set_table(binary(pos+1 : pos+len_sec5), table)

    pos = pos + len_sec5 + len_sec6
    len_sec7 = pack_int32(binary(pos+1), binary(pos+2), binary(pos+3), binary(pos+4))
    call decode_runlength(binary(pos+6 : pos+len_sec7), highest_level, decoded)

    where (decoded == 0)
        decoded = -10   ! define representative of level 0 (Missing Value)
    elsewhere
        decoded = table(decoded)
    endwhere

    decoded_2d = reshape(decoded, (/ x_n, y_n /))

contains

    subroutine set_table(section5, table)
        implicit none
        integer(1), intent(in) :: section5(:)
        integer(2), allocatable, intent(out) :: table(:)
        integer(2) :: max_level, level

        max_level = pack_int16(section5(15), section5(16))
        allocate(table(max_level))
        do level = 1, max_level
            table(level) = pack_int16(section5(16 + 2*level), section5(17 + 2*level))
        end do
    end subroutine

    subroutine decode_runlength(code, hi_level, decoded)
        implicit none
        integer(1), intent(in) :: code(:)
        integer(2), intent(in) :: hi_level
        integer(2), intent(out) :: decoded(0:)
        integer :: i, pwr, offset, length
        integer(2) :: raw, level
        
        offset = 0
        do i = 1, size(code)
            raw = get_int16(code(i))
            if(raw <= hi_level) then
                level = raw
                pwr = 0
                decoded(offset) = level
                offset = offset + 1
            else
                length = (255 - hi_level)**pwr * (raw - (hi_level + 1))
                pwr = pwr + 1
                decoded(offset : offset+length-1) = level
                offset = offset + length
            end if
        end do
    end subroutine

    integer(2) function pack_int16(arg1, arg2)
        implicit none
        integer(1) :: arg1, arg2

        pack_int16 = ishft( iand(int(B'11111111',2), int(arg1, 2)), 8 )     &
                   +        iand(int(B'11111111',2), int(arg2, 2))
    end function

    integer(4) function pack_int32(arg1, arg2, arg3, arg4)
        implicit none
        integer(1) :: arg1, arg2, arg3, arg4

        pack_int32 = ishft( iand(int(B'11111111',4), int(arg1, 4)), 8*3)    &
                   + ishft( iand(int(B'11111111',4), int(arg2, 4)), 8*2)    &
                   + ishft( iand(int(B'11111111',4), int(arg3, 4)), 8  )    &
                   +        iand(int(B'11111111',4), int(arg4, 4))
    end function

    integer(2) function get_int16(arg1)
        implicit none
        integer(1) :: arg1

        get_int16 = iand(int(B'11111111',2), int(arg1, 2))
    end function
end subroutine

jmara.Compile to dll


gfortran -shared jmara.f90 -Ofast -march=native -fopt-info -o ./jmara.dll

Call from Python with ctypes


import ctypes
import numpy as np

def load_jmara_grib2(file):
    lib = np.ctypeslib.load_library('jmara.dll', '.')

    lib.load_jmara_grib2_.argtypes = [
        ctypes.c_char_p,
        np.ctypeslib.ndpointer(dtype=np.int16),
        ctypes.c_size_t     # hidden argument for character length
        ]

    lib.load_jmara_grib2_.restype = ctypes.c_void_p
    representative = np.zeros((3360, 2560), dtype=np.int16, order='C')

    lib.load_jmara_grib2_(
        ctypes.c_char_p(file),
        representative,
        ctypes.c_size_t(len(file))
        )
    
    return representative

file = rb'./SRF_GPV_Ggis1km_Prr60lv_ANAL_20180707/Z__C_RJTD_20180707000000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin'
rain = load_jmara_grib2(file) / 10
Simple drawing
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))
cmap=plt.cm.jet
cmap.set_under('white')
plt.imshow(rain, aspect=8/12, vmin=0.4, cmap=cmap)
plt.colorbar()
plt.show()

png

Draw with Matplotlib and Basemap

For the obtained two-dimensional array rain of precipitation, generate latitude and longitude and draw with Matplotlib. Here, considering that the data is a representative value in the grid, the coordinates are divided into 80 equal parts of the cubic mesh grid (80 equal parts of 1 degree in the longitude direction and 40 minutes (= 1 / 1.5 degrees) in the latitude direction). ) Is set to be the center (1/2).

Install Basemap (Anaconda environment)
conda install -c conda-forge basemap basemap-data-hires
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

lon = np.linspace(118, 150, 2560, endpoint=False) + 1/80 / 2
lat = np.linspace(48, 20, 3360, endpoint=False) - 1/80/1.5 / 2
X, Y = np.meshgrid(lon, lat)

fig, ax = plt.subplots(figsize=(10,8))
ax.set_title('JMA Radar/Raingauge-Analyzed Precipitation\n2018-07-07T00:00:00+00:00')
m = Basemap(llcrnrlat=20, urcrnrlat=48, 
            llcrnrlon=118, urcrnrlon=150,
            resolution='l', ax=ax)
m.drawmeridians(np.arange(0, 360, 5), labels=[1, 0, 0, 1], linewidth=0)
m.drawparallels(np.arange(-90, 90, 5), labels=[1, 0, 1, 0], linewidth=0)
m.drawcoastlines(color='black')

levels = [0, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80]
colors = ['whitesmoke', 'gainsboro', 'darkgray', 'cyan', 'deepskyblue', 
          'blue', 'lime', 'yellow', 'orange', 'red']

im = m.contourf(X, Y, rain, levels, colors=colors, extend='both')
im.cmap.set_under('white')
im.cmap.set_over('magenta')
cb = m.colorbar(im, 'right', ticks=levels, size='2.5%')
cb.set_label('mm/hr')

plt.show()

GrADS風コンター


If you expand one file, it will be 65.6MB in np.float64. If you return the value as it is 10 times, it will be about 16.4MB of np.int16.

File path generation (specified period)

Data media purchased from the Meteorological Business Support Center are included in the annual disk in a hierarchy such as DATA / yyyy / mm / dd. Now if you save this yearly as parent_dir / yyyy / DATA / yyyy / mm / dd, you can generate the paths for individual bin files within the specified time period as follows:

import datetime

def half_hour_range(start, end):
    for n in range(int((end - start).total_seconds() / 3600)):
        yield start + datetime.timedelta(hours=n)
        yield start + datetime.timedelta(hours=n, minutes=30)

def jmara_path(time, parent_dir='.'):
    path = parent_dir.rstrip('/').rstrip('\\') + (
        time.strftime('/%Y/DATA/%Y/%m/%d/')
        + 'Z__C_RJTD_'
        + time.strftime('%Y%m%d%H%M%S')
        + '_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin'
    )
    return path

start = datetime.datetime(2018, 7, 1, 0, 0, 0)
end = datetime.datetime(2018, 9, 1, 0, 0, 0)
parent_dir = 'G:'

filepaths = [jmara_path(t, parent_dir) for t in half_hour_range(start, end)]

filepaths


['G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701000000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701003000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701010000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701013000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701020000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 (...)
 'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831213000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831220000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831223000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831230000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
 'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831233000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin']

[^ 1]: In the 72nd octet of Section 4, 0x00 is specified as the scanning mode, which is the first data at the northwestern end of the range, borrowing the representation of format.txt attached to the analysis latitude. Yes, if you line up from west to east at the same latitude and advance by the number of grid points along the latitude line, you will move to the data at the western end of one grid south, and the data of the same latitude will be lined up again by the number of grid points along the latitude line.

Recommended Posts

Japan Meteorological Agency analysis rainfall (GRIB2 format) is handled by Python as a Numpy array
Create a python numpy array
How to sort by specifying a column in the Python Numpy array.
Why Python slicing is represented by a colon (:)