The bin file of 1km mesh analyzed rainfall (2006-) provided by the Japan Meteorological Agency / Meteorological Business Support Center is in GRIB2 format. However, as of April 2020, it cannot be handled by GDAL or pygrib because it contains its own extensions. There is also a method of binary dumping using wgrib2 that supports analysis rainfall or converting it to NetCDF and reading it, but if you want to extract only the numerical value, it is more convenient to read it directly.
The data consists of ** 8 ** (Sections 0 to 8, of which Section 2 is omitted), of which the integer level value representing precipitation in Section 7 is run-length compressed west. 2560 elements are stored from north to east [^ 1]. Precipitation is converted by converting this level value based on a look-up table with the ** data representative value ** (10 times the precipitation representing the 1 km grid) defined in Section 5, and multiplying it by 1/10. You can get it by
The specifications of the binary data are the Technical Information on Distribution Materials (Meteorological Edition) No. 238 and No. 193 Revised Edition of the Japan Meteorological Agency. It is described in /238.pdf). The templates used in each section are specified in International Meteorological Report Ceremony / Separate Volume.
Each section has a fixed-length section and a variable-length section, and the section length (number of bytes) and section number are stored at the beginning of the section. To read a particular section, start reading by adding the lengths of the previous section. In the analysis rainfall data format, Sections 0, 1, 3, 4, and 6 are defined as fixed lengths of 16, 21, 72, 82, and 6 bytes in order, and Sections 5 and 7 are the number of levels and precipitation. It is variable according to the distribution. However, Section 5 is actually 213 bytes (= 17 + 2 * 98) because the current analysis rainfall has not changed at the maximum level of 98.
The following table shows the correspondence between the level values and data representative values defined in Section 5.
level | Data representative value | Corresponding rainfall |
---|---|---|
1 | 0 | No rainfall, 0mm |
2 | 4 | Minimum value, 0.Less than 5mm |
3 | 10 | 1.0mm |
(...) | (In 10 increments) | (1mm increments) |
79 | 770 | 77.0mm |
80 | 800 | 80.0mm |
(...) | (In 50 increments) | (In 5mm increments) |
89 | 1250 | 125.0mm |
90 | 1300 | 130.0mm |
(...) | (In 100 increments) | (In 10mm increments) |
97 | 2000 | 200.0mm |
98 | 2550 | 205.0 mm or more |
In addition to this, Section 7 includes level 0 as a "missing value". When decoding as an integer value, it is convenient to assign an appropriate value, so in the code below, it is set to "-10".
The run-length coding method used in Section 7 expresses the number of iterations by the sum of powers. [Explanation of run-length coding method](http://www.jmbsc.or.jp/jp/jp /online/joho-sample/Run-Length_Encoding.pdf) Expands based on the formula.
Before expansion, it is a byte string of unsigned integers (0 to 255). The level value from 0 to the highest value (<= 98) or less that appears is the level value as it is, and the value from 0 to 255 corresponding to the number of repetitions of the previous level value. If the latter is continuous, the sum of the powerful numbers raised in order is the number of repetitions.
The following is the precipitation included in the sample of the analysis rainfall file downloaded from Meteorological Business Support Center. Code to read as Numpy's ndarray (North-South 3360x East-West 2560).
from itertools import repeat
import struct
import numpy as np
def set_table(section5):
max_level = struct.unpack_from('>H', section5, 15)[0]
table = (
-10, # define representative of level 0 (Missing Value)
*struct.unpack_from('>'+str(max_level)+'H', section5, 18)
)
return np.array(table, dtype=np.int16)
def decode_runlength(code, hi_level):
for raw in code:
if raw <= hi_level:
level = raw
pwr = 0
yield level
else:
length = (0xFF - hi_level)**pwr * (raw - (hi_level + 1))
pwr += 1
yield from repeat(level, length)
def load_jmara_grib2(file):
with open(file, 'rb') as f:
binary = f.read()
len_ = {'sec0':16, 'sec1':21, 'sec3':72, 'sec4':82, 'sec6':6}
end4 = len_['sec0'] + len_['sec1'] + len_['sec3'] + len_['sec4'] - 1
len_['sec5'] = struct.unpack_from('>I', binary, end4+1)[0]
section5 = binary[end4:(end4+len_['sec5']+1)]
end6 = end4 + len_['sec5'] + len_['sec6']
len_['sec7'] = struct.unpack_from('>I', binary, end6+1)[0]
section7 = binary[end6:(end6+len_['sec7']+1)]
highest_level = struct.unpack_from('>H', section5, 13)[0]
level_table = set_table(section5)
decoded = np.fromiter(
decode_runlength(section7[6:], highest_level), dtype=np.int16
).reshape((3360, 2560))
# convert level to representative
return level_table[decoded]
file = rb'./SRF_GPV_Ggis1km_Prr60lv_ANAL_20180707/Z__C_RJTD_20180707000000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin'
rain = load_jmara_grib2(file) / 10
If you write the above code in Fortran and make it a library, it will be about 40 times faster.
jmara.f90
subroutine load_jmara_grib2(filepath, decoded_2d)
implicit none
character(*), intent(in) :: filepath
integer, parameter :: len_sec0 = 16, len_sec1 = 21, len_sec2 = 0
integer, parameter :: len_sec3 = 72, len_sec4 = 82, len_sec6 = 6
integer :: len_sec5, len_sec7, pos, filesize
integer, parameter :: x_n = 2560, y_n = 3360
integer(1), allocatable :: binary(:)
integer(2), allocatable :: table(:)
integer(2) :: highest_level, decoded(x_n*y_n)
integer(2), intent(out) :: decoded_2d(x_n, y_n)
inquire(file=filepath, size=filesize)
allocate(binary(filesize))
open(100, file=trim(filepath), status='old', access='direct', form='unformatted', recl=filesize)
read(100, rec=1) binary
close(100)
pos = len_sec0 + len_sec1 + len_sec2 + len_sec3 + len_sec4
len_sec5 = pack_int32(binary(pos+1), binary(pos+2), binary(pos+3), binary(pos+4))
highest_level = pack_int16(binary(pos+13), binary(pos+14))
call set_table(binary(pos+1 : pos+len_sec5), table)
pos = pos + len_sec5 + len_sec6
len_sec7 = pack_int32(binary(pos+1), binary(pos+2), binary(pos+3), binary(pos+4))
call decode_runlength(binary(pos+6 : pos+len_sec7), highest_level, decoded)
where (decoded == 0)
decoded = -10 ! define representative of level 0 (Missing Value)
elsewhere
decoded = table(decoded)
endwhere
decoded_2d = reshape(decoded, (/ x_n, y_n /))
contains
subroutine set_table(section5, table)
implicit none
integer(1), intent(in) :: section5(:)
integer(2), allocatable, intent(out) :: table(:)
integer(2) :: max_level, level
max_level = pack_int16(section5(15), section5(16))
allocate(table(max_level))
do level = 1, max_level
table(level) = pack_int16(section5(16 + 2*level), section5(17 + 2*level))
end do
end subroutine
subroutine decode_runlength(code, hi_level, decoded)
implicit none
integer(1), intent(in) :: code(:)
integer(2), intent(in) :: hi_level
integer(2), intent(out) :: decoded(0:)
integer :: i, pwr, offset, length
integer(2) :: raw, level
offset = 0
do i = 1, size(code)
raw = get_int16(code(i))
if(raw <= hi_level) then
level = raw
pwr = 0
decoded(offset) = level
offset = offset + 1
else
length = (255 - hi_level)**pwr * (raw - (hi_level + 1))
pwr = pwr + 1
decoded(offset : offset+length-1) = level
offset = offset + length
end if
end do
end subroutine
integer(2) function pack_int16(arg1, arg2)
implicit none
integer(1) :: arg1, arg2
pack_int16 = ishft( iand(int(B'11111111',2), int(arg1, 2)), 8 ) &
+ iand(int(B'11111111',2), int(arg2, 2))
end function
integer(4) function pack_int32(arg1, arg2, arg3, arg4)
implicit none
integer(1) :: arg1, arg2, arg3, arg4
pack_int32 = ishft( iand(int(B'11111111',4), int(arg1, 4)), 8*3) &
+ ishft( iand(int(B'11111111',4), int(arg2, 4)), 8*2) &
+ ishft( iand(int(B'11111111',4), int(arg3, 4)), 8 ) &
+ iand(int(B'11111111',4), int(arg4, 4))
end function
integer(2) function get_int16(arg1)
implicit none
integer(1) :: arg1
get_int16 = iand(int(B'11111111',2), int(arg1, 2))
end function
end subroutine
jmara.Compile to dll
gfortran -shared jmara.f90 -Ofast -march=native -fopt-info -o ./jmara.dll
Call from Python with ctypes
import ctypes
import numpy as np
def load_jmara_grib2(file):
lib = np.ctypeslib.load_library('jmara.dll', '.')
lib.load_jmara_grib2_.argtypes = [
ctypes.c_char_p,
np.ctypeslib.ndpointer(dtype=np.int16),
ctypes.c_size_t # hidden argument for character length
]
lib.load_jmara_grib2_.restype = ctypes.c_void_p
representative = np.zeros((3360, 2560), dtype=np.int16, order='C')
lib.load_jmara_grib2_(
ctypes.c_char_p(file),
representative,
ctypes.c_size_t(len(file))
)
return representative
file = rb'./SRF_GPV_Ggis1km_Prr60lv_ANAL_20180707/Z__C_RJTD_20180707000000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin'
rain = load_jmara_grib2(file) / 10
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
cmap=plt.cm.jet
cmap.set_under('white')
plt.imshow(rain, aspect=8/12, vmin=0.4, cmap=cmap)
plt.colorbar()
plt.show()
For the obtained two-dimensional array rain
of precipitation, generate latitude and longitude and draw with Matplotlib. Here, considering that the data is a representative value in the grid, the coordinates are divided into 80 equal parts of the cubic mesh grid (80 equal parts of 1 degree in the longitude direction and 40 minutes (= 1 / 1.5 degrees) in the latitude direction). ) Is set to be the center (1/2).
conda install -c conda-forge basemap basemap-data-hires
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
lon = np.linspace(118, 150, 2560, endpoint=False) + 1/80 / 2
lat = np.linspace(48, 20, 3360, endpoint=False) - 1/80/1.5 / 2
X, Y = np.meshgrid(lon, lat)
fig, ax = plt.subplots(figsize=(10,8))
ax.set_title('JMA Radar/Raingauge-Analyzed Precipitation\n2018-07-07T00:00:00+00:00')
m = Basemap(llcrnrlat=20, urcrnrlat=48,
llcrnrlon=118, urcrnrlon=150,
resolution='l', ax=ax)
m.drawmeridians(np.arange(0, 360, 5), labels=[1, 0, 0, 1], linewidth=0)
m.drawparallels(np.arange(-90, 90, 5), labels=[1, 0, 1, 0], linewidth=0)
m.drawcoastlines(color='black')
levels = [0, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80]
colors = ['whitesmoke', 'gainsboro', 'darkgray', 'cyan', 'deepskyblue',
'blue', 'lime', 'yellow', 'orange', 'red']
im = m.contourf(X, Y, rain, levels, colors=colors, extend='both')
im.cmap.set_under('white')
im.cmap.set_over('magenta')
cb = m.colorbar(im, 'right', ticks=levels, size='2.5%')
cb.set_label('mm/hr')
plt.show()
If you expand one file, it will be 65.6MB in np.float64. If you return the value as it is 10 times, it will be about 16.4MB of np.int16.
Data media purchased from the Meteorological Business Support Center are included in the annual disk in a hierarchy such as DATA / yyyy / mm / dd
. Now if you save this yearly as parent_dir / yyyy / DATA / yyyy / mm / dd
, you can generate the paths for individual bin files within the specified time period as follows:
import datetime
def half_hour_range(start, end):
for n in range(int((end - start).total_seconds() / 3600)):
yield start + datetime.timedelta(hours=n)
yield start + datetime.timedelta(hours=n, minutes=30)
def jmara_path(time, parent_dir='.'):
path = parent_dir.rstrip('/').rstrip('\\') + (
time.strftime('/%Y/DATA/%Y/%m/%d/')
+ 'Z__C_RJTD_'
+ time.strftime('%Y%m%d%H%M%S')
+ '_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin'
)
return path
start = datetime.datetime(2018, 7, 1, 0, 0, 0)
end = datetime.datetime(2018, 9, 1, 0, 0, 0)
parent_dir = 'G:'
filepaths = [jmara_path(t, parent_dir) for t in half_hour_range(start, end)]
filepaths
['G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701000000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701003000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701010000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701013000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/07/01/Z__C_RJTD_20180701020000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
(...)
'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831213000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831220000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831223000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831230000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin',
'G:/2018/DATA/2018/08/31/Z__C_RJTD_20180831233000_SRF_GPV_Ggis1km_Prr60lv_ANAL_grib2.bin']
[^ 1]: In the 72nd octet of Section 4, 0x00
is specified as the scanning mode, which is the first data at the northwestern end of the range, borrowing the representation of format.txt attached to the analysis latitude. Yes, if you line up from west to east at the same latitude and advance by the number of grid points along the latitude line, you will move to the data at the western end of one grid south, and the data of the same latitude will be lined up again by the number of grid points along the latitude line
.