The data in the title is distributed on this site. Very detailed and accurate elevation data (DEM) in Japan is available. What is downloaded is a zipped file for each secondary division area mesh defined by the Statistics Bureau of the Ministry of Internal Affairs and Communications. It contains elevation data in xml format. The regional mesh will be described later.
The code is at the end of this article. An example is as follows.
** (The file name is also used as meta information, so please apply it to the downloaded one) **
command
python zdem2tif.py FG-GML-1234-56-DEM10B.zip FG-GML-6543-21-DEM5A.zip ...
One geotiff image is output for each zip file (= secondary partition area mesh). Supports both 5m and 10m resolutions. We have confirmed the operation with python3 series. The gdal package must be installed.
This time, I made a lot of reference to the code placed in this site. However, if it is left as it is, a bug will occur in the case of data loss, so this code has been fixed.
There seem to be various other tools, but I came up with this code because it can be executed on commands, it can be shared with foreigners because it is written in English, and there is no need to extract the zip. ..
Also, as I noticed after posting, qiita already has Article on basic map information and python. The explanation of the file format is more detailed here.
The detailed definition is explained in this document. Regional meshes are defined in the 1st to 3rd order partitions according to the spatial resolution. The primary mesh is assigned a partition code of 4 digits, the secondary mesh is assigned an additional 2 digits, and the tertiary mesh is assigned an additional 2 digits. The latitude and longitude of the four corners of the compartment can be known using the number of the compartment code.
In the future, there may be changes in the specifications of the distributed data, and there is a possibility that it will not be supported in areas that the author has not confirmed, so I will briefly explain the code.
The whole flow (convert
) is
--Get area mesh information and DEM resolution from zip file name
--Define an array of output files (so far DEMOutArray
, mesh_info_from_zipname
)
--Loop about xml file in zip file (for filename in filelist
)
--Get the required information from the xml file (XMLContents
, read_xml
)
--Generate DEM array for each xml file (DEMInArray
)
--Overwrite output array (ʻupdate_raster) --Save in Tiff format (
save_raster`)
Calculate the latitude and longitude of the four corners by reading the 4-digit primary partition code and the 2-digit secondary partition code. Furthermore, the resolution is interpreted by whether the file name is written as DEM5X or DEM10X. For each resolution, the array size of one xml file is fixed (for example, in the case of 10mDEM, it is assumed that there is only one xml in the zip).
Create in advance the size of one secondary compartment. In the case of 5mDEM, it is assumed that a maximum of 10 x 10 xml (tertiary partition) is contained in one zip (secondary partition). However, it seems that there are many areas that have not been observed, and there are cases where only about 10 xml exist. An outlier of -9999. Was used for the location corresponding to the area where xml does not exist.
I first learned about the module that is included by default called zipfile
.
By using this, you can read the files in the zip without decompressing each one.
For 5mDEM, read the last two digits of the tertiary partition code in the file name to determine the insertion position in the output array. In the case of 10mDEM, as mentioned above, it is assumed that there is no description of the tertiary partition code in the first place and there is only one xml. Get the necessary information in text format for the time being. The rules of tags in xml etc. are assumed as in the code. If you read xml with different rules, it should appear as a bug immediately.
Generate an array using the information described in the xml file. File Specifications has a definition related to DEM data. According to it, the land division for each pixel is either {ground surface, surface surface, sea level, inland water surface, no data, etc.}, and the number of -9999 is entered for places where the value is undefined. ..
In the code originally referred to, DEM information is acquired depending on whether it is "ground surface" or not, but in this code, it is divided according to whether the DEM value in xml is -9999 or not (indicated by "Other"). Because there was a DEM value in the area. The reason for saying "other" is unknown). For DEM undefined locations, we applied outliers -1 and separated the use of outliers from cases where the xml file mentioned above did not exist in the first place.
Regarding the orientation of the array, it is assumed that the latitude direction is defined as northward. There is a data called gml: sequenceRule
in the data obtained from xml, and the fact that it is written as'+ x-y'
supports it. If this is '+ x + y'
etc., it's probably facing south, but I haven't divided the conditions around that.
There is nothing special to say.
Converted files downloaded with 5mDEM on the top and 10mDEM on the bottom for the same secondary plot area (almost unedited axis values, etc.). Negative values are masked. It can be seen that the abundance rate of data at each resolution differs depending on the region. I'd be happy if I knew it before I downloaded it, but isn't there any information missing? .. ..
Finally, the whole code.
zdem2tif.py
import os
import sys
import zipfile
import numpy as np
import xml.etree.ElementTree as et
import osgeo.gdal
import osgeo.osr
class XMLContents(object):
def __init__(self):
self.ns = {
'ns': 'http://fgd.gsi.go.jp/spec/2008/FGD_GMLSchema',
'gml': 'http://www.opengis.net/gml/3.2',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'xlink': 'http://www.w3.org/1999/xlink'
}
def read_xml(self, zf, filename, dem_mode):
if dem_mode == 'DEM10':
self.posix = 0
self.posiy = 0
elif dem_mode == 'DEM5':
_split = os.path.basename(filename).split('-')
self.posix = int(_split[4][1])
self.posiy = int(_split[4][0])
with zf.open(filename, 'r') as file:
self.text = file.read().decode('utf_8')
self.root = et.fromstring(self.text)
self.dem = self.root.find('ns:DEM', self.ns)
self.coverage = self.dem.find('ns:coverage', self.ns)
self.envelope = self.coverage.find(
'gml:boundedBy//gml:Envelope', self.ns)
self.lower = self.envelope.find('gml:lowerCorner', self.ns).text
self.upper = self.envelope.find('gml:upperCorner', self.ns).text
self.grid = self.coverage.find(
'gml:gridDomain//gml:Grid//gml:limits//gml:GridEnvelope', self.ns)
self.low = self.grid.find('gml:low', self.ns).text
self.high = self.grid.find('gml:high', self.ns).text
self.tuplelist = self.coverage.find(
'gml:rangeSet//gml:DataBlock//gml:tupleList', self.ns).text
self.gridfunc = self.coverage.find(
'gml:coverageFunction//gml:GridFunction', self.ns)
self.rule = self.gridfunc.find('gml:sequenceRule', self.ns)
self.start = self.gridfunc.find('gml:startPoint', self.ns).text
if self.rule.attrib['order'] != '+x-y':
print('warning sequence order not +x-y')
if self.rule.text != 'Linear':
print('warning sequence rule not Linear')
file.close()
class DEMInArray(object):
def __init__(self, contents):
self._noValue = -1.
self.llat, self.llon = np.array(
contents.lower.split(' '), dtype=np.float64)
self.ulat, self.ulon = np.array(
contents.upper.split(' '), dtype=np.float64)
self.lowx, self.lowy = np.array(
contents.low.split(' '), dtype=np.int)
self.highx, self.highy = np.array(
contents.high.split(' '), dtype=np.int)
self.sizex, self.sizey = self.get_size()
self.x_init, self.y_init = np.array(
contents.start.split(' '), dtype=np.int)
self.datapoints = contents.tuplelist.splitlines()
self.raster = self.get_raster()
def get_size(self):
sizex = self.highx - self.lowx + 1
sizey = self.highy - self.lowy + 1
return sizex, sizey
def get_raster(self):
_x, _y = self.x_init, self.y_init
raster = np.zeros([self.sizey, self.sizex])
raster.fill(self._noValue)
for dp in self.datapoints:
if _y > self.highy:
print('DEM datapoints are unexpectedly too long')
break
s = dp.split(',')
if len(s) == 1:
continue
desc, value = s[0], np.float64(s[1])
# if desc == 'Ground surface':
raster[_y, _x] = value if value > -9998 else self._noValue
_x += 1
if _x > self.highx:
_x = 0
_y += 1
return raster
class DEMOutArray(object):
def __init__(self):
self._fillValue = -9999.
def initialize_raster(self):
raster = np.zeros([self.sizey, self.sizex])
raster.fill(self._fillValue)
return raster
def get_trans(self):
# +x-y
return [self.llon, self.resx, 0, self.ulat, 0, -self.resy]
def update_raster(self, tile, posiy, posix):
xmin = posix * tile.shape[1]
ymin = posiy * tile.shape[0]
xmax = xmin + tile.shape[1]
ymax = ymin + tile.shape[0]
self.raster[ymin:ymax, xmin:xmax] = tile[::-1]
def get_size(self):
sizex = (self.highx - self.lowx + 1) * self.tilex
sizey = (self.highy - self.lowy + 1) * self.tiley
return sizex, sizey
def mesh_info_from_zipname(self, zipname):
'''
Ref: Table 4 of https://www.stat.go.jp/data/mesh/pdf/gaiyo1.pdf
'''
_split = os.path.basename(zipname).split('-')
_lat1 = int(_split[2][:2])
_lon1 = int(_split[2][2:])
_lat2 = int(_split[3][0])
_lon2 = int(_split[3][1])
self.llat = _lat1 * 0.66666666666 + _lat2 * 0.08333333333
self.llon = _lon1 + 100. + _lon2 * 0.125
self.ulat = self.llat + 0.08333333333
self.ulon = self.llon + 0.125
if _split[4][:5] == 'DEM10':
self.mode = 'DEM10'
self.tilex, self.tiley = 1, 1
self.lowx, self.lowy = 0, 0
self.highx, self.highy = 1124, 749
self.sizex, self.sizey = self.get_size()
self.resx, self.resy = 1.1111111111e-04, 1.1111111111e-04
elif _split[4][:4] == 'DEM5':
self.mode = 'DEM5'
self.tilex, self.tiley = 10, 10
self.lowx, self.lowy = 0, 0
self.highx, self.highy = 224, 149
self.sizex, self.sizey = self.get_size()
self.resx, self.resy = 5.5555555555e-05, 5.5555555555e-05
else:
raise ValueError('Unexpected definition of DEM:', _split[4])
self.raster = self.initialize_raster()
def save_raster(self, out_filename):
trans = self.get_trans()
srs = osgeo.osr.SpatialReference()
srs.ImportFromEPSG(4612)
driver = osgeo.gdal.GetDriverByName('GTiff')
output = driver.Create(out_filename, self.sizex, self.sizey,
1, osgeo.gdal.GDT_Float32)
output.GetRasterBand(1).WriteArray(self.raster)
output.GetRasterBand(1).SetNoDataValue(self._fillValue)
output.SetGeoTransform(trans)
output.SetProjection(srs.ExportToWkt())
output.FlushCache()
def convert(in_filename, out_filename):
output_array = DEMOutArray()
output_array.mesh_info_from_zipname(in_filename)
dem_mode = output_array.mode
with zipfile.ZipFile(in_filename, 'r') as zf:
filelist = zf.namelist()
for filename in filelist:
print('loading', filename)
contents = XMLContents()
contents.read_xml(zf, filename, dem_mode)
posix, posiy = contents.posix, contents.posiy
dem_tile = DEMInArray(contents)
output_array.update_raster(dem_tile.raster, posiy, posix)
zf.close()
print('save', out_filename)
output_array.save_raster(out_filename)
def main(argv):
for i in range(1, len(argv)):
convert(argv[i], argv[i].replace('.zip', '.tif'))
if __name__ == '__main__':
main(sys.argv)
Recommended Posts