--Artificial satellite data refers to data acquired by ** "remote sensing" ** using artificial satellites. ――Since artificial satellite data requires specialized tools and large-capacity data processing infrastructure, the organizations that can be used have been limited to university institutions and some specialized institutions, but with the recent spread of open source libraries and data. By using the cloud processing platform, we have created an external environment where even general organizations can easily handle artificial satellite data. ――By using satellite data, you can expect to analyze various ** places, times, and target states ** that could not be acquired so far with big data. --Therefore, in this article, ** How to handle data **, without using ** specialized tools for satellite data analysis ** (using ** python **, which is one of the most familiar tools) ), I would like to introduce how to use it for free ** so that anyone can feel free to try it. ――In addition, this time, I selected a satellite data set that is easy to use for business and social implementation. I hope you will read it while imagining the scenes that can be applied.
――In this article, we will introduce how to use the satellite dataset using ** night light data ** as an example. Image and Data processing by NOAA's National Geophysical Data Center. DMSP data collected by the US Air Force Weather Agency.
――One of the data observed by artificial satellites is night light data. ――Simply put, it is the data that keeps sensing the amount of light in the city at night. -** Nocturnal light data has been reported to correlate with economic activity (GDP and energy consumption) **, and empirical research is underway as a proxy variable for economic activity. ――In particular, application to countries where it is difficult to measure the amount of economic activity in each city or measure accurate economic indicators is being considered.
――Night light data is measured by several artificial satellites, but this time we will use the well-established data set ** DMSP-OLS ** for the night light data research.
--At the time of download, artificial satellite data is huge in global size and region size. ――Therefore, it is common to cut out and use only the necessary area during analysis. --Two are required for the downloaded original data (raster data) and the data that specifies the cutout range (vector data). (Both can be downloaded on the WEB)
--Use the following library (this article will introduce only the minimum required functions) --Data loading: rasterio --Cut out the target area: geopandas --Numerical calculation: numpy --Visualization: matplotlib
―― 1. Data acquisition ―― 2. Read data ―― 3. Extraction of required area ―― 4. Data visualization / analysis-Visualize / compare night light data for each country-
-You can download it from DMSP-OLS data download. --Click the active link to start the download ――However, as stated in the annotation, the compressed file at the time of download is ** 300MB **, but when decompressed, it will be ** 3GB ** per file, so work on a general notebook PC. Somewhat tough ――Therefore, a series of processing from data download to decompression is performed on Google Colab. -(If you have a high-spec PC, you can download and unzip it with a click ...)
--First, install the necessary libraries on Google Colab.
#Installation of required libraries
!pip install sh
! pip install rasterio
!pip install geopandas
--Next, create a folder (directory) to save the downloaded data set.
import os
from sh import wget, gunzip, mv
import tarfile
#Directory for data storage'data'Create
if not os.path.exists('data'):
os.mkdir('data')
--Use the wget command to download the compressed data --Unzip the downloaded file --The file is double compressed with .gz and .zip, which is confusing, but eventually the tif file is decompressed.
#Create a URL to download
target_data = 'F101992'
url = f'https://ngdc.noaa.gov/eog/data/web_data/v4composites/{target_data}.v4.tar'
#Download data
wget(url)
#Decompress the compressed file (extract only the corresponding file)
with tarfile.open(f'/content/{target_data}.v4.tar') as tar:
# stable_lights.avg_vis.tif.Get the file name gz
file = [tarinfo for tarinfo in tar.getmembers() if tarinfo.name.endswith("web.stable_lights.avg_vis.tif.gz")]
#Unzip the target file (.gz)
tar.extractall(path='/content/', members=[file[0]])
#Unzip the target file (Unzip)
gunzip(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif.gz')
#Move target file
mv(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif', '/content/data/')
--Read the unzipped tif file --Easy to read with rasterio.open ('path to file') --You can get the data in numpy format by reading () the read object. --If you convert to numpy format, you can analyze and visualize it as you like.
import rasterio
import numpy as np
import matplotlib.pyplot as plt
with rasterio.open('/content/data/F101992.v4b_web.stable_lights.avg_vis.tif') as src:
data = src.read()#Read in numpy format
#Check the size of the data
data.shape
#Data visualization
plt.imshow(data[0])
plt.colorbar()
** Global night light data for 1992 **
――However, since it is in the downloaded state (global data) here, it is necessary to cut out the data in the unit you want to analyze.
--To extract the required area, you need the data to specify the cutout range. --There are various formats for the data that specifies the cutout range, but here we will use the geojson format data (other shape files are also famous). -You can download the border data for each country from the here site. --You can also click and download here, but use wget to download it on Google Colab.
#Download vector file
wget('https://datahub.io/core/geo-countries/r/countries.geojson')
--Next, load the obtained geojson file --By using geopandas, you can read geojson files as well as handle them like pandas.dataframe. --This will get the data of the boundary line of any area (country).
import geopandas as gpd
#Read geojson file
countries = gpd.read_file('/content/countries.geojson')
#Check the contents
countries.head()
#Visualization of border data
countries.plot()
#Extraction of Japanese borders (normal pandas.dataframe operation)
countries.query('ADMIN == "Japan"')
--Next, apply the acquired data of the border line of Japan to the global dataset and try to extract only the data of the Japan area. --Use a method called rasterio.mask to cut out the necessary parts. --rasterio.mask.mask (object reading "tif file", geometry column of "border data", crop = True) --This outputs two data, out_image and out_transform, and the data cut out in numpy format is stored in out_image (out_transform stores the coordinate conversion information of the cut out data, but it is a bit complicated, so here I will omit it)
import rasterio.mask
with rasterio.open('/content/data/F101992.v4b_web.stable_lights.avg_vis.tif') as src:
out_image, out_transform = rasterio.mask.mask(src, countries.query('ADMIN == "Japan"').geometry, crop=True)
――I will visualize the out_image cut out at the border of Japan
** Japan's night light data in 1992 ** (It can be seen that the amount of light in the Greater Tokyo area is large)
--So far, we have introduced how to obtain night light data for any country. ――I will try to visualize and analyze the night light data of some countries using the method introduced so far.
#Functionize a series of processes
def load_ntl(target_data, area):
#Download only when the data does not exist
if not os.path.exists(f'/content/data/{target_data}.v4b_web.stable_lights.avg_vis.tif'):
url = f'https://ngdc.noaa.gov/eog/data/web_data/v4composites/{target_data}.v4.tar'
#Download data
wget(url)
#Decompress the compressed file (extract only the corresponding file)
with tarfile.open(f'/content/{target_data}.v4.tar') as tar:
# stable_lights.avg_vis.tif.Get the file name gz
file = [tarinfo for tarinfo in tar.getmembers() if tarinfo.name.endswith("web.stable_lights.avg_vis.tif.gz")]
#Unzip the target file (.gz)
tar.extractall(path='/content/', members=[file[0]])
#Unzip the target file (Unzip)
gunzip(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif.gz')
#Move target file
mv(f'/content/{target_data}.v4b_web.stable_lights.avg_vis.tif', '/content/data/')
#Extract the data of the corresponding area from the TIF file
with rasterio.open(f'/content/data/{target_data}.v4b_web.stable_lights.avg_vis.tif') as src:
out_image, out_transform = rasterio.mask.mask(src, countries.query(f'ADMIN == "{area}"').geometry, crop=True)
return out_image
#Function for visualization
def show(data):
plt.figure(figsize=(15, 5))
plt.subplot(121)
plt.imshow(data[0])
plt.subplot(122)
plt.hist(data.reshape(-1), bins=np.arange(1, 63, 1))
#Usage example (acquired data from Japan in 1992)
japan_1992 = load_ntl(target_data='F101992', area='Japan')
-I will try to visualize the night light of various countries. (You can check the data of your favorite country / year by changing the variables of target_data and area)
#Obtained data from Japan, China, Thailand, and Cambodia
japan_1992 = load_ntl(target_data='F101992', area='Japan')
china_1992 = load_ntl(target_data='F101992', area='China')
thailand_1992 = load_ntl(target_data='F101992', area='Thailand')
cambodia_1992 = load_ntl(target_data='F101992', area='Cambodia')
#Visualization
show(japan_1992)
show(china_1992)
show(thailand_1992)
show(cambodia_1992 )
Japan nocturnal light data in 1992
1992 Chinese night light data
1992 Thai night light data
1992 Cambodian night light data
――In this way, Japan is generally bright as of 1992, China has a large land area, so there are large differences between cities, Thailand is bright with highways and local cities centered on Bangkok, and Cambodia is other than the capital. You can see the tendency of each country such as darkness. ――Here, I will only analyze the above pattern, but I think it would be interesting to compare other age groups and countries. --The research area uses the Sum of NTL (Night time Light) (total nighttime light intensity for each area) as an index, and comparative analysis is performed with the GDP and energy consumption index. --Also, you can analyze units smaller than the country (prefectures and municipalities) by downloading other geojson files (shape files).
――We have introduced the night light data so far, but some problems have also been reported in the night light data. ――For example, since the sensor type is different for each fixed institution (F10, F12, etc. in the header of the data name), slightly different sensor biases occur. Therefore, when analyzing the amount of light in a long-term trend, pretreatment is required to remove those biases (many research papers have been published in the process called Calibration). ――In addition, this data expresses the amount of night light as an integer value from 0 to 63, and while it is easy to handle, saturation phenomenon (Saturation) occurs at points where the amount of light is quite large, and the amount of light can be measured correctly. It also has the property of not being present. --Also, this time we introduced a satellite dataset called DMSP-OLS, but this satellite mission was completed in 2013, and now it is the successor to NPP VII RS. A satellite with a higher precision sensor called gov / VIIRS /) is in operation. ――Since satellite datasets are not all-purpose, it is necessary to refer to previous research when using them to understand these problems before using them for analysis.
――In this article, we introduced how to handle satellite datasets in the Google Colab environment. ――This time, we introduced the night light data set, but various satellite data sets are open to the public as open data, and you can basically handle them in the same way as this time. --New big data is possible by combining and analyzing ** micro-side big data ** and ** macro-side big data ** such as satellite datasets that are collected and accumulated by each company. I think that sex will be born. ――I hope that the various satellite datasets released as open data will be used more for the benefit of society.
Recommended Posts