This is,
KPI Item A:" Get used to terminal commands "Item A:" Get used to terminal commands "
It is an output article of.
This time, I used grep awk sort to process the data on the Linux server.
This time, in order to perform search and text processing, we obtained a sample CSV from the following. It is a data set that summarizes official statistics by region. This time, I deleted the header part and extracted only the data part and placed it on the Linux server.
The data format is as follows.
Region code | Prefectures | Municipality | Total population | 123 pieces(Omission) | Number of children in nursery schools, etc. |
---|---|---|---|---|---|
R01100 | Hokkaido | Sapporo | 1952356 | ・ ・ ・ | 25742 |
For the omitted part, refer to the explanation below as appropriate.
If you want to extract only the data (row) of a certain prefecture, use the grep command. The Grep command is a command that displays a ** line ** that contains the specified string or pattern.
Let's extract only the data of "Wakayama prefecture".
# grep "Character string to search"File
grep "Wakayama Prefecture" data/ssdse-2019a.csv
[test@srv1 testAwk]$ grep "Wakayama Prefecture" data/ssdse-2019a.csv
R30201 Wakayama City, Wakayama Prefecture 364154 171215 192939 360592 169798 190794 44519 22757 21762 211753 102859 108894 105954 44446 61508 50822 19092 31730 2466 2727 4449 7627 7938 153089 152798 355502 91995 48369 37387 20594 22322 1819 727 20884 14 658 1751 1435 618 1554 108 1378 159 21 2628 14298 177281 326 10259 25874 1139 2240 11469 34092 5855 4467 4079 15176 6481 10411 25096 882 11457 7907 392 36138 140751 2173 1478 95.6 11.6 152643733 63453773 151658392 65642801 20551722 13750658 0 28 3665 54 1131 17614 25 688 10313 14 12329 162655 90595 72060 8131 5266 2865 132002 43765 88237 3023 37094 114600 1 2 373593 43481 3094 1519 55 34 443 235 1616 335 1172 58 6830
(Omitted below)
Use the previously executed command in the redirect. This time, I will try to find the move-in ratio from the information of the move-out person and the move-in person.
I want to extract only the information of the move-out person and the move-in person (FY2017), so I use the awk command. The awk command is a command that formats the data divided by the specified delimiter.
The default is to split spaces ("") as delimiters. At the time of output, it is possible to specify a specific item such as "{print $ 1}" and output.
This time, the information to be output is the number of in-migrants, the number of out-migrants, and the number of in-migrants-the number of out-migrants. The third output outputs the difference calculation result.
Output number | Data column number | Item name or formula | Remarks |
---|---|---|---|
1 | 3 | Municipal name | |
2 | 25 | Number of transferees | |
3 | 26 | Number of out-migrants | |
4 | - | Number of transferees-Number of out-migrants | 正数:Number of out-migrants以上のNumber of transferees 負数:Number of transferees以上のNumber of out-migrants |
# awk -F'[Delimiter(Multiple)]' '{awk command}' file
[test@srv1 testAwk]$ grep "Wakayama Prefecture" data/ssdse-2019a.csv | awk '{print $3,$25,$26,$25-$26}'
Wakayama City 7627 7938-311
Hainan City 1042 1324-282
Hashimoto City 1260 1560-300
Arida City 469 713-244
Gobo City 632 787-155
Tanabe City 1691 2216-525
Shingu City 815 1068-253
Kinokawa City 1278 1689-411
Iwade City 1720 1753-33
Kimino Town 154 225-71
Katsuragi Town 360 430-70
Kudoyama Town 91 112-21
Koya Town 88 168-80
Yuasa Town 223 313-90
Hirokawa Town 142 202-60
Aridagawa Town 611 593 18
Mihama Town 275 289-14
Hidaka Town 260 239 21
Yura Town 99 169-70
Inami Town 157 204-47
Minabe Town 196 345-149
Hidakagawa Town 270 289-19
Shirahama Town 691 749-58
Kamitonda Town 636 510 126
Susami Town 101 123-22
Nachikatsuura Town 319 527-208
Taiji Town 107 127-20
Kozagawa Town 74 66 8
Kitayama Village 24 18 6
Kushimoto Town 430 584-154
It also uses the previously executed command for redirects.
To sort, use the sort command. The sort command is a command that sorts the data divided by the specified delimiter. This time, set the following options.
option | argument | Description |
---|---|---|
-k | 4 | Item number that serves as a sort standard |
-n | Compare sort criteria as numbers | |
-r | Display in descending order |
[test@srv1 testAwk]$ grep "Wakayama Prefecture" data/ssdse-2019a.csv | awk '{print $3,$25,$26,$25-$26}' | sort -k 4 -n -r
Kamitonda Town 636 510 126
Hidaka Town 260 239 21
Aridagawa Town 611 593 18
Kozagawa Town 74 66 8
Kitayama Village 24 18 6
Mihama Town 275 289-14
Hidakagawa Town 270 289-19
Taiji Town 107 127-20
Kudoyama Town 91 112-21
Susami Town 101 123-22
Iwade City 1720 1753-33
Inami Town 157 204-47
Shirahama Town 691 749-58
Hirokawa Town 142 202-60
Yura Town 99 169-70
Katsuragi Town 360 430-70
Kimino Town 154 225-71
Koya Town 88 168-80
Yuasa Town 223 313-90
Minabe Town 196 345-149
Kushimoto Town 430 584-154
Gobo City 632 787-155
Nachikatsuura Town 319 527-208
Arida City 469 713-244
Shingu City 815 1068-253
Hainan City 1042 1324-282
Hashimoto City 1260 1560-300
Wakayama City 7627 7938-311
Kinokawa City 1278 1689-411
Tanabe City 1691 2216-525
I was able to process simple data on a Linux server. There are more out-migrants than I expected. I would like to know the reason why Kamitonda Town is increasing by far.
Recommended Posts