Voici ce qu'il faut faire lorsque vous voulez un tableau avec l'année, la région, la sous-région, le pays, le code du pays et la population.
LocID,Location,Time,PopTotal,SubRegName,GeoRegName
392,Japan,2019,126860299,Eastern Asia,Asia
Vous pouvez télécharger les données requises sur https://population.un.org/wpp/.
Le contenu de WPP2019_TotalPopulationBySex.csv ressemble à ceci.
LocID,Location,VarID,Variant,Time,MidPeriod,PopMale,PopFemale,PopTotal,PopDensity
...
392,Japan,2,Medium,1950,1950.5,40602.499,42199.585,82802.084,227.132
392,Japan,2,Medium,1951,1951.5,41380.556,42935.709,84316.265,231.285
En plus du nom du pays, les éléments LocID et Location incluent également des noms de région tels que Asie et Asie de l'Est, de sorte qu'ils ne peuvent pas être utilisés tels quels. Reportez-vous à WPP2019_F01_LOCATIONS.XLSX pour vous aligner uniquement sur les pays, à l'exception des noms de région. Dans ce dossier
Il y a trois feuilles, dont la base de données est simplement lisible par machine. Par exemple, l'élément du Japon
Index, Location, Notes, LocID, ISO3_Code, LocType, LocTypeName, ParentID, WorldID, SubRegID, SubRegName, SDGSubRegID, SDGSubRegName, SDGRegID, SDGRegName, GeoRegID, GeoRegName
133, Japan, 392, JPN, 4, Country/Area, 906, 900, 906, Eastern Asia, 1832, Eastern and South-Eastern Asia, 935, Asia
est. Vous pouvez donc voir ce qui suit.
(Lien vers le bloc-notes: https://colab.research.google.com/drive/160xZ5tAGKb1enC0LU2JYEOA6m3l3w1cn?usp=sharing)
Après avoir étudié jusqu'à ce point, le travail est enfin commencé. Commencez par charger WPP2019_TotalPopulationBySex.csv.
import pandas as pd
population_src = pd.read_csv("WPP2019_TotalPopulationBySex.csv")
population_src.head()
LocID | Location | VarID | Variant | Time | MidPeriod | PopMale | PopFemale | PopTotal | PopDensity | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | Afghanistan | 2 | Medium | 1950 | 1950.5 | 4099.243 | 3652.874 | 7752.117 | 11.874 |
1 | 4 | Afghanistan | 2 | Medium | 1951 | 1951.5 | 4134.756 | 3705.395 | 7840.151 | 12.009 |
2 | 4 | Afghanistan | 2 | Medium | 1952 | 1952.5 | 4174.450 | 3761.546 | 7935.996 | 12.156 |
3 | 4 | Afghanistan | 2 | Medium | 1953 | 1953.5 | 4218.336 | 3821.348 | 8039.684 | 12.315 |
4 | 4 | Afghanistan | 2 | Medium | 1954 | 1954.5 | 4266.484 | 3884.832 | 8151.316 | 12.486 |
Extrayez uniquement les informations nécessaires.
population = population_src[population_src.Variant == "Medium"][["LocID", "Location", "Time", "PopTotal"]]
population["PopTotal"] = (population["PopTotal"] * 1000).astype(int)
population
LocID | Location | Time | PopTotal | |
---|---|---|---|---|
0 | 4 | Afghanistan | 1950 | 7752116 |
1 | 4 | Afghanistan | 1951 | 7840151 |
2 | 4 | Afghanistan | 1952 | 7935996 |
3 | 4 | Afghanistan | 1953 | 8039684 |
4 | 4 | Afghanistan | 1954 | 8151316 |
Chargement de WPP2019_F01_LOCATIONS.XLSX.
locations_src = pd.read_excel('WPP2019_F01_LOCATIONS.XLSX', sheet_name="DB")
locations_src.head()
Index | Location | Notes | LocID | ISO3_Code | LocType | LocTypeName | ParentID | WorldID | SubRegID | SubRegName | SDGSubRegID | SDGSubRegName | SDGRegID | SDGRegName | GeoRegID | GeoRegName | MoreDev | LessDev | LeastDev | oLessDev | LessDev_ExcludingChina | LLDC | SIDS | WB_HIC | WB_MIC | WB_UMIC | WB_LMIC | WB_LIC | WB_NoIncomeGroup | MaxHIV_Male | MaxHIV_Female | MaxHIV_BothSexes | YearMaxHIV_BothSexes | HIVAIDSMortalityImpact_AgePattern | HIVAIDSMortalityImpact_e0 | TotPop2019LessThan90k | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | WORLD | NaN | 900 | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 2 | UN development groups | a | 1803 | NaN | 25.0 | Label/Separator | 900 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 3 | More developed regions | b | 901 | NaN | 5.0 | Development group | 1803 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Less developed regions | c | 902 | NaN | 5.0 | Development group | 1803 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Least developed countries | d | 941 | NaN | 5.0 | Development group | 902 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Extrayez uniquement les informations nécessaires.
location = locations_src[locations_src.LocType == 4][["LocID", "SubRegName", "GeoRegName"]]
location.head()
LocID | SubRegName | GeoRegName | |
---|---|---|---|
26 | 108 | Eastern Africa | Africa |
27 | 174 | Eastern Africa | Africa |
28 | 262 | Eastern Africa | Africa |
29 | 232 | Eastern Africa | Africa |
30 | 231 | Eastern Africa | Africa |
Combinez les données démographiques et nationales.
population_by_countries = population.merge(location)
population_by_countries.head()
LocID | Location | Time | PopTotal | SubRegName | GeoRegName | |
---|---|---|---|---|---|---|
0 | 4 | Afghanistan | 1950 | 7752116 | Southern Asia | Asia |
1 | 4 | Afghanistan | 1951 | 7840151 | Southern Asia | Asia |
2 | 4 | Afghanistan | 1952 | 7935996 | Southern Asia | Asia |
3 | 4 | Afghanistan | 1953 | 8039684 | Southern Asia | Asia |
4 | 4 | Afghanistan | 1954 | 8151316 | Southern Asia | Asia |
Cherchons les données japonaises.
population_by_countries[(population_by_countries.Location == "Japan") & (population_by_countries.Time == 2019)]
LocID | Location | Time | PopTotal | SubRegName | GeoRegName | |
---|---|---|---|---|---|---|
16377 | 392 | Japan | 2019 | 126860299 | Eastern Asia | Asia |
Ça a l'air bien donc je vais le sauver.
population_by_countries.to_csv("population_by_countries.csv", index=False)
!head population_by_countries.csv
LocID,Location,Time,PopTotal,SubRegName,GeoRegName
4,Afghanistan,1950,7752116,Southern Asia,Asia
4,Afghanistan,1951,7840151,Southern Asia,Asia
4,Afghanistan,1952,7935996,Southern Asia,Asia
4,Afghanistan,1953,8039684,Southern Asia,Asia
4,Afghanistan,1954,8151316,Southern Asia,Asia
4,Afghanistan,1955,8270992,Southern Asia,Asia
4,Afghanistan,1956,8398873,Southern Asia,Asia
4,Afghanistan,1957,8535157,Southern Asia,Asia
4,Afghanistan,1958,8680097,Southern Asia,Asia
Donc, je vais enregistrer ici que j'ai passé beaucoup de temps juste à obtenir les données banales de la population mondiale.
Recommended Posts