"GCI Data Scientist Training Course" is offered by the University of Tokyo (Matsuo Laboratory) "* Practical data The contents of the exercise part are published in Jupyter NoteBook format (CC-BY-NC-ND) in the Scientist Training Course and Deep Learning Course </ u> * ". Chapter 8 is "** Database and SQL Basics **", where you will learn how to operate a relational database management system. Instead of the "Like" button for the valuable and wonderful teaching materials that you can learn in Japanese, I will post the answers you have solved. Please point out any mistakes.
I didn't know how to run MariaDB on Jupyter Notebook, so I installed MySQL on WSL and entered commands on WSL Terminal.
8.1.2 RDBMS
** <Practice 1> ** Add a cell (press + in the upper left) to display the existing database. Then create a new database (such as TEST2) and choose to use that database.
** <Practice 2> ** Create a new table "meibo2" in the database created and selected in Exercise 1. After creating it, make sure that the table is complete.
** <Practice 3> ** Let's add data to the newly created table. After adding, please make sure that the data is included.
/*Exercise 1*/
create database TEST2;
use TEST2;
/*Exercise 2*/
create table meibo2
(id int primary key, name varchar(20))
engine = MyISAM
default charset = utf8;
show tables;
/*Exercise 3*/
insert into meibo2 (id, name) values (1, "Yamada");
select * from meibo2;
+-----------------+
| Tables_in_TEST2 |
+-----------------+
| meibo2 |
+-----------------+
1 row in set (0.00 sec)
+----+--------+
| id | name |
+----+--------+
| 1 | Yamada |
+----+--------+
1 row in set (0.00 sec)
** [Let's try] ** Let's search the data by changing the above conditional expression in various ways. For example, how do you search for people whose id is other than 1 or whose name ends with a?
select * from meibo where name like '%a';
+----+--------+
| id | name |
+----+--------+
| 1 | Yamada |
| 2 | Tanaka |
+----+--------+
2 rows in set (0.00 sec)
** <Practice 1> ** Select the database of TEST1 and extract the person record with id = 4 in the above table meibo.
select * from meibo where id=4;
+----+------+-------+------+
| id | name | class | age |
+----+------+-------+------+
| 4 | Kato | 2 | 15 |
+----+------+-------+------+
1 row in set (0.00 sec)
** <Practice 2> ** Update the class of the person with id = 8 to 7 in the meibo table above. If you can confirm the update with the select statement, set the class of the person with id = 8 back to 1.
select * from meibo where id=8;
update meibo set class=7 where id=8;
select * from meibo where id=8;
update meibo set class=1 where id=8;
select * from meibo where id=8;
+----+------+-------+------+
| id | name | class | age |
+----+------+-------+------+
| 8 | Sato | 1 | 14 |
+----+------+-------+------+
1 row in set (0.00 sec)
+----+------+-------+------+
| id | name | class | age |
+----+------+-------+------+
| 8 | Sato | 7 | 14 |
+----+------+-------+------+
1 row in set (0.00 sec)
+----+------+-------+------+
| id | name | class | age |
+----+------+-------+------+
| 8 | Sato | 1 | 14 |
+----+------+-------+------+
1 row in set (0.00 sec)
** <Practice question 3> (* Required question: Used in subsequent questions.) ** Add a new column height to the same table meibo as above. In addition, please update with 150 for people with id = 1 to 4, 155 for people with id = 5 to 6, and 160 for people with id = 7 to 8.
alter table meibo add height int;
update meibo set height=150 where id=1;
update meibo set height=150 where id=2;
update meibo set height=150 where id=3;
update meibo set height=150 where id=4;
update meibo set height=155 where id=5;
update meibo set height=155 where id=6;
update meibo set height=160 where id=7;
update meibo set height=160 where id=8;
select * from meibo;
+----+----------+-------+------+--------+
| id | name | class | age | height |
+----+----------+-------+------+--------+
| 1 | Yamada | 1 | 14 | 150 |
| 2 | Tanaka | 2 | 13 | 150 |
| 3 | Suzuki | 1 | 13 | 150 |
| 4 | Kato | 2 | 15 | 150 |
| 5 | Ito | 3 | 12 | 155 |
| 6 | Takeuchi | 2 | 16 | 155 |
| 7 | Kimura | 3 | 11 | 160 |
| 8 | Sato | 1 | 14 | 160 |
+----+----------+-------+------+--------+
8 rows in set (0.00 sec)
** <Practice 1> ** How many different ages are there on the same meibo table as above?
select count(distinct age) as ageCnt from meibo;
+--------+
| ageCnt |
+--------+
| 6 |
+--------+
1 row in set (0.00 sec)
** <Practice 2> ** Find the heights of the shortest and tallest people.
select min(height) as minHeight, max(height) as maxHeight from meibo;
+-----------+-----------+
| minHeight | maxHeight |
+-----------+-----------+
| 150 | 160 |
+-----------+-----------+
1 row in set (0.00 sec)
** <Practice 3> ** Extract records that are taller than 155 and have class 3.
select * from meibo where height>=155 and class=3;
+----+--------+-------+------+--------+
| id | name | class | age | height |
+----+--------+-------+------+--------+
| 5 | Ito | 3 | 12 | 155 |
| 7 | Kimura | 3 | 11 | 160 |
+----+--------+-------+------+--------+
2 rows in set (0.00 sec)
** <Practice 1> ** Find the average height for each class at the meibo table.
select class, avg(height) as avgHeight from meibo group by class;
| class | avgHeight |
+-------+-----------+
| 1 | 153.3333 |
| 2 | 151.6667 |
| 3 | 157.5000 |
+-------+-----------+
3 rows in set (0.00 sec)
** <Practice 2> ** In addition to the above, let's calculate the number of people in each class, the height of the smallest person and the height of the largest person in each class.
select class, avg(height) as avgHeight, min(height) as minHeight, max(height) as maxHeight from meibo group by class;
+-------+-----------+-----------+-----------+
| class | avgHeight | minHeight | maxHeight |
+-------+-----------+-----------+-----------+
| 1 | 153.3333 | 150 | 160 |
| 2 | 151.6667 | 150 | 155 |
| 3 | 157.5000 | 155 | 160 |
+-------+-----------+-----------+-----------+
3 rows in set (0.00 sec)
** <Practice 3> ** Ask for the same items as above, focusing on those over 13 years of age.
select class, avg(height) as avgHeight, min(height) as minHeight, max(height) as maxHeight from meibo where age>13 group by class;
+-------+-----------+-----------+-----------+
| class | avgHeight | minHeight | maxHeight |
+-------+-----------+-----------+-----------+
| 1 | 155.0000 | 150 | 160 |
| 2 | 152.5000 | 150 | 155 |
+-------+-----------+-----------+-----------+
2 rows in set (0.00 sec)
** <Practice 1> ** Create a new table called cardtb. However, the columns should be id, point, money (all int). In addition, insert the following data. (id,point,money) = (1,100,1000),(2,NULL,60),(3,50,500),(4,30,600),(5,10,10),(6,NULL,40),(7,100,1000),(8,2000,100)
create table cardtb
(id int primary key, point int, money int)
engine = MyISAM
default charset = utf8;
insert into cardtb (id,point,money)
values (1,100,1000),(2,NULL,60),(3,50,500),(4,30,600),(5,10,10),(6,NULL,40),(7,100,1000),(8,2000,100);
select * from cardtb;
+----+-------+-------+
| id | point | money |
+----+-------+-------+
| 1 | 100 | 1000 |
| 2 | NULL | 60 |
| 3 | 50 | 500 |
| 4 | 30 | 600 |
| 5 | 10 | 10 |
| 6 | NULL | 40 |
| 7 | 100 | 1000 |
| 8 | 2000 | 100 |
+----+-------+-------+
8 rows in set (0.00 sec)
** <Practice 2> ** Use id as a key to join the above table internally to the meibo table.
select * from meibo as a join cardtb as b on a.id=b.id;
+----+----------+-------+------+--------+----+-------+-------+
| id | name | class | age | height | id | point | money |
+----+----------+-------+------+--------+----+-------+-------+
| 1 | Yamada | 1 | 14 | 150 | 1 | 100 | 1000 |
| 2 | Tanaka | 2 | 13 | 150 | 2 | NULL | 60 |
| 3 | Suzuki | 1 | 13 | 150 | 3 | 50 | 500 |
| 4 | Kato | 2 | 15 | 150 | 4 | 30 | 600 |
| 5 | Ito | 3 | 12 | 155 | 5 | 10 | 10 |
| 6 | Takeuchi | 2 | 16 | 155 | 6 | NULL | 40 |
| 7 | Kimura | 3 | 11 | 160 | 7 | 100 | 1000 |
| 8 | Sato | 1 | 14 | 160 | 8 | 2000 | 100 |
+----+----------+-------+------+--------+----+-------+-------+
8 rows in set (0.00 sec)
** <Practice 3> ** Use id as a key in the meibo table to outer join the above table.
/*The output result is the same as Exercise 2*/
select * from meibo as a left join cardtb as b on a.id=b.id;
/* select * from meibo as a right join cardtb as b on a.id=b.id; */
+----+----------+-------+------+--------+------+-------+-------+
| id | name | class | age | height | id | point | money |
+----+----------+-------+------+--------+------+-------+-------+
| 1 | Yamada | 1 | 14 | 150 | 1 | 100 | 1000 |
| 2 | Tanaka | 2 | 13 | 150 | 2 | NULL | 60 |
| 3 | Suzuki | 1 | 13 | 150 | 3 | 50 | 500 |
| 4 | Kato | 2 | 15 | 150 | 4 | 30 | 600 |
| 5 | Ito | 3 | 12 | 155 | 5 | 10 | 10 |
| 6 | Takeuchi | 2 | 16 | 155 | 6 | NULL | 40 |
| 7 | Kimura | 3 | 11 | 160 | 7 | 100 | 1000 |
| 8 | Sato | 1 | 14 | 160 | 8 | 2000 | 100 |
+----+----------+-------+------+--------+------+-------+-------+
8 rows in set (0.00 sec)
** <Practice 1> ** In the meibo table, if the height is less than 155, name it "below_155", if it is exactly 155, name it "equal_155", and if it is greater than 155, name it "over_155" to display the table.
select *
,case
when height<155 then "below_155"
when height=155 then "equal_155"
when height>155 then "over_155"
else "others"
end as heightLevel
from meibo;
+----+----------+-------+------+--------+-------------+
| id | name | class | age | height | heightLevel |
+----+----------+-------+------+--------+-------------+
| 1 | Yamada | 1 | 14 | 150 | below_155 |
| 2 | Tanaka | 2 | 13 | 150 | below_155 |
| 3 | Suzuki | 1 | 13 | 150 | below_155 |
| 4 | Kato | 2 | 15 | 150 | below_155 |
| 5 | Ito | 3 | 12 | 155 | equal_155 |
| 6 | Takeuchi | 2 | 16 | 155 | equal_155 |
| 7 | Kimura | 3 | 11 | 160 | over_155 |
| 8 | Sato | 1 | 14 | 160 | over_155 |
+----+----------+-------+------+--------+-------------+
8 rows in set (0.00 sec)
** <Practice 2> ** Use the table and subquery concept above to find the number of people at each heightLevel.
select a.heightLevel, count(*) from
(select *
,case
when height<155 then "below_155"
when height=155 then "equal_155"
when height>155 then "over_155"
else "others"
end as heightLevel
from meibo) a group by a.heightLevel;
+-------------+----------+
| heightLevel | count(*) |
+-------------+----------+
| below_155 | 4 |
| equal_155 | 2 |
| over_155 | 2 |
+-------------+----------+
3 rows in set (0.00 sec)
** <Practice 3> ** Use the table and subquery ideas above to find the average age of each heightLevel.
select a.heightLevel, avg(age) from
(select *
,case
when height<155 then "below_155"
when height=155 then "equal_155"
when height>155 then "over_155"
else "others"
end as heightLevel
from meibo) a group by a.heightLevel;
+-------------+----------+
| heightLevel | avg(age) |
+-------------+----------+
| below_155 | 13.7500 |
| equal_155 | 14.0000 |
| over_155 | 12.5000 |
+-------------+----------+
3 rows in set (0.01 sec)
** <Practice 1> ** Let's take the id and name from table meibo2 and create v3 of the view.
create view v3 as select id, name from meibo2;
select * from v3;
+----+----------+
| id | name |
+----+----------+
| 1 | Yamada |
| 9 | Nagata |
| 10 | Sugino |
| 11 | Takayama |
| 12 | John |
+----+----------+
5 rows in set (0.00 sec)
** <Practice 2> ** Let's create a view v4 that internally joins the meibo table and the score table to retrieve the id and name.
create table TEST3 as select c.* from
(select a.* from meibo a join score b on a.id=b.id) c;
create view v4 as select id, name from TEST3;
select * from v4;
+----+----------+
| id | name |
+----+----------+
| 1 | Yamada |
| 2 | Tanaka |
| 3 | Suzuki |
| 5 | Ito |
| 6 | Takeuchi |
| 8 | Sato |
+----+----------+
6 rows in set (0.00 sec)
** <Practice 3> ** Let's add new data to the table created above. What happens?
insert into TEST3 (id) values (11);
insert into TEST3 (id, name) values (12, "NANASHI");
select * from TEST3;
select * from v4;
+----+----------+-------+------+--------+
| id | name | class | age | height |
+----+----------+-------+------+--------+
| 1 | Yamada | 1 | 14 | 150 |
| 2 | Tanaka | 2 | 13 | 150 |
| 3 | Suzuki | 1 | 13 | 150 |
| 5 | Ito | 3 | 12 | 155 |
| 6 | Takeuchi | 2 | 16 | 155 |
| 8 | Sato | 1 | 14 | 160 |
| 11 | NULL | NULL | NULL | NULL |
| 12 | NANASHI | NULL | NULL | NULL |
+----+----------+-------+------+--------+
8 rows in set (0.00 sec)
+----+----------+
| id | name |
+----+----------+
| 1 | Yamada |
| 2 | Tanaka |
| 3 | Suzuki |
| 5 | Ito |
| 6 | Takeuchi |
| 8 | Sato |
| 11 | NULL |
| 12 | NANASHI |
+----+----------+
8 rows in set (0.00 sec)
** 8.4.1 Comprehensive Problem 1 ** Use the database below to answer the following questions. (Please also refer to the notes at the following URL.) https://dev.mysql.com/doc/world-setup/en/
/// Omitted ///
(1) Select database world to see what tables are there and what columns each has.
#I don't know how to access the host server called ZDB, so ...
#For WSL, enter the following command from Terminal
cd ~
wget http://downloads.mysql.com/docs/world.sql.gz
gzip -d world.sql.gz
mysql -h"localhost" -u root -p < world.sql
/* (1) */
use world;
show tables;
show columns from city;
show columns from country;
show columns from countrylanguage;
+-----------------+
| Tables_in_world |
+-----------------+
| city |
| country |
| countrylanguage |
+-----------------+
3 rows in set (0.01 sec)
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| CountryCode | char(3) | NO | MUL | | |
| District | char(20) | NO | | | |
| Population | int(11) | NO | | 0 | |
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
+----------------+---------------------------------------------------------------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------------------------------------------------------------------------------+------+-----+---------+-------+
| Code | char(3) | NO | PRI | | |
| Name | char(52) | NO | | | |
| Continent | enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') | NO | | Asia | |
| Region | char(26) | NO | | | |
| SurfaceArea | decimal(10,2) | NO | | 0.00 | |
| IndepYear | smallint(6) | YES | | NULL | |
| Population | int(11) | NO | | 0 | |
| LifeExpectancy | decimal(3,1) | YES | | NULL | |
| GNP | decimal(10,2) | YES | | NULL | |
| GNPOld | decimal(10,2) | YES | | NULL | |
| LocalName | char(45) | NO | | | |
| GovernmentForm | char(45) | NO | | | |
| HeadOfState | char(60) | YES | | NULL | |
| Capital | int(11) | YES | | NULL | |
| Code2 | char(2) | NO | | | |
+----------------+---------------------------------------------------------------------------------------+------+-----+---------+-------+
15 rows in set (0.00 sec)
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3) | NO | PRI | | |
| Language | char(30) | NO | PRI | | |
| IsOfficial | enum('T','F') | NO | | F | |
| Percentage | decimal(4,1) | NO | | 0.0 | |
+-------------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
(2) Display 5 rows for each table. Please check lightly what kind of data you have.
/* (2) */
select * from city limit 5;
select * from country limit 5;
select * from countrylanguage limit 5;
+----+----------------+-------------+---------------+------------+
| ID | Name | CountryCode | District | Population |
+----+----------------+-------------+---------------+------------+
| 1 | Kabul | AFG | Kabol | 1780000 |
| 2 | Qandahar | AFG | Qandahar | 237500 |
| 3 | Herat | AFG | Herat | 186800 |
| 4 | Mazar-e-Sharif | AFG | Balkh | 127800 |
| 5 | Amsterdam | NLD | Noord-Holland | 731200 |
+----+----------------+-------------+---------------+------------+
5 rows in set (0.00 sec)
+------+-------------+---------------+---------------------------+-------------+-----------+------------+----------------+---------+---------+-----------------------+----------------------------------------------+--------------------------+---------+-------+
| Code | Name | Continent | Region | SurfaceArea | IndepYear | Population | LifeExpectancy | GNP | GNPOld | LocalName | GovernmentForm | HeadOfState | Capital | Code2 |
+------+-------------+---------------+---------------------------+-------------+-----------+------------+----------------+---------+---------+-----------------------+----------------------------------------------+--------------------------+---------+-------+
| ABW | Aruba | North America | Caribbean | 193.00 | NULL | 103000 | 78.4 | 828.00 | 793.00 | Aruba | Nonmetropolitan Territory of The Netherlands | Beatrix | 129 | AW |
| AFG | Afghanistan | Asia | Southern and Central Asia | 652090.00 | 1919 | 22720000 | 45.9 | 5976.00 | NULL | Afganistan/Afqanestan | Islamic Emirate | Mohammad Omar | 1 | AF |
| AGO | Angola | Africa | Central Africa | 1246700.00 | 1975 | 12878000 | 38.3 | 6648.00 | 7984.00 | Angola | Republic | José Eduardo dos Santos | 56 | AO |
| AIA | Anguilla | North America | Caribbean | 96.00 | NULL | 8000 | 76.1 | 63.20 | NULL | Anguilla | Dependent Territory of the UK | Elisabeth II | 62 | AI |
| ALB | Albania | Europe | Southern Europe | 28748.00 | 1912 | 3401200 | 71.6 | 3205.00 | 2500.00 | Shqipëria | Republic | Rexhep Mejdani | 34 | AL |
+------+-------------+---------------+---------------------------+-------------+-----------+------------+----------------+---------+---------+-----------------------+----------------------------------------------+--------------------------+---------+-------+
5 rows in set (0.00 sec)
+-------------+------------+------------+------------+
| CountryCode | Language | IsOfficial | Percentage |
+-------------+------------+------------+------------+
| ABW | Dutch | T | 5.3 |
| ABW | English | F | 9.5 |
| ABW | Papiamento | F | 76.7 |
| ABW | Spanish | F | 7.4 |
| AFG | Balochi | F | 0.9 |
+-------------+------------+------------+------------+
5 rows in set (0.00 sec)
(3) Count the number of records in the city table. Also, check for duplicate data.
/* (3) */
select count(*) from city;
select count(distinct id) from city;
select count(distinct name, countrycode, district) from city;
+----------+
| count(*) |
+----------+
| 4079 |
+----------+
1 row in set (0.00 sec)
+---------------------+
| count( distinct id) |
+---------------------+
| 4079 |
+---------------------+
1 row in set (0.00 sec)
+---------------------------------------------+
| count(distinct name, countrycode, district) |
+---------------------------------------------+
| 4078 |
+---------------------------------------------+
1 row in set (0.02 sec)
** 8.4.2 Comprehensive Problem 2 ** Answer the following questions using the same database as above.
(1) For the countrylanguage table, calculate the number of languages (only those in the table, without duplication) around each CountryCode. Also, join the result to the city table to display all the city columns and the number of their languages. However, you only need to display the result in 5 lines.
create view v1 as select CountryCode, count(distinct Language) from countrylanguage group by CountryCode;
select * from
city a
join
v1 b
on a.CountryCode = b.CountryCode
limit 5;
+-----+----------------+-------------+----------+------------+-------------+--------------------------+
| ID | Name | CountryCode | District | Population | CountryCode | count(distinct Language) |
+-----+----------------+-------------+----------+------------+-------------+--------------------------+
| 129 | Oranjestad | ABW | – | 29034 | ABW | 4 |
| 1 | Kabul | AFG | Kabol | 1780000 | AFG | 5 |
| 2 | Qandahar | AFG | Qandahar | 237500 | AFG | 5 |
| 3 | Herat | AFG | Herat | 186800 | AFG | 5 |
| 4 | Mazar-e-Sharif | AFG | Balkh | 127800 | AFG | 5 |
+-----+----------------+-------------+----------+------------+-------------+--------------------------+
5 rows in set (0.00 sec)
(2) For the countrylanguage table, add up the Percentages with each CountryCode as the axis. However, please display only those whose total value exceeds 99.9.
create view v2 as select CountryCode, sum(Percentage) from countrylanguage group by CountryCode;
select CountryCode, `sum(Percentage)` from v2 where `sum(Percentage)` > 99.9;
+-------------+-----------------+
| CountryCode | sum(Percentage) |
+-------------+-----------------+
| BHS | 100.0 |
| BMU | 100.0 |
| BTN | 100.0 |
| CHL | 100.0 |
| CPV | 100.0 |
| CRI | 100.0 |
| CUB | 100.0 |
| DMA | 100.0 |
| DOM | 100.0 |
| DZA | 100.0 |
| ECU | 100.0 |
| ESH | 100.0 |
| FRO | 100.0 |
| GRD | 100.0 |
| GRL | 100.0 |
| GUY | 100.0 |
| HTI | 100.0 |
| IRL | 100.0 |
| JPN | 100.0 |
| KNA | 100.0 |
| KOR | 100.0 |
| LCA | 100.0 |
| LSO | 100.0 |
| MDV | 100.0 |
| NLD | 101.0 |
| POL | 100.0 |
| PRK | 100.0 |
| PSE | 100.0 |
| RWA | 100.0 |
| SLV | 100.0 |
| SMR | 100.0 |
| TUV | 100.0 |
| WSM | 100.1 |
+-------------+-----------------+
33 rows in set (0.00 sec)
(3) For the city table, divide the district initials into four patterns, one that starts with A, one that starts with B, one that starts with C, and others, and count the number of records for each.
select a.Initial, count(a.Initial) from
(select *
,case
when left(Name, 1) = "A" then "Initial A"
when left(Name, 1) = "B" then "Initial B"
when left(Name, 1) = "C" then "Initial C"
else "others"
end as "Initial"
from city) a group by Initial;
+-----------+------------------+
| Initial | count(a.Initial) |
+-----------+------------------+
| Initial A | 260 |
| Initial B | 317 |
| Initial C | 281 |
| others | 3221 |
+-----------+------------------+
4 rows in set (0.02 sec)
Recommended Posts