I recently started studying Python. Since it's a big deal, I'd like to post a useful post for those who are just starting to touch Python like myself. Since I have never written Python, this time I would like to perform simple data analysis while understanding the syntax of python and what kind of library there is.
Reference book: [Ready to use! Can be practiced in business! How to make AI / machine learning / deep learning apps with Python](https://www.amazon.co.jp/%E3%81%99%E3%81%90% E3% 81% AB% E4% BD% BF% E3% 81% 88% E3% 82% 8B-% E6% A5% AD% E5% 8B% 99% E3% 81% A7% E5% AE% 9F% E8 % B7% B5% E3% 81% A7% E3% 81% 8D% E3% 82% 8B-Python% E3% 81% AB% E3% 82% 88% E3% 82% 8B-AI% E3% 83% BB % E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 83% BB% E6% B7% B1% E5% B1% A4% E5% AD% A6% E7 % BF% 92% E3% 82% A2% E3% 83% 97% E3% 83% AA% E3% 81% AE% E3% 81% A4% E3% 81% 8F% E3% 82% 8A% E6% 96 % B9-% E3% 82% AF% E3% 82% B8% E3% 83% A9% E9% A3% 9B% E8% A1% 8C% E6% 9C% BA / dp / 4802611641)
** Execution environment ** Google Colaboratory --You can access the python library without installing pip. ――It's free, and if you have a Google account, you can execute the code immediately without the trouble of installation etc.
** Library to use **
First of all, prepare the data necessary for the analysis.
To analyze the weather data, download the dataset from the following URL. You can download it directly, but let's install it using a library called urllib.
#Access a function called urlretrieve from urllib
from urllib.request import urlretrieve
#Prepare a variable called filename. File name tempreture.csv
filename = "tempreture.csv"
#Specify url
url = "https://raw.githubusercontent.com/kujirahand/mlearn-sample/master/tenki2006-2016/kion10y.csv"
#Read the url and the temperature described in the line above.Save the data as a file named csv
urlretrieve(url, filename)
Pandas is a library for efficient data analysis in Python. Pandas makes it easy for you to do data analysis tasks such as loading data, displaying statistics, and graphing.
#When importing and using pandas, it seems that it is common to write pd, so use pd.
import pandas as pd
#To see the contents of the csv file you got earlier, read_csv()Use the
pd.read_csv(filename)
As a result of running, I found that the data is 4018 rows x 6 columns.
#[Make the data of the past 10 years into a dictionary type and make it easy to program]
history = {}
#Get index and data for each row. Same as enumurate function in other languages
for i, row in df.iterrows():
#Substitute the monthly temperature into each variable
month, day, tempreture = (int(row['Month']), int(row['Day']), float(row['temperature']))
#key to "12"/Make it look like 25 "
key = str(month) + "/" + str(day)
#Judgment is made so that the same key is not duplicated
if not(key in history): history[key] = []
#If there is no duplication, add it to history
history[key] += [tempreture]
# [Find the average value]
average = {}
#Loop history and get key
for key in history:
#Link the calculated average value to the key and add it to average
average[key] = sum(history[key]) / len(history[key])
result = average[key]
# print("{0}: {1}".format(key, result))
import math
#function to check type(To accept only character strings)
def isString(date):
return type(date) is str
#Get the average value of the specified date from the dictionary type average
def getTempreture(date):
if isString(date):
return average[date]
tempreture = getTempreture("12/25")
value = round(tempreture)
#Type conversion int to string
print(str(value)+ "Degree")
#Import matplotlib to draw the graph
import matplotlib.pyplot as plt
#Processing to divide temperature data by month
tempreture_per_month = df.groupby(['Month'])['temperature']
#Sum the divided temperature data monthly and divide it by the number of data per month
average_tempreture = tempreture_per_month.sum() / tempreture_per_month.count()
#draw
average_tempreture.plot()
I was able to draw.
--It is recommended for beginners to start with Google Colaboratory because it saves the trouble of installation and other work. ――I was able to manipulate and draw data more easily than I expected, and I learned how great the python library is.
Since I touched python for the first time, there are still many things I do not understand, but I will continue learning so that I can gradually perform advanced analysis.