Learning record No. 21 (25th day)

Learning record (25th day)

Start studying: Saturday, December 7th

Teaching materials, etc .: ・ Miyuki Oshige "Details! Python3 Introductory Note ”(Sotec, 2017): 12/7 (Sat) -12/19 (Thu) read ・ Progate Python course (5 courses in total): 12/19 (Thursday) -12/21 (Saturday) end ・ Andreas C. Müller, Sarah Guido "(Japanese title) Machine learning starting with Python" (O'Reilly Japan, 2017): 12/21 (Sat) -December 23 (Sat) ・ Kaggle: Real or Not? NLP with Disaster Tweets: Posted on Saturday, December 28th to Friday, January 3rd Adjustment ・ ** Wes Mckinney "(Japanese title) Introduction to data analysis by Python" (O'Reilly Japan, 2018) **: 1/4 (Wednesday) to 1/13 (Monday) read

"Introduction to Data Analysis with Python"

Read on January 13th

Chapter 11 Time Series Data

-Any data observed at a certain point in time constitutes a time series. Examples of characterization: time stamps, fixed periods, sense of time, etc. The method changes depending on what it is applied to. pandas offers many tools for time series. It is effective for finance and log data analysis.

-Datetime, time, calendar module You can specify the format with str or strftime. % Y is a 4-digit year,% y is a 2-digit year, etc. Use it like datetime.strftime ('% Y-% m-% d').

-Index reference If you use date ['2000'], you can refer to the data of the corresponding date. Generation by specifying a range date_range Data movement You can also move by specifying shift and offset.

・ Most of the time series are handled by Coordinated Universal Time UTC. Get and generate a timezone object with pytz.timezone Localize with tz_localize and convert to another timezone with tz_convert. You can also specify the time zone when generating the timestamp.

-Time series frequency can be converted. Use the resample method. Downsampling to aggregate to less frequent data, vice versa resample ('5min', closed = XXX), closed determines which of the left and right is the closed interval (not included in the value). OHLC (Open-High-Low-Close) function, open price, close price, highest price, lowest price can be aggregated.   -Window function: Weights that decrease exponentially are applied to the data. A function that is 0 except for a certain finite interval. Helps reduce noise and gap data. You can apply your own functions by rolling, expanding, span, apply.

Chapter 12 pandas: Advanced Edition

・ Categorical of pandas There is a possibility that processing speed and memory usage can be improved by utilizing it.   -When performing a large amount of analysis using a specific data set, performance improvement can be obtained with categorical variables. Replacing columns in a data frame with categorical representations also saves a lot of memory.  astype('category')

-Category method addition, size relationship setting, deletion, etc.  add_categories, as_ordered,remove_categories

-When using a machine learning tool, etc., it may be necessary to convert to a dummy variable format. (One-hot encoding.) Expressed as 0 or 1. It can be converted with get_dummies.

-Groupby can perform common processing for specified elements. You can do the same with transform using a lambda expression, like lambda x: x.mean ().  df.transform(lambda x:x.mean()) Group calculation is also possible by utilizing transform normalized = (df ['A'] --b.transform ('mean')) / b.transform ('std') etc. Aggregation for each group may occur multiple times, or the benefits of vector operations outweigh the overall benefits.

Chapter 13 Introduction to Modeling Library in Python

-The point of contact between pandas and the analysis library is usually a NumPy array. Use the .value attribute to convert a data frame to NumPy. (Becomes an ndarray.)  data.values When returning, pass a two-dimensional ndarray and specify the column name.  pd.DataFrame(data.values, columns=['one', 'two', 'three']

-When using only a part of the column It is better to use values while referring to the index with loc.  model_cols = ['x0', 'x1']  data.loc[:, model_cols].values Now you can extract only ** x0, x1 ** of ** all rows ** with array.

Replace some with dummy variables


dummies = pd.get_dummies(data.category, prefix='category')
data_with_dummies = data.drop('category', axis=1).join(dummies)

#Create a dummy, delete the original column with drop, and add it with join.

Recommended Posts

Learning record No. 21 (25th day)
Learning record No. 10 (14th day)
Learning record No. 24 (28th day)
Learning record No. 23 (27th day)
Learning record No. 25 (29th day)
Learning record No. 26 (30th day)
Learning record No. 20 (24th day)
Learning record No. 14 (18th day) Kaggle4
Learning record No. 15 (19th day) Kaggle5
Learning record 4 (8th day)
Learning record 9 (13th day)
Learning record 3 (7th day)
Learning record 5 (9th day)
Learning record 6 (10th day)
Learning record 8 (12th day)
Learning record 1 (4th day)
Learning record 7 (11th day)
Learning record 2 (6th day)
Learning record 16 (20th day)
Learning record 22 (26th day)
Learning record 13 (17th day) Kaggle3
Learning record No. 17 (21st day)
Learning record 12 (16th day) Kaggle2
Learning record No. 18 (22nd day)
Learning record No. 19 (23rd day)
Learning record No. 29 (33rd day)
Learning record No. 28 (32nd day)
Learning record No. 27 (31st day)
Learning record 11 (15th day) Kaggle participation
Programming learning record day 2
Learning record
Learning record # 3
Learning record # 1
Learning record # 2
Python learning day 4
Learning record (2nd day) Scraping by #BeautifulSoup
Learning record (4th day) #How to get the absolute path from the relative path
Learning record so far
Go language learning record
Linux learning record ① Plan
Effective Python Learning Memorandum Day 15 [15/100]
<Course> Deep Learning: Day2 CNN
Effective Python Learning Memorandum Day 6 [6/100]
Effective Python Learning Memorandum Day 12 [12/100]
Effective Python Learning Memorandum Day 9 [9/100]
Effective Python Learning Memorandum Day 8 [8/100]
Learning record (3rd day) #CSS selector description method #BeautifulSoup scraping
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Learning record (6th day) #Set type #Dictionary type #Mutual conversion of list tuple set #ndarray type #Pandas (DataFrame type)
Effective Python Learning Memorandum Day 14 [14/100]
Effective Python Learning Memorandum Day 1 [1/100]
Subjects> Deep Learning: Day3 RNN
Rabbit Challenge Deep Learning 2Day
Effective Python Learning Memorandum Day 13 [13/100]
Effective Python Learning Memorandum Day 3 [3/100]
Effective Python Learning Memorandum Day 5 [5/100]
Effective Python Learning Memorandum Day 4 [4/100]
Effective Python Learning Memorandum Day 7 [7/100]
Effective Python Learning Memorandum Day 2 [2/100]
Thoroughly study Deep Learning [DW Day 0]