If there is any Pandas related content in Python learning, I will update it from time to time.
Pandas A library that provides functions to support data analysis
python
import pandas as pd
python
csv_test_1 = pd.read_csv('hoge.csv')
python
excel_data = pd.read_excel('hoge.xlsx')
python
csv_test_2 = pd.read_csv('hoge_2.csv')
csv_test = pd.concat([csv_test_1 , csv_test_2], ignore_index=True)
csv_test.head()
-When the item names of both tables to be joined are the same. Combine with ```on =" id "` `` as a condition.
Post-join table= pd.merge(Table 1,Table 2, on="Join item", how="Method")
#### **`python`**
```python
join_data = pd.merge(a_data, b_data[["id", "date", "customer"]], on="id", how="left")
join_data.head()
-When the item names of both tables to be joined are different. Combined with `left_on =" customer_name ", right_on =" customer name ""
`.
python
pd.merge(a_data, b_data, left_on="customer_name", right_on="Customer name", how="left")
python
pd.unique(test_data.item_name))
len(pd.unique(test_data.item_name))) #Number of unique data
python
test_data["a"] = pd.to_datetime(test_data["a"])
python
time_data["payment_month"] = time_data["payment_date"].dt.strftime("%Y%m")
python
pd.pivot_table(test_data, index='item_name', columns='payment_month', values=['price', 'quantity'], aggfunc='sum')
** ・ Pivot_table overview ** index: Specify a row columns: Specify columns values: Specify the values to be aggregated aggfunc: Specify the aggregation method
python
print(len(test_data)) #Display the number of data
python
csv_test_1.head()
python
csv_test_1["Column name"].head()
python
res = test_data.loc[flg_is_null, "item_name"]
python
test_data["new"] = test_data["a"] * test_data["b"]
python
test_data["a"].sum()
python
test_data.groupby("create_date").sum()["price"]
python
test_data.groupby(["create_date", "item_name"]).sum()[["price", "quantity"]]
python
test_data["a"].sum() == test_data["b"].sum()
python
test_data.isnull().sum()
python
test_data.isnull().any(axis=0)
python
test_data.describe()
python
test_data["create_date"].min()
test_data["create_date"].max()
python
test_data.dtypes
-The following various statistics can be displayed with describe (). Number of data (count), mean (mean), standard deviation (std), minimum (min), quartile (25%, 75%), median (50%), maximum (max)
Work memo ・ Data cleansing
Data processing: Pandas Visualization: Matplotlib Machine learning: scikit-learn
Recommended Posts