Overview

Clean the csv file data as a preparation for data processing
Remove blanks, symbols, numbers, words, cut out words, write to specific data elements, csv output
Use Python 3.x series, Pandas, re
I wrote it out instead of a memo: I think there are many mistakes in technical terms and insufficient explanations, so I will correct them as needed.

Actual code

0. Loading the library

This time we will use `pandas` and `` `re``` (modules for using regular expressions)

import pandas as pd
import re

1. Read data

df = pd.read_csv("filename.csv")

2. Delete unnecessary elements (blanks, symbols, numbers, words)

Delete unnecessary elements for the entire column

df['Column name'] = df['Column name'].str.replace(r'(\d)', '') #Delete numbers
df['Column name'] = df['Column name'].str.replace('-', '') #Remove sign
df['Column name'] = df['Column name'].str.replace('word', '') #Delete word
df['Column name'] = df['Column name'].str.strip() #Remove whitespace at the beginning and end
df['Column name'] = df['Column name'].str.replace(r'(\d)', '').str.replace('-', '').str.replace('Ah', '').str.strip()
#These can also be run at the same time

3. Cut out words

Thing you want to do

nameSuppose that each element consisting of multiple words exists in the column Example:

df['name'][0] = "I have a pen."
df['name'][1] = "She has a pen."

On the other hand, the first word is extracted and stored as a list in a new column called `` `subject```. Example:

df['subject'][0] = "I"
df['subject'][1] = "She"

code

temp = df['name'].str.split() #Break down into words
subject = [] #Create an empty list to store the clipped words
for item in temp: 
    subject.append(item[0]) #Store the first word of each line in the list
df['subject'] = subject #Added to the original dataframe with the column name subject

4. Write to a specific data element

`.at[]You can access specific data by using`

df.at['Line name','Column name'] = "This is a test" df.at[line number,'Column name'] = "This is a test"


## 5. csv output
 Finally, output the edited data frame to csv. By adding ```encoding ='utf_8_sig'```, garbled characters can be prevented.

df.to_csv("filename_v2.csv", encoding='utf_8_sig')

Recommended Posts

Data cleaning using Python

Data analysis using Python 0

Data analysis using python pandas

Data acquisition using python googlemap api

Data analysis python

Start using Python

Scraping using Python

[python] Read data

Get Youtube data in Python using Youtube Data API

[Python] Various data processing using Numpy arrays

Creating Google Spreadsheet using Python / Google Data API

Data analysis with python 2

Data analysis using xarray

Operate Redmine using Python Redmine

Fibonacci sequence using Python

Python Data Visualization Libraries

[Python] Get all comments using Youtube Data API

Data analysis overview python

Data cleansing 2 Data cleansing using DataFrame

Using Python #external packages

WiringPi-SPI communication using Python

Age calculation using python

[Python3] Let's analyze data using machine learning! (Regression)

Python data analysis template

Search Twitter using Python

[Python tutorial] Data structure

[Python] Sorting Numpy data

Python introductory study-output of sales data using tuples-

Name identification using python

Notes using Python subprocesses

Try using Tweepy [Python2.7]

Data analysis with Python

Cleaning Backlog with Python

Let's analyze Covid-19 (Corona) data using Python [For beginners]

Create a data collection bot in Python using Selenium

Collectively register data in Firestore using csv file in Python

Get LEAD data using Marketo's REST API in Python

[Python] Get insight data using Google My Business API

Write data to KINTONE using the Python requests module

Process csv data with python (count processing using pandas)

[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-

Flatten using Python yield from

Scraping using Python 3.5 async / await

Sample data created with python

My python data analysis container

Save images using python3 requests

Handle Ambient data in Python

data structure python push pop

[S3] CRUD with S3 using Python [Python]

Python for Data Analysis Chapter 4

[Python] Try using Tkinter's canvas

Using Quaternion with Python ~ numpy-quaternion ~

Display UTM-30LX data in Python

Try using Kubernetes Client -Python-

Select features using text data

Get Youtube data with python

[Python] Using OpenCV with Python (Basic)

Website change monitoring using python

Post to Twitter using Python

Data Science Cheat Sheet (Python)

Start to Selenium using python