Create "operation log" CSV formatting tool in 5 days with Python Pandas PyInstaller

A record of making one tool in 5 days without punch lines or mountains

I want to do something new. What you can use at work anyway → I wonder if I should shape the OPLOG.

Oplog is our product term. It is an operation log that contains extra data such as date and time, operator, operation function in csv. Sometimes I get it when something goes wrong. "Operation log".

Pursuing your own happiness is the best motivation. For the time being, I'm happy just to sort by date and time. I decided to start at my own pace.

This is our calendar: Works Human Intelligence Advent Calendar 2020 that embodies Develop fun! Works Human Intelligence # 2 Advent Calendar 2020 that embodies Develop fun!

What to use

Candidates are around here

1. Go because I have never touched it

https://qiita.com/noborus/items/f253961cca6f4465f20c https://golang.org/

2. Data processing seems to be python, so Python

https://qiita.com/ysdyt/items/9ccca82fc5b504e7913a

3. Java normally

https://www.delftstack.com/ja/howto/java/how-to-sort-objects-in-arraylist-by-date-in-java/

That's all for the first day.

the 2nd day

https://www.python.org/downloads/ I installed Python 3.9 for the time being.

https://qiita.com/AI_Academy/items/b97b2178b4d10abe0adb I don't really want to use it for 5 hours, but let's take a look at it.

`test.py`


#This line is a comment. This line will not be executed.

print("Hello, Python") 

#This line is a comment. This line will not be executed.
#This line is a comment. This line will not be executed.

I got angry when I made and typed test.py at the command prompt

C:\workspace\Python\playground>test.py
SyntaxError: Non-UTF-8 code starting with '\x82' in file C:\Users\works\Desktop\workspace\Python\playground\test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Is it an amateur? Resave in UTF-8 and Hello, Python succeed from?

py -m pip install pandas

https://data-flair.training/blogs/install-pandas-on-windows/ Did you install Pandas now?

https://note.nkmk.me/python-pandas-to-csv/ I want to try this area for the time being, but before that, what is a good python editor? → I put VS code. https://code.visualstudio.com/

Without thinking

`firstpandas.py`


import pandas as pd

I tried to write, but I got an error

RuntimeError: The current Numpy installation ('C:\\Users\\works\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\numpy\\__init__.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

Third day

RuntimeError: The current Numpy installation (Omission) fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

Investigate the cause of. See for more information below. https://developercommunity.visualstudio.com/content/problem/1207405/fmod-after-an-update-to-windows-2004-is-causing-a.html use numpy==1.19.3 works ？

https://qiita.com/bear_montblanc/items/b4b75dfd77da98076da5 If you google this I don't know much about what I believe → The error disappeared. Eh.

I want to sort CSV

https://note.nkmk.me/python-pandas-sort-values-sort-index/ Return to. As per the tutorial

`firstpandas.py`


import pandas as pd
df = pd.read_csv('sample_pandas_normal.csv', index_col=0)
print(df)

Then run.

C:\workspaces\playground>firstpandas.py 
         age state  point 
name 
Alice     24    NY     64 
Bob       42    CA     92 
Charlie   18    CA     70 
Dave      68    TX     70 
Ellen     24    CA     88 
Frank     30    NY     57

Oh, it was. Let's sort this. Is it okay to do this?

`firstpandas.py`


import pandas as pd 
df = pd.read_csv('sample_pandas_normal.csv', index_col=0) 
df.sort_values('age')
print(df)

As a result, it doesn't change. It's not good to do it with a sense. .. First of all, the official document is around here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

once again. Writing from the conclusion, this gave me an image.

`firstpandas.py`


import pandas as pd

df = pd.read_csv('sample_pandas_normal.csv')
print(df)

df_s = df.sort_values('age')
print(df_s)

C:\workspaces\playground>firstpandas.py
      name  age state  point
0    Alice   24    NY     64
1      Bob   42    CA     92
2  Charlie   18    CA     70
3     Dave   68    TX     70
4    Ellen   24    CA     88
5    Frank   30    NY     57
      name  age state  point
2  Charlie   18    CA     70
0    Alice   24    NY     64
4    Ellen   24    CA     88
5    Frank   30    NY     57
1      Bob   42    CA     92
3     Dave   68    TX     70

It ’s better to review the Python grammar.

Python3 cheat sheet (basic) Python cheat sheet basic elements (@IT) Translated Pandas Official Cheat Sheet

Once here.

Try sorting by date

Now, I changed the CSV used a little and added the date column.

`sample_pandas_date.csv`


name,age,state,point,birthday
Alice,24,NY,64,1996/1/2
Bob,42,CA,92,1978/2/2
Charlie,18,CA,70,2002/3/4
Dave,68,TX,70,1952/1/1
Ellen,24,CA,88,1996/1/5
Frank,30,NY,57,1990/5/15

`firstpandas.py`


import pandas as pd

df = pd.read_csv('sample_pandas_date.csv')
print(df)

df_s = df.sort_values('birthday')
print(df_s)

What happens with this is below the result.

C:\workspaces\playground>firstpandas.py
      name  age state  point   birthday
0    Alice   24    NY     64   1996/1/2
1      Bob   42    CA     92   1978/2/2
2  Charlie   18    CA     70   2002/3/4
3     Dave   68    TX     70   1952/1/1
4    Ellen   24    CA     88   1996/1/5
5    Frank   30    NY     57  1990/5/15
      name  age state  point   birthday
3     Dave   68    TX     70   1952/1/1
1      Bob   42    CA     92   1978/2/2
5    Frank   30    NY     57  1990/5/15
0    Alice   24    NY     64   1996/1/2
4    Ellen   24    CA     88   1996/1/5
2  Charlie   18    CA     70   2002/3/4

It's becoming like that.

It's time to use the real operation log. When I put the actual CSV, I got an error. This is the end of the excitement.

C:\workspaces\playground>firstpandas.py
Traceback (most recent call last):
  File "C:\workspaces\playground\firstpandas.py", line 3, in <module>
    df = pd.read_csv('oplog20201112.csv')
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Users\works\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas\_libs\parsers.pyx", line 537, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 740, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte

Day 4

I got an error when I read the CSV

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 0: invalid start byte

CSV is like this (masked)

`oplog20201112.csv`


User ID,client name,Windows login ID,Terminal ID,IP Address,MAC Address,Domain name,Login time,Logout time,Login status,action,Function name,Executable file name(Shell),argument(Command line),Execution time,Execution state
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","Job monitoring","Job.exe","-context:*****","2020/11/12 13:19:23","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","Main menu","Companyxx.exe","-cfg","2020/11/12 13:18:56","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","System setting","Maintenance.exe","-context:*****","2020/11/12 13:19:19","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","svc.sh","userid/password","2020/11/12 13:19:32","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","test.sh","userid/password 0 0","2020/11/12 13:19:29","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","test.sh","userid/password 0 0","2020/11/12 13:19:30","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Batch job","mst13","out.sh","userid/password %JAVA% 0","2020/11/12 13:19:31","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","Job management","quevw.exe","-context:*****","2020/11/12 13:19:20","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","batchjob jobid:498298","test.sh","userid/password 0 0","2020/11/12 13:19:56","success"
"all","client-name","works","client-name","xx.xx.xx.xx","xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx     xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx  xx-xx-xx-xx-xx-xx xx-xx-xx-xx-xx-xx ","123-456","2020/11/12 13:18:56","2020/11/12 13:23:38","success","Program","batchjob jobid:498301","svc.sh","userid/password","2020/11/12 13:21:39","success"

Well, Japanese was included. Encoding required.

https://techacademy.jp/magazine/21128

It doesn't matter at all

Process data with pandas based on the csv file of the new Python corona

That's why it looks interesting.

Once the text was changed to UTF-8 with an editor for confirmation, the reading was successful. Oh, would you like to sort by date and time column as it is?

`firstpandas.py`


import pandas as pd

df = pd.read_csv('oplog20201112.csv')
print(df)

df_s = df.sort_values('Execution time')
print(df_s)

As a result, they sorted by execution time quite normally. So that's it. Let's save the CSV with a different name.

Return to this page. https://note.nkmk.me/python-pandas-to-csv/

`firstpandas.py`


import pandas as pd

df = pd.read_csv('oplog20201112.csv')
# print(df)

df_s = df.sort_values('Execution time')
# print(df_s)

df_s.to_csv('out.csv')

The encoding problem mentioned earlier is dealt with below. https://note.sngklab.jp/?p=435

`firstpandas.py`


import pandas as pd

df = pd.read_csv('oplog20201112.csv',encoding="SHIFT-JIS")
print(df)

df_s = df.sort_values('Execution time')
print(df_s)

df_s.to_csv('out.csv')

I want to bring the execution time column to the far left

Then how to move the column. https://note.nkmk.me/python-pandas-reindex/

As you can see, it looks like that.

`firstpandas.py`


import pandas as pd

df = pd.read_csv('oplog20201112.csv',encoding="SHIFT-JIS")
# print(df)
df_s = df.sort_values('Execution time')
# print(df_s)
df_s = df_s.reindex(columns=['Execution time', 
                             'Function name', 
                             'User ID', 
                             'client name', 
                             'Windows login ID', 
                             'Terminal ID', 
                             'Login time', 
                             'Logout time'])
df_s.to_csv('out.csv')

Tool distribution

I also want to see how to make it easier to use. There is something called pyintaller.

pyInstaller # Let's actually create it

set path=C:\Users\works\AppData\Local\Programs\Python\Python39\Scripts;%path%

Please note that is required first. At the command prompt

C:\workspaces\playground>pyinstaller firstpandas.py --onefile
67 INFO: PyInstaller: 4.0
67 INFO: Python: 3.9.0
69 INFO: Platform: Windows-10-10.0.19041-SP0
70 INFO: wrote C:\workspaces\playground\firstpandas.spec
(Omission)

Then, Exe was successfully completed.

https://news.mynavi.jp/article/python-28/

Create * .bat

`test.bat`


cd /d %~dp0

call firstpandas.exe

The same folder looks like this

There is a firstpandas.exe that just reads in.csv and makes it out.csv.

The file I actually get is not in.csv, so I wonder if I should change the csv received as an argument with * .bat to in.csv.

Column existence check

By the way, there were several types of file formats, so I changed them.

How to check if a column exists in Pandas https://stackoverflow.com/questions/24870306/how-to-check-if-a-column-exists-in-pandas

`firstpandas.py`


import pandas as pd

df = pd.read_csv('oplog20201112.csv',encoding="SHIFT-JIS")
# print(df)

if 'Execution time' in df:
    df_s = df.sort_values('Execution time')
    df_s = df_s.reindex(columns=['Execution time', 
                                 'Function name', 
                                 'User ID', 
                                 'client name', 
                                 'Windows login ID', 
                                 'Terminal ID', 
                                 'Login time', 
                                 'Logout time'])
if 'PRC_DATE' in df:
    df_s = df.sort_values('PRC_DATE')
    df_s = df_s.reindex(columns=['PRC_DATE', 
                                 'DETAIL1', 
                                 'USERID', 
                                 'TERM_ID'])


df_s.to_csv('out.csv')

File encoding

The following may appear.

UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x87 in position 22224: illegal multibyte sequence

I thought,

Points to note when letting pandas read csv of excel output

https://minus9d.hatenablog.com/entry/2015/07/30/225841 https://stackoverflow.com/questions/6729016/decoding-shift-jis-illegal-multibyte-sequence

The encoding is further changed according to.

`firstpandas.py`


import pandas as pd

# df = pd.read_csv('in.csv',encoding="SHIFT-JIS")
df = pd.read_csv('in.csv',encoding="shift_jisx0213")
# print(df)

if 'Execution time' in df:
    df_s = df.sort_values('Execution time')
    df_s = df_s.reindex(columns=['Execution time', 
                                 'Function name', 
                                 'User ID', 
                                 'client name', 
                                 'Windows login ID', 
                                 'Terminal ID', 
                                 'Login time', 
                                 'Logout time'])
if 'PRC_DATE' in df:
    df_s = df.sort_values('PRC_DATE')
    df_s = df_s.reindex(columns=['PRC_DATE', 
                                 'DETAIL1', 
                                 'USERID', 
                                 'TERM_ID'])


df_s.to_csv('out.csv')

Other reference https://techacademy.jp/magazine/23367

Finally

`formatter.bat`


cd /d %~dp0

copy %1 in.csv

call ofmt.exe

echo "see out.csv!"

pause

It looks like this