The superior's remark that "we will take SEO measures" asked us to create a batch that examines and visualizes the ranking of our company and competitors in Google search. I didn't even understand the word SEO, but my seniors used Elastic Search and Kibana, so I decided to try it. There is no particular reason to implement it in Python. Because I like Python.
By the way, the company surveyed this time is appropriate, Made "drone maker" . The reason is that I'm interested in drones lately.
Python : 3.8.3 Selenium : 3.141.0 Elastic Search : 7.10.1-SNAPSHOT Kibana : 7.10.1
Regarding ElasticSearch, from Json's "number" returned by curl -XGET "http: localhost: 9200"
.
For Kibana, I could simply go with kibana -V
.
rank_observer.json
{
"mappings": {
"properties": {
"keyword": {
"type": "keyword"
},
"ranking": {
"type": "integer"
},
"target_domain": {
"type": "keyword"
},
"get_date": {
"type": "date",
"format": "yyyy/MM/dd"
}
}
}
}
I made three files. The flow of the program is "Get the keyword you want to look up from keyword.txt, get the domain of the competitor you want to look up from domain.txt, and go crazy with rank_observer.py. And automate it with CRON" Feeling like that.
keyword.txt###
keyword.txt
programmable drone
drone company
→ I chose the keywords appropriately like this. The larger the number, the slower the process.
domain.txt###
domain.txt
www.dji.com
www.parrot.com
www.yuneec.com
www.kespry.com
www.skydio.com
www.insitu.com
www.delair.aero
www.ehang.com
→ Regarding domains, I selected several from the top of the "10 Recommended Drone Makers" sites. Investigate how these manufacturers (domains?) Change their search rankings over time.
rank_observer.py###
rank_observer.py
import datetime
import os
import traceback
import smtplib
from lxml import html
import unicodedata
from selenium import webdriver
import chromedriver_binary
from selenium.webdriver.chrome.options import Options
from elasticsearch import Elasticsearch
#Connect to Chrome and search by argument
def search(driver, word):
driver.get("https://www.google.com")
search = driver.find_element_by_name('q')
search.send_keys(word)
search.submit()
return driver.page_source
#Perspective the first page
def analyze(source):
path_to_link = "//div[@class='yuRUbf']/a/@href"
root = html.fromstring(source)
#Returns a list with addresses
address = root.xpath(path_to_link)
return address
#Jump to the link destination of the "Next" button at the bottom of the list display page after searching, and return the link of the jumped destination
def next_page_source(source, driver):
path_to_next_page = "//td[@class='d6cvqb']/a[@id='pnnext']/@href"
root = html.fromstring(source)
address = root.xpath(path_to_next_page)
if address is None:
return 0
else:
driver.get("https://www.google.com/" + str(address[0]))
return driver.page_source
#Export search results as a csv file
def write(filename, keyword, line, i):
today = datetime.datetime.today()
path = "./data/"
Y = str(today.year) + "/"
M = str(today.month)
MM = M.zfill(2) + "/"
if os.path.exists(path + Y):
if os.path.exists(path + Y + MM):
with open(os.path.abspath(path + Y + MM + keyword + ".csv"), "a") as f:
f.write(str(int(i)+1)+","+line+"\n")
else:
os.makedirs(path + Y + MM)
with open(os.path.abspath(path + Y + MM + keyword + ".csv"), "a") as f:
f.write(str(int(i)+1)+","+line+'\n')
else:
os.makedirs(path + Y)
if os.path.exists(path + Y + MM):
with open(os.path.abspath(path + Y + MM + keyword + ".csv"), "a") as f:
f.write(str(int(i)+1)+","+line+"\n")
else:
os.makedirs(path + Y + MM)
with open(os.path.abspath(path + Y + MM + keyword + ".csv"), "a") as f:
f.write(str(int(i)+1)+","+line+'\n')
#Get keywords from a text file that contains the keywords you want to look up
def get_keyword():
with open("./keyword.txt", "r", encoding="utf-8") as f:
line = f.read()
keyword = line.splitlines()
return keyword
#Get the domain from a text file that contains the domain you want to look up
def get_domain():
with open("./domain.txt", "r", encoding="utf-8") as f:
line = f.read()
domain = line.splitlines()
return domain
#Returns only the addresses on the domain list
def check_domain(address, domain):
ok = False
for d in domain:
if d in address:
ok = True
return ok
#Changed to dictionary form to associate domain with rank
# args = {
# "address" :List of links, list
# "page_num":Current number of pages, int
# "keyword" :keyword, string
# "date" :date, date
# "domain" :Specified domain,list
# }
def sophisticate_data(address, page_num, keyword, date, domain):
address_list = []
if len(address) != 0:
for i, content in enumerate(address):
address_dict = {}
#Is the link in the specified domain?
print(content, domain)
print(check_domain(content, domain))
if check_domain(content, domain):
#Number of pages x 10+Calculate search order in order of address
address_dict["keyword"] = keyword
address_dict["rank"] = i + page_num*10 + 1
address_dict["domain"] = content
address_dict["date"] = date
address_list.append(address_dict)
return address_list
#Use the above function to get the search ranking of the domain specified for each keyword.
def parse():
data = []
#Set keywords and domains
keyword = get_keyword()
domain = get_domain()
date = datetime.datetime.today().strftime("%Y/%m/%d")
dir_title = datetime.datetime.today().strftime("%Y_%m_%d")
#How many pages do you want to move? There are about 10 items per page
page_num = 5
for kw in keyword:
time.sleep(10)
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
#The source of the result screen that appears after searching by entering a keyword in the search field
source = search(driver, kw)
address = analyze(source)
results = sophisticate_data(address, 0, kw, date, domain)
#Loads the specified number of pages
for i in range(1,page_num-1):
next_source = next_page_source(source, driver)
if next_page_source == 0:
break
results.extend(sophisticate_data(analyze(next_source), i, kw, date, domain))
source = next_source
time.sleep(10)
driver.quit()
#Save to file
filename = datetime.datetime.today().strftime("%Y_%m_%d") + "_" + kw
for item in results:
write(filename, item["keyword"], str(item["domain"]), item["rank"])
results = sorted(results, key=lambda x:x["rank"])
data.extend(results)
#ElasticSearch from here
#Write data for each keyword
client = Elasticsearch("http://localhost:9200")
for d in data:
body = {}
body["keyword"] = d["keyword"]
body["ranking"] = d["rank"]
body["target_domain"] = d["domain"]
body['get_date'] = d["date"]
client.index(index='sample_index', body=body)
#Send email
def send_mail(exception, error=True):
program = "rank_observer.py"
FROM = '[email protected]'
#Set the notification destination email address
TO = ['[email protected]']
if error == False:
SUBJECT = f"{exception}"
TEXT = f"{exception}"
else:
SUBJECT = u'An error has been detected.'
TEXT = f'In the monitoring server{program}The following error was detected in. Please check the log etc.\n\n {exception}'
message = "Subject: {}\n\n{}".format(SUBJECT, TEXT)
s = smtplib.SMTP()
s.connect()
s.sendmail(FROM, TO, message.encode("utf-8"))
s.close()
#Run
#Exception handling
try:
parse()
except Exception as e:
#Send email only when there is an error
send_mail(traceback.format_exc())
It's a code that feels like readability, but I've passed the check of my seniors by adding more comments.
I sent an email if there was an error. Also, I don't know if it works even if I copy it. I think it's okay if you play with the pass or email address.
0 1 * * * sudo python3 ~/path/to/rank_observer.py >> ~/path/to/data/log.log 2>&1
↑ This is the contents of crontab. The program runs at midnight every day. Exception stacks and races are now written to a log file with a funny name called log.log at any time.
1、elasticsearch.exceptions.SSLError [SSL:WROMG_VERSION_NUMBER]
It's clear from the log that this error came out, but I forgot how I fixed it. Certainly
rank_observer.py
client = Elasticsearch("http://localhost:9200")
I feel like I deleted the argument about SSL here. It looks like it was use_ssl = True
,.
2、ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:
I couldn't install chromedriver-binary for the rest of my life, but I was able to go with this article .
I think it's a redundant way of writing, so I should read the readable code carefully. Next, I would like to study stock prices by making another BI tool or myself. How to visualize with Kibana is omitted. I will put only the image. You can see which manufacturers have recently been ranked high or low. This may be a stepping stone for SEO measures.
Recommended Posts