I automated job can stamping with selenium and deployed it to Google Cloud Functions. I will describe the points I was addicted to at that time.
Worked at SIer in Tokyo RPA engineer → Support for PoC introduction of AI solutions, etc.
I also do Twitter ↓ https://twitter.com/m_schna2
It has the following configuration. When I actually tried it, it took quite a while to link Google Drive and Cloud Functions.
The id pw url is embedded, but please change it accordingly.
main.py
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
import jpholiday
import datetime
import requests, json
import os, time
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import shutil
def punchClockWrapper(request):
def isBizDay():
dt_now = datetime.datetime.now()
dt_now = utc_to_jst(dt_now)
DATE = dt_now.strftime('%Y%m%d')
Date = datetime.date(int(DATE[0:4]), int(DATE[4:6]), int(DATE[6:8]))
if Date.weekday() >= 5 or jpholiday.is_holiday(Date):
return 0
else:
return 1
def punchClock():
#constant
url_jobcan = 'https://id.jobcan.jp/users/~~~~~~~~~~'
id = '~~~~~~~~~~~~~'
pw = '~~~~~~~~~~~~~'
#Option setting setting
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--window-size=1280x1696')
options.add_argument('--no-sandbox')
options.add_argument('--hide-scrollbars')
options.add_argument('--enable-logging')
options.add_argument('--log-level=0')
options.add_argument('--v=99')
options.add_argument('--single-process')
options.add_argument('--ignore-certificate-errors')
#GUI backend operation
options.binary_location = os.getcwd() + "/headless-chromium"
driver = webdriver.Chrome(os.getcwd() + "/chromedriver", options=options) #Load the driver that runs Chrome
#Open Job Can
driver.get(url_jobcan)
#Email input
text = driver.find_element_by_id("user_email")
text.send_keys(id)
#password input
text = driver.find_element_by_id("user_password")
text.send_keys(pw)
#Login click
btn = driver.find_element_by_name("commit")
btn.click()
#Click the stamp button
btn = driver.find_element_by_id("adit-button-push")
btn.click()
#Wait 8 seconds
time.sleep(8)
#close the window
driver.quit()
def toSlack():
WEB_HOOK_URL = "https://hooks.slack.com/services/~~~~~~~~~"
dt_now = datetime.datetime.now()
dt_now = utc_to_jst(dt_now)
dt_punch = dt_now.strftime('%Y year%m month%d day%H:%M:%S')
requests.post(WEB_HOOK_URL, data = json.dumps({
'text': str(dt_punch)+u'Stamping completed', #Notification content
'username': '~~~~~~', #username
'icon_emoji': u':smile_cat:', #icon
'link_names': 1, #Link names
}))
def utc_to_jst(timestamp_utc):
timestamp_jst = timestamp_utc.astimezone(datetime.timezone(datetime.timedelta(hours=+9)))
return timestamp_jst
def writeLog(message):
dt_now = datetime.datetime.now()
dt_now = utc_to_jst(dt_now)
dt_punch = dt_now.strftime('%Y year%m month%d day%H:%M:%S')
with open(log_path, 'a') as f:
print(str(dt_punch)+u' '+message, file=f)
def downloadLog():
f = drive.CreateFile(drive_param)
f.GetContentFile(log_path)
def uploadLog():
f = drive.CreateFile(drive_param)
f.SetContentFile(log_path)
f.Upload()
#Constant setting
log_path = "/tmp/punch_clock.log"
drive_param = {
'parents': [{
'id':'~~~~~~~~~~~'
}],
'id': '~~~~~~~~~~~',
'title': 'punch_clock.log'
}
#dir move(settings.The default is the current dir because yaml is read from the execution dir)
#Must be stored in a directory with read and write permissions
os.chdir(os.path.dirname(os.path.abspath(__file__)))
shutil.copy2("credentials.json","/tmp/credentials.json")
#Authentication
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)
#Below main processing
downloadLog()#Download logs from Google Drive
writeLog('Start processing') #Start log
flg = isBizDay() #Weekday judgment(Weekdays: 1, Holidays: 0)
if flg == 1 :
punchClock() #Stamping
toSlack() #Slack notification
writeLog('Stamping completed') #End log
else :
writeLog('I did not stamp it because it is a holiday') #End log
uploadLog() #Upload logs to Google Drive
return 'Done'
selenium This is the main punch clock. In the code, I'm worried about waiting for 8 seconds after stamping, but I've never failed to stamp, so it's probably okay.
Notify using webhook API. There are steps for API settings at the following URL. https://slack.com/intl/ja-jp/help/articles/115005265063-Slack-%E3%81%A7%E3%81%AE-Incoming-Webhook-%E3%81%AE%E5%88%A9%E7%94%A8#incoming-webhook-u12398u35373u23450
Regarding the deployment method, put all the code and necessary library modules in the same folder and upload with the following code etc.
gcloud functions deploy punchClockWrapper --runtime python37 --trigger-http --region asia-northeast1 --memory 512MB
Here, refer to the URL below. https://blowup-bbs.com/gcp-cloud-functions-python3/
To add a log to Google Drive Download the log up to the last time → Add → Upload In Google Drive, it is managed by the file id, so even if you upload it with the same name, it will be uploaded as a separate file although it has the same name instead of overwriting by default. Therefore, it can be overwritten by specifying the id of the parent folder and the id of the log file, but can't the id of the log file be obtained by normal GUI operation? I turned the code below to check the id.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import pprint
#Authentication
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)
#file data acquisition
file_id = drive.ListFile({'q':'title="punch_clock.log"'}).GetList()[0]['id']
f = drive.CreateFile({'id':file_id})
#File metadata display
f.FetchMetadata()
pprint.pprint(f)
By the way, you can check the folder id from the url part by opening it with a browser. You can check the method at the link below. https://tonari-it.com/gas-google-drive-file-folder-id/
PyDrive is a wrapper library for the Google Drive API. At first, interactive authentication is required, but json is created after successful authentication, and after the second time, the json is referred to without permission, so it is possible to automate Google Drive by incorporating it in the code. There are few code situations and it is very neat.
#Authentication
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)
It's just the behavior at the time of authentication See settings.yaml in the current directory → Refer to the path specified in save_credentials_file → If json exists, authentication succeeds and overwrites json. If not, authentication is requested with code and if code is entered, json is created in the specified path as well. It is necessary to be aware of this point.
Of course, you cannot interactively authenticate after deploying to GCF. So I authenticate locally in advance, create a json file and store it in the folder to deploy, but in GCF the writable directory is limited to / tmp. And since json is written (overwritten) even when authentication is successful, you must specify the path /tmp/~~~.json in save_credentials_file. So in the code (below code) I copy credentials.json from the deploy directory to / tmp. And in settings.yaml deployed together, /tmp/credentials.json is specified in the save_credentials_file item. By the way, PyDrive settings.yaml must be in the current directory, so I moved it to the file execution directory.
#dir move(settings.The default is the current dir because yaml is read from the execution dir)
#Must be stored in a directory with read and write permissions
os.chdir(os.path.dirname(os.path.abspath(__file__)))
shutil.copy2("credentials.json","/tmp/credentials.json")
Since the default region of the worker started by GCF is us-central1 which is 9 hours off from Japan time, the time described in the log output and Slack notification is corrected by the following function. (I should have deployed with asia-north1 this time, but it still shifts for some reason ...)
def utc_to_jst(timestamp_utc):
timestamp_jst = timestamp_utc.astimezone(datetime.timezone(datetime.timedelta(hours=+9)))
return timestamp_jst
--embed id, pw, url It's a low priority, but I'd like to improve it someday.
--No restrictions on callers At present, if the URL that triggers GCF is found, anyone can stamp it! You can do this as soon as you check it out! So I'll fix it when I have time!
https://stackoverflow.com/questions/51487885/tmp-file-in-google-cloud-functions-for-python https://blowup-bbs.com/gcp-cloud-functions-python3/ https://dev.classmethod.jp/articles/python-time-string-timezone/ https://qiita.com/wezardnet/items/34b89276cdfc66607727
Recommended Posts