I automated job can stamping with selenium and deployed it to Google Cloud Functions, but it was quite difficult

Introduction

I automated job can stamping with selenium and deployed it to Google Cloud Functions. I will describe the points I was addicted to at that time.

Easy self-introduction

Worked at SIer in Tokyo RPA engineer → Support for PoC introduction of AI solutions, etc.

I also do Twitter ↓ https://twitter.com/m_schna2

Flow diagram

It has the following configuration. When I actually tried it, it took quite a while to link Google Drive and Cloud Functions. フロー

Whole code

The id pw url is embedded, but please change it accordingly.

main.py


from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
import jpholiday
import datetime
import requests, json
import os, time
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import shutil

def punchClockWrapper(request):
  def isBizDay():
    dt_now = datetime.datetime.now()
    dt_now = utc_to_jst(dt_now)
    DATE = dt_now.strftime('%Y%m%d')
    Date = datetime.date(int(DATE[0:4]), int(DATE[4:6]), int(DATE[6:8]))
    if Date.weekday() >= 5 or jpholiday.is_holiday(Date):
      return 0
    else:
      return 1

  def punchClock():
    #constant
    url_jobcan = 'https://id.jobcan.jp/users/~~~~~~~~~~'
    id = '~~~~~~~~~~~~~'
    pw = '~~~~~~~~~~~~~'

    #Option setting setting
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    options.add_argument('--window-size=1280x1696')
    options.add_argument('--no-sandbox')
    options.add_argument('--hide-scrollbars')
    options.add_argument('--enable-logging')
    options.add_argument('--log-level=0')
    options.add_argument('--v=99')
    options.add_argument('--single-process')
    options.add_argument('--ignore-certificate-errors')

    #GUI backend operation
    options.binary_location = os.getcwd() + "/headless-chromium"    
    driver = webdriver.Chrome(os.getcwd() + "/chromedriver", options=options) #Load the driver that runs Chrome

    #Open Job Can
    driver.get(url_jobcan)

    #Email input
    text = driver.find_element_by_id("user_email")
    text.send_keys(id)

    #password input
    text = driver.find_element_by_id("user_password")
    text.send_keys(pw)

    #Login click
    btn = driver.find_element_by_name("commit")
    btn.click()

    #Click the stamp button
    btn = driver.find_element_by_id("adit-button-push")
    btn.click()

    #Wait 8 seconds
    time.sleep(8)

    #close the window
    driver.quit()

  def toSlack():
    WEB_HOOK_URL = "https://hooks.slack.com/services/~~~~~~~~~"
    dt_now = datetime.datetime.now()
    dt_now = utc_to_jst(dt_now)
    dt_punch = dt_now.strftime('%Y year%m month%d day%H:%M:%S')
    requests.post(WEB_HOOK_URL, data = json.dumps({
      'text': str(dt_punch)+u'Stamping completed',  #Notification content
      'username': '~~~~~~',  #username
      'icon_emoji': u':smile_cat:',  #icon
      'link_names': 1,  #Link names
    }))

  def utc_to_jst(timestamp_utc):
    timestamp_jst = timestamp_utc.astimezone(datetime.timezone(datetime.timedelta(hours=+9)))
    return timestamp_jst

  def writeLog(message):
    dt_now = datetime.datetime.now()
    dt_now = utc_to_jst(dt_now)
    dt_punch = dt_now.strftime('%Y year%m month%d day%H:%M:%S')
    with open(log_path, 'a') as f:
      print(str(dt_punch)+u'  '+message, file=f)

  def downloadLog():
    f = drive.CreateFile(drive_param)
    f.GetContentFile(log_path)

  def uploadLog():
    f = drive.CreateFile(drive_param)
    f.SetContentFile(log_path)
    f.Upload()

  #Constant setting
  log_path = "/tmp/punch_clock.log"
  drive_param = {
    'parents': [{
        'id':'~~~~~~~~~~~'
    }],
    'id': '~~~~~~~~~~~',
    'title': 'punch_clock.log'
  }
  #dir move(settings.The default is the current dir because yaml is read from the execution dir)
  #Must be stored in a directory with read and write permissions
  os.chdir(os.path.dirname(os.path.abspath(__file__)))
  shutil.copy2("credentials.json","/tmp/credentials.json")
  #Authentication
  gauth = GoogleAuth()
  gauth.CommandLineAuth()
  drive = GoogleDrive(gauth)
  #Below main processing
  downloadLog()#Download logs from Google Drive
  writeLog('Start processing') #Start log
  flg = isBizDay() #Weekday judgment(Weekdays: 1, Holidays: 0)
  if flg == 1 :
    punchClock() #Stamping
    toSlack() #Slack notification
    writeLog('Stamping completed') #End log
  else :
    writeLog('I did not stamp it because it is a holiday') #End log
  uploadLog() #Upload logs to Google Drive
  return 'Done'

Smooth points

selenium This is the main punch clock. In the code, I'm worried about waiting for 8 seconds after stamping, but I've never failed to stamp, so it's probably okay.

Slack notification

Notify using webhook API. There are steps for API settings at the following URL. https://slack.com/intl/ja-jp/help/articles/115005265063-Slack-%E3%81%A7%E3%81%AE-Incoming-Webhook-%E3%81%AE%E5%88%A9%E7%94%A8#incoming-webhook-u12398u35373u23450

The difficult point

Deploy to Google Cloud Functions (GCF)

Regarding the deployment method, put all the code and necessary library modules in the same folder and upload with the following code etc.

gcloud functions deploy punchClockWrapper --runtime python37 --trigger-http --region asia-northeast1 --memory 512MB

Here, refer to the URL below. https://blowup-bbs.com/gcp-cloud-functions-python3/

Add log to Google Drive (using PyDrive)

To add a log to Google Drive Download the log up to the last time → Add → Upload In Google Drive, it is managed by the file id, so even if you upload it with the same name, it will be uploaded as a separate file although it has the same name instead of overwriting by default. Therefore, it can be overwritten by specifying the id of the parent folder and the id of the log file, but can't the id of the log file be obtained by normal GUI operation? I turned the code below to check the id.

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
import pprint

#Authentication
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)

#file data acquisition
file_id = drive.ListFile({'q':'title="punch_clock.log"'}).GetList()[0]['id']
f = drive.CreateFile({'id':file_id})

#File metadata display
f.FetchMetadata()
pprint.pprint(f)

By the way, you can check the folder id from the url part by opening it with a browser. You can check the method at the link below. https://tonari-it.com/gas-google-drive-file-folder-id/

Combination of PyDrive and GCF

PyDrive is a wrapper library for the Google Drive API. At first, interactive authentication is required, but json is created after successful authentication, and after the second time, the json is referred to without permission, so it is possible to automate Google Drive by incorporating it in the code. There are few code situations and it is very neat.

#Authentication
gauth = GoogleAuth()
gauth.CommandLineAuth()
drive = GoogleDrive(gauth)

It's just the behavior at the time of authentication See settings.yaml in the current directory → Refer to the path specified in save_credentials_file → If json exists, authentication succeeds and overwrites json. If not, authentication is requested with code and if code is entered, json is created in the specified path as well. It is necessary to be aware of this point.

Of course, you cannot interactively authenticate after deploying to GCF. So I authenticate locally in advance, create a json file and store it in the folder to deploy, but in GCF the writable directory is limited to / tmp. And since json is written (overwritten) even when authentication is successful, you must specify the path /tmp/~~~.json in save_credentials_file. So in the code (below code) I copy credentials.json from the deploy directory to / tmp. And in settings.yaml deployed together, /tmp/credentials.json is specified in the save_credentials_file item. By the way, PyDrive settings.yaml must be in the current directory, so I moved it to the file execution directory.

#dir move(settings.The default is the current dir because yaml is read from the execution dir)
#Must be stored in a directory with read and write permissions
os.chdir(os.path.dirname(os.path.abspath(__file__)))
shutil.copy2("credentials.json","/tmp/credentials.json")

By default, the time is off

Since the default region of the worker started by GCF is us-central1 which is 9 hours off from Japan time, the time described in the log output and Slack notification is corrected by the following function. (I should have deployed with asia-north1 this time, but it still shifts for some reason ...)

def utc_to_jst(timestamp_utc):
  timestamp_jst = timestamp_utc.astimezone(datetime.timezone(datetime.timedelta(hours=+9)))
  return timestamp_jst

Future tasks

--embed id, pw, url It's a low priority, but I'd like to improve it someday.

--No restrictions on callers At present, if the URL that triggers GCF is found, anyone can stamp it! You can do this as soon as you check it out! So I'll fix it when I have time!

Reference URL

https://stackoverflow.com/questions/51487885/tmp-file-in-google-cloud-functions-for-python https://blowup-bbs.com/gcp-cloud-functions-python3/ https://dev.classmethod.jp/articles/python-time-string-timezone/ https://qiita.com/wezardnet/items/34b89276cdfc66607727

Recommended Posts

I automated job can stamping with selenium and deployed it to Google Cloud Functions, but it was quite difficult
Convert the spreadsheet to CSV and upload it to Cloud Storage with Cloud Functions
The file edited with vim was readonly but I want to save it
I was addicted to scraping with Selenium (+ Python) in 2020
[IOS] GIF animation with Pythonista3. I was addicted to it.
PyTorch's book was difficult to understand, so I supplemented it
Upload and delete files to Google Cloud Storages with django-storage
A memorandum when I tried to get it automatically with selenium
I want to use an external library with IBM Cloud Functions
How to connect to Cloud Firestore from Google Cloud Functions with python code
I tried to make a periodical process with Selenium and Python
I tried to summarize what was output with Qiita with Word cloud
I made a music bot using discord.py and Google Drive API (tested with Docker → deployed to Heroku)
I made an app for foreign visitors to Japan with a hackathon and won a prize, but when I thought about it carefully, it was useless.