Introduction

Well, I didn't like mouse fluttering in the business system (attendance management).

Common things in crawling

Basically, it is a system in which humans flutter with a mouse, so it is not considered to be operated by a machine. There is a premise that you will never press a button that you have not seen yet, so if you think about moving it according to the person, do something when you can press it properly or when you can input it. It is necessary to control the timing like this.

Naturally, selenium has such a mechanism.

document: waits, Waits, Wait (translation above)

The second and third documents show two ways. However, waiting for a certain period of time in the dark clouds is an implicit wait, and it's okay to use time.sleep without leaving it to selenium. Well, typing costs and forgetting to put in are eliminated, so it is easy to say that it is easy, but if the response speed differs depending on the network situation and PC load situation, the expected result may not be obtained, so it is originally explicit. Waiting is desirable. However, [I'm polling internally for a certain period (default 0.5s)](https://seleniumhq.github.io/selenium/docs/api/py/_modules/selenium/webdriver/support/wait.html# WebDriverWait), so the processing load will be slightly higher.

How to wait

So, in the sample written in the above document, it is shown as follows.

#The first one
element = WebDriverWait(driver, 10).until(lambda x: x.find_element_by_id(“someId”))

#2nd and 3rd
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    #Exception handling

Customize waiting conditions

In addition, this document describes how to customize. .. Helper classes defined in ExceptedConditions such as presence_of_element_located also works this way. I am using. However, this method can be understood if it is a little more complicated, but if it is as simple as the one presented, it can be done more easily.

You can create a custom wait condition using a class with a call method that returns False if the conditions do not match.

You only have to keep the conditions such as, so you can use a lambda expression or something. (It's mentioned in the very first document.)

def proc(driver, type, name, cname):
  #Create the process you want to customize, return False when the conditions do not match, return the element if successful
  element = driver.find_element(type, name)
  if cname in element.get_attribute("class"):
    return element
  else:
    return False

wait = WebDriverWait(driver, 10)
try:
  element = wait.until(lambda drv: proc(drv, By.ID, 'myNewInput', 'myCSSClass'))
except TimeoutException:
  print("timeout..")
  sys.exit()

An example of a lamda expression is [this document](https://seleniumhq.github.io/selenium/docs/api/py/webdriver_support/selenium.webdriver.support.wait.html?highlight=webdriverwait#selenium.webdriver.support .wait.WebDriverWait ) Is also described. This document has a link to the source, so I think it's very useful. If you refer to here, the point is that you can add a condition such as not only getting an element when it is found, but also getting an element with a certain attribute.

The main subject?

Well, in the first sample, I confirmed that the ID exists in the DOM, but for example, the element you want to check may be class or name, and the element to wait for changes depending on the page and configuration. At that time, this is By.CLASS_NAME, and I wanted to avoid being aware of identifiers such as By.NAME (in terms of coding). Of course, when crawling, you need to think about how easy it is because this element is pulled by the ID, and how to uniquely identify it because it is a class. So, for the time being, I made a little wait process based on the above example. I haven't confirmed it properly.

def wait(drv, sec, selector):
    def chk(selector):
        elem = drv.find_element(By.ID, selector)
        if elem:
            return elem
        elem = drv.find_element(By.CLASS_NAME, selector)
        #print("css:",type(elem), elem)
        if elem:
            return elem
        elem = drv.find_element(By.XPATH, selector)
        if elem:
            return elem
        elem = drv.find_elements(By.ID, selector)
        if elem:
            return elem
        elem = drv.find_elements(By.CLASS_NAME, selector)
        if elem:
            return elem
        return False

    try:
        elem = WebDriverWait(drv, sec).until(
            lambda _: chk(selector)
        )
        return elem
    except TimeoutException:
        print(f"wait timeout.. {selector} not found")
        return None

elem = wait(driver, 10, "elem_name")
if not elem:
    print("wow, unknown error.")

It's kind of like that, but I can't forgive the fact that chk is a bit redundant and find_element is done many times. Furthermore, the elements that can be obtained may be lists. .. In that sense, it seems better to change the way of thinking a little.

Also, stackoverflow provides an example solution for creating your own class. Pass the check part in a list, and if any one hits, it's OK. It's pretty smart, so let's organize it so that it works properly with this idea.

Final form

Wait implementation like that

class AnyEc:
        """ Use with WebDriverWait to combine expected_conditions
                in an OR.
        """""
        def __init__(self, *args):
                if type(args) is tuple:
                        lval = list(args)
                else:
                        lval = args

                self.ecs = []
                for v in lval:
                        if type(v) is list:
                                self.ecs += v
                        else:
                                self.ecs.append(v)

                print("ecs type: ", type(self.ecs))
        def __call__(self, driver):
                #print("ecs: ", self.ecs)
                for fn, param in self.ecs:
                        r = fn(param)
                        print("param: ", param, r)
                        if r :
                                return r
                return False

def wait_any(drv, sec, *args):
        try:
                elem = WebDriverWait(drv, sec).until(
                        AnyEc(*args)
                )
                return elem
        except TimeoutException:
                print(f"wait timeout.. {args} not found")
                return False

How to use

def make_css_selector(key):
        value = []
        value += ['[id="%s"]' % key]
        value += ['#%s' % key]
        value += [key]
        value += ['[name="%s"]' % key]
        value += [".%s" % key]
        return value

#Usage sample

#Url to access
url='https://ja.stackoverflow.com/'
#The tag you want to find
str='question-mini-list h3'
#Leave it to me
val = make_css_selector(str)
fn = [(driver.find_elements_by_css_selector, x) for x in val]

driver = webdriver.Chrome()
driver.get(url)

try :
  #Wait until you find the tag you're looking for, time out after 10 seconds
  elem = wait_any(driver, 10, fn)
  for e in elem:
    print(e.text)

finally:
  driver.close()
  driver.quit()

It doesn't look that smart after all: sweat_smile:

Digression

Originally I wanted to combine XPath with or so that it could be done in one shot, but I gave up because it was troublesome to convert to XPath: stuck_out_tongue_winking_eye:

By the way, let's take a look at the source of find_element. https://seleniumhq.github.io/selenium/docs/api/py/_modules/selenium/webdriver/remote/webdriver.html#WebDriver.find_element

        if self.w3c:
            if by == By.ID:
                by = By.CSS_SELECTOR
                value = '[id="%s"]' % value
            elif by == By.TAG_NAME:
                by = By.CSS_SELECTOR
            elif by == By.CLASS_NAME:
                by = By.CSS_SELECTOR
                value = ".%s" % value
            elif by == By.NAME:
                by = By.CSS_SELECTOR
                value = '[name="%s"]' % value
        return self.execute(Command.FIND_ELEMENT, {
            'using': by,
            'value': value})['value']

In fact, it's almost replaced with CSS_SELECTOR. So, if I didn't need to specify it in XPath, I thought that I should be able to use one find, but it didn't work, so I gave up here.

Do you want to wait for general purpose in Python Selenium?