Click here until yesterday
You will become an engineer in 100 days-Day 70-Programming-About scraping
You will become an engineer in 100 days --Day 66 --Programming --About natural language processing
You will become an engineer in 100 days --Day 63 --Programming --Probability 1
You will become an engineer in 100 days-Day 59-Programming-Algorithms
You will become an engineer in 100 days --- Day 53 --Git --About Git
You will become an engineer in 100 days --Day 42 --Cloud --About cloud services
You will become an engineer in 100 days --Day 36 --Database --About the database
You will be an engineer in 100 days-Day 24-Python-Basics of Python language 1
You will become an engineer in 100 days --Day 18 --Javascript --JavaScript basics 1
You will become an engineer in 100 days --Day 14 --CSS --CSS Basics 1
You will become an engineer in 100 days --Day 6 --HTML --HTML basics 1
This time is also a continuation of scraping.
If you have finished installing Selenium, you can continue.
Load the library. Assuming that Google Chrome will run ...
from selenium import webdriver
#Driver settings
chromedriver = "Driver's full pass"
driver = webdriver.Chrome(executable_path=chromedriver)
I think that the save destination of the WEB driver is different for each person, so please rewrite it. This is the way to launch Google Chrome.
If you get an error message, you need to match the version of the WEB driver and Chrome. It may also be necessary to set permissions so that the WEB driver can be executed, so check the error details and take appropriate action.
At this point, you can operate the browser, so you can perform various operations.
Once you open the browser, it stays open until you close it. Don't forget to drop it as opening it in large numbers consumes resources.
You can also open it in headerless
mode when using selenium.
The headerless
mode is a mechanism that moves the browser behind the scenes without visibly launching it.
This is a very convenient mode because it saves resources and allows you to use Selenium on Linux servers.
How to write is to create a variable to add the option
setting of the browser
Add the headerless
setting and add it to the argument of the WEB driver call method.
Option variable = webdriver.ChromeOptions ()
Optional variable .add_argument ('--headless')
Driver variable = webdriver.Chrome (options = option variable)
from selenium import webdriver
#Driver settings
chromedriver = "Driver's full pass"
#Option setting
options = webdriver.ChromeOptions()
options.add_argument('--headless')
#Driver call
driver = webdriver.Chrome(executable_path=chromedriver,
options=options)
We will operate using the variables when selenium is called.
Since we called it with the variable name driver
earlier, we will call it the driver variable
from now on.
To access the website
Driver variable .get (URL)
And execute it.
driver.get(URL)
Let's go to my HP as a trial.
driver.get('http://www.otupy.net')
You can type in the URL to access the site each time you run it. It will take some time for all the websites to be displayed, so it is better to wait for a while before performing any subsequent operations.
You can scroll within the site by running Javascript. You can type the script with ʻexecute_script`.
Driver variable .execute_script (Javascript)
As the Javascript part, type the script as characters
window.scrollBy (0, Y)
and window.scrollTo (0, Y)
Use to determine the scroll position.
window.scrollBy (0, window.innerHeight);
for one page
If you specify window.scrollTo (0, document.body.scrollHeight);
, you can scroll to the bottom.
Let's scroll.
#Scroll a little
driver.execute_script("window.scrollBy(0, window.innerHeight);")
#Scroll to the bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Now you can scroll your browser around.
To work with your site, you need to find the element of where you want to work. You can search for elements on the site such as input orchids.
There are many ways to find an element
Driver variable .find_element_by_XXXX
You can search by the value of each attribute with the method.
If an element is found, it will be extracted as a data type called WebElement
.
** Search by id attribute **
Driver variable .find_element_by_id (value of id attribute)
** Search by name attribute **
Driver variable .find_element_by_name (value of name attribute)
** Search by class name **
Driver variable .find_element_by_class_name (class name)
** tag name **
Driver variable .find_element_by_tag_name (tag name)
** Search by link_text **
Driver variable .find_element_by_link_text (value of link_text)
CSS_Selector
Driver variable .find_element_by_css_selector (value of css_selector)
xpath
Driver variable .find_element_by_xpath (value of xpath)
You must find the element first to work with it.
If you find an element by the above method, assign it to the element variable
and you can perform the following operations.
Element variable .find_element_by_XXXX ()
Element variable. Operation method
** Click an element **
Element variable .click ()
** Enter characters in the element **
Element variable .send_keys (character)
** Key input with element **
Load the Keys
library first.
from selenium.webdriver.common.keys import Keys
Then find the element and use send_keys
to enter the keys.
Element variable .send_keys (Keys. Special keys)
The keys that can be handled are as follows.
Key | Keys |
---|---|
Enter key | Keys.ENTER |
ALT key(Combined with normal key) | Keys.ALT,"Key" |
← key | Keys.LEFT |
→ key | Keys.RIGHT |
↑ key | Keys.UP |
↓ key | Keys.DOWN |
Ctrl key(Combined with normal key) | Keys.CONTROL,"Key" |
Delete key | Keys.DELETE |
HOME key | Keys.HOME |
END key | Keys.END |
ESCAPE key | Keys.ESCAPE |
equal | Keys.EQUALS |
COMMAND key | Keys.COMMAND |
F1 key | Keys.F1 |
shift key(Combined with normal key) | Keys.SHIFT,"Key" |
Page down key | Keys.PAGE_DOWN |
Page up key | Keys.PAGE_UP |
Space bar | Keys.SPACE |
Return key | Keys.RETURN |
tab key | Keys.TAB |
You can get the source code of the page as a string.
Driver variable .page_source
driver.page_source
After acquisition, analysis can be performed using a library such as BeautifulSoup
.
With selenium, with normal scraping techniques It is convenient because you can easily obtain information that cannot be obtained.
If you are having trouble getting data, try selenium. If you can do this, you will be able to get overwhelming data.
25 days until you become an engineer
Otsu py's HP: http://www.otupy.net/
Youtube: https://www.youtube.com/channel/UCaT7xpeq8n1G_HcJKKSOXMw
Twitter: https://twitter.com/otupython
Recommended Posts