--When using selenium on EC2, it takes a long time to get the data. ――It went smoothly at first, but as the execution is repeated, the processing becomes heavier. --If you wait for a long time, you will finally get an error that you cannot reach chrome.
error
$python3 filename.py
selenium.common.exceptions.WebDriverException:
Message: chrome not reachable
Multiple processes and drivers for the chrome browser have been launched. (Because it is headless, it cannot be visually confirmed ...)
-Stop the running chrome browser & driver.
-Write object.quit ()
at the end of the file.
python
#Show all running processes including "chrome"
$ ps aux | grep chrome
#If you see a lot of descriptions like the one below, multiple webdrivers are running.
Username pts/0 Sl 04:09 0:00
chromedriver --port=49671
#If you see a lot of statements like the one below, multiple browser processes are running.
Username pts/0 S 04:32 0:00
/opt/google/chrome/chrome --type=broker
ps aux
--ps command: Display the command being executed --aux is an option --a: All user processes --u: Display user name and start time --x: Shows processes without control terminals
-"| (Pipe)": Added command --grep text: Show only processes that include text
python
killall chrome
killall webdriver
Check the running file again.
python
$ ps aux | grep chrome
#Only the following was left and the others were finished.
Username pts/0 S+ 04:35 0:00
grep --color=auto chrome
python
#Assuming the following description
browser = webdriver.Chrome(options=options)
browser.quit()
If the variable for starting webdriver is described in another description such as driver, match it. Example: driver.quit ()
In another article, there was something like stopping the sandbox and adding --no-sandbox as an option, but in my case, the sandbox was not started and was unnecessary.
If sandbox is displayed in the running process, it may be effective to try the following as well.
killall chrome-sandbox
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-setuid-sandbox")
The original article is here
By the way, the sandbox is an isolated environment that does not affect outside programs. (Used to run programs that are likely to contain viruses, etc.)
Recommended Posts