① Use chrome ② Make it headless ③ Get the vertical and horizontal lengths for each page with javascript, set them, and get a capture ④ If you think that the site will time out suddenly, you can restart Chrome.
Requests for capture acquisition come suddenly, so you need to be careful not to miss a capture quickly. For that purpose, make sure that the capture is done in the specified folder at the end. If it times out, you can restart Chrome and continue to capture. It's the easiest way to write a program quickly. But this is pretty good.
■ Environment ・ Windodws10 ・ Python 3.8.3
Get a capture of the entire web page
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument('--hide-scrollbars') #Turn off the scroll bar
options.add_argument('--incognito') #Secret mode
options.add_argument('--headless') #Headless (browser disappears, it's a good idea to add this option after testing)
driver = webdriver.Chrome(options=options) #Description when the path is passing
try:
WebDriverWait(self.driver, 15).until(EC.presence_of_all_elements_located)
driver.get("https://testtesttest.com")
#↑ Move to the URL to be captured. Time-out errors may suddenly occur frequently on sites with many images and advertisements.
except Exception:
#Write a description of driver restart here
#Get the vertical and horizontal size and get the capture
page_width = driver.execute_script('return document.body.scrollWidth')
page_height = driver.execute_script('return document.body.scrollHeight')
driver.set_window_size(page_width, page_height)
#Get current time for file name
now = datetime.datetime.now()
zikan = now.strftime('%Y%m%d_%H%M%S')
filename = "file name"+ "_"+ zikan + ".jpg " #The extension can be ping
#Take a capture
driver.save_screenshot("./Folder name/" + filename)
#Search for a capture file for up to 5 seconds
start=time.time()
while time.time()-start<=5:
if os.path.exists(./Folder name/+filename):
break #Exit when the file is found
time.sleep(1)
else:
#Write what to do if the capture is not found
driver.quit()
When operating with vba Since many Excel files are often running, the capture save destination is It is described with a full path instead of a relative path. In my experience, sites with a lot of images and promotions suddenly have frequent timeout errors. Even in that case, if I put in the restart process with my own function OnceMoreGet, I proceeded without stopping.
Get a capture of the entire web page
Dim deiver as New ChromeDriver
Dim tate As Long
Dim yoko As Long
Dim Target as String
driver.AddArgument "headless" 'Headless
driver.AddArgument "disable-gpu" 'Temporarily needed options. It may be unnecessary, but just in case
driver.AddArgument "incognito" 'Secret mode
driver.AddArgument "hide-scrollbars" 'Turn off the scroll bar
'Set the wait time for reading and timeout 30 seconds
driver.Timeouts.PageLoad = 30000
driver.Timeouts.Server = 30000
driver.Timeouts.ImplicitWait = 30000
driver.Timeouts.Script = 30000
driver.Start
'Simple for light sites that don't time out
'driver.get("URL of the site")OK
'↓ is a function that restarts when a timeout occurs after a screen transition
If Not OnceMoreGet(driver, "URL of the site") Then
'If the restart fails after the timeout, write some processing here
End If
'Capture acquisition process
tate = driver.ExecuteScript("return document.body.scrollHeight")
yoko = driver.ExecuteScript("return document.body.scrollWidth")
driver.Window.SetSize yoko, tate
Target = "Enter the file name here with the full path"
'Screenshot: There may be an error here, so it is better to have an error avoidance process.
driver.TakeScreenshot.SaveAs (Target)
'Use the Dir function to check if the capture is possible.
Dim timeout As Date
timeout = DateAdd("s", 5, Now)
Dim str As String
Do
str = Dir(Target)
If Now > timeout Then
'Write the processing when only the capture file is not found
End If
Loop Until str <> ""
driver.quit
'A function that restarts when a timeout error occurs
Function OnceMoreGet(driver As ChromeDriver, url As String) As Boolean
On Error GoTo ErrorHandler
diver.Get (url)
OnceMoreGet = True
Exit Function
ErrorHandler:
'Reboot if an exception occurs
driver.Quit
Call WaitFor(3)
driver.Start
driver.Get (url)
OnceMoreGet = True
End Function
An amateur is writing by himself. I would appreciate any advice or comments.
Recommended Posts