A friend reading manga on a pirated manga site called manga1001.com "There are a lot of radical ads that I can't see outside, and I get a warning when I use Adblock." I said, so let's erase it! I thought.
Also, if you do something similar to this article,
Please be careful. You may be guilty.
src
of ʻimg`src
as ʻimg`I'm using Chrome Canary so that it's okay if it breaks.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import chromedriver_binary
from time import sleep
#Path to output the generated HTML file
output_path = '/Users/hoge/fuga/'
#Webdriver options
options = Options()
#Specify the path of Google Chrome Canary
options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
#Specify the size of the window
options.add_argument('window-size=1600,900')
#Ask for the URL of the page you want to remove ads from
url = input('enter url: ')
#Launch Chrome
driver = webdriver.Chrome(options=options)
driver.get(url)
#Wait a moment for the page to execute JavaScript
sleep(3)
#Get title
title = driver.find_elements_by_class_name('entry-title')[0].text
#Get WebElement of img element as an array
contents = driver.find_elements_by_css_selector('.entry-content figure img')
#Assign a character string to be displayed as HTML to the output variable output
output = '''
<!DOCTYPE html>
<html>
<head>
<style>
body{
background-color:#333;
}
img{
display: block;
margin: 10px auto;
width: 100%;
max-width: 600px;
box-shadow: 0 0 10px black;
}
</style>
</head>
<body>
'''
#Add the src attribute of the acquired img element to output as an image
for content in contents:
output += '<img src="{}"/>'.format(content.get_attribute('src'))
#Add closing tag to output
output += '</body></html>'
#Create an HTML file with the title name and write the output
with open('{0}{1}.html'.format(output_path, title), 'w', encoding='utf-8') as f:
f.write(output)
#Open the created HTML file
driver.get('file://{0}{1}.html'.format(output_path, title))
I was able to scrape the contents of the cluttered site neatly. Again, I'm not going to use it myself, and I didn't give this program to a friend. I just wanted to scrape! Lol
Recommended Posts