I tried scraping the advertisement of the pirated cartoon site

Background

A friend reading manga on a pirated manga site called manga1001.com "There are a lot of radical ads that I can't see outside, and I get a warning when I use Adblock." I said, so let's erase it! I thought.

Caution

Also, if you do something similar to this article,

Please be careful. You may be guilty.

Method

  1. Enter any URL in manga1001.com
  2. Open Chrome
  3. Get the src of ʻimg`
  4. Create HTML file
  5. Write the obtained src as ʻimg`
  6. Open the HTML file

Source code

I'm using Chrome Canary so that it's okay if it breaks.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import chromedriver_binary
from time import sleep

#Path to output the generated HTML file
output_path = '/Users/hoge/fuga/'

#Webdriver options
options = Options()
#Specify the path of Google Chrome Canary
options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
#Specify the size of the window
options.add_argument('window-size=1600,900')

#Ask for the URL of the page you want to remove ads from
url = input('enter url: ')

#Launch Chrome
driver = webdriver.Chrome(options=options)
driver.get(url)

#Wait a moment for the page to execute JavaScript
sleep(3)

#Get title
title = driver.find_elements_by_class_name('entry-title')[0].text

#Get WebElement of img element as an array
contents = driver.find_elements_by_css_selector('.entry-content figure img')

#Assign a character string to be displayed as HTML to the output variable output
output = '''
<!DOCTYPE html>
<html>
<head>
<style>
body{
  background-color:#333;
}
img{
  display: block;
  margin: 10px auto;
  width: 100%;
  max-width: 600px;
  box-shadow: 0 0 10px black;
}
</style>
</head>
<body>
'''

#Add the src attribute of the acquired img element to output as an image
for content in contents:
  output += '<img src="{}"/>'.format(content.get_attribute('src'))

#Add closing tag to output
output += '</body></html>'

#Create an HTML file with the title name and write the output
with open('{0}{1}.html'.format(output_path, title), 'w', encoding='utf-8') as f:
  f.write(output)

#Open the created HTML file
driver.get('file://{0}{1}.html'.format(output_path, title))

What was made

manga1001.com_scraping.gif

Impressions

I was able to scrape the contents of the cluttered site neatly. Again, I'm not going to use it myself, and I didn't give this program to a friend. I just wanted to scrape! Lol

Recommended Posts

I tried scraping the advertisement of the pirated cartoon site
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried the asynchronous server of Django 3.0
I tried scraping
I tried the pivot table function of pandas
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to summarize the basic form of GPLVM
I tried the MNIST tutorial for beginners of tensorflow.
I tried using the API of the salmon data project
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
The definitive edition of python scraping! (Target site: BicCamera)
I tried the simplest method of multi-label document classification
I tried to classify the voices of voice actors
I tried running the sample code of the Ansible module
I tried to summarize the string operations of Python
I tried scraping with Python
Scraping the result of "Schedule-kun"
I tried the changefinder library!
I tried scraping with python
I tried the site "Deploy Azure virtual machine from Go"
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to get the location information of Odakyu Bus
I tried the accuracy of three Stirling's approximations in python
I tried to find the average of the sequence with TensorFlow
I tried refactoring the CNN model of TensorFlow using TF-Slim
I tried face recognition of the laughter problem using Keras.
I tried morphological analysis of the general review of Kusoge of the Year
[Python] I tried to visualize the follow relationship of Twitter
Automatic scraping of reCAPTCHA site every day (6/7: containerization)
Automatic scraping of reCAPTCHA site every day (5/7: 2captcha)
Automatic scraping of reCAPTCHA site every day (4/7: S3 file processing)
Automatic scraping of reCAPTCHA site every day (1/7: python environment construction)
Automatic scraping of reCAPTCHA site every day (3/7: xls file processing)
The definitive edition of python scraping! (Target site: BicCamera)
I tried scraping the advertisement of the pirated cartoon site
Basics of Python scraping basics
I tried a little bit of the behavior of the zip function
[Machine learning] I tried to summarize the theory of Adaboost
[Python] I tried collecting data using the API of wikipedia
I tried to fight the Local Minimum of Goldstein-Price Function
I displayed the chat of YouTube Live and tried playing
[Linux] I tried to summarize the command of resource confirmation system
I tried to make a site that makes it easy to see the update information of Azure
I tried the TensorFlow tutorial 1st
I tried the Naro novel API 2
I tried to automate the watering of the planter with Raspberry Pi
I tried web scraping with python.
I tried to build the SD boot image of LicheePi Nano
I tried using GrabCut of OpenCV
I found out by analyzing the reviews of the job change site! ??
I tried the TensorFlow tutorial 2nd
I looked at the meta information of BigQuery & tried using it
I tried to expand the size of the logical volume with LVM
I tried the Naruro novel API
I tried running the DNN part of OpenPose with Chainer CPU
I tried to summarize the frequently used implementation method of pytest-mock
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
I tried to move the ball
I tried using the checkio API
I tried to estimate the interval.
I tried to transform the face image using sparse_image_warp of TensorFlow Addons