Easy web scraping with Python and Ruby

Web scraping-> Collecting HTML data of a website to extract and format specific data.

This time, I will introduce one of the methods of Python and Ruby respectively.

Python: BeautifulSoup4

Beautiful Soup is quite useful in Python.

Installation

pip install beautifulsoup4

How to use

import urllib2
from bs4 import BeautifulSoup

html = urllib2.urlopen("http://example.com")
# =>Of course you can also read files.

soup = BeautifulSoup(html)

#Lots of useful methods!
soup.find_all('td')
soup.find("head").find("title")
soup.find_parents()
soup.find_parent()
soup.descendants()

#It seems that you can also rename tags, change attribute values, add and delete them!
tag = soup.a
tag.string = "New link text."
tag
# => <a href="">New link text.</a>

soup = BeautifulSoup("<a>Foo</a>")
soup.a.append("Bar")
# => <a href="">FooBar</a>

I've never used Python, but it was a lot of fun to use.

Ruby: nokogiri

Installation

gem install nokogiri

source 'https://rubygems.org'
gem 'nokogiri'

bundle

How to use

charset = nil
html = open("http://example.com") do |f|
  charset = f.charset 
  f.read 
end

doc = Nokogiri::HTML.parse(html, nil, charset)

doc.title
doc.xpath('//h2 | //h3').each do |link|
  puts link.content
end

html = File.open('data.html', encoding: 'utf-8') { |file| file.read }
doc = Nokogiri::HTML.parse(html, nil) do |d|
  d.xpath('//td').each do |td|
    pp td.content
  end
end

Personally, I liked Ruby after all.

reference

Scraping with Python and Beautiful Soup-Qiita http://qiita.com/itkr/items/513318a9b5b92bd56185 kondou.com --Beautiful Soup 4.2.0 Doc. Japanese translation (2013-11-19 last updated) http://kondou.com/BS4/# Ruby scraping with Nokogiri [Tutorial for beginners] --Sake, 泪, Ruby, Rails http://morizyun.github.io/blog/ruby-nokogiri-scraping-tutorial/

Recommended Posts

Easy web scraping with Python and Ruby

Scraping with Node, Ruby and Python

Practice web scraping with Python and Selenium

Web scraping with python + JupyterLab

Easy web scraping with Scrapy

Web scraping beginner with python

Web scraping with Python ① (Scraping prior knowledge)

Scraping with Python, Selenium and Chromedriver

Easy modeling with Blender and Python

Scraping with Python

Scraping with Python

Web crawling, web scraping, character acquisition and image saving with python

Easy deep learning web app with NNC and Python + Flask

WEB scraping with Python (for personal notes)

Getting Started with Python Web Scraping Practice

[Personal note] Web page scraping with python3

Web scraping with Python ② (Actually scraping stock sites)

Horse Racing Site Web Scraping with Python

Getting Started with Python Web Scraping Practice

Easy web app with Python + Flask + Heroku

[For beginners] Try web scraping with Python

Https access via proxy with Python web scraping was easy with requests

Scraping with Python (preparation)

Try scraping with Python.

Scraping with Python + PhantomJS

Ruby, Python and map

Python and Ruby split

Scraping with Selenium [Python]

Python web scraping selenium

Scraping with Python + PyQuery

Scraping RSS with Python

Scraping tabelog with python and outputting to CSV

I tried web scraping using python and selenium

Launch a web server with Python and Flask

Let's do web scraping with Python (weather forecast)

Let's do web scraping with Python (stock price)

Programming with Python and Tkinter

I tried scraping with Python

Encryption and decryption with Python

Data analysis for improving POG 1 ~ Web scraping with Python ~

Scraping with selenium in Python

Python and hardware-Using RS232C with Python-

Scraping with Selenium + Python Part 1

Python on Ruby and angry Ruby on Python

[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]

Web scraping notes in python3

Easy scraping with Python (JavaScript / Proxy / Cookie compatible version)

Scraping with chromedriver in python

Save images with web scraping

Easy partial download of mp4 with python and youtube-dl!

Python and ruby slice memo

Parse and visualize JSON (Web application ⑤ with Python + Flask)

Scraping with Selenium in Python

Web scraping technology and concerns

Quick web scraping with Python (while supporting JavaScript loading)

Web API with Python + Falcon

Python beginners get stuck with their first web scraping

Ruby and Python syntax ~ branch ~

WEB scraping with python and try to make a word cloud from reviews

Web scraping using Selenium (Python)

Scraping weather forecast with python