Overview

I am studying with reference to O'Reilly Japan's "Data Visualization Beginning with Python and JavaScript".

Retrieving web data using the requests library

"Requests" in Python is a library that makes it easy to handle HTTP exchanges in Python.

Advance preparation

Install requests

pip install requests

In version 2.7.9 or earlier, an SSL warning may occur. In that case, update to a new SSL library to solve the problem.

pip install --upgrade ndg-httpsclient

Example of using request library

Download Wikipedia page (get HTML page and inline JavaScript)

>>> import requests
>>> response = requests.get("https://ja.wikipedia.org/wiki/Python");
>>> 
>>> #Get a list of attributes of the responsep object
>>> dir(response)
['__attrs__', '__bool__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__nonzero__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']
>>>
>>> #Get HTTP status code from response object
>>> response.status_code
200
>>>
>>> #You can get the HTML page and inline JavaScript by getting the text property of the response object
>>> response.text
'<!DOCTYPE html>\n<html class="client-nojs" lang="ja" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title>Python - Wikipedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>(window.RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Python","wgTitle":"Python","wgCurRevisionId":65321720,"wgRevisionId":65321720,"wgArticleId":993,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Programming language","Object-oriented language","Scripting language","Open Source","Python"],"wgBreakFrames
...

JSON format data acquisition

>>> import requests
>>> response = requests.get("https://www.oreilly.co.jp/books/9784873118086/biblio.json");
>>> 
>>> #Get JSON data
>>> data = response.json()
>>> data
{'title': 'Data visualization starting with Python and JavaScript', 'picture_large': 'http://www.oreilly.co.jp/books/images/picture_large978-4-87311-808-6.jpeg', 'picture': 'http://www.oreilly.co.jp/books/images/picture978-4-87311-808-6.gif', 'picture_small': 'http://www.oreilly.co.jp/books/images/picture_small978-4-87311-808-6.gif', 'authors': ['Kyran Dale\by u3000', 'Takeshi Shimada\translated by u3000', 'Tetsuya Kinoshita\u3000 translation'], 'released': '2017-08-25', 'pages': 500, 'price': 4104, 'ebook_price': 3283, 'original': 'Data Visulalization with Python and JavaScript', 'original_url': 'http://shop.oreilly.com/product/0636920037057.do', 'isbn': '978-4-87311-808-6'}
>>> 
>>> #Get key value
>>> data.keys()
dict_keys(['title', 'picture_large', 'picture', 'picture_small', 'authors', 'released', 'pages', 'price', 'ebook_price', 'original', 'original_url', 'isbn'])
>>> 
>>> #Get title
>>> data["title"]
'Data visualization starting with Python and JavaScript'

reference

Data visualization starting with Python and JavaScript https://www.oreilly.co.jp/books/9784873118086/

Requests: HTTP for humans http://requests-docs-ja.readthedocs.io/en/latest/user/quickstart/ 　　 Next time, I will study how to use data from Web API.