I found a service called Quandl that I found while researching how to get stock prices, so I tried using it.
Quandl
Quandl is a search engine for numerical data such as finance and economy, which can search data from various sources and display graphs and tables. The data can also be downloaded in JSON, CSV, and other formats, or imported into services such as Plotly.
There are some paid data, but most of the data can be obtained free of charge.
Libraries in various languages are also available (https://www.quandl.com/help/libraries), and you can use the API to retrieve and retrieve data.
There is a page that not only provides each data individually, but also summarizes the data. If you click Data> Data Browser from the upper left of the screen, you can search by country or data type. For example, there are the following pages.
Quandl libraries can be installed using pip. (Requires Numpy and pandas)
$ pip install Quandl
If you import the Quandl module and pass the Quandl Code to the get function, you can get the data as a pandas DataFrame. Quandl Code is displayed in the upper right corner of the screen when you search the data.
>>> import Quandl
>>> df = Quandl.get('GOOG/NASDAQ_GOOG')
>>> df[:5]
Open High Low Close Volume
Date
2004-08-19 49.96 51.98 47.93 50.12 NaN
2004-08-20 50.69 54.49 50.20 54.10 NaN
2004-08-23 55.32 56.68 54.47 54.65 NaN
2004-08-24 55.56 55.74 51.73 52.38 NaN
2004-08-25 52.43 53.95 51.89 52.95 NaN
[5 rows x 5 columns]
If you don't have a Quandl account, you're limited to using the API 50 times a day. You can use the API unlimitedly by registering an account with Quandl and passing the token displayed on the account page together.
>>> df = Quandl.get('GOOG/NASDAQ_GOOG', authtoken='YOUR_AUTH_TOKEN')
Multiple data can be acquired at the same time by passing a list of Quandl Code as an argument as shown in the example below. This is convenient when you want to compare data.
>>> df = Quandl.get(["WORLDBANK/JPN_IT_NET_USER_P2",
... "WORLDBANK/KOR_IT_NET_USER_P2",
... "WORLDBANK/CHN_IT_NET_USER_P2"])
>>> df.columns = ["Japan", "Korea", "China"]
>>> df[:5]
Japan Korea China
Date
1990-12-31 0.020294 0.023265 0.000000
1991-12-31 0.040438 0.046124 NaN
1992-12-31 0.096678 0.098404 NaN
1993-12-31 0.401278 0.249947 0.000169
1994-12-31 0.799684 0.311359 0.001168
You can search the data using the search function. verbose is True by default, and if True, the top 4 data will be output to standard output.
>>> dataset = Quandl.search("Internet User", source="WORLDBANK", verbose=False)
If the number of results is large, the pages are divided, so it is necessary to specify the number of pages and execute repeatedly. Note that if there are too many pages, you may get caught in throttling.
>>> page = 0
>>> dataset = []
>>> import itertools as it
>>> for page in it.count():
... d = Quandl.search("Internet User", page=page, source="WORLDBANK", verbose=False)
... if not d:
... break
... dataset.extend(d)
It's basically the same as searching with a browser, but if you want to get the Quandl Code all at once, it may be easier to use the API.
See the documentation for more details on other APIs.
See the API documentation for options that are not supported by the Python library. You can also use it by opening the URL directly using urllib2 etc.
Recommended Posts