[Python] I tried collecting data using the API of wikipedia

Introduction

I didn't need it because there is a lot of wikipedia data locally, but when I wanted a little data, I came across the wikipedia API, so it is a record at that time.

environment

Operable OS (works on both windows and mac) ┗mac OS Catalina 10.15.7 ┗Widows 10 Python 3.8.3

Installation

Only this. pip install wikipedia

Collect the summary part of wikipedia

When you enter a search word, it will search for articles related to that word without permission. ** python3 wikipedia_data.py search word ** You can do it with. The execution result, that is, the article data of wikipdia is saved in wikipedia.txt.

If you have a problem with your search word, ** wikipedia.exceptions.DisambiguationError: "search word" may refer to: ** After the sentence, it will suggest candidates, so searching again with that word will work.

In rare cases, a long error may occur, but due to the nature of the API, there is probably an error in communication due to some influence. So, if you get an error other than the above, ignore it and try again to succeed.

wikipedia_data.py


import sys
import wikipedia

#Set language to Japanese
wikipedia.set_lang("jp")
#Open text file
f = open('wikipedia.txt', 'a')

args = sys.argv
word = args[1]
#Search using search words
words = wikipedia.search(word)

if not words:
    print("No match")
else:
    #Get a summary if the search word hits
    line = str(wikipedia.summary(words[0]))
    f.write(line.rstrip())
    print("success!")

f.write("\n" + "endline" + "\n")
f.close()

How to use the wikipedia API

Official English tutorial ↓ https://wikipedia.readthedocs.io/en/latest/code.html

It doesn't taste good on its own, so I've briefly extracted and summarized what I think I'll use. (I think it's enough to know this, but there are a lot of broken parts, so if you want to master it, please see the tutorial for yourself)

method Overview
wikipedia.search ("search word", results = 10) Returns a list of up to 10 search results for a search word
wikipedia.summary ("search word", sentences = 0) Get the article summary for the search word
wikipedia.page ("search word") Get the entire article for the search word as an object
If you add .content to the generated object, you can get the entire article as text data
# At the end Thank you for your hard work this time as well. You can easily get a large amount of wikipedia data, but if you want only a few dozens of data, this method may be good. If anyone knows how to do it, let me know in the comments. I write articles every time, so I don't know what to write next, but I will write something again. Well then.

Recommended Posts

[Python] I tried collecting data using the API of wikipedia
I tried using the API of the salmon data project
I tried using the checkio API
I tried using the BigQuery Storage API
[Python] I tried to get various information using YouTube Data API!
vprof --I tried using the profiler for Python
I tried to touch the API of ebay
I tried using the Datetime module by Python
I tried using the image filter of OpenCV
I tried to get the authentication code of Qiita API with Python.
I tried to get the movie information of TMDb API with Python
[Python] I tried using OpenPose
I tried clustering ECG data using the K-Shape method
I tried using Python (3) instead of a scientific calculator
I tried hitting the API with echonest's python client
I tried to summarize the string operations of Python
[Python] I tried to judge the member image of the idol group using Keras
I tried to find the entropy of the image with python
I tried "gamma correction" of the image with Python + OpenCV
I tried the accuracy of three Stirling's approximations in python
I tried to search videos using Youtube Data API (beginner)
I tried using the Python library from Ruby with PyCall
Data acquisition using python googlemap api
I tried the Naro novel API 2
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I tried refactoring the CNN model of TensorFlow using TF-Slim
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried face recognition of the laughter problem using Keras.
I tried using Thonny (Python / IDE)
Miscellaneous notes that I tried using python for the matter
I tried the Naruro novel API
[Python] I tried using YOLO v3
[For beginners] I tried using the Tensorflow Object Detection API
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
I tried to get the index of the list using the enumerate function
I tried using the python module Kwant for quantum transport calculation
[Python] I wrote the route of the typhoon on the map using folium
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried using the COTOHA API (there is code on GitHub)
I tried to improve the efficiency of daily work with Python
Try using the Wunderlist API in Python
I tried the asynchronous server of Django 3.0
I tried using Twitter api and Line api
Tweet using the Twitter API in Python
I tried using Bayesian Optimization in Python
I didn't know the basics of Python
Get Youtube data in Python using Youtube Data API
I tried to touch the COTOHA API
Python: I tried the traveling salesman problem
The Python project template I think of.
Creating Google Spreadsheet using Python / Google Data API
Awareness of using Aurora Severless Data API
I tried the Python Tornado Testing Framework
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried using PDF data of online medical care based on the spread of the new coronavirus infection
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried logistic regression analysis for the first time using Titanic data
I tried to get the batting results of Hachinai using image processing