How to get a list of links from a page from wikipedia

[Web scraping with Python](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82%8BWeb%E3%82%B9%E3%82%AF % E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0-Ryan-Mitchell / dp / 4873117615). In it, get the link contained in the article from the Wikipedia page. The sample in this book seems to be an English page, so I improved it a little for Japanese Wikipedia.

Execution environment

OS：OX X EI Capitan(10.11.5) Python:3.5.1

#codeing:utf-8

import re
from bs4 import BeautifulSoup
from urllib.request import urlopen
from urllib.parse import unquote

url = "https://ja.wikipedia.org/wiki/%E3%83%86%E3%82%A4%E3%83%AB%E3%82%BA_%E3%82%AA%E3%83%96_%E3%82%A4%E3%83%8E%E3%82%BB%E3%83%B3%E3%82%B9"

html = urlopen(url)
bsObj = BeautifulSoup(html,'html.parser')

pattern = re.compile("^(/wiki/)((?!:).)*$")

for link in bsObj.find('div',{'id':'bodyContent'}).findAll('a',href = pattern):
    if 'href' in link.attrs:
        print (unquote(link.attrs['href']))

Recommended Posts

How to get a list of links from a page from wikipedia

How to get a list of built-in exceptions in python

How to write a list / dictionary type of Python3

How to get a list of files in the same directory with python

[Python] How to make a list of character strings character by character

How to shuffle a part of a Python list (at random.shuffle)

[Command] Command to get a list of files containing double-byte characters

Extract a page from a Wikipedia dump

How to get the last (last) value in a list in Python

How to access wikipedia from python

How to get a quadratic array of squares in a spiral!

How to connect the contents of a list into a string

[Python] How to create a table from list (basic operation of table creation / change of matrix name)

Try to get a list of breaking news threads in Python.

How to get a string from a command line argument in python

[Python] How to get & change rows / columns / values from a table.

Here's a brief summary of how to get started with Django

How to use Visual Recognition to get LINE ID from a girl

I tried to get a list of AMI Names using Boto3

How to get the vertex coordinates of a feature in ArcPy

How to get a list excluding elements whose index is i ...?

How to get a job as an engineer from your 30s

How to remove duplicates from a Python list while preserving order.

How to create a clone from Github

How to get rid of long comprehensions

[Python] How to convert a 2D list to a 1D list

How to get a stacktrace in python

[python] Get a list of instance variables

How to create a repository from media

Summary of how to use Python list

How to test on a Django-authenticated page

[Python] Get a list of folders only

[Introduction to Python] How to sort the contents of a list efficiently with list sort

[Linux] Command to get a list of commands executed in the past

How to get a value from a parameter store in lambda (using python)

How to get a namespaced view name from a URL (path_info) in Django

How to get a sample report from a hash value using VirusTotal's API

How to format a list of dictionaries (or instances) well in Python

How to calculate the volatility of a brand

A simple example of how to use ArgumentParser

How to open a web browser from python

How to clear tuples in a list (Python)

How to create a function object from a string

How to get results from id in Celery

How to generate a Python object from JSON

Get a list of IAM users with Boto3

How to get dictionary type elements of Python 2.7

Get a list of Qiita likes by scraping

How to pass the execution result of a shell command in a list in Python

How to achieve something like a list of void * (or variant) in Go?

[NNabla] How to get the output (variable) of the middle layer of a pre-built network

Python script to get a list of input examples for the AtCoder contest

[Introduction to Python] How to get the index of data with a for statement

How to use list []

[Python] How to remove duplicate values from the list

How to get the number of digits in Python

Convert a slice object to a list of index numbers

A memo of how to use AIST supercomputer ABCI

How to get a logged-in user with Django's forms.py

Python: Get a list of methods for an object

Basics of PyTorch (2) -How to make a neural network-