Jupyter Tips 3 Continuation of Jupyter's Tips 2
As a preparation, first do the following:
jupyter_notebook
import random, re, urllib, IPython.core.getipython
from itertools import takewhile
with urllib.request.urlopen('https://ja.wikipedia.org/wiki/'
'%E4%B8%96%E7%95%8C%E4%B8%80%E3%81%AE%E4%B8%80%E8%A6%A7') as fp:
s0 = fp.read().decode('utf8')
s = re.sub('<img.*?/>', '', s0.replace('\u3000', ''))
le = 0
while le != len(s):
le = len(s)
s = re.sub(r'<(a|b|i|span|strong|sup)[^>]*>(.*?)</\1>', r'\2', s)
s = re.sub(r'\[[0-9]*]', '', s)
ll = re.findall('<li>(.*?)</li>', s[10000:])
ll = [i for i in takewhile(lambda s: not s.startswith('Guinness World Records'), ll)
if 'Most' in i or 'World's best' in i or len(i) > 15]
def G_impl(s, lst=ll):
print(random.choice(lst))
ip = IPython.core.getipython.get_ipython()
if ip:
ip.register_magic_function(G_impl, magic_name='G')
jupyter_notebook
G
>>>
Language with the fewest syllables in the world (one of them) → Hawaiian-35 syllables.
jupyter_notebook
G
>>>
Language with the most irregular verbs in the world → French-570 pieces.
that's all
Recommended Posts