Hello. It's been hot these days. When it gets hot, I'm tired of patrolling pages and extracting elements. Let's get rid of this quickly with python's lassie.
lassie
https://github.com/michaelhelmick/lassie
Web Content Retrieval for Humans™
That's right
installation
pip install lassie
You can go from pip
usage
>> import lassie
>> fetched = lassie.fetch('http://www.youtube.com/watch?v=tkjbEqnp_0U')
Only this.
Contents is like this.
>> from pprint import pprint
>> pprint(fetched)
{'description': u'Recording of the July 12, 2013 webcast, "What\'s New in Riak 1.4"',
'images': [{'src': u'http://b.vimeocdn.com/ts/ 445/011/445011693_640.jpg',
'type': u'og:image'}],
'title': u"What's New in Riak 1.4",
'url': u'http://vimeo.com/71448923',
'videos': [{'height': 400,
'src': u'http://vimeo.com/moogaloop.swf?clip_id=71448923',
'type': u'application/x-shockwave-flash',
'width': 640},
{'height': 400,
'src': u'https://player.vimeo.com/video/71448923',
'width': 640}]}
The title, description, keywords (though not shown here), images, videos, etc. will be extracted and returned. I hate that thumbnail candidates are also returned with a type.
It seems that you can spend the hot summer calmly.
dependencies setup.py
install_requires=[
'requests==1.2.3',
'beautifulsoup4==4.2.1',
'html5lib==1.0b3'
],
It's familiar.
lassie [Noun] [Countable noun] << Scottish dialect >> Daughter, girl; young lady (⇔ laddie). [LASS+‐IE]
Meaning of lassie-English-Japanese dictionary Weblio dictionary
I see
Um, maybe
"Lassie" is a nickname for "the handmaiden / girl" in English, and of course Lassie is a female collie.
[Lassie-Wikipedia](http://ja.wikipedia.org/wiki/%E5%90%8D%E7%8A%AC%E3%83%A9%E3%83%83%E3%82%B7% E3% 83% BC)
I did not know. It will be studying.
Recommended Posts