-Extract the link including "mp4" from the HTML file. -Uses Beautiful Soup.
from BeautifulSoup import BeautifulSoup
open_name = raw_input('Open html file: ')
save_name = raw_input('Save file name: ')
f = open(open_name)
html = f.read()
f.close()
f2 = open(save_name, 'w')
soup = BeautifulSoup(html)
for link in soup.findAll("a"):
if "mp4" in link.get("href"): # "mp4"Extract links containing
f2.writelines(link.get('href') + '¥n')
f2.close()
I referred to Stack Overflow linked below. Please point out if there is another good way.
reference: python - how can I get href links from html code - Stack Overflow https://stackoverflow.com/questions/3075550/how-can-i-get-href-links-from-html-code/3075568#3075568
Recommended Posts