The English study app created in the article below requires an audio file of English words. https://qiita.com/Fuminori_Souma/private/0706716fdebf08572c6c
Downloading the audio file manually is time consuming and laborious, so I decided to download it automatically by web scraping.
Thank you for downloading the audio file from weblio.
get_sound_file.py
import sys
import tkinter
import time
import re
import urllib.request
from tkinter import messagebox
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
class Frame(tkinter.Frame):
def __init__(self, master=None):
tkinter.Frame.__init__(self, master)
self.master.title('Get the audio file')
self.master.geometry("400x300")
#Label settings
text_1 = tkinter.Label(self, text=u'Enter the word for which you want to get an audio file in the text box below.')
text_1.pack(pady='7')
text_2 = tkinter.Label(self, text=u'* When entering multiple words, ",Please separate with.')
text_2.pack()
#Text (multiple vers of entries.)settings of
self.ent_words = tkinter.Text(self, height=15)
self.ent_words.pack(padx='30')
#Push button settings
bttn_start = tkinter.Button(self, text = u'start', command=self.start_get_file)
bttn_start.bind("<Button-1>") #(Button-2 for wheel click, 3 for right click)
bttn_start.pack(pady='7')
def checkAlnum(self, word): #Check if the entered word contains unnecessary symbols, etc.
alnum = re.compile(r'^[a-zA-Z]+$') #Compile regular expressions
result = alnum.match(word) is not None #SRE if match meets the conditions_Match object, otherwise None(False)return it
return result
def delete_symbols(self, word): #Delete symbols etc. included in the character string
# return word.replace(',', '').replace('.', '').replace('-', '').replace(' ', '')
return word.replace(',', '').replace(' ', '')
def get_mp3(self, word, driver): #Open weblio page and get mp3 file
dir = 'C:/Users/fumin/OneDrive' #Audio file download destination
#Enter a word in the text box for word search and press the search button
driver.find_element_by_xpath("//*[@id=\"searchWord\"]").clear() #Initialize text box
driver.find_element_by_xpath("//*[@id=\"searchWord\"]").send_keys(word)
driver.find_element_by_xpath("//*[@id=\"headFixBxTR\"]/input").click()
time.sleep(5)
#Audio file exists (=If "player play" exists)
if not driver.find_elements_by_xpath("//*[@id=\"audioDownloadPlayUrl\"]/i") == []:
#Press "Play Player" to open the mp3 file in a new window
driver.find_element_by_xpath("//*[@id=\"audioDownloadPlayUrl\"]/i").click()
time.sleep(5)
#Change the target window to a newly opened mp3 file
handles = driver.window_handles
driver.switch_to.window(handles[1])
#Download mp3 file
urllib.request.urlretrieve(driver.current_url, (dir + '/' + word + '.mp3'))
driver.close()
#Return the target window to the original window
driver.switch_to.window(handles[0])
return 'OK'
else: #Audio file does not exist (=If "player playback" does not exist)
return 'NG'
def start_get_file(self):
reslist = {} #Whether the audio file of the word exists (initialized with an empty dictionary type)
words = self.ent_words.get('1.0', 'end') #Get the word list entered in the text box
if self.checkAlnum(self.delete_symbols(words)): #Entered correctly (alphabetic characters and "",If nothing other than "is entered)
ww = [x.strip() for x in words.split(',')] #Store the input word list as a list type separated by commas
#Open browser
drv = webdriver.Chrome("C:/Users/fumin/pybraries/chromedriver_ver79/chromedriver")
time.sleep(10)
#Open the page (weblio) to operate
drv.get("https://ejje.weblio.jp/")
time.sleep(10)
j = 0 #NG word(Words for which mp3 files do not exist)Number of
for i in range(len(ww)): #Get mp3 file
reslist[ww[i]] = self.get_mp3(ww[i], drv)
if reslist[ww[i]] == 'NG': #Add words that don't have mp3 files to the NG list
j += 1 #Add the number of NG words
if j <= 1: #The first NG word is stored as a character string type
nglist = ww[i]
elif j == 2: #The second NG word is converted to a list type by connecting it with the first one separated by commas.
nglist = (nglist + ',' + ww[i]).split(',')
else: #The third and subsequent ones are added to the list type sequentially
nglist.append(ww[i])
drv.close() #Close the browser when the word acquisition process is complete
if 'nglist' in locals(): #If there are words for which the audio file did not exist
if j == 1: #When there is only one NG word
messagebox.showinfo('', 'I downloaded the audio files of all words except the following.\n\n' + nglist)
else: #When there are two or more NG words
messagebox.showinfo('', 'I downloaded the audio files of all words except the following.\n\n' + ', '.join(nglist))
else:
messagebox.showinfo('', 'I downloaded the audio file of all the entered words.')
else: #Not entered correctly (alphabetic characters and "",If something other than "is entered)
messagebox.showinfo('', 'Alphabet and ",Is entered. Please try again after deleting it.')
if __name__ == '__main__':
#Frame settings
root = Frame()
root.pack()
root.mainloop()
It's not good to put a burden on weblio's site, so I slowed it down considerably. .. for that reason, The download speed is not much different from manual. (I think it is meaningful to automate, not speed)
When I open an mp3 file, the audio file is played every time. .. for that reason, Adjusted the sound of the mp3 file only when it is played so that the sound is not played. .. But of the mp3 file I couldn't adjust the volume bar. I thought about setting the volume of the PC itself to 0 for a moment, but while listening to music If you downloaded it, the music will be cut off too! I thought, and gave up without stopping.
For how to download mp3 files, first right click-> Save Audio As I was thinking of selecting, but the context menu that came out by right-clicking is in Selenium It seems inaccessible. .. So I used urllib to download the mp3 file. I'm glad I was able to download the mp3 file as a result, but when I need to right-click in the future I'm in trouble. ..
Thank you for all the help you have given me. Thank you very much.
Contents | Link destination |
---|---|
How to download the file | https://stackoverflow.com/questions/48736437/how-to-download-this-video-using-selenium |
Confirmation of element existence | https://ja.stackoverflow.com/questions/30895/xpath%E3%81%A7%E8%A6%81%E7%B4%A0%E3%81%AE%E5%AD%98%E5%9C%A8%E3%82%92%E7%A2%BA%E8%AA%8D%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95 |
About right-clicking on Selenium | https://stackoverflow.com/questions/20316864/how-to-perform-right-click-using-selenium-chromedriver |
I'm wrong here! No here! You should do this here! If you have any questions, If you can point it out, I will be happy to shed tears.
Recommended Posts