Synopsis

The English study app created in the article below requires an audio file of English words. https://qiita.com/Fuminori_Souma/private/0706716fdebf08572c6c

Downloading the audio file manually is time consuming and laborious, so I decided to download it automatically by web scraping.

Thank you for downloading the audio file from weblio.

The speed is set to slow (probably less than manual) so as not to bother weblio.

source file

`get_sound_file.py`


import sys
import tkinter
import time
import re
import urllib.request
from tkinter import messagebox
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

class Frame(tkinter.Frame):

    def __init__(self, master=None):
        tkinter.Frame.__init__(self, master)
        self.master.title('Get the audio file')
        self.master.geometry("400x300")

        #Label settings
        text_1 = tkinter.Label(self, text=u'Enter the word for which you want to get an audio file in the text box below.')
        text_1.pack(pady='7')
        text_2 = tkinter.Label(self, text=u'* When entering multiple words, ",Please separate with.')
        text_2.pack()

        #Text (multiple vers of entries.)settings of
        self.ent_words = tkinter.Text(self, height=15)
        self.ent_words.pack(padx='30')

        #Push button settings
        bttn_start = tkinter.Button(self, text = u'start', command=self.start_get_file)
        bttn_start.bind("<Button-1>") #（Button-2 for wheel click, 3 for right click)
        bttn_start.pack(pady='7')

    def checkAlnum(self, word):  #Check if the entered word contains unnecessary symbols, etc.
        alnum = re.compile(r'^[a-zA-Z]+$')  #Compile regular expressions
        result = alnum.match(word) is not None  #SRE if match meets the conditions_Match object, otherwise None(False)return it
        return result

    def delete_symbols(self, word):  #Delete symbols etc. included in the character string
        # return word.replace(',', '').replace('.', '').replace('-', '').replace(' ', '')
        return word.replace(',', '').replace(' ', '')

    def get_mp3(self, word, driver):  #Open weblio page and get mp3 file

        dir = 'C:/Users/fumin/OneDrive'  #Audio file download destination

        #Enter a word in the text box for word search and press the search button
        driver.find_element_by_xpath("//*[@id=\"searchWord\"]").clear()  #Initialize text box
        driver.find_element_by_xpath("//*[@id=\"searchWord\"]").send_keys(word)
        driver.find_element_by_xpath("//*[@id=\"headFixBxTR\"]/input").click()
        time.sleep(5)

        #Audio file exists (=If "player play" exists)
        if not driver.find_elements_by_xpath("//*[@id=\"audioDownloadPlayUrl\"]/i") == []:

            #Press "Play Player" to open the mp3 file in a new window
            driver.find_element_by_xpath("//*[@id=\"audioDownloadPlayUrl\"]/i").click()
            time.sleep(5)

            #Change the target window to a newly opened mp3 file
            handles = driver.window_handles
            driver.switch_to.window(handles[1])

            #Download mp3 file
            urllib.request.urlretrieve(driver.current_url, (dir + '/' + word + '.mp3'))
            driver.close()

            #Return the target window to the original window
            driver.switch_to.window(handles[0])

            return 'OK'

        else:  #Audio file does not exist (=If "player playback" does not exist)

            return 'NG'


    def start_get_file(self):

        reslist = {}  #Whether the audio file of the word exists (initialized with an empty dictionary type)

        words = self.ent_words.get('1.0', 'end')  #Get the word list entered in the text box

        if self.checkAlnum(self.delete_symbols(words)):  #Entered correctly (alphabetic characters and "",If nothing other than "is entered)

            ww = [x.strip() for x in words.split(',')]  #Store the input word list as a list type separated by commas

            #Open browser
            drv = webdriver.Chrome("C:/Users/fumin/pybraries/chromedriver_ver79/chromedriver")
            time.sleep(10)

            #Open the page (weblio) to operate
            drv.get("https://ejje.weblio.jp/")
            time.sleep(10)

            j = 0  #NG word(Words for which mp3 files do not exist)Number of

            for i in range(len(ww)):  #Get mp3 file
                reslist[ww[i]] = self.get_mp3(ww[i], drv)

                if reslist[ww[i]] == 'NG':  #Add words that don't have mp3 files to the NG list

                    j += 1  #Add the number of NG words

                    if j <= 1:  #The first NG word is stored as a character string type
                        nglist = ww[i]

                    elif j == 2:  #The second NG word is converted to a list type by connecting it with the first one separated by commas.
                        nglist = (nglist + ',' + ww[i]).split(',')

                    else:  #The third and subsequent ones are added to the list type sequentially
                        nglist.append(ww[i])

            drv.close()  #Close the browser when the word acquisition process is complete

            if 'nglist' in locals():  #If there are words for which the audio file did not exist

                if j == 1:  #When there is only one NG word
                    messagebox.showinfo('', 'I downloaded the audio files of all words except the following.\n\n' + nglist)
                else:  #When there are two or more NG words
                    messagebox.showinfo('', 'I downloaded the audio files of all words except the following.\n\n' + ', '.join(nglist))
            else:
                messagebox.showinfo('', 'I downloaded the audio file of all the entered words.')

        else:   #Not entered correctly (alphabetic characters and "",If something other than "is entered)
            messagebox.showinfo('', 'Alphabet and ",Is entered. Please try again after deleting it.')


if __name__ == '__main__':

    #Frame settings
    root = Frame()
    root.pack()
    root.mainloop()

Remarks

It's not good to put a burden on weblio's site, so I slowed it down considerably. .. for that reason, The download speed is not much different from manual. (I think it is meaningful to automate, not speed)

Task

When I open an mp3 file, the audio file is played every time. .. for that reason, Adjusted the sound of the mp3 file only when it is played so that the sound is not played. .. But of the mp3 file I couldn't adjust the volume bar. I thought about setting the volume of the PC itself to 0 for a moment, but while listening to music If you downloaded it, the music will be cut off too! I thought, and gave up without stopping.
For how to download mp3 files, first right click-> Save Audio As I was thinking of selecting, but the context menu that came out by right-clicking is in Selenium It seems inaccessible. .. So I used urllib to download the mp3 file. I'm glad I was able to download the mp3 file as a result, but when I need to right-click in the future I'm in trouble. ..

Other information that was used as a reference

Thank you for all the help you have given me. Thank you very much.

Contents	Link destination
How to download the file	https://stackoverflow.com/questions/48736437/how-to-download-this-video-using-selenium
Confirmation of element existence	https://ja.stackoverflow.com/questions/30895/xpath%E3%81%A7%E8%A6%81%E7%B4%A0%E3%81%AE%E5%AD%98%E5%9C%A8%E3%82%92%E7%A2%BA%E8%AA%8D%E3%81%99%E3%82%8B%E6%96%B9%E6%B3%95
About right-clicking on Selenium	https://stackoverflow.com/questions/20316864/how-to-perform-right-click-using-selenium-chromedriver

Finally

I'm wrong here! No here! You should do this here! If you have any questions, If you can point it out, I will be happy to shed tears.

[Python] I created an app that automatically downloads the audio file of each word used for the English study app.