Nice to meet you. I decided to write an article for the first time this time. Currently a second year master's student in information technology, I plan to join an IT company in April of this year. Until now, we haven't made external calls, but we will continue to make them little by little, so thank you!
The first memorable post is a super introduction to "scraping". I have always been interested in natural language processing, and I was wondering if I could deal with "law" as its target. Then I found a certain research report and maybe I could do it too! For the time being, let's pull the sentence from the Court Page! I thought.
・ Python 3.7.7 ・ Windows 10 Pro ・ PyCharm 2019.3.3 (IDE)
Note This time, the content is simply to display the pdf in the page in the browser. We will continue to make improvements to bring the legal field closer to us.
law.py
from selenium import webdriver
driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver')
driver.get('https://www.courts.go.jp/app/hanrei_jp/search1')
search_bar = driver.find_element_by_name("filter[text1]")
search_bar.send_keys("GPS")
search_bar.submit()
#Extract elements with Xpath and tr with format[]Change the value of
#Click on it to view the page
for i in range(1,11):
x_path = "//*[@id='main-contents']/div[2]/div/div[3]/div[5]/table/tbody/tr[{0}]/td[2]/a".format(i)
driver.find_element_by_xpath(x_path).click()
** ➀ Access the court case search page. ** ** ** ② Extract the html tag in the search window and search with the keyword "GPS". ** **
I used selenium for the first time this time, but it took about an hour with just this content. (Is it inefficient?) Especially in the last part, the html structure is too complicated and I was worried about how to extract the tags of the PDF file.
From now on, ➀ Change pages to display all PDFs ② Download PDF ➂ The user can freely set the search word If such functions are possible, it seems that natural language processing can be applied, so I would like to continue taking on the challenge.
Thank you for reading my first post!
・ Https://stackoverrun.com/ja/q/11884507 ・ Https://ai-inter1.com/python-selenium/ ・ Https://www.seleniumqref.com/api/python/element_get/Python_find_element_by_xpath.html
Recommended Posts