I tried to automatically send the literature of the new coronavirus to LINE with Python

Overview

In this article, we will deal with the method of extracting the literature related to the new coronavirus newly registered the day before from the medical literature database and automatically sending it to LINE. The main content is to extract documents that match a certain keyword from a database called PubMed.

When you have a new paper, you will be notified like this. 完成イメージ(論文あり)

Without it, it looks like this. 完成イメージ(論文なし)

environment

Python 3.6.5
beautifulsoup4==4.9.0
requests==2.23.0
urllib3==1.25.9

Database and keyword selection

This time, we will use PubMed as a medical literature database. PubMed is a database created by NCBI (National Center for Biotechnology Information) in NLM (National Library of Medicine). You can search for documents published in major medical journals around the world.

Next, as a keyword, when I looked into the new coronavirus, the words "coronavirus" and "Covid-19" were often used. Therefore, this time I decided to extract the literature that contains either the word "coronavirus" or "Covod-19".

PubMed API

I used PubMed's API as a way to extract documents from PubMed. There are multiple APIs available in PubMed, but I used ESearch and EFetch. For more information, please refer to Documentation.

ESearch_Overview

ESearch allows you to get a list of article IDs that match your search formula. Based on this URL

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=

If you put a search expression after "term =" in, the ID that matches the search expression will be returned.

For example, try "coronavirus".

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=coronavirus

If you enter the above URL in your browser, you will see a result like this. ESearch retmax=20

I have successfully obtained the ID of the paper. Count means the number of documents that match the search formula, and retmax means the number of documents that match and display. The initial value of retmax is 20, but you can get up to 100,000.

For example, to change retmax to 100, you need to add "retmax = 100" to the URL.

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=coronavirus&retmax=100

If you enter the above URL in your browser, ESearch retmax=100 It looks like. The number of documents displayed has increased to 100.

You can add some conditions to extract the literature, such as "retmax". This time, in addition to "retmax", we will use "field", "mindate", and "maxdate".

In "field", you can select the search location from "title" or "abstract". With "mindate" and "maxdate", you can decide from when to when the document is targeted by the date when the document was registered in PubMed. For example, if you want to search the literature from April 2019 to April 2020 by title only,

&field=title&mindate=2019/4/1&maxdate=2020/4/31

Add.

ESearch_code

First, create a URL to find out the ID of the article that corresponds to the search formula. This time, we are using a search formula that connects "coronavirus" and "covid-19" with an OR.

URL creation for Esearch


import time
def make_url(yesterday,query):
    """
Create url to do Esearch
Arguments: date, search expression
Return value: str type url
    """
    #Esearch basic URL
    baseURL="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
    
    #Limit search scope to titles or abstracts
    field="field=title/abstract"

    #Changed the maximum number of IDs that can be obtained to 1000
    retmax="retmax=1000"

    #Only for yesterday's literature
    mindate="mindate={}".format(yesterday)
    maxdate="maxdate={}".format(yesterday)

    #Combine each string
    url="&".join([baseURL+query,field,retmax,mindate,maxdate])
    
    time.sleep(5)
    return url

Once you have created the URL, use it to get a list of IDs. We use Beautiful Soup to make it easier to get an ID.

Obtain the article ID from Esearch


from bs4 import BeautifulSoup
from urllib.parse import urljoin
import urllib.request

def get_id(url): 
    """
Obtaining a dissertation ID
Arguments: research url
Return value: List of ids
    """
    #Get a list of IDs with ESearch
    article_id_list=urllib.request.urlopen(url)
    
    #Get ID only
    bs=BeautifulSoup(article_id_list,"html.parser")
    ids=bs.find_all("id")
    
    return ids

EFetch_ Overview

Use EFetch to obtain information such as titles and abstracts from the article ID. This URL is the basis.

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=

You can get the information of the paper by entering the id of the paper in "id =".

EFetch_ code

The dissertation information is obtained from each ID obtained by ESearch.

Get the title and URL of the paper


from bs4 import BeautifulSoup
import urllib.request
import time

def get_summary(id):
    """
Get a summary of your dissertation
Arguments: id
Return value: Title, article url
    """ 
    #EFetch basic URL
    serchURL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="
    
    search_url=serchURL+id.text+"&retmode=xml"
    summary=urllib.request.urlopen(search_url)
    summary_bs=BeautifulSoup(summary,"html.parser")
    
    #Document URL is created from the article ID
    article_URL="https://pubmed.ncbi.nlm.nih.gov/{}/".format(id.text)
    
    #Extract the title of the document
    title=summary_bs.find("articletitle")
    title=title.text
    
    time.sleep(5)
    return title,article_URL

LINE Notify_ code

After that, the information of the acquired paper is output. You can send a message from python to LINE by using LINE Notify. I referred to this article.

Send to LINE


def output_line(line_access_token,message):
    """
Send notifications to LINE
Arguments: access token, notification content
Return value: None
    """ 
    line_url = "https://notify-api.line.me/api/notify"
    line_headers = {'Authorization': 'Bearer ' + line_access_token}
    payload = {'message': message}
    r=requests.post(line_url, headers=line_headers, params=payload,)

Whole code

python


from bs4 import BeautifulSoup
from urllib.parse import urljoin
from datetime import date,timedelta
import urllib.request
import requests
import time

def main():
    """
Main processing
    """
    #LINE access token
    line_access_token = 'LINE access token'

    #Get date
    yesterday=date.today()-timedelta(days=1)
    yesterday="/".join([str(yesterday.year),str(yesterday.month),str(yesterday.day)])
    
    #Search formula
    query="coronavirus+OR+covid-19"

    #Get Esearch link
    URL=make_url(yesterday,query)

    #Get the dissertation id
    ids=get_id(URL)

    #When there is no new paper
    if ids == []:
        message="Covid-There are no 19 new papers"
        output_line(line_access_token,message)
  
    #When there is a new paper
    else:
        for id in ids:
            #Get the title and URL of the paper
            title,article_URL=get_summary(id)

            #Send a notification to LINE
            message="""{}
            {}""".format(title,article_URL)
            output_line(line_access_token,message)

def make_url(yesterday,query):
    """
Create url to do Esearch
Arguments: date, search expression
Return value: str type url
    """
    #Esearch basic URL
    baseURL="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
    
    #Limit search scope to titles or abstracts
    field="field=title/abstract"

    #Changed the maximum number of IDs that can be obtained to 1000
    retmax="retmax=1000"

    #Only for yesterday's literature
    mindate="mindate={}".format(yesterday)
    maxdate="maxdate={}".format(yesterday)

    #Combine each string
    url="&".join([baseURL+query,field,retmax,mindate,maxdate])
    
    time.sleep(5)
    return url
    
def get_id(url): 
    """
Obtaining a dissertation ID
Arguments: research url
Return value: List of ids
    """
    #Get a list of IDs with ESearch
    article_id_list=urllib.request.urlopen(url)
    
    #Get ID only
    bs=BeautifulSoup(article_id_list,"html.parser")
    ids=bs.find_all("id")
    
    return ids

def get_summary(id):
    """
Get a summary of your dissertation
Arguments: id
Return value: Title, article url
    """ 
    #EFetch basic URL
    serchURL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id="
    
    search_url=serchURL+id.text+"&retmode=xml"
    summary=urllib.request.urlopen(search_url)
    summary_bs=BeautifulSoup(summary,"html.parser")
    
    #Document URL is created from the article ID
    article_URL="https://pubmed.ncbi.nlm.nih.gov/{}/".format(id.text)
    
    #Extract the title of the document
    title=summary_bs.find("articletitle")
    title=title.text
    
    time.sleep(5)
    return title,article_URL
        
def output_line(line_access_token,message):
    """
Send notifications to LINE
Arguments: access token, notification content
Return value: None
    """ 
    line_url = "https://notify-api.line.me/api/notify"
    line_headers = {'Authorization': 'Bearer ' + line_access_token}
    payload = {'message': message}
    r=requests.post(line_url, headers=line_headers, params=payload,)

if __name__ == "__main__":
    main()

After that, by running this with cron, you can automatically send the title and URL of the document to LINE every day.

Recommended Posts

I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to streamline the standard role of new employees with Python
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to find the entropy of the image with python
I tried to put out the frequent word ranking of LINE talk with Python
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to improve the efficiency of daily work with Python
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried to get the authentication code of Qiita API with Python.
I tried to automatically extract the movements of PES players with software
I tried to get the movie information of TMDb API with Python
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
Since the stock market crashed due to the influence of the new coronavirus, I tried to visualize the performance of my investment trust with Python.
I tried to automatically generate a password with Python3
I tried to solve the problem with Python Vol.1
I tried to summarize the string operations of Python
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to find the average of the sequence with TensorFlow
I tried to notify the train delay information with LINE Notify
[Python] I tried to visualize the follow relationship of Twitter
I tried to divide the file into folders with Python
[Python] I tried to automatically create a daily report of YWT with Outlook mail
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried to make the weather forecast on the official line by referring to the weather forecast bot of "Dialogue system made with python".
I tried to solve the ant book beginner's edition with python
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I want to know the weather with LINE bot feat.Heroku + Python
I tried to automate the watering of the planter with Raspberry Pi
How to write offline real time I tried to solve the problem of F02 with Python
I want to output the beginning of the next month with Python
I tried to create a list of prime numbers with python
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I tried to fix "I tried stochastic simulation of bingo game with Python"
I tried to expand the size of the logical volume with LVM
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
PhytoMine-I tried to get the genetic information of plants with Python
I tried to make it possible to automatically send an email just by double-clicking the [Python] icon
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to summarize the contents of each package saved by Python pip in one line
I tried to solve the first question of the University of Tokyo 2019 math entrance exam with python sympy
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
[Outlook] I tried to automatically create a daily report email with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the RSS of the top song of the iTunes store automatically
I tried to analyze the New Year's card by myself using python
I tried "smoothing" the image with Python + OpenCV
I tried hundreds of millions of SQLite with python