Researchers may be concerned about the number of citations of academic papers they have seen / written. You can easily check the number of citations on Google scholar etc. As a precursor to the number of citations, there is also the number of readers in the literature management software Mendeley. At least this doesn't seem to be known without opening Mendeley. So, after practicing scraping, I created a script to get the number of Mendeley readers.
[1] Web scraping with python [2] List of precautions for web scraping
Windows Python 3
The full text is below. The explanation continues below.
a.py
# Modules
import requests
# Constants
Mendeley = 'https://www.mendeley.com/catalogue/'
PaperID = []
PaperID.append("5a856ac7-0d75-3560-8824-9f9061f3eb50/")
# Functions
def SandwitchedText(text_source,text_1, text_2):
return text_source.split(text_1)[1].split(text_2)[0]
for a in PaperID:
r = requests.get(Mendeley + a)
text = r.text
print("Title : " + SandwitchedText(text, "\"title\":\"", "\",\"detail"))
print("readers : " + SandwitchedText(text, "readers:", ":"))
print("citations : " + SandwitchedText(text, "citations:", ":"))
--requests is a package that can be used for scraping [1]. Pay attention to the scraping rules when using [2]. --Give the corresponding URL as the information of the treatise for which you want to obtain data. This article is based on the famous High Temperature Superconductivity Paper. The part of the variable Mendeley in the script is fixed, and the part of the URL that differs for each treatise is given to the list of PaperIDs. --As SandwitchedText, we define a function that returns the part between the given string text_source and text_1 and text_2. --You can get the source of the page corresponding to the URL with requests.get (url) .text. In the script, the source is stored as a string in the text. --Finally, get the title of the treatise, the number of mendeley readers, and the number of citations from the page source and output it to the console. Now, stare at the source of the page to find where you need the information in the source and use the SandwitchedText function to get the information.
If you increase the number of papers in the list, you can get information on several papers at once. I think it's a little smarter if you give it a title instead of the URL of the treatise.
Recommended Posts