In python's cURL library "PycURL" How to receive HTML or json response after login authentication.
The point is how to save a cookie and send a request with the cookie.
This is very helpful. How to send various HTTP methods from installation. thank you for helping me. HTTP requests with PycURL
To log in and save the cookie, do the following: (For simplicity, exception handling, UserAgent, timeout, etc. are completely ignored.)
login.py
import pycurl
import io
curl = pycurl.Curl()
curl.setopt(pycurl.URL,'https://www.xxx.com/login.php' )
curl.setopt(pycurl.POST, 1)
curl.setopt(pycurl.HTTPPOST, [('email', 'qqq@ppp'), ('pass', 'qqqppp')])
curl.setopt(pycurl.COOKIEJAR,'cookie.txt')
#The following two lines are the description to prevent the response from being displayed on the command line. Not the essence.
b = io.BytesIO()
curl.setopt(pycurl.WRITEFUNCTION, b.write)
curl.perform()
The important thing here is "Save the login cookie in'./cookie.txt'" To add the option, write as follows.
curl.setopt(pycurl.COOKIEJAR,'cookie.txt')
Later, we will attach this cookie and send a request to pages that require login authentication.
The request destination and parameter name are
You should refer to the source of the login page or check the communication history (chrome verification → Network tab).
Also, if you need more parameters to log in, you can just add more elements to the list.
curl.setopt(pycurl.URL,'https://www.xxx.com/login.php' )
curl.setopt(pycurl.POST, 1)
curl.setopt(pycurl.HTTPPOST, [('para1', 'aaa'), ('para2', 'bbb'), ('para3', 'ccc'), ...])
It is assumed that you will receive the HTML of the main page of xxx.com that requires login.
request_page.py
import pycurl
import io
curl = pycurl.Curl()
curl.setopt(pycurl.URL,'https://www.xxx.com/main' )
curl.setopt(pycurl.COOKIEFILE,'cookie.txt')
b = io.BytesIO()
curl.setopt(pycurl.WRITEFUNCTION, b.write)
try:
curl.perform()
ret = b.getvalue()
http_code = curl.getinfo(pycurl.HTTP_CODE)
except Exception as e:
ret = str(e)
The important thing here is To send a request with the cookie.txt generated after logging in,
curl.setopt(pycurl.COOKIEFILE,'cookie.txt')
Is to describe.
** pycurl.COOKIEJAR ** when saving cookies ** ** pycurl.COOKIEFILE ** when using cookies
b = io.BytesIO()
curl.setopt(pycurl.WRITEFUNCTION, b.write)
ret = b.getvalue()
With this, the HTML of the main page is stored in str type in ret, and it can be processed like other variables. Of course you can also request json. In that case, the response received is str type, so it is decoded by json module etc.
Recommended Posts