Thing you want to do

The Network tab of Chrome's developer tool (the one that opens with Ctl + Shift + i on Windows) is an interesting tool that allows you to see the timeline of the data acquired by the browser and simulate the line speed.

This time, I will simply get the URL list of the files displayed in this Network tab with Python + Selenium.

environment

Chrome 79.0.3945.45 beta Python 3.7.3 selenium 3.141.0 chromedriver-binary 79.0.3945.36.0

Debian GNU/Linux 9 (Docker container)

Implementation

Until the page is acquired by Selenium, it is as follows. Set options appropriately, such as headless mode. I get the page with driver.get (), but this excellent article was very helpful for the basic knowledge of this.

-Automatic operation of Chrome with Python + Selenium

`netlogs.py`


caps = DesiredCapabilities.CHROME
caps["goog:loggingPrefs"] = {"performance": "ALL"} 
# caps["loggingPrefs"] = {"performance": "ALL"} 

# options
options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--user-agent='+_headers["User-Agent"])

# get driver
driver = Chrome(options=options, desired_capabilities=caps)
driver.implicitly_wait(5)
driver.get("https://qiita.com/")

The log containing the URL is named performance, so setDesiredCapabilities to get the log [^ 1] I'll give you this when you get the driver [^ 2].

The setting name of DesiredCapabilities depends on the environment. There was a case that it didn't work unless it was "loggingPrefs" instead of "goog: loggingPrefs". Is it different depending on the Chrome version ...?

`netlogs.py`


time.sleep(2)

I'll wait until the page loads. It seems that the theory is to wait with driver.implicitly_wait (), I put sleep because I couldn't get the desired data well. Please let me know if there is a smarter way ...

`netlogs.py`


netLog = driver.get_log("performance")

The log acquired by driver.get_log ("performance ") is in JSON-like format and looks like the following.

`performance`


[
    {'level': 'INFO', 'message': '{
            "message": {
                "method": "Page.frameResized",
                "params": {}
            },
            "webview": "***"
        }', 'timestamp': ***
    },
    {'level': 'INFO', 'message': '{

    ...

We will extract only the necessary parts from the acquired performance log.

`netlogs.py`


def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response

events = [process_browser_log_entry(entry) for entry in netLog]
events = [event for event in events if 'Network.response' in event['method']]

detected_url = []
for item in events:
    if "response" in item["params"]:
        if "url" in item["params"]["response"]:
            detected_url.append(item["params"]["response"]["url"])

Of the properties " message ", those that further include Network.responseReceived in the"method"name are selectively extracted. Then, the extracted ʻeventswill be a set of items as follows. After that, I found the item containing" url " in" params "=>" response ", extracted it, and stored it in detected_url`.

`network.response`


[
    {
        "method": "Network.responseReceivedExtraInfo",
        "params": {
            "blockedCookies": [],
            "headers": {
                "cache-control": "max-age=0, private, must-revalidate",
                "content-encoding": "gzip",
                "content-type": "text/html; charset=utf-8",
                "date": "Sat, 23 Nov 2019 07:41:40 GMT",
                "etag": "W/\"***\"",
                "referrer-policy": "strict-origin-when-cross-origin",
                "server": "nginx",
                "set-cookie": "***",
                "status": "200",
                "strict-transport-security": "max-age=2592000",
                "x-content-type-options": "nosniff",
                "x-download-options": "noopen",
                "x-frame-options": "SAMEORIGIN",
                "x-permitted-cross-domain-policies": "none",
                "x-request-id": "***",
                "x-runtime": "***",
                "x-xss-protection": "1; mode=block"
            },
            "requestId": "***"
        }
    },
    {
    ...

Whole code

`netlogs.py`


caps = DesiredCapabilities.CHROME
caps["goog:loggingPrefs"] = {"performance": "ALL"}

options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--user-agent='+_headers["User-Agent"])

driver = Chrome(options=options, desired_capabilities=caps)
driver.implicitly_wait(5)
driver.get("https://qiita.com/")

time.sleep(2)

netLog = driver.get_log("performance")

def process_browser_log_entry(entry):
    response = json.loads(entry['message'])['message']
    return response
events = [process_browser_log_entry(entry) for entry in netLog]
events = [event for event in events if 'Network.response' in event['method']]

detected_url = []
for item in events:
    if "response" in item["params"]:
        if "url" in item["params"]["response"]:
            detected_url.append(item["params"]["response"]["url"])

Other method

It seems that you can also execute a script to get the above information [^ 3].

`netlogs_js.py`


scriptToExecute = "var performance = window.performance || window.mozPerformance || window.msPerformance || window.webkitPerformance || {}; var network = performance.getEntries() || {}; return JSON.stringify(network);"
netData = driver.execute_script(scriptToExecute)
netJson = json.loads(str(netData))

detected_url = []
for item in netJson:
    detected_url.append(item["name"])

I was able to get the URL list information by this method as well.

However, sometimes the desired file is not included, and I feel that it is not a stable method. (Not verified properly)

Please point out if there is a better way!

[^ 1]: I referred to this (almost copy)-[Selenium --python. How to capture network traffic's response [duplicate]](https://stackoverflow.com/questions/52633697/selenium-python-how- to-capture-network-traffics-response)

Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium

Thing you want to do

environment

Implementation

netlogs.py

netlogs.py

netlogs.py

performance

netlogs.py

network.response

Whole code

netlogs.py

Other method

netlogs_js.py

`netlogs.py`

`netlogs.py`

`netlogs.py`

`performance`

`netlogs.py`

`network.response`

`netlogs.py`

`netlogs_js.py`