The following updated version
Create a python script to check if the link at the specified URL is valid
Same as in here
Grep
as much as possible the character string that followshttps: // ...
under the current directory. Save as input.txt
.
grep -r "https://" * > test.txt
cat test.txt | sed -e 's/.*https//' | sed "s/^/http/g" > test.txt
cat test2.txt | sed 's/>//g' | sed 's/"//g' | sed 's/)//g' | sed 's/;//g' | sed 's/]//g' | cut -d' ' -f 1 > input.txt
Check if you can access the URL of input.txt
.
$ python3 check_url.py
#Output result
#Can access → OK
#Inaccessible → Not Found
#* However, the grep result may not be as intended, so check it.
NotFound:http://www.kernel.org/pub/linux/kernel/v5.x/linux-${PV}.tar.xz
OK:http://facebook.github.io/watchman/
...
check_url.py
#-*- using:utf-8 -*-
import urllib.request, urllib.error
with open('out.txt', 'w') as txt:
txt.write("chdck result\n")
def checkURL(url):
try:
f = urllib.request.urlopen(url)
f.close()
return True
except:
return False
if __name__ == '__main__':
with open("./input.txt") as f:
for url in f:
# print(url, end='')
ret = checkURL(url)
if ret == True:
result = "OK:"
else:
result = "NotFound:"
ret_text = result + url
#ret_text = ret_text.replace('\n', '')
print(ret_text)
if ret != True:
with open('out.txt', 'a') as txt:
txt.write(ret_text)
Output the result OK/NG to the following
cat out.txt
Create a python script to check if the link at the specified URL is valid (https://qiita.com/seigot/items/534ca3089d217200a1d6) Extract the character string to the right of the specified character string from the input character string
Recommended Posts