As a learning tool for Pythonista, I wrote a program that loads and processes web pages.
Morioka City Bus Location System, Bus Operation Information Page (http::) where you can search the timetable of </ i> buses operating in Morioka City. //gps.iwatebus.or.jp/pc/index.htm) is a convenient web service that allows you to search the timetable if you know the name of the bus stop you are getting on and off. However, when it was displayed on Safari on the iPhone, it was unexpectedly inconvenient because it was troublesome to search and specify the bus stop and the displayed characters were small. (Personal impression) As a countermeasure, I made a Workflow app and used it to check the code of the bus stop that I often use and use the API to capture, process and display the data. I decided to try it with Pythonista.
As a study, I made a program to process and display the Web data obtained from the API by specifying the bus stop code. (You can find the bus stop code from the source on the page for selecting the bus stop to board. I processed it into CSV, but I will refrain from publishing it.)
The source only works with Pythonista 2.7 series. In the new Pythonista 3, urllib2 couldn't be imported and didn't work. (Addition: Since it works with the information received in the comment, I added the source below.)
bistable.py
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib2
import re
import codecs
ps = re.compile(r'<.*?>')
pnbsp = re.compile(r' ')
prt = re.compile(r'^\n')
URL1 = 'http://gps.iwatebus.or.jp/bls/pc/jikoku_jk.jsp?'
ride = "jjg=1&jtr=241" #Code in front of Morioka Station
goff = "kjg=1&ktr=383". #Code in front of Morioka Hachimangu
getURL = URL1 + ride + '&' + goff
r = urllib2.urlopen(getURL)
webdata.decode('sjis')
wdata = ps.sub('',webdata)
wdata = pnbsp.sub('',wdata)
wdata = prt.sub('',wdata)
wdata.replace(" ","")
wdata.replace(";","\r")
alldata = wdata.split("\r")
od = ""
hr = ""
for ld in alldata:
ld.decode('sjis')
wd = ld.lstrip()
if (len(wd) >1): #Do not process lines with only one character
if wd == '<!--' or wd == '-->': #Ignore the multi-line comment tag as it remains
pass
elif (len(wd) == 2): #Two-character lines are likely to be time
if not wd == ';;': #Time data unless there are two lines of semicolons
hr = '[' + wd + ']:' #Store time data
elif wd[0] == ";": #Lines starting with a semicolon are "minute" data
od = od + hr + wd.replace(";",'') + "\r" #Concatenate time data and minute data
else:
od = od + wd.replace(";",'') + "\r" #Concatenate data other than the above by removing the semicolon
alldata = od.split("\r") #Make an array (list) separated by line breaks
od = ""
for ld in alldata:
if (len(ld) > 0):
if ld[0:5] == '[24]:': #If there is no "minute" data after 24:00, "Saturday" and "holiday" are stuck together, so divide them.
if ld[7] in '0123456789': #If the value after 24:00 is a numerical value, treat it as minute data.
od = od + ld + "\r\n"
else:
od = od + ld[5:len(ld)] + "\r\n" #Since there is no minute data, it is deleted at 24:00
elif ('[;' in ld): #Ignore the lines of garbage that come out during the process
pass
elif ('[' in ld) and len(ld) == 6: #Ignore the lines of garbage that come out during the process
pass
else:
od = od + ld + "\r\n"
print od.decode('sjis') #Shift the processing result-Convert to JIS and write to standard output
Postscript: Since Pythonista3 also included a 2.x series library, I was thinking of finding a switching method later, but in the comments, please tell me how to run python 2.x series on Pythonista3. I did. Also, since I reviewed the source and refactored and added comments, I will add the source that works with Pythonista3.
busTimeTable3.py
#!python2
# -*- coding: utf-8 -*-
import urllib2
import re
import codecs
URL1 = 'http://gps.iwatebus.or.jp/bls/pc/jikoku_jk.jsp?'
ride = "jjg=1&jtr=241" #--Bus stop code at Morioka Station
goff = "kjg=1&ktr=383" #--Bus stop code in front of Hachimangu
getURL = URL1 + ride + '&' + goff
r = urllib2.urlopen(getURL)
webdata = r.read() #Loading web data
webdata.decode('sjis') #Data shift-Since it is jis, decode it
wdata = re.sub(r'<.*?>', '',webdata) #Erase HTML tags
wdata = re.sub(r' ','', wdata) # &Erase nbsp (leave the semicolon on purpose)
wdata.replace(' ','') #Erase whitespace
wdata.replace(';','\r') #Convert semicolon to newline
alldata = wdata.split('\r') #Make an array (list) separated by line breaks
od = ""
hr = ""
for ld in alldata:
ld.decode('sjis')
wd = ld.lstrip()
if (len(wd) >1): #Do not process lines with only one character
if wd == '<!--' or wd == '-->': #Ignore the multi-line comment tag as it remains
pass
elif (len(wd) == 2): #Two-character lines are likely to be time
if not wd == ';;': #Time data unless there are two lines of semicolons
hr = '[' + wd + ']:' #Store time data
elif wd[0] == ";": #Lines starting with a semicolon are "minute" data
od = od + hr + wd.replace(";",'') + "\r" #Concatenate time data and minute data
else:
od = od + wd.replace(";",'') + "\r" #Concatenate data other than the above by removing the semicolon
alldata = od.split("\r") #Make an array (list) separated by line breaks
od = ""
for ld in alldata:
if (len(ld) > 0):
if ld[0:5] == '[24]:': #If there is no "minute" data after 24:00, "Saturday" and "holiday" are stuck together, so divide them.
if ld[7] in '0123456789': #If the value after 24:00 is a numerical value, treat it as minute data.
od = od + ld + "\r\n"
else:
od = od + ld[5:len(ld)] + "\r\n" #Since there is no minute data, it is deleted at 24:00
elif ('[;' in ld): #Ignore the lines of garbage that come out during the process
pass
elif ('[' in ld) and len(ld) == 6: #Ignore the lines of garbage that come out during the process
pass
else:
od = od + ld + "\r\n"
print od.decode('sjis') #Shift the processing result-Convert to JIS and write to standard output