Leave for later people.
--There are few materials for Python3 -Bottle has few Japanese materials
For the above reasons, I had a hard time in a wasteful place ...
--First, I decided to use Python for my research. Python with Numpy and Scipy is suitable for research on information retrieval. --Python3 has utf-8 as the default encoding, so it is easier to handle Japanese than Python2.
――Since it is a research on the Web, I thought that it should be implemented as a Web application. --Do not use DB. So the micro framework is the best. --Flask and Bottle are famous as Python micro-frameworks. But Flask doesn't support Python 3. (As of May 23, 2013. Maybe it will be supported soon) Addendum: As of November 29, 2013, Flask is compatible with Python3! !! Yeah!
See this article for sample code to search using the Custom Search API. http://qiita.com/items/92febaf8bbea541b1e36
However, the example is Python2 series code, so some modifications are needed.
Define the process that actually fetches the json of the search result as a function called simple_search (query) in googlesearch.py.
googlesearch.py
import urllib
import urllib.request
import urllib.parse
import json
def simple_search(query):
QUERY = query
API_KEY = 'AIzaSyBdhBWUc5W3Aco3YGPwOlS_rYM0LENl_jo'
NUM = 1
url = 'https://www.googleapis.com/customsearch/v1?'
params = {
'key': API_KEY,
'q': QUERY,
'cx': '013036536707430787589:_pqjad5hr1a',
'alt': 'json',
'lr': 'lang_ja', }
start = 1
for i in range(0, NUM):
params['start'] = start
request_url = url + urllib.parse.urlencode(params)
try:
response = urllib.request.urlopen(request_url)
json_body = json.loads(response.read().decode('utf-8'))
items = json_body['items']
except:
print('Error')
return items
Put this file in the same directory as app.py. In addition, this code takes only one page, that is, 10 search results. If you want to take more, change the value of params ['start'] and turn the for loop.
Attention in Python3. Some standard libraries have changed from Python 2.
--urllib.urlopen (url) is now urllib.request.urlopen (url). --urllib.urlencode (params) is now urllib.parse.urlencode (params).
Also, use your own API_KEY. You can get the API key for your Google account at https://code.google.com/apis/console/.
Bottle writes the process corresponding to the URL and http request, that is, the controller code, in app.py.
This time, I used Mako as the template used for the view part. Put the templates used by Mako in the static / templates directory.
app.py
from bottle import Bottle, route, run, static_file, request
from mako.template import Template
import googlesearch
import pdb
template = Template(filename='static/templates/index.tmpl')
app = Bottle()
@route('/static/:path#.+#', name='static')
def static(path):
return static_file(path, root='static')
@route('/results')
def results_get():
return template.render(items='')
@route('/results', method='POST')
def results():
query = request.forms.decode().get('query')
items = googlesearch.simple_search(query)
return template.render(items=items)
@route('/')
def greet():
return template.render(items='')
run(host='localhost', port=1234, debug=True)
The point is the function of @route ('/ results', method ='POST'),
query = request.forms.decode().get('query')
To be. If you want to search in Japanese, you need decode (). This method is found on Stackoverflow, but if there are few Japanese materials It's sad that it takes a lot of time and effort to read the English manual and search for Stackoverflow when doing this kind of processing. English-speaking programmers don't think about multi-byte characters and it's hard.
Display the search results with the @route ('/ results', method ='POST') function. I think there is a way to search with the GET method instead of POST, but I don't know how to do it and I feel that POST is okay. And on the view side.
static/templates/index.tmpl
#coding: utf-8
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Subtask Search</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Loading Bootstrap -->
<link href="static/css/bootstrap.css" rel="stylesheet">
<!-- Loading Flat UI -->
<link href="static/css/flat-ui.css" rel="stylesheet">
<link rel="shortcut icon" href="static/images/favicon.ico">
<!-- HTML5 shim, for IE6-8 support of HTML5 elements. All other JS at the end of file. -->
<!--[if lt IE 9]>
<script src="static/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="container">
<div class="demo-headline">
<a href="/">
<h1 class="demo-logo">
Subtask Search
</h1>
</a>
</div> <!-- /demo-headline -->
<div class="span4 offset4">
<form action="/results" method="post">
<input type="text" name="query" value placeholder="Input your task" class="span4 offset4" />
<input type="submit" value="Search" />
</form>
</div>
<div class="span8 offset2">
<ul class="unstyled">
% for item in items:
<li>
<a href= ${item['link']}>
${item['title']}
</a>
</li>
% endfor
</ul>
</div>
</div> <!-- /container -->
<!-- Load JS here for greater good =============================-->
<script src="static/js/jquery-1.8.2.min.js"></script>
<script src="static/js/jquery-ui-1.10.0.custom.min.js"></script>
<script src="static/js/jquery.dropkick-1.0.0.js"></script>
<script src="static/js/custom_checkbox_and_radio.js"></script>
<script src="static/js/custom_radio.js"></script>
<script src="static/js/jquery.tagsinput.js"></script>
<script src="static/js/bootstrap-tooltip.js"></script>
<script src="static/js/jquery.placeholder.js"></script>
<script src="http://vjs.zencdn.net/c/video.js"></script>
<script src="static/js/application.js"></script>
<!--[if lt IE 8]>
<script src="static/js/icon-font-ie7.js"></script>
<script src="static/js/icon-font-ie7-24.js"></script>
<![endif]-->
</body>
</html>
I used Flat UI for this index.tmpl to make it look nice. css and js are downloaded from Flat UI and put in the static directory.
Mako is working
% for item in items:
<li>
<a href= ${item['link']}>
${item['title']}
</a>
</li>
% endfor
Only the part of. app.py
return template.render(items=items)
So, I put the items with the result of simple_search in Mako's items and render this view.
You can do this.
It's hard to handle multi-byte characters. It should have been easier with Python3, but it's still hard. There are few Japanese materials, so it's even harder. I want everyone to share more know-how.
https://github.com/katryo/google_simple_search I left the entire application in. Please take a look.
Recommended Posts