Previously, it was troublesome to introduce mecab-python, so let's put it together in a Dockerfile.
The REST API part was implemented in Flask by referring to here. The source can be found on github. This is convenient if you also try using docker-compose. It doesn't taste very good with just one container like this time.
: memo: [2016-10-07 added] Added about updating the dictionary file. : memo: [2018-03-11 postscript] Added front end.
Dockerfile This is the deliverable.
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get install python3 python3-pip curl git sudo cron -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /opt
RUN git clone https://github.com/taku910/mecab.git
WORKDIR /opt/mecab/mecab
RUN ./configure --enable-utf8-only \
&& make \
&& make check \
&& make install \
&& ldconfig
WORKDIR /opt/mecab/mecab-ipadic
RUN ./configure --with-charset=utf8 \
&& make \
&&make install
WORKDIR /opt
RUN git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git
WORKDIR /opt/mecab-ipadic-neologd
RUN ./bin/install-mecab-ipadic-neologd -n -y
COPY . /opt/api
WORKDIR /opt/api
RUN pip3 install -r requirements.txt
CMD ["python3", "server.py"]
: memo: [2016-10-07 postscript]
.
├── README.md
├── docker-compose.yml
└── flask-mecab
├── Dockerfile
├── requirements.txt
└── server.py
docker-compose.yml
api:
build: ./flask-mecab
volumes:
- "./flask-mecab:/opt/api"
ports:
- "5000:5000"
restart: always
You can customize the output format of MeCab, but by default
Surface type\t part of speech,Part of speech subclassification 1,Part of speech subclassification 2,Part of speech subclassification 3,Inflected form,Utilization type,Prototype,reading,pronunciation
Since it is, each line of the output character string is divided by'\ t'and',' and zip () is used to correspond to each element.
server.py
def mecab_parse(sentence, dic='ipadic'):
dic_dir = "/usr/local/lib/mecab/dic/"
if dic == 'neologd':
dic_name = 'mecab-ipadic-neologd'
else:
dic_name = dic
m = MeCab.Tagger('-d ' + dic_dir + dic_name)
#Output format (default)
format = ['Surface type', 'Part of speech','Part of speech細分類1', 'Part of speech細分類2', 'Part of speech細分類3', 'Inflected form', 'Utilization type','prototype','reading','pronunciation']
return [dict(zip(format, (lambda x: [x[0]]+x[1].split(','))(p.split('\t')))) for p in m.parse(sentence).split('\n')[:-2]]
See also the execution example below.
Since you can write port forwarding settings etc. in docker-compose.yml, this is the only startup.
$ git clone https://github.com/matsulib/mecab-service.git
$ cd mecab-service/
$ sudo docker-compose up -d
HTTP request
POST /mecab/v1/parse-ipadic
POST /mecab/v1/parse-neologd
Request header
Content-Type: application/json
Request body
{
"sentence":String
}
$ curl -X POST http://localhost:5000/mecab/v1/parse-ipadic \
-H "Content-type: application/json" \
-d '{"sentence": "Functional programming"}' | jq .
{
"dict": "ipadic",
"message": "Success",
"results": [
{
"prototype": "function",
"Part of speech": "noun",
"Part of speech subclassification 1": "General",
"Part of speech subclassification 2": "*",
"Part of speech subclassification 3": "*",
"Utilization type": "*",
"Inflected form": "*",
"pronunciation": "Kansu",
"Surface type": "function",
"reading": "Kansu"
},
{
"prototype": "Mold",
"Part of speech": "noun",
"Part of speech subclassification 1": "suffix",
"Part of speech subclassification 2": "General",
"Part of speech subclassification 3": "*",
"Utilization type": "*",
"Inflected form": "*",
"pronunciation": "Play",
"Surface type": "Mold",
"reading": "Play"
},
{
"prototype": "programming",
"Part of speech": "noun",
"Part of speech subclassification 1": "Change connection",
"Part of speech subclassification 2": "*",
"Part of speech subclassification 3": "*",
"Utilization type": "*",
"Inflected form": "*",
"pronunciation": "programming",
"Surface type": "programming",
"reading": "programming"
}
],
"status": 200
}
mecab-ipadic-neologd is a dictionary that is strong against proper nouns.
$ curl -X POST http://localhost:5000/mecab/v1/parse-neologd \
-H "Content-type: application/json" \
-d '{"sentence": "Functional programming"}' | jq .
{
"dict": "neologd",
"message": "Success",
"results": [
{
"prototype": "Functional programming",
"Part of speech": "noun",
"Part of speech subclassification 1": "Proper noun",
"Part of speech subclassification 2": "General",
"Part of speech subclassification 3": "*",
"Utilization type": "*",
"Inflected form": "*",
"pronunciation": "Kansugata programming",
"Surface type": "Functional programming",
"reading": "Kansugata programming"
}
],
"status": 200
}
The image created from this Dockerfile is the next mecabservice_api, but it is huge compared to the original ubuntu.
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
mecabservice_api latest 785bc1295e46 About an hour ago 3.375 GB
ubuntu 16.04 c73a085dc378 4 days ago 127.1 MB
Let's see where it became bloated with the history subcommand. It seems that the dictionary file is very large, but can't it be helped?
$ sudo docker history mecabservice_api
IMAGE CREATED CREATED BY SIZE COMMENT
785bc1295e46 About an hour ago /bin/sh -c #(nop) CMD ["python3" "server.py"] 0 B
00339a58f77e About an hour ago /bin/sh -c pip3 install -r requirements.txt 5.852 MB
a397235a8e30 About an hour ago /bin/sh -c #(nop) WORKDIR /opt/api 0 B
436ef40f928d About an hour ago /bin/sh -c #(nop) COPY dir:a4cb3a57cc1f117b07 2.445 kB
12456a11160d 3 hours ago /bin/sh -c ./bin/install-mecab-ipadic-neologd 2.307 GB
18478e2f8d71 3 hours ago /bin/sh -c #(nop) WORKDIR /opt/mecab/mecab-ip 0 B
74e6a0bffa98 3 hours ago /bin/sh -c git clone --depth 1 https://github 114.2 MB
db6a7c716f21 3 hours ago /bin/sh -c #(nop) WORKDIR /opt/mecab 0 B
6980af0a5afd 3 hours ago /bin/sh -c ./configure --with-charset=utf8 106 MB
db32f32c58e6 3 hours ago /bin/sh -c #(nop) WORKDIR /opt/mecab/mecab-ip 0 B
5745385f2342 3 hours ago /bin/sh -c ./configure --enable-utf8-only 11.16 MB
8c601b1fac00 3 hours ago /bin/sh -c #(nop) WORKDIR /opt/mecab/mecab 0 B
d730397e47eb 3 hours ago /bin/sh -c git clone https://github.com/taku9 378.8 MB
2abd825af064 3 hours ago /bin/sh -c apt-get update && apt-get instal 325 MB
d5abec2370fb 3 hours ago /bin/sh -c #(nop) WORKDIR /opt 0 B
c73a085dc378 4 days ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
I noticed that the dictionary of mecab-ipadic-neologd is updated regularly on the server. I did.
-This dictionary is updated automatically on the development server --Will be updated at least twice a week --Monday and Thursday
In addition, instructions on how to get updates into the post-installation dictionary are also provided in Official.
There is already a useful option if you want to update automatically with cron (I'll omit the explanation). For example, if you write the following two lines in crontab etc., you can specify (-y) with user authority (-u) at 3:00 am on Tuesday and Friday without checking the analysis result.> In the directory (-) p [/ path / to / user / directory]) A dictionary file is updated.
00 03 * * 2 ./bin/install-mecab-ipadic-neologd -n -y -u -p /path/to/user/directory > /path/to/log/file
00 03 * * 5 ./bin/install-mecab-ipadic-neologd -n -y -u -p /path/to/user/directory > /path/to/log/file
If you want to set the running container as mecabservice_api_1 and output the update log to / opt / log / mecab, You can set it by connecting to the container with bash as follows.
$ sudo docker exec -it mecabservice_api_1 /bin/bash
# mkdir -p /opt/log/mecab
# (crontab -l 2>/dev/null; echo "PATH=$PATH") | crontab -
# (crontab -l 2>/dev/null; echo '00 18 * * 1 /opt/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -y >/opt/log/mecab/neologd_`date -I`.log 2>/opt/log/mecab/neologd_`date -I`.err') | crontab -
# (crontab -l 2>/dev/null; echo '00 18 * * 4 /opt/mecab-ipadic-neologd/bin/install-mecab-ipadic-neologd -n -y >/opt/log/mecab/neologd_`date -I`.log 2>/opt/log/mecab/neologd_`date -I`.err') | crontab -
# /etc/init.d/cron restart
# exit
Here, the UTC time is set so that it will be updated at 3:00 am on Tuesday and Friday of JST time. By default, it got stuck for a moment where mecab was not included in the crontab PATH. Also, at first, a small bug in the shell did not work with the above settings, but I threw Issue and Pull Request. As soon as I got it merged, it started working (* ^^) v
Even though I am using docker-compose, I am lonely with only one service, so I added a web application that calls the above REST API from the front end to docker-compose. Please refer to github for the front source. Communication between containers and CORS settings were clogged up.
docker-compose.yml
api:
build: ./flask-mecab
volumes:
- "./flask-mecab:/opt/api"
ports:
- "5000:5000"
restart: always
front:
build: ./flask-mecab-front
volumes:
- "./flask-mecab-front:/opt/front"
ports:
- "5001:5001"
restart: always
links:
- api
environment:
FLASK_MECAB_URI: "http://api:5000/mecab/v1"
After booting, access http: // localhost: 5001 /
with your browser.
Recommended Posts