Language processing 100 knocks 2015 "Chapter 6: Processing English texts" This is the record of 51st "Cut out words" of .tohoku.ac.jp/nlp100/#ch6). This time, technically, it is almost the same as the previous time. A simple knock that ends with less than 10 lines of code.
Link | Remarks |
---|---|
051.Cut out words.ipynb | Answer program GitHub link |
100 amateur language processing knocks:51 | Copy and paste source of many source parts |
type | version | Contents |
---|---|---|
OS | Ubuntu18.04.01 LTS | It is running virtually |
pyenv | 1.2.16 | I use pyenv because I sometimes use multiple Python environments |
Python | 3.8.1 | python3 on pyenv.8.I'm using 1 Packages are managed using venv |
An overview of various basic technologies of natural language processing through English text processing using Stanford Core NLP.
Stanford Core NLP, Stemming, Part-of-speech tagging, Named entity recognition, Co-reference analysis, Parsing analysis, Phrase structure analysis, S-expressions
For the English text (nlp.txt), execute the following processing.
Treat whitespace as word breaks, take 50 outputs as input, and output in the form of one word per line. However, output a blank line at the end of the sentence.
import re
with open('./050.result.txt') as file_in, \
open('./051.result.txt', 'w') as file_out:
for line in file_in:
if line != '\n':
line = re.sub(r'''
[\.|;|:|\?|!|,]* # . or ; or : or ? or ! or ,Is 0 times or more
\s #Blank
''', '\n', line, flags = re.VERBOSE)
print(line, file=file_out)
Processing using regular expressions following the previous time. This time, replace the blank (space) with a line break. This time it's simpler because there are no positive look-ahead / look-behind assertions. Even if there is a symbol system before the blank, it is replaced.
When the program is executed, the following result (excerpt from the first 20 lines) is output.
text:051.result.txt(Excerpt from the first 20 lines)
Natural
language
processing
From
Wikipedia
the
free
encyclopedia
Natural
language
processing
(NLP)
is
a
field
of
computer
science
Recommended Posts