Language processing 100 knocks 2015 ["Chapter 5: Dependency analysis"](http://www.cl.ecei. tohoku.ac.jp/nlp100/#ch5) [43rd "Extracting clauses containing nouns related to clauses containing verbs"](http://www.cl.ecei.tohoku.ac.jp/nlp100 / # sec43) This is a record. Compared to the previous knock, there is no big difference just by adding conditions to the output source and destination.
Link | Remarks |
---|---|
043.Extract clauses containing nouns related to clauses containing verbs.ipynb | Answer program GitHub link |
100 amateur language processing knocks:43 | Copy and paste source of many source parts |
CaboCha official | CaboCha page to look at first |
I installed CRF ++ and CaboCha too long ago and forgot how to install them. Since it is a package that has not been updated at all, we have not rebuilt the environment. I have only a frustrated memory of trying to use CaboCha on Windows. I think I couldn't use it on 64-bit Windows (I have a vague memory and maybe I have a technical problem).
type | version | Contents |
---|---|---|
OS | Ubuntu18.04.01 LTS | It is running virtually |
pyenv | 1.2.16 | I use pyenv because I sometimes use multiple Python environments |
Python | 3.8.1 | python3 on pyenv.8.I'm using 1 Packages are managed using venv |
Mecab | 0.996-5 | apt-Install with get |
CRF++ | 0.58 | It's too old and I forgot how to install(Perhapsmake install ) |
CaboCha | 0.69 | It's too old and I forgot how to install(Perhapsmake install ) |
Apply the dependency analyzer CaboCha to "I am a cat" and experience the operation of the dependency tree and syntactic analysis.
Class, Dependency Parsing, CaboCha, Clause, Dependency, Case, Functional Verb Parsing, Dependency Path, [Graphviz](http: / /www.graphviz.org/)
Using CaboCha for the text (neko.txt) of Natsume Soseki's novel "I am a cat" Analyze the dependency and save the result in a file called neko.txt.cabocha. Use this file to implement a program that addresses the following questions.
When clauses containing nouns relate to clauses containing verbs, extract them in tab-delimited format. However, do not output symbols such as punctuation marks.
import re
#Delimiter
separator = re.compile('\t|,')
#Dependency
dependancy = re.compile(r'''(?:\*\s\d+\s) #Not subject to capture
(-?\d+) #Numbers(Contact)
''', re.VERBOSE)
class Morph:
def __init__(self, line):
#Split with tabs and commas
cols = separator.split(line)
self.surface = cols[0] #Surface type(surface)
self.base = cols[7] #Uninflected word(base)
self.pos = cols[1] #Part of speech(pos)
self.pos1 = cols[2] #Part of speech subclassification 1(pos1)
class Chunk:
def __init__(self, morphs, dst):
self.morphs = morphs
self.srcs = [] #List of original clause index numbers
self.dst = dst #Contact clause index number
self.verb = False
self.noun = False
self.phrase = ''
for morph in morphs:
#For non-symbols Create clauses
if morph.pos != 'symbol':
self.phrase += morph.surface
if morph.pos == 'verb':
self.verb = True
if morph.pos == 'noun':
self.noun = True
#Substitute the origin and add the Chunk list to the statement list
def append_sentence(chunks, sentences):
#Substitute the entrepreneur
for i, chunk in enumerate(chunks):
if chunk.dst != -1:
chunks[chunk.dst].srcs.append(i)
sentences.append(chunks)
return sentences, []
morphs = []
chunks = []
sentences = []
with open('./neko.txt.cabocha') as f:
for line in f:
dependancies = dependancy.match(line)
#If it is not EOS or dependency analysis result
if not (line == 'EOS\n' or dependancies):
morphs.append(Morph(line))
#When there is a morphological analysis result in the EOS or dependency analysis result
elif len(morphs) > 0:
chunks.append(Chunk(morphs, dst))
morphs = []
#In the case of dependency result
if dependancies:
dst = int(dependancies.group(1))
#When there is a dependency result in EOS
if line == 'EOS\n' and len(chunks) > 0:
sentences, chunks = append_sentence(chunks, sentences)
for i, sentence in enumerate(sentences):
for chunk in sentence:
if chunk.dst != -1 and \
chunk.noun and \
sentence[chunk.dst].verb:
print('{}\t{}'.format(chunk.phrase, sentence[chunk.dst].phrase))
#Limited because there are many
if i > 50:
break
I changed the Chunk class from the previous knock and defined in the class variable whether the clause includes nouns and verbs. Since we are processing in a for
loop, we stopped creating strings for clauses in list comprehension notation.
python
class Chunk:
def __init__(self, morphs, dst):
self.morphs = morphs
self.srcs = [] #List of original clause index numbers
self.dst = dst #Contact clause index number
self.verb = False
self.noun = False
self.phrase = ''
for morph in morphs:
#For non-symbols Create clauses
if morph.pos != 'symbol':
self.phrase += morph.surface
if morph.pos == 'verb':
self.verb = True
if morph.pos == 'noun':
self.noun = True
All you have to do is narrow down the output target with the conditional branch of ʻif`.
python
for i, sentence in enumerate(sentences):
for chunk in sentence:
if chunk.dst != -1 and \
chunk.noun and \
sentence[chunk.dst].verb:
print('{}\t{}'.format(chunk.phrase, sentence[chunk.dst].phrase))
When the program is executed, the following results will be output. Since there are many, only a part is output.
Output result
Where was born
I don't get it
I have no idea
Crying where you did
I remember only what I was
I saw
For the first time here
I saw something
I will ask you later
Catch us
Placed on the palm
Sue was lifted
When fluffy
I just felt
Calm down on
I saw my face
Would be the beginning of things
I thought it was
Feeling remains
It still remains
Should be decorated with the first hair
The face is slippery
I met after that
I also met a cat
I met once
The center is protruding
Blow from inside
Blow smoke
I was sore and weak
Human drinking
I knew that
I knew about it
Sit behind
Sit in your heart
I started driving at high speed
Does the student move?
Will it move or will it move
Will only I move?
I don't know if it works
Turn your eyes
I feel sick
There is a sound
Out of the eyes
The fire broke out
I remember until then
I remember but I don't know
I don't know the rest
I don't know
Notice
There is no student
Lots of
I can't see my brother
I can't even see a single
I even hid my mother
I hid myself
Unlike the place
I can't even open my eyes
I was abandoned
It was abandoned from above
It was suddenly abandoned
It was abandoned inside
When you crawl out with your thoughts
When you crawl out Sasahara
On the other side
There is a pond
I saw
Sit in front
It doesn't make sense
I wonder if the student will come again
Will you come to meet me
Do it with meow
No one comes
Cross over the pond
The wind crosses
Takes a day
It's dark
I'm hungry
It has decreased very much
With food
There is up to
Make a decision
Started to go around the pond
Started to turn to the left
Put up with that
If you put up with it and crawl
If you forcibly crawl
It came out by the thing
I went to the place
If you crawl here
The bamboo fence collapsed
I sneaked through the hole
I sneaked into the house
I may have starved to death with something.
If the bamboo fence was not torn
I may have starved to death
I may have starved to death on the roadside.
What was the shadow
There is a hole
Until today
I will visit
Visit the calico
It is a passage
Although I sneaked into the mansion
It gets dark in my house
I'm hungry
It's raining
I couldn't do it after cleaning up
I can no longer grace
Go towards
Go towards
Thinking from now on
The time has passed
Crawl inside
I encountered it here
I encountered
Should see humans
I encountered an opportunity
The first thing I met
This is squeezed out
Seen from the student
If you look at it
When I see me
Suddenly grab it
Grab the cervical muscle
I squeezed out to the table
Recommended Posts