Constraint analysis (partial analysis)

MeCab's constrained analysis (partial analysis) function is used when some morpheme information of a sentence is known or boundaries are known. The Python and MeCab binding natto-py provides three constrained parsing methods.

--partial / -p option
Specifying boundary constraints
Specifying feature constraints

Partial analysis with --partial option

Specify the --partial or -p option when retrieving a MeCab instance. The input statement passed to parse describes the constraint in the following format.

Sentence fragment:
Sentence fragment. Normal morphological analysis is performed as if there were no restrictions. However, morphemes that straddle sentence fragments are not output.
Be sure to add \ n (line feed) at the end.
Morpheme fragment
The format is surface \ t feature pattern \ n.
Finally, add \ n to the end of the input statement.

from natto import MeCab

text = """garden\t Hoge
To
Haniwa\t Hoge
Chicken\t Hoge
There is.
"""

with MeCab("--partial") as nm:
    print(nm.parse(text))
 
Niwahoge
Particles,Case particles,General,*,*,*,To,D,D
Haniwa Hoge
Chicken Hoge
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Verb,Independence,*,*,One step,Uninflected word,Is,Il,Il
.. symbol,Kuten,*,*,*,*,。,。,。
EOS

The above example sends the analysis result to the standard output, but for finer constraints, use the morpheme boundary constraint (boundary) or part of speech constraint (feature) function.

Morpheme boundary constraints

If you know the word boundaries in advance, you can specify the boundaries as a compiled regular expression or string with the boundary_constraints keyword argument. Those that match the specified morpheme boundary will be treated as one morpheme and analyzed.

text = "There is a chicken in the haniwa."

patt = "Chicken|Haniwa|garden"

with MeCab() as nm:
    #Get information for each MeCabNode by specifying a morpheme boundary constraint
    for n in nm.parse(text, boundary_constraints=patt, as_nodes=True):
        if not (n.is_bos() or n.is_eos()):
            print("{}:\t{}". format(n.surface, n.feature))

# BOS/Omit EOS node
garden:noun,General,*,*,*,*,*
To:Particle,Case particles,General,*,*,*,To,D,D
Haniwa:noun,General,*,*,*,*,Haniwa,Haniwa,Haniwa
Chicken:noun,General,*,*,*,*,Chicken,Chicken,Chicken
But:Particle,Case particles,General,*,*,*,But,Moth,Moth
Is:verb,Independence,*,*,One step,Uninflected word,Is,Il,Il
。:symbol,Kuten,*,*,*,*,。,。,。

For details, see 6.2. Re — Regular Expression Operation and re.finditer See /re.html#re.finditer).

Feature constraints

The feature_constraints keyword argument allows you to specify part of speech classification for each particular morpheme. Tuple (tuple) that has part words for morphological elements as a pair, and those morphological elements and part word mappings are further stored in tuples. Then pass it to the parse method as follows:

feat = (("Chicken","Hoge"), ("Haniwa","HogeHoge"), ("garden","更にHoge"))

with MeCab() as nm:
    #Get information for each MeCabNode by specifying part-speech constraints for some morphemes
    for n in nm.parse(text, feature_constraints=feat, as_nodes=True):
        if not (n.is_bos() or n.is_eos()):
            print("{}:\t{}". format(n.surface, n.feature))

# BOS/Omit EOS node
garden:Further loosening
To:Particle,Case particles,General,*,*,*,To,D,D
Haniwa:Hogehoge
Chicken:Hoge
But:Particle,Case particles,General,*,*,*,But,Moth,Moth
Is:verb,Independence,*,*,One step,Uninflected word,Is,Il,Il
。:symbol,Kuten,*,*,*,*,。,。,。

that's all

Use MeCab constrained parsing (partial parsing) in Python through natto-py

Constraint analysis (partial analysis)

Partial analysis with --partial option

Morpheme boundary constraints

Feature constraints

reference