Es gibt mehrere Quellen

Beitrag 2 Analysieren der Syntaxanalyse der COTOHA-API in Java

Während des Tests der "API-Demo" (https://api.ce-cotoha.com/demo) fiel mir das Analyseergebnis von "Meine Frau und meine Tochter sind auf eine Reise gegangen" auf.

Ich habe mich gefragt, ob dies eindeutig alle richtigen Antworten darstellt, aber in Wirklichkeit


{
      "id" : 2,
      "form" : "Tochter",
      "kana" : "Musume",
      "lemma" : "Tochter",
      "pos" : "Substantiv",
      "dependency_labels" : [ {
        "token_id" : 0,
        "label" : "conj"
      }, {
        "token_id" : 3,
        "label" : "case"
      } ],
      "attributes" : { }
    }

Es wird als zurückgegeben und sieht aus wie die rote Linie unten in der Abbildung.

Da der Attributname von JSON "dependency_labels" lautet, ist es leicht zu verstehen, dass es mehrere gibt. Wenn Sie sich jedoch nur die Demo ansehen, scheint es, dass es nicht mehrere gibt, und ich dachte, Sie müssen vorsichtig sein. Es schien auch, dass die Demo die Attraktivität der API nicht vollständig vermittelte.

Es kann nur ein Dokument gesendet werden

Es kann nur ein "Dokument" (z. B. mehrere Protokolle im Call Center) für die Verarbeitung in einem API-Aufruf angegeben werden. Wenn ich versuche, eine große Anzahl von Dokumenten zu verarbeiten, möchte ich mehrere Dokumente gleichzeitig anstatt einzeln verarbeiten. Daher dachte ich, dass dieser Punkt ebenfalls beachtet werden sollte. (Beispiel: Es ist unangemessen, die Anrufprotokolle verschiedener Kunden für die syntaktische Analyse zu verketten.) Erwarten Sie, dass die nächste Version mehrere Dokumente verarbeiten kann.

Verhalten bei der Analyse mehrerer Sätze

Wenn ich sende "Meine Frau und meine Tochter sind auf eine Reise gegangen. Ich und mein Sohn haben gegrilltes Fleisch gegessen.", Wird die folgende Antwort zurückgegeben.


{
  "result" : [ {
    "chunk_info" : {
      "id" : 0,
      "head" : 1,
      "dep" : "P",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 0,
      "form" : "Schwiegertochter",
      "kana" : "Yo Ich",
      "lemma" : "Schwiegertochter",
      "pos" : "Substantiv",
      "features" : [ ],
      "common_noun_semantic" : [ 49, 76, 88 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "dependency_labels" : [ {
        "token_id" : 1,
        "label" : "cc"
      } ],
      "attributes" : { }
    }, {
      "id" : 1,
      "form" : "Wann",
      "kana" : "Zu",
      "lemma" : "Wann",
      "pos" : "Fallassistent",
      "features" : [ "Dauereinsatz" ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 1,
      "head" : 7,
      "dep" : "D",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ {
        "link" : 0,
        "label" : "other"
      } ]
    },
    "tokens" : [ {
      "id" : 2,
      "form" : "Tochter",
      "kana" : "Musume",
      "lemma" : "Tochter",
      "pos" : "Substantiv",
      "features" : [ ],
      "common_noun_semantic" : [ 49, 59, 88 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "dependency_labels" : [ {
        "token_id" : 0,
        "label" : "conj"
      }, {
        "token_id" : 3,
        "label" : "case"
      } ],
      "attributes" : { }
    }, {
      "id" : 3,
      "form" : "Ist",
      "kana" : "C.",
      "lemma" : "Ist",
      "pos" : "Aufeinanderfolgende Hilfswörter",
      "features" : [ ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 2,
      "head" : 3,
      "dep" : "D",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 4,
      "form" : "Reise",
      "kana" : "Ryoko",
      "lemma" : "Reise",
      "pos" : "Substantiv",
      "features" : [ "Bewegung" ],
      "common_noun_semantic" : [ 1658, 1659, 1660 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ 18 ],
      "dependency_labels" : [ {
        "token_id" : 5,
        "label" : "case"
      } ],
      "attributes" : { }
    }, {
      "id" : 5,
      "form" : "Zu",
      "kana" : "D.",
      "lemma" : "Zu",
      "pos" : "Fallassistent",
      "features" : [ "Dauereinsatz" ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 3,
      "head" : 7,
      "dep" : "P",
      "chunk_head" : 0,
      "chunk_func" : 2,
      "links" : [ {
        "link" : 2,
        "label" : "purpose"
      } ],
      "predicate" : [ "past" ]
    },
    "tokens" : [ {
      "id" : 6,
      "form" : "Linie",
      "kana" : "ich",
      "lemma" : "gehen",
      "pos" : "Verbstamm",
      "features" : [ "IKU" ],
      "common_noun_semantic" : [ 2053, 2132 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ 15, 20, 29, 32, 5 ],
      "dependency_labels" : [ {
        "token_id" : 4,
        "label" : "nmod"
      }, {
        "token_id" : 7,
        "label" : "aux"
      }, {
        "token_id" : 8,
        "label" : "aux"
      }, {
        "token_id" : 9,
        "label" : "punct"
      } ],
      "attributes" : { }
    }, {
      "id" : 7,
      "form" : "Tsu",
      "kana" : "Tsu",
      "lemma" : "Tsu",
      "pos" : "Verbale Nutzung endet",
      "features" : [ ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    }, {
      "id" : 8,
      "form" : "Ta",
      "kana" : "Ta",
      "lemma" : "Ta",
      "pos" : "Verbsuffix",
      "features" : [ "halt" ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    }, {
      "id" : 9,
      "form" : "。",
      "kana" : "",
      "lemma" : "。",
      "pos" : "Phrase",
      "features" : [ ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 4,
      "head" : 7,
      "dep" : "D",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 10,
      "form" : "ich",
      "kana" : "ich",
      "lemma" : "ich",
      "pos" : "Substantiv",
      "features" : [ "Gleichbedeutend" ],
      "common_noun_semantic" : [ 37, 8 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "dependency_labels" : [ {
        "token_id" : 11,
        "label" : "cc"
      } ],
      "attributes" : { }
    }, {
      "id" : 11,
      "form" : "Wann",
      "kana" : "Zu",
      "lemma" : "Wann",
      "pos" : "Fallassistent",
      "features" : [ "Dauereinsatz" ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 5,
      "head" : 7,
      "dep" : "D",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 12,
      "form" : "Sohn",
      "kana" : "Musco",
      "lemma" : "Sohn",
      "pos" : "Substantiv",
      "features" : [ ],
      "common_noun_semantic" : [ 48, 58, 87 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "dependency_labels" : [ {
        "token_id" : 13,
        "label" : "case"
      } ],
      "attributes" : { }
    }, {
      "id" : 13,
      "form" : "Ist",
      "kana" : "C.",
      "lemma" : "Ist",
      "pos" : "Aufeinanderfolgende Hilfswörter",
      "features" : [ ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 6,
      "head" : 7,
      "dep" : "D",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ ]
    },
    "tokens" : [ {
      "id" : 14,
      "form" : "Gebratenes Fleisch",
      "kana" : "Yakiniku",
      "lemma" : "Gebratenes Fleisch",
      "pos" : "Substantiv",
      "features" : [ ],
      "common_noun_semantic" : [ 843, 852 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "dependency_labels" : [ {
        "token_id" : 15,
        "label" : "case"
      } ],
      "attributes" : { }
    }, {
      "id" : 15,
      "form" : "Zu",
      "kana" : "Wo",
      "lemma" : "Zu",
      "pos" : "Fallassistent",
      "features" : [ "Dauereinsatz" ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  }, {
    "chunk_info" : {
      "id" : 7,
      "head" : -1,
      "dep" : "O",
      "chunk_head" : 0,
      "chunk_func" : 1,
      "links" : [ {
        "link" : 1,
        "label" : "agent"
      }, {
        "link" : 3,
        "label" : "manner"
      }, {
        "link" : 4,
        "label" : "coagent"
      }, {
        "link" : 5,
        "label" : "agent"
      }, {
        "link" : 6,
        "label" : "object"
      } ],
      "predicate" : [ "past" ]
    },
    "tokens" : [ {
      "id" : 16,
      "form" : "Essen",
      "kana" : "Tabe",
      "lemma" : "Essen",
      "pos" : "Verbstamm",
      "features" : [ "A" ],
      "common_noun_semantic" : [ 1581, 1590 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ 2, 23 ],
      "dependency_labels" : [ {
        "token_id" : 2,
        "label" : "nsubj"
      }, {
        "token_id" : 6,
        "label" : "advcl"
      }, {
        "token_id" : 10,
        "label" : "nmod"
      }, {
        "token_id" : 12,
        "label" : "nsubj"
      }, {
        "token_id" : 14,
        "label" : "dobj"
      }, {
        "token_id" : 17,
        "label" : "aux"
      }, {
        "token_id" : 18,
        "label" : "punct"
      } ],
      "attributes" : { }
    }, {
      "id" : 17,
      "form" : "Ta",
      "kana" : "Ta",
      "lemma" : "Ta",
      "pos" : "Verbsuffix",
      "features" : [ "halt" ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    }, {
      "id" : 18,
      "form" : "。",
      "kana" : "",
      "lemma" : "。",
      "pos" : "Phrase",
      "features" : [ ],
      "common_noun_semantic" : [ ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ ],
      "attributes" : { }
    } ]
  } ],
  "status" : 0,
  "message" : ""
}

Wenn Sie hier auf "Los" schauen, gibt es vier Ziele: "4,7,8,9". Eigentlich sollte es 6 sein, einschließlich "0,2", aber die API gibt auch 4 zurück.

Ursprünglich erwartetes Ergebnis

Sie erhalten die erwarteten Ergebnisse, wenn Sie eine einzelne Anweisung senden, anstatt mehrere Anweisungen gleichzeitig zu senden.

Das Ziel von "go", als nur "die Braut und die Tochter auf eine Reise gingen" gesendet wurde

{
      "id" : 6,
      "form" : "Linie",
      "kana" : "ich",
      "lemma" : "gehen",
      "pos" : "Verbstamm",
      "features" : [ "IKU" ],
      "common_noun_semantic" : [ 2053, 2132 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ 15, 20, 29, 32, 5 ],
      "dependency_labels" : [ {
        "token_id" : 0,
        "label" : "nmod"
      }, {
        "token_id" : 2,
        "label" : "nsubj"
      }, {
        "token_id" : 4,
        "label" : "nmod"
      }, {
        "token_id" : 7,
        "label" : "aux"
      }, {
        "token_id" : 8,
        "label" : "aux"
      }, {
        "token_id" : 9,
        "label" : "punct"
      } ],
      "attributes" : { }
    }

Ist die Syntaxanalyse etwas verdächtig, wenn ich mehrere Sätze sende, oder liegt ein Problem mit meinem Namen vor? .. Es ist notwendig zu untersuchen.

Das Ziel von "go", wenn ein anderer Satz verkettet und gesendet wird, nachdem "die Braut und die Tochter eine Reise unternommen haben".

{
      "id" : 6,
      "form" : "Linie",
      "kana" : "ich",
      "lemma" : "gehen",
      "pos" : "Verbstamm",
      "features" : [ "IKU" ],
      "common_noun_semantic" : [ 2053, 2132 ],
      "proper_noun_semantic" : [ ],
      "declinable_word_semantic" : [ 15, 20, 29, 32, 5 ],
      "dependency_labels" : [ {
        "token_id" : 4,
        "label" : "nmod"
      }, {
        "token_id" : 7,
        "label" : "aux"
      }, {
        "token_id" : 8,
        "label" : "aux"
      }, {
        "token_id" : 9,
        "label" : "punct"
      } ],
      "attributes" : { }
    }

Wenn Sie sich spec ansehen, heißt es "Satz: zu analysierende Aussage" und Sie können ihn als einzelnen Satz lesen. "Trennen nach Sätzen" ist jedoch auch eine der natürlichen Sprachverarbeitungen. Daher dachte ich, wenn nur ein einziger Satz als Ziel ausgewählt würde, wäre dies ein Problem als API-Spezifikation. Im Bereich des Text Mining denke ich, dass es im Gegensatz zur akademischen Welt selten ist, nur einen einzigen Satz zu verarbeiten.

Beispielsweise gibt Stanford NLP auch Satzumbrüche als Anmerkungen zurück. (Zum Beispiel: "Ich habe mir die Live-Aufführung von Morning Musume angesehen." Muss korrekt analysiert werden.)

Wenn ich ein professioneller Benutzer bin, wenden Sie sich an den Support. Wenn ich für die Lieferung verantwortlich wäre, würde ich Sie dringend bitten, das Problem zu beheben. Wenn ich ein Entwicklungsleiter wäre, würde es sofort behoben werden ^ _ ^ ;;

Verknüpfung

COTOHA API Portal

[JAVA] Kommentar zur COTOHA Syntax Analysis API