Replace the named entity in the read text file with a label (using GiNZA)

I had a hard time as it was, so I will post it for the time being. There may be a better way. If you are a beginner like me, please refer to it.

The environment is python 3.6.9 and Ubuntu 18.04.4.

`change_NER.py`


# coding:utf-8
import spacy

with open('input.txt','r') as f:
  nlp = spacy.load('ja_ginza')
  data = f.read()
  doc = nlp(data)

with open('output.txt','w') as f:

    text = list(data)                               #Store each character in the list
    entity = [ent.label_ for ent in doc.ents]       #Named entity label
    start = [ent.start_char for ent in doc.ents]    #From what character is the named entity
    end = [ent.end_char for ent in doc.ents]        #What character is the named entity
    num = 0                                        
    stop = False

    for i in range(len(text)):
        if i == start[num]:
            f.write(entity[num])
            if num < len(start) - 1: #Out of range prevention
                num += 1
            stop = True

        elif stop == True:
            if i < end[num-1]: #Only the number of characters in the named entity
                continue　　　　#Consume i
            elif i == end[num-1]:
                stop = False
                f.write(text[i])

        else:
            f.write(text[i])

Recommended Posts

Replace the named entity in the read text file with a label (using GiNZA)

Read a file in Python with a relative path from the program

Replace the directory name and the file name in the directory together with a Linux command.

Get the file name in a folder using glob

[Sublime Text 2] Always execute a specific file in the project

Process the contents of the file in order with a shell script

Read the config file in Go language! Introducing a simple sample

[Python] Read a csv file with a large data size using a generator

A memo organized by renaming the file names in the folder with python

Extract lines that match the conditions from a text file with python

Read a Python # .txt file for a super beginner in Python with a working .py