KWIC (keyword in context) gets the place where a certain keyword appears, but it has the advantage of getting the context before and after it.
In other words, by knowing in what context the ** keyword is used **, it can be used for deepening qualitative interpretation.
KWIC is easy to do using the ConcordanceIndex class of NLTK, and NLTK (Natural Language Toolkit) is one of the libraries for natural language processing in Python.

⑴ Acquisition of corpus

Here, Kenji Miyazawa's "Night on the Galactic Railroad" will be used as the corpus, and Internet Electronic Library "Aozora Bunko" will be used as a resource.

❶ Import library

import re #Regular expression manipulation
import zipfile #Working with zip files
import urllib.request #Get data on the web
import os.path #Manipulating pathnames
import glob #Get file path name

❷ Get and read files

Define a set of methods to get the zip file, unzip it, save it, and get the path of the saved file.

def download(URL):
    #Download zip file
    zip_file = re.split(r'/', URL)[-1]
    urllib.request.urlretrieve(URL, zip_file)
    dir = os.path.splitext(zip_file)[0]

    #Unzip and save the zip file
    with zipfile.ZipFile(zip_file) as zip_object:
        zip_object.extractall(dir)

    os.remove(zip_file)

    #Get the path of the saved file
    path = os.path.join(dir,'*.txt')
    list = glob.glob(path)
    return list[0]

Define a set of methods to read the file, extract only the body, and remove the noise.

def convert(download_text):
    #File reading
    data = open(download_text, 'rb').read()
    text = data.decode('shift_jis')

    #Extraction of text
    text = re.split(r'\-{5,}', text)[2]  
    text = re.split(r'Bottom book:', text)[0]
    text = re.split(r'[#New Page]', text)[0]

    #Noise removal
    text = re.sub(r'《.+?》', '', text)
    text = re.sub(r'［＃.+?］', '', text)
    text = re.sub(r'｜', '', text)
    text = re.sub(r'\r\n', '', text)
    text = re.sub(r'\u3000', '', text)  
    text = re.sub(r'「', '', text) 
    text = re.sub(r'」', '', text)
    text = re.sub(r'、', '', text)
    text = re.sub(r'。', '', text)

    return text

Click here for details 3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko.
** Specify the URL ** to execute from file acquisition to body extraction. Copy the ** link address ** from "File Download" in "Book Card: No.43737" of the work.

URL = 'https://www.aozora.gr.jp/cards/000081/files/43737_ruby_19028.zip'

download_file = download(URL)
text = convert(download_file)

print(text)

The result of importing the file by specifying the URL in the download method and passing it to the convert method to extract only the body is as follows.

⑵ Separation by morphological analysis

NLTK's ConcordanceIndex class is intended for English processing, so use MeCab to convert Japanese sentences into a ** separated format ** with spaces between words.
By the way, concordance is mainly used to mean "match".

❶ Install MeCab

!apt install aptitude
!aptitude install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file -y
!pip install mecab-python3==0.7

❷ Divide into words

Create an instance in the MeCab.Tagger class with the word-separated output mode -Owakati, then separate it with a word using the parse method.

import MeCab

mecab = MeCab.Tagger("-Owakati")
words = mecab.parse(text).split()

❸ Separate word

In addition, use join to join words using a single-byte space as a delimiter.

doc = ' '.join(words)
print(doc)

⑶ Execution of KWIC

❶ Tokenization with nltk

Import nltk here, but it will not work unless you also download a tokenizer called punkt.
Tokenize the word-separated doc with ** NLTK and convert it to text format **.

import nltk
nltk.download('punkt')

text_ = nltk.Text(nltk.word_tokenize(doc))

❷ KWIC format output

As an example, specify the keyword "Giovanni".
Instantiate the ConcordanceIndex class with the input text as text_ and display the ** output in KWIC format based on the keyword **.

word = 'Giovanni'

#Create an instance and specify the input text
c = nltk.text.ConcordanceIndex(text_)

#Display KWIC format by keyword
c.print_concordance(word, width=40, lines=50)

The print_concordance method for displaying KWIC format allows you to specify ** display width ** with width and ** maximum number of lines ** with lines. Here, all the matched 196 points are displayed.
You can also get ** keyword location ** in the original text with the following offsets method. Here is the result of the search, which is the original purpose.

print(c.offsets(word))

Personally, I have the experience of using it complementarily for the purpose of deepening the analysis after performing collocation. However, the corpus was not very large, such as research data, and some of the important words had both negative and positive sides.
In other words, we simply performed negative-positive analysis of words, but next, I would like to look at the so-called sentiment analysis (calculation of emotional values) method.

3. Natural language processing with Python 4-1. Analysis for words with KWIC

⑴ Acquisition of corpus

❶ Import library

❷ Get and read files

⑵ Separation by morphological analysis

❶ Install MeCab

❷ Divide into words

❸ Separate word

⑶ Execution of KWIC

❶ Tokenization with nltk

❷ KWIC format output