Convert sentences to vectors with gensim

I tried the Chapter From Strings to Vectors.

The stoplist part excludes unnecessary words.

What is a stop word Words that have to be excluded from the search target in order to improve the search accuracy because it takes too many searches. Function words such as particles and auxiliary verbs (such as "ha", "no", "desu", "masu" in Japanese, and "the", "of", "is" in English) are almost always applicable. ..

Refer to Hatena

`sample.py`



import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

from gensim import corpora, models, similarities

documents = ["Human machine interface for lab abc computer applications",
          "A survey of user opinion of computer system response time",
          "The EPS user interface management system",
          "System and human system engineering testing of EPS",
          "Relation of user perceived response time to error measurement",
          "The generation of random binary unordered trees",
          "The intersection graph of paths in trees",
          "Graph minors IV Widths of trees and well quasi ordering",
          "Graph minors A survey"]

          
# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
  for document in documents]

# remove words that appear only once
from collections import defaultdict
frequency = defaultdict(int)

# print(texts)

for text in texts:
	for token in text:
 		frequency[token] += 1

texts = [[token for token in text if frequency[token] > 1]
for text in texts]

# from pprint import pprint   # pretty-printer
# pprint(texts)

dictionary = corpora.Dictionary(texts)
# print(dictionary)

#Output with id
# print(dictionary.token2id)

#Convert to sentence vector
corpus = [dictionary.doc2bow(text) for text in texts]
print(corpus)

Official tutorial https://radimrehurek.com/gensim/tut1.html

Recommended Posts

Convert sentences to vectors with gensim

Convert 202003 to 2020-03 with pandas

Convert .ipynb to .html (with BatchFile)

Convert list to DataFrame with python

Convert PDF to image with ImageMagick

Convert memo at once with Python 2to3

Convert from PDF to CSV with pdfplumber

Convert character strings to features with RoBERTa

Convert Excel data to JSON with python

Convert Hiragana to Romaji with Python (Beta)

Convert FX 1-minute data to 5-minute data with Python

Convert PDF files to PNG files with GIMP

Convert array (struct) to json with golang

Convert HEIC files to PNG files with Python

Convert Chinese numerals to Arabic numerals with Python

Sample to convert image to Wavelet with Python

Convert to HSV

Convert DICOM to PNG with Ascending and Descending

Convert data with shape (number of data, 1) to (number of data,) with numpy.

Convert PDF to image (JPEG / PNG) with Python

Convert PDFs to images in bulk with Python

Convert mp4 to mp3 with ffmpeg (thumbnail embedded version)

Convert svg file to png / ico with Python

Convert Windows epoch values to date with python

Easily convert Jupyter Notebooks to blogs with fastpages

How to convert (32,32,3) to 4D tensor (1,32,32,1) with ndarray type

Convert strings to character-by-character list format with python

I want to convert an image to WebP with lollipop

0 Convert unfilled date to datetime type with regular expression

Convert kanji to kana

Convert a text file with hexadecimal values to a binary file

How to convert horizontally held data to vertically held data with pandas

How to convert a class object to a dictionary with SQLAlchemy

Easy generation of stylistic pakuri sentences with MeCab + gensim

Convert jupyter to py

Convert keras-yolo3 to onnx

Convert the image in .zip to PDF with Python

How to convert JSON file to CSV file with Python Pandas

Convert dict to array

PyInstaller memorandum Convert Python [.py] to [.exe] with 2 lines

Convert json to excel

Convert numeric variables to categorical with thresholds in pandas

Convert Select query obtained from Postgre with Go to JSON

Convert images to sepia toning with PIL (Python Imaging Library)

Convert garbled scanned images to PDF with Pillow and PyPDF

I tried machine learning to convert sentences into XX style

Convert video to black and white with ffmpeg + python + opencv

Try to factorial with recursion

Connect to BigQuery with Python

Convert hexadecimal string to binary

[python] Convert date to string

[gensim] How to use Doc2Vec

Convert numpy int64 to python int

Convert HTML to text file

Connect to Wikipedia with Python

Post to slack with Python 3

Connect to Postgresql with GO

Output to syslog with Loguru

Introduction to RDB with sqlalchemy Ⅰ

How to update with SQLAlchemy?

To run gym_torcs with ubutnu16