Find the Levenshtein Distance with python

Regarding the editing distance, it is a little old, but the article by Naoya Ito is helpful. Simply put, it's a way to express the closeness of two strings as a number.

Reference: Levenshtein Distance-Naoya's Hatena Diary

Preparation

There was a package called python-Levenshtein, so let's put it in.

$ sudo pip install python-Levenshtein

Try

Let's write a code like this.

#!/usr/bin/env python
# coding: utf8

import Levenshtein

string1 = "Yasuji Inoue"
string2 = "Yasuji Inoue"

string1 = string1.decode('utf-8')
string2 = string2.decode('utf-8')

print Levenshtein.distance(string1, string2)

$ python levenshtein.py
1

Japanese is also OK. If you replace one character, it will be the correct character, so the editing distance will be 1.

I'm not good at typing the letters python, and when I notice it, it becomes pyhton. The editing distance between pyhton and python is 2. (Because it will be the same if you swap the two letters)

bonus

Looking at the Documentation, it seems that you can also calculate the Jaro-Winkler distance and so on.

Bonus 2

If you register it as a MySQL stored like the one below, it will look like ORDER BY LEVENSHTEIN (title, "Hogehoge"). It is convenient because it will be displayed in the order of the letters. However, the index does not work, so if you are searching all records and the number of records is large, the query will be quite heavy.

https://github.com/fza/mysql-doctrine-levenshtein-function

Relation

PHP-[Multibyte support] Find the Levenshtein distance-Qiita

Recommended Posts

Find the Levenshtein Distance with python

Write a python program to find the editing distance [python] [Levenshtein distance]

Find the maximum Python

Find the mood value with python (Rike Koi)

Find the shortest path with the Python Dijkstra's algorithm

Find the difference in Python

Call the API with python3.

Find the maximum python (improved)

I tried to find the entropy of the image with python

[Python] Find the second smallest value.

Get the weather with Python requests

Get the weather with Python requests 2

Hit the Etherpad-lite API with Python

Install the Python plugin with Netbeans 8.0.2

I liked the tweet with python. ..

I calculated "Levenshtein distance" using Python

Master the type with Python [Python 3.9 compatible]

Find image similarity with Python + OpenCV

Find out the mystery change of Pokédex description by Levenshtein distance

Make the Python console covered with UNKO

Find the maximum value python (fixed ver)

Behind the flyer: Using Docker with Python

Check the existence of the file with python

Find the SHA256 value with R (with bonus)

[Python] Get the variable name with str

Search the maze with the python A * algorithm

Let's read the RINEX file with Python ①

Working with OpenStack using the Python SDK

Download files on the web with Python

Learn the design pattern "Singleton" with Python

[Python] Automatically operate the browser with Selenium

Learn the design pattern "Facade" with Python

The road to compiling to Python 3 with Thrift

FizzBuzz with Python3

Scraping with Python

Statistics with python

Scraping with Python

Twilio with Python

Integrate with Python

Play with 2016-Python

AES256 with python

Tested with Python

python starts with ()

with syntax (Python)

Find the general terms of the Tribonacci sequence with linear algebra and Python

Bingo with python

Zundokokiyoshi with python

Excel with Python

Microcomputer with Python

Cast with python

I tried "smoothing" the image with Python + OpenCV

[Python] Get the files in a folder with Python

Load the network modeled with Rhinoceros in Python ③

Prepare the execution environment of Python3 with Docker

Find the second derivative with JAX automatic differentiation

2016 The University of Tokyo Mathematics Solved with Python

I tried "differentiating" the image with Python + OpenCV

[Note] Export the html of the site with python.

Find a position above the threshold with NumPy

The easiest way to synthesize speech with python

Try to solve the man-machine chart with Python