Explanation of edit distance and implementation in Python

Algorithm description

The edit distance (Levenshtein distance) is the number of steps it takes to change one string to another. That step is ・ Delete one character ・ Insert any single character -Replace one character with another Is one of

Let's first understand the Levenshtein distance algorithm

For example, in the case of the strings cafe and coffee cafe→caffe→coffe→coffee Something like this

Let's make a table first

c o f f e e
c
a
f
e

Let's enter the initial value When the target string is empty (the target string will be increased little by little), of course, the Levenshtein distance is equal to the length of the string. So

c o f f e e
0 1 2 3 4 5 6
c 1 *
a 2
f 3
e 4

The production algorithm starts from "*****" in (3,3) All remaining cells will be the ** minimum ** below this • If the top character of the column in this cell is the same as the leftmost character in the row in this cell, then the value in this cell is equal to ** the value in the upper left cell **, so Otherwise ** the value in the upper left cell + 1 ** ・ Value in the left cell +1 ・ Value of the upper cell +1

Here, in the case of (3,3) ・ Since the top character "c" in the column at (3,3) and the leftmost character "c" in the row are the same, it becomes 0. ・ Because it becomes (2,3), it becomes 1. ・ Because it becomes (3,2), it becomes 1. (3,3) becomes 0 because it takes the minimum of the above three values.

c o f f e e
0 1 2 3 4 5 6
c 1 0
a 2
f 3
e 4

Fill all cells like this

c o f f e e
0 1 2 3 4 5 6
c 1 0 1 2 3 4 5
a 2 1 1 2 3 4 5
f 3 2 2 1 2 3 4
e 4 3 3 2 2 2 3

Get this table. The value at the bottom right of this table is the Levenshtein distance.

Implementation by Python 3.5.2

If you put this algorithm into a program

levenshtein.py


class Levenshtein:
#Start the array here and enter the initial value
    def initArray(self,str1,str2):
        distance = []
        for i in range(len(str1)+1):
            distance.append([0]*(len(str2)+1))
            distance[i][0] = i
        for j in range(len(str2)+1):
            distance[0][j] = j
        return distance
#Put a value in a cell
    def editDist(self,str1,str2,distance):
        dist = [0]*3
        for i in range(1,len(str1)+1):
            for j in range(1,len(str2)+1):
                dist[0] = distance[i-1][j-1] if str1[i-1]==str2[j-1] else distance[i-1][j-1]+1
                dist[1] = distance[i][j-1]+1
                dist[2] = distance[i-1][j]+1
                distance[i][j]=min(dist)
        return distance[i][j]

    def __init__(self,str1,str2):
        self.str1 = str1
        self.str2 = str2
        Levenshtein.distance = self.initArray(str1,str2)
        Levenshtein.dist = self.editDist(str1,str2,Levenshtein.distance)

if __name__ == '__main__':
    str1 = 'coffee'
    str2 = 'cafe'
    leven = Levenshtein(str1,str2)
    print(leven.dist)

Recommended Posts

Explanation of edit distance and implementation in Python
Explanation and implementation of SocialFoceModel
Overview of generalized linear models and implementation in Python
Explanation and implementation of PRML Chapter 4
Explanation and implementation of ESIM algorithm
Sorting algorithm and implementation in Python
Implementation of life game in Python
Explanation and implementation of simple perceptron
Implementation of original sorting in Python
[Reinforcement learning] Explanation and implementation of Ape-X in Keras (failure)
Explanation of CSV and implementation example in each programming language
Implementation module "deque" in queue and Python
Explanation and implementation of Decomposable Attention algorithm
Project Euler # 1 "Multiples of 3 and 5" in Python
Edit fonts in Python
Explanation and implementation of the XMPP protocol used in Slack, HipChat, and IRC
RNN implementation in python
ValueObject implementation in Python
Implementation of TRIE tree with Python and LOUDS
Implementation of particle filters in Python and application to state space models
SVM implementation in python
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
Full-width and half-width processing of CSV data in Python
Calculation of standard deviation and correlation coefficient in Python
Merge sort implementation / complexity analysis and experiment in Python
Python --Explanation and usage summary of the top 24 packages
Difference between Ruby and Python in terms of variables
Logical symbols learned in marriage (and implementation examples in Python)
[python] Calculation of months and years of difference in datetime
Variational Bayesian inference implementation of topic model in python
A reminder about the implementation of recommendations in Python
Explanation of NoReverseMatch error in "python django super introduction"
Sample of getting module name and class name in Python
Summary of date processing in Python (datetime and dateutil)
Stack and Queue in Python
Python implementation of particle filters
Neural network implementation in python
Unittest and CI in Python
Maxout description and implementation (Python)
Source installation and installation of Python
Applied practice of try/except and dictionary editing and retrieval in Python
[For beginners] Summary of standard input in Python (with explanation)
Reference order of class variables and instance variables in "self. Class variables" in Python
Comparison of how to use higher-order functions in Python 2 and 3
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ②
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ①
[Python] Strengths and weaknesses of DataFrame in terms of time required
[Tips] Problems and solutions in the development of python + kivy
Environment construction of python and opencv
Pixel manipulation of images in Python
The story of Python and the story of NaN
MIDI packages in Python midi and pretty_midi
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Difference between list () and [] in Python
Difference between == and is in python
Installation of SciPy and matplotlib (Python)
View photos in Python and html
Introduction and implementation of activation function
HMM parameter estimation implementation in python
Python implementation of self-organizing particle filters
Mixed normal distribution implementation in python