Write a good-performing function to find the Levenshtein edit distance that represents the distance between strings containing "full-width"
However, the cost of addition, deletion, and replacement is all set to 1.
The Levenshtein distance (Levenshtein distance) or editing distance (Henshukyori) is a numerical value that indicates how different two character strings are in information theory. Specifically, it is given as the minimum number of steps required to transform one character string into another by inserting, deleting, or replacing characters. To give an example of how to find a practical distance, transforming "kitten" into "sitting" requires at least three steps as shown below, so Levenshtein between two words. The distance is 3. kitten sitten (replace “k” with “s”) sittin (replace "e" with "i") sitting (insert “g” to finish) (From wikipedia)
def levenshtein(s1, s2):
if len(s1) < len(s2):
return levenshtein(s2, s1)
# len(s1) >= len(s2)
if len(s2) == 0:
return len(s1)
previous_row = range(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + 1 # j+1 instead of j since previous_row and current_row are one character longer
deletions = current_row[j] + 1 # than s2
substitutions = previous_row[j] + (c1 != c2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]
Quoted from below http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python
Recommended Posts