⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 distance.cpp

📁 可自动识别自然语言,句子匹配等功能,达到人工智能识别
💻 CPP
字号:
/////////////////////////////////////////////////////////////////////////////////////////////////
//
// Levenshtein Distance
// @author: Gonzales Cenelia
// homepage: www.ai-search.4t.com
//
// The "Levenshtein distance" is a measure of the similarity between two strings,
// this algorithm is also refered to as "edit distance". The "Levenshtein distance"
// was named after the russian scientist "Vladimir Levenshtein", who has discovered
// it back in 1965. The smaller the distance between two strings, the closer are
// these strings syntacticaly. The "Levenshtein distance" is computed by
// calculating the minimum number of operations that has to be made to transform
// one string to another one,usualy this operations are: replace,insert or delete a character
// example: we can change the word: "mathematics" to "mathematician" by changing one character
// and by inserting two more characters at the end.(we can replace "s" by "i" and 
// also insert "a" and "n" after that). The total number of operations that was needed in this
// case to change "mathematics" to "mathematician" was 3 operations and since it is
// also the smallest number of operation that can be use to transform one of this strings
// to the other one, that value is also a measure of the "Levenshtein distance" between
// these two strings. 
// There has been many application of the "Levenshtein distance", here is a few of them: 
// Spell Checking, Speech Recognition, Pattern Recognition etc. //****************
//
//****************************************************************************
#include "distance.h"


// finds the minimum of tree integers
int _min(int a, int b, int c) {
	return min(min(a, b), c);
}

// allocates a 2D array of integers
int **create_matrix(int Row, int Col) {
	int **array = new int*[Row];
	for(int i = 0; i < Row; ++i) {
		array[i] = new int[Col];
	}
	return array;
}

// deallocates memory
int **delete_matrix(int **array, int Row, int Col) {
	for(int i = 0; i < Row; ++i) {
		delete array[i];
	}
	delete [] array;
	return array;
}

// computes the Levenshtein distance between two strings
// "x" represent the pattern and "y" represent the text
// "m" is the pattern length and "n" is the text length
int LD(const char *x, unsigned int m, const char *y, unsigned int n) {
	// if the length of the second string is zero
	// then the distance between the two strings will
	// be equal to the length of the first string
	// and vis-versa
	// if the length of both strings is equal to zero
	// then the distance between this two strings will
	// simply be zero
	if (n == 0) {
		return m;
	} 
	else if (m == 0) {
		return n;
	}

	// creating a matrix of m+1 rows and n+1 columns
	int **matrix = create_matrix(m + 1, n + 1);

	// initialising the first row of the matrix
	for(unsigned int i = 0; i <= n; ++i) {
		matrix[0][i] = i; 
	}

	// initialising the first column of the matrix
	for(i = 0; i <= m; ++i) {
		matrix[i][0] = i; 
	}

	// complementary variables for computing the "Levenshtein distance"
	unsigned int above_cell, left_cell, diagonal_cell, cost;

	// starting the main process for computing 
	// the distance between the two strings "x" and "y"
	for(i = 1; i <= m; ++i) {
		for(unsigned int j = 1; j <= n; ++j) {
			// if the current two characters
			// of both strings are the same
			// then, the corresponding cost value
			// will be zero,otherwise it will be 1
			if (x[i-1] == y[j-1]) {
				cost = 0;
			} 
			else {
				cost = 1;
			}
			// current cell of the matrix: matrix[i][j]

			// finds the above cell to the current cell
			above_cell = matrix[i-1][j];

			// finds the left cell to the current cell
			left_cell = matrix[i][j-1];

			// finds the diagonally above cell to the current cell
			diagonal_cell = matrix[i-1][j-1];

			// computes the current value of the "edit distance" and place
			// the result into the current matrix cell
			matrix[i][j] = _min(above_cell + 1, left_cell + 1, diagonal_cell + cost);
		}
	}
	// placing the final result into a variable
	unsigned int result = matrix[m][n];
	// freeing memory that has been used
	// for the "matrix variable"
	delete_matrix(matrix, m + 1, n + 1);
	// returning result of the search
	return result;
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -