JP2820183B2

JP2820183B2 - String comparison method

Info

Publication number: JP2820183B2
Application number: JP4230516A
Authority: JP
Inventors: 博志太田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-08-28
Filing date: 1992-08-28
Publication date: 1998-11-05
Anticipated expiration: 2013-11-05
Also published as: JPH0683871A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、２つの文字列の近似度
を判定する文字列比較方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string comparing method for determining the degree of similarity between two character strings.

【０００２】[0002]

【従来の技術】従来、長さｍの被比較文字列ａと、長さ
ｎの比較文字列ｂの近似度を判定する方法として、文字
列ｂを文字列ａに変換するのに必要なＬｅｖｅｎｓｔｅ
ｉｎ距離と呼ばれる、コスト（以下、距離と称する）を
求める方法が知られている。2. Description of the Related Art Conventionally, as a method of judging the degree of approximation between a compared character string a having a length m and a comparison character string b having a length n, Levenste required for converting a character string b into a character string a is used.
There is known a method of calculating a cost (hereinafter, referred to as a distance) called an in-distance.

【０００３】この方法の概要を説明すると、文字列の変
換には、文字の追加、削除、変更の３つのタイプがあ
る。これらの変換タイプに、それぞれ変換のコストを割
り当てる。そして、必要とされる変換のコストの合計が
文字列ａと文字列ｂの距離となる。To explain the outline of this method, there are three types of character string conversions: addition, deletion, and modification of characters. Each of these conversion types is assigned a conversion cost. Then, the total required conversion cost is the distance between the character strings a and b.

【０００４】この距離の値が０の場合は、２つの文字列
が等しいことになり、値が大きいほど２つの文字列はよ
り異なっているとされる。When the value of the distance is 0, the two character strings are equal, and the larger the value, the more the two character strings are different.

【０００５】以下、この方法について、被比較文字列
「ａＸｂＹｃ］と比較文字列「ａｂｃ」の近似度を求め
る場合を例にとって、図を参照しながら説明する。な
お、説明を簡単にするために追加、削除、変更のコスト
をそれぞれ１、２、３とする。Hereinafter, this method will be described with reference to the drawings, taking as an example the case where the degree of approximation between the compared character string "aXbYc" and the compared character string "abc" is obtained. For the sake of simplicity, the costs of addition, deletion, and change are 1, 2, and 3, respectively.

【０００６】図６は従来の方法を示すフローチャートで
あり、コスト配列初期化ステップでは行の大きさとして
被比較文字列「ａＸｂＹｃ」の長さ＋１、即ち、６をも
ち、列の大きさとして比較文字列「ａｂｃ」の長さ＋
１、即ち、４を持つ配列Ｍの配列要素Ｍ〔０，０〕の値
を０とした後、第０行目の配列要素Ｍ〔０，ｊ〕（０＜
ｊ＜６）についてそれぞれＭ〔０，ｊ−１〕の値に追加
のコスト１を加算した値とする。FIG. 6 is a flowchart showing a conventional method. In the cost array initialization step, the length of the line to be compared has the length of the character string to be compared "aXbYc" +1, that is, 6, and the comparison is made as the column size. Length of character string "abc" +
After the value of the array element M [0,0] of the array M having 1, ie, 4, is set to 0, the array element M [0, j] (0 <
j <6) is a value obtained by adding an additional cost 1 to the value of M [0, j-1].

【０００７】その後、第０列目の配列要素Ｍ〔ｉ，０〕
（０＜ｉ＜４）についてＭ〔ｉ−１，０〕の値に削除の
コスト２を加算した値とする。初期化ステップ後の配列
Ｍは図８に示すように初期化される。Then, the array element M [i, 0] in the 0th column
For (0 <i <4), a value obtained by adding a deletion cost 2 to the value of M [i-1, 0] is used. The array M after the initialization step is initialized as shown in FIG.

【０００８】その後、図６において、コスト配列演算ス
テップが実行される。このコスト配列演算ステップのフ
ローチャートを図７に示す。このコスト配列演算ステッ
プでは図７のフローチャートの行及び列制御変数初期化
ステップＳ８，行制御判定ステップＳ９，列制御判
定ステップＳ１０の各ステップで制御される繰返しの中
で配列要素Ｍ〔ｉ，ｊ〕（０＜ｉ＜４＜０＜ｊ＜６）の
それぞれについて配列の順位の小さい順に値を求めてい
く。Thereafter, in FIG. 6, a cost array operation step is executed. FIG. 7 shows a flowchart of the cost array calculation step. In this cost array calculation step, the array element M [i, j] in the repetition controlled by the row and column control variable initialization step S8, row control determination step S9, and column control determination step S10 in the flowchart of FIG. For each of (0 <i <4 <0 <j <6), values are obtained in ascending order of the array.

【０００９】比較文字列「ａＸｂＹｃ」の第ｉ番目の文
字と比較文字列「ａｂｃ」の第ｊ番目の文字とが等しい
か否かが文字比較ステップＳ１１で判定され、等しい場
合は、コスト演算１（ステップＳ１２）で配列要素Ｍ
〔ｉ，ｊ〕に、配列要素Ｍ〔ｉ−１，ｊ−１〕の値、配
列要素Ｍ〔ｉ，ｊ−１〕の値に追加のコスト１を加算し
た値、配列要素Ｍ〔ｉ−１，ｊ〕の値に削除のコスト２
を加算した値の３つの値の中の最小値が格納される。Whether or not the i-th character of the comparison character string "aXbYc" is equal to the j-th character of the comparison character string "abc" is determined in a character comparison step S11. (Step S12) Array element M
[I, j], the value of the array element M [i-1, j-1], the value obtained by adding an additional cost 1 to the value of the array element M [i, j-1], the array element M [i- [J] cost 2
Is stored as the minimum value among the three values obtained by adding.

【００１０】異なる場合は、コスト演算２（ステップＳ
１３）で配列要素Ｍ〔ｉ，ｊ〕に、配列要素Ｍ〔ｉ−
１，ｊ−１〕の値に変更のコスト３を加算した値、配列
要素Ｍ〔ｉ，ｊ−１〕の値に追加のコスト１を加算した
値、配列要素Ｍ〔ｉ−１，ｊ〕の値に削除のコスト２を
加算した値の３つの値の中の最小値が格納される。この
ようにして配列要素の値をそれぞれ求めた後の配列コス
ト配列Ｍを図９に示す。If they are different, cost calculation 2 (step S
13), the array element M [i, j] is added to the array element M [i−
[1, j-1]], the value obtained by adding the change cost 3 to the value of the array element M [i, j-1], and the value obtained by adding the additional cost 1 to the value of the array element M [i-1, j]. Is stored as the minimum value among the three values obtained by adding the deletion cost 2 to the value. FIG. 9 shows an array cost array M after the values of the array elements have been obtained in this manner.

【００１１】図９の距離Ｌ２が被比較文字列「ａＸｂＹ
ｃ」と比較文字列「ａｂｃ」の近似度であり、この場合
は４になる。The distance L2 in FIG. 9 is the character string to be compared "aXbY
c "and the degree of approximation of the comparison character string" abc ". In this case, it is 4.

【００１２】[0012]

【発明が解決しようとする課題】従来の文字列の比較方
法では、文字列内の文字の連続性が無視されるために、
例えば、被比較文字列「ａＸｂＹｃ」と比較文字列「ａ
ｂｃ」の近似度と、被比較文字列「ａｂｃＸＹ」と比較
文字列「ａｂｃ」の近似度が等しくなってしまい、文字
ａ，ｂ，ｃの連続性が近似度の判定に反映されないとい
う問題があった。In the conventional character string comparison method, the continuity of characters in the character string is ignored.
For example, the compared character string “aXbYc” and the comparison character string “a
bc "and the compared character string" abcXY "and the compared character string" abc "become equal, and the continuity of the characters a, b, and c is not reflected in the determination of the similarity. there were.

【００１３】従って本発明の目的は、文字列の近似度を
より正確に判定することのできる文字列の比較方法を提
供することを目的とする。Accordingly, it is an object of the present invention to provide a character string comparison method capable of more accurately determining the degree of approximation of a character string.

【００１４】[0014]

【課題を解決するための手段】本発明によれば、長さｍ
の被比較文字列ａと、長さｎの比較文字列ｂの近似度
を、文字の追加、削除、変更のそれぞれに対しコストを
設定し、行の大きさがｍ＋１、列の大きさがｎ＋１であ
るコスト配列Ｍを用意した後、第０列目の配列要素Ｍ
〔ｉ，０〕（０≦ｉ≦ｍ）の値をそれぞれｉ×削除、の
コストとし、配列要素Ｍ〔０，ｊ〕（０≦ｊ≦ｎ）の値
をそれぞれｊ×追加、のコストとするコスト配列初期化
ステップと、配列要素Ｍ〔ｉ，ｊ〕（０＜ｉ≦ｍ，０＜
ｊ≦ｎ）の値を配列要素Ｍ〔ｉ−１，ｊ−１〕の値に、
被比較文字列ａの第ｉ番目の文字と比較文字列ｂの第ｊ
番目の文字が等しい場合は０、異なる場合は変更のコス
トを加算した値、配列要素Ｍ〔ｉ，ｊ−１〕の値に追加
のコストを加算した値、配列要素Ｍ〔ｉ−１，ｊ〕の値
に削除のコストを加算した値、の３つの値のうちの最小
値となるように順に求めていくコスト演算ステップとを
有し、このコスト演算ステップによって求められた文字
列ｂを文字列ａに変換するコストを以って文字列ａと文
字列ｂの近似度とする文字列の比較方法において、さら
に、前記コスト配列Ｍと同じ大きさの部分一致配列Ｍ’
を用意し、部分一致配列Ｍ’の配列要素全てを０で初期
化する部分一致配列初期化ステップと、配列要素Ｍ’
〔ｉ，ｊ〕（０＜ｉ≦ｍ，０＜ｊ≦ｎ）の値を被比較文
字列ａのｉ番目の文字と比較文字列ｂのｊ番目の文字が
等しい場合には配列要素Ｍ’〔ｉ−１，ｊ−１〕の値に
１を加算した値とする部分一致演算ステップとを備え、
この部分一致演算ステップによって求められた部分一致
の最大長を、文字列ｂを文字列ａに変換するコストと併
せ、文字列の近似度とすることを特徴とする文字列比較
方法が得られる。According to the invention, the length m
The similarity between the compared character string a and the comparison character string b having the length n is set as a cost for each of addition, deletion, and change of characters, and the row size is m + 1 and the column size is n + 1. Is prepared, the array element M in the 0th column is prepared.
The value of [i, 0] (0 ≦ i ≦ m) is the cost of i × deletion, and the value of the array element M [0, j] (0 ≦ j ≦ n) is j × addition. And an array element M [i, j] (0 <i ≦ m, 0 <
j ≦ n) to the value of array element M [i−1, j−1],
The i-th character of the compared character string a and the j-th character of the compared character string b
If the second character is equal, 0; otherwise, the value obtained by adding the cost of the change, the value obtained by adding the additional cost to the value of the array element M [i, j-1], the array element M [i-1, j] And a value obtained by adding the cost of the deletion to the value of the cost calculation step. In the character string comparison method in which the character string a and the character string b are approximated by the cost of conversion into the string a, the partial matching array M ′ having the same size as the cost array M is further provided.
And a partial matching sequence initialization step of initializing all the array elements of the partial matching sequence M ′ to 0, and an array element M ′
If the value of [i, j] (0 <i ≦ m, 0 <j ≦ n) is equal to the j-th character of the compared character string b and the j-th character of the compared character string b, the array element M ′ A partial match operation step of adding 1 to the value of [i-1, j-1].
The character string comparison method is characterized in that the maximum length of the partial match obtained in the partial match calculation step is used as the degree of approximation of the character string, together with the cost of converting the character string b into the character string a.

【００１５】[0015]

【作用】このように本発明の文字列の比較方法では、文
字の連続性を測定するために、中間結果を格納する部分
一致配列Ｍ´を初期化する部分一致配列初期化ステップ
と配列要素Ｍ’〔ｉ，ｊ〕の値を被比較文字列ａのｉ番
目の文字と比較文字列ｂのｊ番目の文字が等しい場合に
は配列要素Ｍ’〔ｉ−１，ｊ−１〕の値に１を加算した
値とする部分一致演算ステップとを備え、この部分一致
演算ステップによって求められた部分一致配列Ｍ’の配
列要素の中の最大の値が、被比較文字列ａと比較文字列
ｂの一致する部分文字列の最大長となる。As described above, according to the character string comparison method of the present invention, in order to measure the continuity of characters, the partial matching array initializing step of initializing the partial matching array M 'storing the intermediate result and the array element M If the value of [i, j] is the same as the i-th character of the compared character string a and the j-th character of the comparison character string b, the value of the array element M '[i-1, j-1] A partial matching operation step of adding 1 to the comparison string, and the largest value among the array elements of the partial matching array M ′ obtained by the partial matching operation step is the compared character string a and the comparison character string b. Is the maximum length of the substring that matches

【００１６】[0016]

【実施例】本発明について図面を参照して、説明する。
図１は本発明のフローチャートであり、図２は図１の演
算ステップのフローチャートである。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described with reference to the drawings.
FIG. 1 is a flowchart of the present invention, and FIG. 2 is a flowchart of the calculation steps of FIG.

【００１７】被比較文字列「ａｂｃＸＹ」と比較文字列
「ａｂｃ」を比較する場合を例にとって説明する。な
お、説明を簡単にするために追加、削除、変更のコスト
をそれぞれ１，２，３とする。A case will be described as an example where the character string to be compared "abcXY" is compared with the character string to be compared "abc". For simplicity of description, the costs of addition, deletion, and change are 1, 2, and 3, respectively.

【００１８】図１に示す初期化ステップでは行の大きさ
として被比較文字列「ａＸｂＹｃ」の長さ＋１、即ち、
６をもち、列の大きさとして比較文字列「ａｂｃ」の長
さ＋１、即ち、４をもつコスト配列Ｍ及びコスト配列Ｍ
と同じ大きさの部分一致配列Ｍ’がコスト配列初期化ス
テップ１、部分一致配列初期化ステップ２にて、以下の
ように初期化される。In the initialization step shown in FIG. 1, the length of the line to be compared "aXbYc" is +1 as the line size, that is,
6, the cost array M and the cost array M having the length of the comparison character string “abc” +1 as the column size, ie, 4.
In the cost array initializing step 1 and the partially identical array initializing step 2, the partial coincidence array M ′ having the same size as is initialized as follows.

【００１９】配列要素Ｍ〔０，０〕の値を０とした後、
第０行目の配列要素Ｍ〔０，ｊ〕（０＜ｊ＜６）につい
て、それぞれＭ〔０，ｊ−１〕の値に追加のコスト１を
加算した値とする。After setting the value of array element M [0,0] to 0,
For the array element M [0, j] (0 <j <6) on the 0th row, a value obtained by adding an additional cost 1 to the value of M [0, j-1] is used.

【００２０】その後、第０列目の配列要素Ｍ〔ｉ，０〕
（０＜ｉ＜４）についてＭ〔ｉ−１，０〕の値に削除の
コスト２を加算した値とする。配列Ｍ’の全ての配列要
素は０で初期化される。初期化ステップ実行後のコスト
配列Ｍは図３に示すように初期化される。Then, the array element M [i, 0] in the 0th column
For (0 <i <4), a value obtained by adding a deletion cost 2 to the value of M [i-1, 0] is used. All array elements of array M 'are initialized to zero. The cost array M after the execution of the initialization step is initialized as shown in FIG.

【００２１】その後、図１に示す演算ステップが実行さ
れる。この演算ステップは、コスト演算ステップ３と部
分一致演算ステップ４とからなり、図２に示すフローチ
ャートの行及び列制御変数初期化のステップＳ１，行
制御判定ステップＳ２，列制御判定ステップＳ３の
各ステップで制御される繰返しの中で配列要素Ｍ〔ｉ，
ｊ〕（０＜ｉ＜４，０＜ｊ＜６）及び配列要素Ｍ’
〔ｉ，ｊ〕（０＜ｉ＜４，０＜ｊ＜６）のそれぞれにつ
いて配列の添字の小さい順に値を求めていく。Thereafter, the calculation step shown in FIG. 1 is executed. This calculation step includes a cost calculation step 3 and a partial match calculation step 4. Each step of the row and column control variable initialization step S1, the row control determination step S2, and the column control determination step S3 in the flowchart shown in FIG. Array element M [i,
j] (0 <i <4, 0 <j <6) and array element M ′
For each of [i, j] (0 <i <4, 0 <j <6), values are obtained in ascending order of the array subscript.

【００２２】次に、被比較文字列「ａｂｃＸＹ」の第ｉ
番目の文字と比較文字列「ａｂｃ」の第ｊ番目の文字が
等しいか否かが文字比較ステップＳ４で判定され、等
しい場合はコスト演算１（ステップＳ５）で配列要素
Ｍ〔ｉ，ｊ〕に配列要素Ｍ〔ｉ−１，ｊ−１〕の値、配
列要素Ｍ〔ｉ，ｊ−１〕の値に追加のコスト１を加算し
た値、配列要素Ｍ〔ｉ−１，ｊ〕の値に削除のコスト２
を加算した値の３つの値の中の最小値が格納され、さら
に、部分一致演算（ステップＳ４）で配列要素Ｍ’
〔ｉ，ｊ〕に配列要素Ｍ’〔ｉ−１，ｊ−１〕に１を加
算した値が格納される。Next, the i-th character string of the compared character string "abcXY"
It is determined in a character comparison step S4 whether or not the jth character of the comparison character string "abc" is equal to the jth character. If the jth character is equal, the cost operation 1 (step S5) sets the array element M [i, j] The value of the array element M [i-1, j-1], the value obtained by adding an additional cost 1 to the value of the array element M [i, j-1], and the value of the array element M [i-1, j] Removal cost 2
Is stored, and the minimum value among the three values of the array element M ′ is stored in the partial match operation (step S4).
A value obtained by adding 1 to the array element M '[i-1, j-1] is stored in [i, j].

【００２３】異なる場合は、コスト演算２（ステップＳ
６）で配列要素Ｍ〔ｉ，ｊ〕に、配列要素Ｍ〔ｉ−
１，ｊ−１〕の値に変更のコスト３を加算した値、配列
要素Ｍ〔ｉ，ｊ−１〕の値に追加のコスト１を加算した
値、配列要素Ｍ〔ｉ−１，ｊ〕の値に削除のコスト２を
加算した値の３つの値の中の最小値が格納される。If different, cost calculation 2 (step S
6), the array element M [i, j] is added to the array element M [i−
[1, j-1]], the value obtained by adding the change cost 3 to the value of the array element M [i, j-1], and the value obtained by adding the additional cost 1 to the value of the array element M [i-1, j]. Is stored as the minimum value among the three values obtained by adding the deletion cost 2 to the value.

【００２４】このようにして、配列要素０の値をそれぞ
れ求めた後のコスト配列Ｍを図４に、部分一致配列Ｍ’
を図５に示す。FIG. 4 shows the cost array M after the values of the array element 0 have been obtained in this manner.
Is shown in FIG.

【００２５】図４に示す距離Ｄ１が被比較文字列「ａｂ
ｃＹＸ」と比較文字列「ａｂｃ」の変換コストであり、
図５に示す一致最大長Ｌ１が、被比較文字列と比較文字
列の一致する部分文字列の最大長である。The distance D1 shown in FIG.
cYX ”and the conversion cost of the comparison character string“ abc ”.
The maximum matching length L1 shown in FIG. 5 is the maximum length of the matching partial character string between the compared character string and the compared character string.

【００２６】ここで求められた距離Ｄ１及び一致最大長
Ｌ１をもって文字列の近似度とする。The distance D1 and the maximum matching length L1 obtained here are used as the degree of approximation of the character string.

【００２７】[0027]

【発明の効果】以上説明したように、本発明は、文字列
の近似度を判定する際、文字の連続性を考慮したため、
被比較文字列と比較文字列の一致する部分文字列の最大
長を同じに求めることができ、文字列の近似度を、より
正確に判定することができる。As described above, the present invention considers the continuity of characters when determining the degree of approximation of a character string.
The maximum length of the partial character string that matches the compared character string and the comparison character string can be obtained in the same manner, and the degree of approximation of the character string can be determined more accurately.

[Brief description of the drawings]

【図１】本発明にかかる文字列比較方法のフローチャー
トである。FIG. 1 is a flowchart of a character string comparison method according to the present invention.

【図２】図１の演算ステップのフローチャートである。FIG. 2 is a flowchart of a calculation step of FIG. 1;

【図３】図１の初期化ステップ実行後のコスト配列Ｍを
示した図である。FIG. 3 is a diagram showing a cost array M after execution of an initialization step in FIG. 1;

【図４】図１の演算ステップ実行後のコスト配列Ｍをを
示した図である。FIG. 4 is a diagram showing a cost array M after execution of the calculation step of FIG. 1;

【図５】図１の演算ステップ実行後の部分一致配列Ｍ’
を示した図である。FIG. 5 is a partial coincidence array M ′ after execution of the calculation step of FIG. 1;
FIG.

【図６】従来の文字列比較方法のフローチャートであ
る。FIG. 6 is a flowchart of a conventional character string comparison method.

【図７】図６のコスト配列演算ステップのフローチャー
トである。FIG. 7 is a flowchart of a cost array calculation step in FIG. 6;

【図８】図６の初期化ステップ実行後のコスト配列Ｍを
示した図である。FIG. 8 is a diagram showing a cost array M after execution of an initialization step in FIG. 6;

【図９】図６の演算ステップ実行後のコスト配列Ｍを示
した図である。FIG. 9 is a diagram showing a cost array M after execution of the calculation step of FIG. 6;

[Explanation of symbols]

１コスト配列初期化ステップ２部分一致配列初期化ステップ３コスト演算ステップ４部分一致演算ステップ 1 Cost array initialization step 2 Partial match array initialization step 3 Cost calculation step 4 Partial match calculation step

フロントページの続き (56)参考文献特開昭62−89134（ＪＰ，Ａ) 特開平１−181124（ＪＰ，Ａ) 特開昭60−27938（ＪＰ，Ａ) 特開平２−108157（ＪＰ，Ａ) 特開昭63−233427（ＪＰ，Ａ) 特開平２−232768（ＪＰ，Ａ) 特開平３−100865（ＪＰ，Ａ) Ｒ．Ｌ．ＫａｓｈｙａｐａｎｄＢ．Ｊ．Ｏｏｍｍｅｎ，”ＡＵＮＩＦＩＥＤＴＨＥＯＲＹＦＯＲＯＲＤＥＲＰＲＥＳＥＲＶＩＮＧＰＲＯＰＥＲＴＩＥＳＩＮＶＯＬＶＩＮＧＴＷＯＳＴＲＩＮＧＳ”，Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ 1980 ＣｏｎｆｅｒｅｎｃｅｏｎＩｎｆｏｒｍａｔｉｏｎｓｃｉｅｎｃｅｓａｎｄｓｙｓｔｅｍｓｐｐ193−198 （昭和56年８月21日ｊｉｃｓｔ受入Ｍ．Ｗ．ＤｕａｎｄＳ．Ｃ．Ｃｈａｎｇ，”Ａｍｏｄｅｌａｎｄａｆａｓｔａｌｇｏｒｉｔｈｍｆｏｒｍｕｌｔｉｐｌｅｅｒｒｏｒｓｓｐｅｌｌｉｎｇｃｏｒｒｅｃｔｉｏｎ”，ＡｃｔａＩｎｆｏｒｍａｔｉｃａ 29，281−302（平成４年７月８日ｊｉｃｓｔ受入 (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/30Continuation of the front page (56) References JP-A-62-89134 (JP, A) JP-A-1-181124 (JP, A) JP-A-60-27938 (JP, A) JP-A-2-108157 (JP) JP-A-63-233427 (JP, A) JP-A-2-232768 (JP, A) JP-A-3-100865 (JP, A) L. Kashiap and B.K. J. Ommen, "A UNIF IED THEORY FOR ORDER PRESERVING PROPERTIES INVOLVING TWO STRINGS", Proceedings of the 1980 Conference on Information Services. Du and SC Chang, "A model and a fast algorithm for multiple errors spelling corrections", Acta Informatica 29, 281-28, Acta Informatic, 29, 281-28 (Int.Cl. ⁶ , DB name) G06F 17/30

Claims

(57) [Claims]

1. A degree of approximation between a compared character string a having a length m and a comparison character string b having a length n is set to a cost for each of addition, deletion, and change of a character. After preparing a cost array M with m + 1 and a column size of n + 1,
The value of the array element M [i, 0] (0 ≦ i ≦ m) in the column is the cost of i × deletion, and the array element M [0, j]
A cost array initialization step in which the value of (0 ≦ j ≦ n) is added by j ×, and the value of an array element M [i, j] (0 <i ≦ m, 0 <j ≦ n) The value of the array element M [i-1, j-1] is added to the i-th character of the compared character string a and the j-th character of the compared character string b.
0 if the 2nd character is equal, otherwise the sum of the costs of change, array element M
It is the minimum value of three values: a value obtained by adding an additional cost to the value of [i, j-1], and a value obtained by adding a deletion cost to the value of the array element M [i-1, j]. And a cost calculation step of sequentially calculating the character string b obtained by the cost calculation step and converting the character string b into the character string a by using the character string a and the character string b as an approximation degree. In the column comparison method, further, a partial matching array M ′ having the same size as the cost array M is prepared, and all array elements of the partial matching array M ′ are initialized to 0. The value of element M ′ [i, j] (0 <i ≦ m, 0 <j ≦ n) is calculated by comparing the i-th character of the compared character string a with the j-th character of the compared character string b.
If the second character is equal, the array element M '[i-1, j
-1], which is a value obtained by adding 1 to the value of [-1]. The maximum length of the partial match obtained in the partial match calculation step is combined with the cost of converting the character string b into the character string a. A character string comparison method, wherein the degree of approximation of the character string is used.