JP7618458B2

JP7618458B2 - Character recognition device, character recognition method, and program

Info

Publication number: JP7618458B2
Application number: JP2021018142A
Authority: JP
Inventors: 遼平田中
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2025-01-21
Anticipated expiration: 2041-02-08
Also published as: JP2022121020A

Description

本発明の実施形態は、文字認識装置、文字認識方法、及びプログラムに関する。 Embodiments of the present invention relate to a character recognition device, a character recognition method, and a program.

従来、入力された画像に含まれる文字を認識する文字認識技術が知られていた。このような従来技術によれば、明示的にそれぞれの文字の境界を区切ることなく認識することにより、認識精度を向上させられることが知られている。しかしながら、文字の境界を区切ることなく認識する場合には、１つの文字を重複して認識してしまう場合があった。また、入力画像によっては、文字を読み飛ばして認識してしまう場合があった。すなわち、従来技術によれば、入力画像に含まれる文字を正しく認識できないといった問題があった。 Conventionally, character recognition technology has been known that recognizes characters contained in an input image. It is known that such conventional technology can improve recognition accuracy by recognizing characters without explicitly separating the boundaries of each character. However, when recognizing characters without separating their boundaries, there are cases where a character is recognized as being duplicated. Also, depending on the input image, there are cases where a character is recognized without being read through. In other words, conventional technology has a problem in that it is not possible to correctly recognize characters contained in an input image.

特開２０１３－０９７５９０号公報JP 2013-097590 A

本発明が解決しようとする課題は、入力画像に含まれる文字を正しく文字認識することができる文字認識装置、文字認識方法、及びプログラムを提供することである。 The problem that the present invention aims to solve is to provide a character recognition device, a character recognition method, and a program that can correctly recognize characters contained in an input image.

実施形態の文字認識装置は、第１スコア算出部と、文字領域推定部と、第２スコア算出部と、選択部とを持つ。第１スコア算出部は、文字列の尤もらしさを示す第１スコアであって、入力画像に含まれる文字列の候補である複数の候補文字列それぞれの前記第１スコアを算出する。文字領域推定部は、前記入力画像の領域のうち、前記候補文字列に含まれる文字それぞれに対応する領域を推定する。第２スコア算出部は、推定された領域に基づいて、前記候補文字列に含まれる文字の整合性を示す第２スコアを算出する。選択部は、算出された前記第１スコアと前記第２スコアとに基づいて、複数の前記候補文字列のうち１以上の文字列を選択する。前記入力画像は、複数の文字入力領域を含み、前記第２スコア算出部は、前記候補文字列に含まれる文字それぞれに対応する領域と、前記文字入力領域とに基づいて、前記第２スコアを算出する。
A character recognition device according to an embodiment includes a first score calculation unit, a character region estimation unit, a second score calculation unit, and a selection unit. The first score calculation unit is a first score indicating the likelihood of a character string, and calculates the first score for each of a plurality of candidate character strings that are candidates for character strings included in an input image. The character region estimation unit estimates an area of the input image corresponding to each character included in the candidate character string. The second score calculation unit calculates a second score indicating consistency of characters included in the candidate character string based on the estimated area. The selection unit selects one or more character strings from the plurality of candidate character strings based on the calculated first score and second score. The input image includes a plurality of character input areas, and the second score calculation unit calculates the second score based on an area corresponding to each character included in the candidate character string and the character input area.

第１の実施形態に係る文字認識システムの機能構成の一例を示す概略図。FIG. 1 is a schematic diagram showing an example of the functional configuration of a character recognition system according to a first embodiment. 第１の実施形態に係る入力画像の一例を示す図。FIG. 4 is a view showing an example of an input image according to the first embodiment. 第１の実施形態に係る第１スコアについて説明するための図。FIG. 4 is a diagram for explaining a first score according to the first embodiment. 第１の実施形態に係る第２スコアについて説明するための図。FIG. 6 is a diagram for explaining a second score according to the first embodiment. 第１の実施形態に係る文字認識システムの一連の動作を示すフローチャート。4 is a flowchart showing a series of operations of the character recognition system according to the first embodiment. 第１の実施形態に係る第２スコア算出部の機能構成の変形例を示す図。FIG. 13 is a diagram showing a modified example of the functional configuration of a second score calculation unit according to the first embodiment. 第１の実施形態に係る重複読みについて説明するための図。FIG. 4 is a diagram for explaining overlapping reading according to the first embodiment; 第１の実施形態に係る読み飛ばしスコア算出部の機能構成の一例を示す図。FIG. 4 is a diagram showing an example of the functional configuration of a skip score calculation unit according to the first embodiment. 第１の実施形態に係る文字らしさマップについて説明するための図。5 is a diagram for explaining a character-likeness map according to the first embodiment; FIG. 第１の実施形態に係る第１スコア算出部の機能構成の変形例を示す図。FIG. 13 is a diagram showing a modified example of the functional configuration of a first score calculation unit according to the first embodiment. 第２の実施形態に係るビームサーチについて説明するための図。FIG. 11 is a diagram for explaining a beam search according to the second embodiment. 第２の実施形態に係るビームサーチにおいて、部分入力画像毎の整合性スコアについて説明するための図。13A and 13B are diagrams for explaining a match score for each partial input image in the beam search according to the second embodiment. 第２の実施形態に係る文字認識システムの一連の動作を示すフローチャート。10 is a flowchart showing a series of operations of a character recognition system according to a second embodiment. 第３の実施形態に係る入力画像の一例を示す図。FIG. 13 is a view showing an example of an input image according to the third embodiment. 第３の実施形態に係る第２スコアの一例を示す図。FIG. 13 is a diagram showing an example of a second score according to the third embodiment. 第４の実施形態に係る文字領域推定部の動作の一例を説明するための図。13A to 13C are diagrams for explaining an example of the operation of a character region estimation unit according to the fourth embodiment. 第４の実施形態に係る文字領域推定部の動作の変形例を説明するための図。13A to 13C are diagrams for explaining a modified example of the operation of the character region estimation unit according to the fourth embodiment. 第５の実施形態に係る文字らしさマップの変形例について説明するための図。13A and 13B are diagrams for explaining modified examples of the character-likeness map according to the fifth embodiment; 第５の実施形態に係る文字らしさマップ生成部の変形例の機能構成の一例を示す図。FIG. 13 is a diagram showing an example of the functional configuration of a modified example of a character-likeliness map generation unit according to the fifth embodiment. 第５の実施形態に係る入力データと教師データの一例を示す図。13A and 13B are diagrams showing examples of input data and teacher data according to the fifth embodiment. 従来技術による重複読み及び読み飛ばしについて説明するための図。FIG. 1 is a diagram for explaining overlapping reading and skipping according to the conventional technology.

以下、実施形態の文字認識装置、文字認識方法、及びプログラムを、図面を参照して説明する。 The character recognition device, character recognition method, and program of the embodiment will be described below with reference to the drawings.

［従来技術］
図２１を参照しながら、従来技術による文字認識方法を用いた場合において発生する問題点について説明する。図１１は、従来技術による重複読み及び読み飛ばしについて説明するための図である。列９０には重複読みの一例を、列９５には読み飛ばしの一例をそれぞれ示す。 [Prior Art]
Problems that arise when using a character recognition method according to the prior art will be described with reference to Fig. 21. Fig. 11 is a diagram for explaining overlapping reading and skipping reading according to the prior art. Column 90 shows an example of overlapping reading, and column 95 shows an example of skipping.

重複読みの一例において、入力画像９１が入力された場合について説明する。入力画像９１に記載された文字列を正しく文字認識する場合、文字が記載されている位置として、領域９２と領域９３とを推定する。領域９２には“川”が、領域９３には“崎”がそれぞれ記載されているため、従来技術による文字認識方法によれば、入力画像９１には“川崎”が記載されていることが認識できる。一方、入力画像９１に記載された文字列を誤って文字認識してしまう場合、文字が記載されている位置として、領域９２と領域９３と領域９４とを推定する。領域９２には“川”が、領域９３には“崎”が、領域９４には“山”がそれぞれ記載されているため、従来技術による文字認識方法によれば、入力画像９１には“川山崎”が記載されていると誤認識してしまう。このように、重複読みの一例においては、“崎”の偏を“山”と認識した後、再度“崎”を認識しているため、“山”の文字を重複して認識してしまう問題があった。 In an example of overlapping reading, a case where an input image 91 is input will be described. When the character string written in the input image 91 is correctly recognized as a character, regions 92 and 93 are estimated as the positions where the characters are written. Since "川" is written in region 92 and "崎" is written in region 93, it can be recognized that "川崎" is written in the input image 91 according to the character recognition method of the conventional technology. On the other hand, when the character string written in the input image 91 is erroneously recognized as a character, regions 92, 93, and 94 are estimated as the positions where the characters are written. Since "川" is written in region 92, "崎" is written in region 93, and "山" is written in region 94, it is erroneously recognized that "川山崎" is written in the input image 91 according to the character recognition method of the conventional technology. Thus, in this example of overlapping reading, after the radical of "崎" is recognized as "山", "崎" is recognized again, there is a problem in that the character "山" is recognized overlappingly.

次に、読み飛ばしの一例において、入力画像９６が入力された場合について説明する。入力画像９６に記載された文字列を正しく文字認識する場合、文字が記載されている位置として、領域９７と領域９８と領域９９とを推定する。領域９７には“長”が、領域９８には“谷”が、領域９９には“川”がそれぞれ記載されているため、従来技術による文字認識方法によれば、入力画像９６には“長谷川”が記載されていることが認識できる。一方、入力画像９６に記載された文字列を誤って文字認識してしまう場合、文字が記載されている位置として、領域９７と領域９８とを推定する。領域９７には“長”が、領域９９には“川”がそれぞれ記載されているため、従来技術による文字認識方法によれば、入力画像９６には“長川”が記載されていると誤認識してしまう。このように、読み飛ばしの一例においては、領域９８に記載された“谷”の文字を読み飛ばして認識してしまう問題があった。 Next, an example of skipping will be described in which input image 96 is input. When the character string written in input image 96 is correctly recognized as a character, regions 97, 98, and 99 are estimated as the positions where the characters are written. Since "長" is written in region 97, "谷" is written in region 98, and "川" is written in region 99, it can be recognized that "Hasegawa" is written in input image 96 according to the character recognition method of the conventional technology. On the other hand, when the character string written in input image 96 is erroneously recognized as a character, regions 97 and 98 are estimated as the positions where the characters are written. Since "長" is written in region 97 and "川" is written in region 99, it is erroneously recognized that "長川" is written in input image 96 according to the character recognition method of the conventional technology. Thus, in this example of skipping, there was a problem in that the character "谷" written in region 98 was skipped over and recognized.

［第１の実施形態］
本実施形態に係る文字認識装置は、上述したような従来技術による問題点を抑止する。本実施形態に係る文字認識装置は、入力画像に記載された文字列について、文字列認識を行う。文字列認識とは、文字列を含む画像を入力として、画像中の文字列を認識するタスクである。本実施形態では左から右に読まれる横書きの文字列について説明する。なお、本実施形態においては、左から右に読まれる横書きの文字列に限定されず、縦書きの文字列についても同様である。なお、文字列を含む画像とは、手書きされた文字の画像や、写真撮影された看板、道路標識等を広く含む。なお、本実施形態において、文字列に含まれる文字数は、０文字以上である場合について説明する。 [First embodiment]
The character recognition device according to this embodiment suppresses the problems of the conventional technology as described above. The character recognition device according to this embodiment performs character string recognition on a character string written in an input image. Character string recognition is a task of recognizing a character string in an image by inputting an image including the character string. In this embodiment, a horizontally written character string read from left to right will be described. Note that this embodiment is not limited to a horizontally written character string read from left to right, and the same applies to a vertically written character string. Note that an image including a character string broadly includes an image of handwritten characters, a photographed signboard, a road sign, and the like. Note that in this embodiment, a case will be described in which the number of characters included in the character string is 0 or more.

図１は、第１の実施形態に係る文字認識システムの機能構成の一例を示す概略図である。同図を参照しながら、文字認識システム１の機能構成について説明する。文字認識システム１は、入力画像取得部２１と、候補文字列算出部２２と、文字認識装置１０と、出力部２３とを備える。 FIG. 1 is a schematic diagram showing an example of the functional configuration of a character recognition system according to a first embodiment. The functional configuration of the character recognition system 1 will be described with reference to the diagram. The character recognition system 1 includes an input image acquisition unit 21, a candidate character string calculation unit 22, a character recognition device 10, and an output unit 23.

入力画像取得部２１は、入力画像ＩＭを取得する。図２は、第１の実施形態に係る入力画像ＩＭの一例を示す図である。入力画像ＩＭには、文字列Ｓが記載されている。具体的には、入力画像ＩＭには、“川崎”の手書き文字である文字列Ｓが記載されている。本実施形態においては、入力画像ＩＭには、手書きされた文字列Ｓが記載されている場合の一例について説明する。 The input image acquisition unit 21 acquires an input image IM. FIG. 2 is a diagram showing an example of an input image IM according to the first embodiment. A character string S is written in the input image IM. Specifically, the input image IM contains the character string S, which is the handwritten character string "Kawasaki." In this embodiment, an example of the case where the handwritten character string S is written in the input image IM will be described.

図１に戻り、候補文字列算出部２２は、入力画像取得部２１が取得した入力画像ＩＭに記載された文字列Ｓの候補である候補文字列ＣＳを算出する。候補文字列算出部２２は、公知の文字認識技術（例えば、パターンマッチングや特徴検出等）を用い、候補文字列ＣＳを算出する。候補文字列算出部２２は複数の文字列Ｓを候補文字列ＣＳとして出力する。 Returning to FIG. 1, the candidate character string calculation unit 22 calculates candidate character strings CS that are candidates for the character string S written in the input image IM acquired by the input image acquisition unit 21. The candidate character string calculation unit 22 calculates the candidate character strings CS using a known character recognition technique (e.g., pattern matching, feature detection, etc.). The candidate character string calculation unit 22 outputs multiple character strings S as candidate character strings CS.

文字認識装置１０は、入力画像取得部２１が取得した入力画像ＩＭと、候補文字列算出部２２により算出された複数の候補文字列ＣＳとに基づき、候補文字列算出部２２により算出された複数の候補文字列ＣＳのうち、尤もらしい文字列を選択文字列ＳＳとして選択する。文字認識装置１０は、第１スコア算出部（文字認識部）１１０と、文字領域推定部１２０と、第２スコア算出部（領域整合性スコア算出部）１３０と、選択部１４０とを備える。 Based on the input image IM acquired by the input image acquisition unit 21 and the multiple candidate strings CS calculated by the candidate string calculation unit 22, the character recognition device 10 selects a likely string as a selected string SS from the multiple candidate strings CS calculated by the candidate string calculation unit 22. The character recognition device 10 includes a first score calculation unit (character recognition unit) 110, a character region estimation unit 120, a second score calculation unit (region consistency score calculation unit) 130, and a selection unit 140.

出力部２３は、文字認識装置１０により選択された選択文字列ＳＳを出力する。出力部２３は、例えば、選択文字列ＳＳを不図示の表示部に表示させるための情報を出力し、不図示の音声出力部から音声出力させるための情報を出力し、又は不図示の情報処理装置に無線出力することにより選択文字列ＳＳを出力する。 The output unit 23 outputs the selected character string SS selected by the character recognition device 10. For example, the output unit 23 outputs information for displaying the selected character string SS on a display unit (not shown), outputs information for audio output from an audio output unit (not shown), or outputs the selected character string SS wirelessly to an information processing device (not shown).

第１スコア算出部１１０は、候補文字列算出部２２が算出した複数の候補文字列ＣＳのそれぞれについて、第１スコアＳ１を算出する。候補文字列ＣＳとは、入力画像ＩＭに含まれる文字列Ｓの候補である。第１スコアＳ１は、文字列の尤もらしさを示す。すなわち、第１スコア算出部１１０は、文字列の尤もらしさを示す第１スコアＳ１であって、入力画像ＩＭに含まれる文字列Ｓの候補である複数の候補文字列ＣＳそれぞれの第１スコアＳ１を算出する。 The first score calculation unit 110 calculates a first score S1 for each of the multiple candidate strings CS calculated by the candidate string calculation unit 22. The candidate strings CS are candidates for the string S included in the input image IM. The first score S1 indicates the likelihood of the string. In other words, the first score calculation unit 110 calculates the first score S1 indicating the likelihood of the string, for each of the multiple candidate strings CS that are candidates for the string S included in the input image IM.

図３は、第１の実施形態に係る第１スコアについて説明するための図である。同図を参照しながら、第１スコア算出部１１０が算出する第１スコアＳ１の具体例について説明する。同図には、入力画像ＩＭに文字列“川崎”が含まれる場合の一例を示す。この一例において、候補文字列算出部２２は、候補文字列ＣＳ－１として“川山崎”と、候補文字列ＣＳ－２として“川崎”と、候補文字列ＣＳ－３として“川山奇”とを算出する。第１スコア算出部１１０は、候補文字列ＣＳ－１、候補文字列ＣＳ－２及び候補文字列ＣＳ－３それぞれについて第１スコアＳ１を算出する。この一例において、候補文字列ＣＳ－１の第１スコアＳ１－１は“０．５”であり、候補文字列ＣＳ－２の第１スコアＳ１－２は“０．５”であり、候補文字列ＣＳ－３の第１スコアＳ１－３は“０．１”である。 FIG. 3 is a diagram for explaining the first score according to the first embodiment. A specific example of the first score S1 calculated by the first score calculation unit 110 will be described with reference to the figure. The figure shows an example in which the input image IM includes the character string "Kawasaki". In this example, the candidate character string calculation unit 22 calculates "Kawayamazaki" as the candidate character string CS-1, "Kawasaki" as the candidate character string CS-2, and "Kawayamaki" as the candidate character string CS-3. The first score calculation unit 110 calculates the first score S1 for each of the candidate character strings CS-1, CS-2, and CS-3. In this example, the first score S1-1 of the candidate character string CS-1 is "0.5", the first score S1-2 of the candidate character string CS-2 is "0.5", and the first score S1-3 of the candidate character string CS-3 is "0.1".

図１に戻り、文字領域推定部１２０は、候補文字列ＣＳと、入力画像ＩＭとに基づき、文字領域ＣＡを推定する。文字領域ＣＡとは、候補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域である。すなわち、文字領域推定部１２０は、入力画像ＩＭの領域のうち、候補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域である文字領域ＣＡを推定する。 Returning to FIG. 1, the character area estimation unit 120 estimates a character area CA based on the candidate character string CS and the input image IM. The character area CA is an area corresponding to each character C included in the candidate character string CS. In other words, the character area estimation unit 120 estimates a character area CA, which is an area of the input image IM that corresponds to each character C included in the candidate character string CS.

第２スコア算出部１３０は、文字領域推定部１２０により推定された領域である文字領域ＣＡに基づいて、候補文字列ＣＳに含まれる文字の整合性を示す第２スコアＳ２を算出する。ここで、候補文字列ＣＳに含まれる文字の整合性とは、空間的な整合性をいう。空間的な整合性がない場合、各文字が重複したり、文字を読み飛ばしてしまったりする場合がある。 The second score calculation unit 130 calculates a second score S2 indicating the consistency of the characters included in the candidate character string CS based on the character area CA, which is the area estimated by the character area estimation unit 120. Here, the consistency of the characters included in the candidate character string CS refers to spatial consistency. If there is no spatial consistency, characters may overlap or characters may be skipped.

図４は、第１の実施形態に係る第２スコアについて説明するための図である。同図を参照しながら、文字領域推定部１２０が行う文字領域ＣＡの推定と、第２スコア算出部１３０が算出する第２スコアＳ２の具体例とについて説明する。同図には、入力画像ＩＭに文字列“川崎”が含まれる場合の一例を示す。この一例において、文字領域推定部１２０は、候補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域である文字領域ＣＡを推定する。例えば、文字領域推定部１２０は、候補文字列ＣＳ－１である“川山崎”に含まれる文字Ｃに対応する領域を文字領域ＣＡ１－１、文字領域ＣＡ２－１及び文字領域ＣＡ３－１として推定する。また、文字領域推定部１２０は、候補文字列ＣＳ－２である“川崎”に含まれる文字Ｃに対応する領域を文字領域ＣＡ１－２及び文字領域ＣＡ２－２として推定する。また、文字領域推定部１２０は、候補文字列ＣＳ－３である“川山奇”に含まれる文字Ｃに対応する領域を文字領域ＣＡ１－３、文字領域ＣＡ２－３及び文字領域ＣＡ３－３として推定する。 FIG. 4 is a diagram for explaining the second score according to the first embodiment. With reference to the figure, the estimation of the character area CA performed by the character area estimation unit 120 and a specific example of the second score S2 calculated by the second score calculation unit 130 will be explained. The figure shows an example in which the input image IM includes the character string "Kawasaki". In this example, the character area estimation unit 120 estimates character areas CA, which are areas corresponding to each of the characters C included in the candidate character string CS. For example, the character area estimation unit 120 estimates the areas corresponding to the characters C included in the candidate character string CS-1, "Kawasaki", as character areas CA1-1, CA2-1, and CA3-1. In addition, the character area estimation unit 120 estimates the areas corresponding to the characters C included in the candidate character string CS-2, "Kawasaki", as character areas CA1-2 and CA2-2. Additionally, the character area estimation unit 120 estimates the areas corresponding to the character C in the candidate character string CS-3, "川山奇," as character areas CA1-3, CA2-3, and CA3-3.

第２スコア算出部１３０は、複数の候補文字列ＣＳそれぞれについて第２スコアＳ２を算出する。この一例において、候補文字列ＣＳ－１の第２スコアＳ２－１は“０．１”であり、候補文字列ＣＳ－２の第２スコアＳ２－２は“１．０”であり、候補文字列ＣＳ－３の第２スコアＳ２－３は“１．０”である。 The second score calculation unit 130 calculates a second score S2 for each of the multiple candidate strings CS. In this example, the second score S2-1 of candidate string CS-1 is "0.1", the second score S2-2 of candidate string CS-2 is "1.0", and the second score S2-3 of candidate string CS-3 is "1.0".

図１に戻り、選択部１４０は、算出された第１スコアＳ１と第２スコアＳ２とに基づいて、複数の候補文字列ＣＳのうち１以上の文字列Ｓを、選択文字列ＳＳとして選択する。例えば、選択部１４０は、第１スコアＳ１と第２スコアＳ２とを乗じた結果、最も大きい値となる候補文字列ＣＳを、選択文字列ＳＳとして選択する。 Returning to FIG. 1, the selection unit 140 selects one or more character strings S from among the multiple candidate character strings CS as the selected character string SS based on the calculated first score S1 and second score S2. For example, the selection unit 140 selects the candidate character string CS that has the largest value as the selected character string SS when the first score S1 and the second score S2 are multiplied together.

図５は、第１の実施形態に係る文字認識システムの一連の動作を示すフローチャートである。以下、同図に示すフローチャートに沿って文字認識装置１０の一連の動作について説明する。 Figure 5 is a flowchart showing a series of operations of the character recognition system according to the first embodiment. Below, the series of operations of the character recognition device 10 will be explained according to the flowchart shown in the figure.

（ステップＳ１１０）入力画像取得部２１は、入力画像ＩＭを取得する。候補文字列算出部２２は、入力画像ＩＭに記載された文字列Ｓの候補となる候補文字列ＣＳを算出する。本フローチャートにおいては、候補文字列算出部２２が、ｎ個（ｎは１以上の整数）の候補文字列ＣＳを算出した場合について説明する。 (Step S110) The input image acquisition unit 21 acquires an input image IM. The candidate character string calculation unit 22 calculates candidate character strings CS that are candidates for the character string S written in the input image IM. In this flowchart, a case will be described in which the candidate character string calculation unit 22 calculates n candidate character strings CS (n is an integer equal to or greater than 1).

（ステップＳ１２０）第１スコア算出部１１０は、算出された複数の候補文字列ＣＳのうち、それぞれの候補文字列ＣＳについて、第１スコアＳ１を算出する。すなわち、候補文字列ＣＳをｙ_ｎ、第１スコアＳ１をα_ｎとした場合、第１スコア算出部１１０は、（ｙ_１，α_１）…（ｙ_ｎ，α_ｎ）を算出する。 (Step S120) The first score calculation unit 110 calculates a first score S1 for each of the calculated candidate character strings CS. That is, when the candidate character string CS is y _n and the first score S1 is α _n , the first score calculation unit 110 calculates (y _1, α ₁ )...(y _n, α _n ).

（ステップＳ１３０）第２スコア算出部１３０は、カウンタｉに１をセットする。 (Step S130) The second score calculation unit 130 sets counter i to 1.

（ステップＳ１４０）文字領域推定部１２０は、入力画像ＩＭのうち、候補文字列ＣＳに含まれる複数の文字Ｃにそれぞれ対応する領域を推定する。本フローチャートにおいては、候補文字列ＣＳにｍ文字含まれる場合（ｍは１以上の整数）について説明する。すなわち、候補文字列ＣＳであるｙ_ｉには、ｙ_ｉ，１,…,ｙ_ｉ，ｍの文字Ｃが含まれる。この場合、文字領域推定部１２０は、それぞれの文字Ｃに対応する文字領域ＣＡであるｓ_１,…,ｓ_ｍを推定する。 (Step S140) Character region estimation unit 120 estimates regions of input image IM corresponding to each of multiple characters C included in candidate character string CS. In this flowchart, a case where candidate character string CS includes m characters (m is an integer equal to or greater than 1) will be described. That is, candidate character string CS _yi includes characters C yi _,1 , ..., _yi,m . In this case, character region estimation unit 120 estimates character regions CA _s1 , ..., _sm corresponding to each character C.

（ステップＳ１５０）第２スコア算出部１３０は、候補文字列ＣＳであるｙ_ｉについての第２スコアＳ２を算出する。第２スコアＳ２をβ_ｎとも記載する。第２スコアＳ２であるβ_ｎは、ｓ_１,…,ｓ_ｍに基づき、算出される。 (Step S150) The second score calculation unit 130 calculates a second score S2 for y _i , which is a candidate character string CS. The second score S2 is also referred to as β _n . β _n , which is the second score S2, is calculated based on s ₁ , ..., s _m .

（ステップＳ１６０）選択部１４０は、第１スコアＳ１であるα_ｉと、第２スコアＳ２であるβ_ｉとに基づき、γ_ｉを算出する。 (Step S160) The selection unit 140 calculates γ _i based on α _i which is the first score S1 and β _i which is the second score S2.

（ステップＳ１７０）第２スコア算出部１３０は、ｉ＜ｎである場合、処理をステップＳ１９０に進める。すなわち、第２スコア算出部１３０は、カウンタｉが、候補文字列算出部２２により算出された候補文字列ＣＳの数であるｎ個に達するまで、ステップＳ１４０からステップＳ１６０までの工程を繰り返す。第２スコア算出部１３０は、ｉ＜ｎでない場合、すなわち、カウンタｉが、候補文字列算出部２２により算出された候補文字列ＣＳの数であるｎ個に達した場合、処理をステップＳ１８０に進める。 (Step S170) If i<n, the second score calculation unit 130 advances the process to step S190. That is, the second score calculation unit 130 repeats the process from step S140 to step S160 until the counter i reaches n, which is the number of candidate character strings CS calculated by the candidate character string calculation unit 22. If i<n is not true, that is, if the counter i reaches n, which is the number of candidate character strings CS calculated by the candidate character string calculation unit 22, the second score calculation unit 130 advances the process to step S180.

（ステップＳ１９０）第２スコア算出部１３０は、カウンタｉをインクリメントし、処理をステップＳ１４０に進める。 (Step S190) The second score calculation unit 130 increments the counter i and proceeds to step S140.

（ステップＳ１８０）選択部１４０は、γ_ｎが最大となる候補文字列ＣＳを、選択文字列ＳＳとして選択する。本フローチャートにおいて、選択部１４０は、最大点集合により選択文字列ＳＳを選択する。なお、選択部１４０は、α_ｎと、β_ｎの算出方法に応じて、最小点集合により選択文字列ＳＳを選択してもよい。 (Step S180) The selection unit 140 selects the candidate character string CS with the maximum γ _n as the selected character string SS. In this flowchart, the selection unit 140 selects the selected character string SS using the maximum point set. Note that the selection unit 140 may select the selected character string SS using the minimum point set depending on the calculation method of α _n and β _n .

［第１の実施形態の変形例］
図６は、第１の実施形態に係る第２スコア算出部の機能構成の変形例を示す図である。同図を参照しながら、第２スコア算出部１３０の変形例である第２スコア算出部１３０Ａについて説明する。第２スコア算出部１３０Ａは、重複読みスコア算出部１３１と、読み飛ばしスコア算出部１３２と、第２スコア統合部１３３とを備える。 [Modification of the first embodiment]
6 is a diagram showing a modified example of the functional configuration of the second score calculation unit according to the first embodiment. With reference to the figure, a second score calculation unit 130A, which is a modified example of the second score calculation unit 130, will be described. The second score calculation unit 130A includes a duplicate reading score calculation unit 131, a skipping score calculation unit 132, and a second score integration unit 133.

重複読みスコア算出部１３１は、候補文字列ＣＳの重複量を示すスコアである重複読みスコアＳ２１を算出する。候補文字列ＣＳの重複量とは、具体的には、候補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域が互いに重なり合う量である。第２スコア算出部１３０Ａは、算出された重複読みスコアＳ２１に基づいて第２スコアＳ２を算出する。すなわち、本実施形態において、第２スコア算出部１３０Ａは、候補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域が互いに重なり合う量に基づいて、第２スコアＳ２を算出する。 The overlapping reading score calculation unit 131 calculates an overlapping reading score S21, which is a score indicating the amount of overlapping of the candidate string CS. The amount of overlapping of the candidate string CS is, specifically, the amount of overlapping of the areas corresponding to the characters C included in the candidate string CS. The second score calculation unit 130A calculates a second score S2 based on the calculated overlapping reading score S21. That is, in this embodiment, the second score calculation unit 130A calculates the second score S2 based on the amount of overlapping of the areas corresponding to the characters C included in the candidate string CS.

図７は、第１の実施形態に係る重複読みについて説明するための図である。同図を参照しながら、重複読みスコア算出部１３１が算出する重複量について説明する。この一例において、入力画像ＩＭには、“川崎”との文字が記載され、文字領域推定部１２０は、文字領域ＣＡとして、文字領域ＣＡ１、文字領域ＣＡ２及び文字領域ＣＡ３を推定する。ここで、文字領域ＣＡ２と文字領域ＣＡ３とが互いに重なり合った領域である重複領域ＣＡ－ＤＰが重複量である。具体的には、重複読みスコア算出部１３１は、重複量がｍ（ｙ）である場合に、下の式（１）を重複の整合性スコアＰ_ｏｖｌｐとして算出する。 FIG. 7 is a diagram for explaining overlapping reading according to the first embodiment. With reference to the same figure, the overlapping amount calculated by the overlapping reading score calculation unit 131 will be explained. In this example, the input image IM contains the character "Kawasaki", and the character area estimation unit 120 estimates character area CA1, character area CA2, and character area CA3 as the character area CA. Here, the overlapping amount is the overlapping area CA-DP, which is an area where the character area CA2 and the character area CA3 overlap each other. Specifically, when the overlapping amount is m(y), the overlapping reading score calculation unit 131 calculates the following formula (1) as the overlapping consistency score P _ovlp .

ここで、Ｃ_ＯＰは０から１の定数であり、小さいほど重複の整合性スコアＰ_ｏｖｌｐは小さくなる。Ｃ_ＯＰの値は実験的に求められてもよい。 Here, C _OP is a constant between 0 and 1, and the smaller the C OP is, the smaller the overlap consistency score P _ovlp becomes. The value of C _OP may be determined experimentally.

図６に戻り、読み飛ばしスコア算出部１３２は、候補文字列ＣＳに含まれる文字Ｃと、文字領域推定部１２０により推定された領域とに基づいて、読み飛ばしが発生しているか否かを示すスコアである読み飛ばしスコアＳ２２を算出する。第２スコア算出部１３０は算出された読み飛ばしスコアＳ２２に基づいて、第２スコアＳ２を算出する。すなわち、第２スコア算出部１３０は、候補文字列ＣＳに含まれる文字Ｃと、文字領域推定部１２０により推定された領域とに基づいて、第２スコアＳ２を算出する。 Returning to FIG. 6, the skipping score calculation unit 132 calculates a skipping score S22, which is a score indicating whether skipping has occurred, based on the character C included in the candidate character string CS and the area estimated by the character area estimation unit 120. The second score calculation unit 130 calculates a second score S2 based on the calculated skipping score S22. That is, the second score calculation unit 130 calculates the second score S2 based on the character C included in the candidate character string CS and the area estimated by the character area estimation unit 120.

図８は、第１の実施形態に係る読み飛ばしスコア算出部の機能構成の一例を示す図である。読み飛ばしスコア算出部１３２の機能構成の一例について、図を参照しながら説明する。読み飛ばしスコア算出部１３２は、文字らしさマップ生成部１３２１と、読み飛ばしスコア統合部１３２２とを備える。本実施形態において、読み飛ばしスコア算出部１３２は、入力画像ＩＭの領域に何らかの文字Ｃが存在する尤もらしさに基づいて、第２スコアＳ２を算出する。 FIG. 8 is a diagram showing an example of the functional configuration of the skip score calculation unit according to the first embodiment. An example of the functional configuration of the skip score calculation unit 132 will be described with reference to the diagram. The skip score calculation unit 132 includes a character likelihood map generation unit 1321 and a skip score integration unit 1322. In this embodiment, the skip score calculation unit 132 calculates the second score S2 based on the likelihood that some character C exists in the area of the input image IM.

文字らしさマップ生成部１３２１は、文字らしさマップＣＬＭを生成する。文字らしさマップＣＬＭとは、入力画像ＩＭの画像領域に何らかの文字Ｃが存在する尤もらしさを示す。 The character-likelihood map generator 1321 generates a character-likelihood map CLM. The character-likelihood map CLM indicates the likelihood that some character C exists in the image area of the input image IM.

読み飛ばしスコア統合部１３２２は、文字領域推定部１２０により推定された文字領域ＣＡと、文字らしさマップ生成部１３２１により生成された文字らしさマップＣＬＭとに基づき、読み飛ばしスコアＳ２２を算出する。 The skip score integration unit 1322 calculates the skip score S22 based on the character area CA estimated by the character area estimation unit 120 and the character likeness map CLM generated by the character likeness map generation unit 1321.

図９は、第１の実施形態に係る文字らしさマップについて説明するための図である。同図を参照しながら、文字らしさマップＣＬＭと、読み飛ばしスコア統合部１３２２が行う処理の概要について説明する。 FIG. 9 is a diagram for explaining the character-likeness map according to the first embodiment. With reference to the figure, an overview of the character-likeness map CLM and the processing performed by the skip score integration unit 1322 will be explained.

図９（Ａ）は、読み飛ばしの誤認識をしている候補文字列ＣＳについて、文字領域推定部１２０が文字領域ＣＡを推定した場合における文字領域ＣＡについて示す図である。同図において、入力画像ＩＭに含まれる文字列Ｓである“長谷川”のうち“長”との文字については文字領域ＣＡ１として、“川”との文字については文字領域ＣＡ２として、推定されている。“谷”との文字については文字領域ＣＡとして推定されていない。すなわち、読み飛ばしが発生している。 Figure 9 (A) is a diagram showing character area CA when the character area estimation unit 120 estimates character area CA for a candidate character string CS that has been erroneously recognized as skipping. In the figure, the character "長" in the character string "Hasegawa" contained in the input image IM is estimated as character area CA1, and the character "川" as character area CA2. The character "谷" is not estimated as character area CA. In other words, skipping has occurred.

図９（Ｂ）は、文字らしさマップＣＬＭの一例について示す図である。同図に示す一例において、領域ＡＲ１、領域ＡＲ２及び領域ＡＲ３には文字が存在する確率が高い。すなわち、文字らしさマップ生成部１３２１は、入力画像ＩＭの画像領域に何らかの文字Ｃが存在する尤もらしさを文字らしさマップＣＬＭとして生成するため、読み飛ばしが発生している文字も含めた文字列Ｓについて、文字Ｃが存在する尤もらしさを推定する。 Figure 9 (B) is a diagram showing an example of a character likelihood map CLM. In the example shown in the figure, there is a high probability that characters exist in areas AR1, AR2, and AR3. In other words, the character likelihood map generating unit 1321 generates the likelihood that some character C exists in an image area of the input image IM as the character likelihood map CLM, and therefore estimates the likelihood that a character C exists for the character string S, including characters that have been skipped.

図９（Ｃ）は、マスクＭＳＫの一例を示す図である。読み飛ばしスコア統合部１３２２は、文字領域推定部１２０により推定された文字領域ＣＡに基づき、マスクＭＳＫを生成する。マスクＭＳＫは、候補文字列ＣＳに含まれる文字が存在する領域又は存在しない領域を示す。読み飛ばしスコア統合部１３２２は、生成したマスクＭＳＫにより文字らしさマップＣＬＭをフィルタリングする。読み飛ばしスコア統合部１３２２は、文字らしさマップＣＬＭをフィルタリングすることにより、候補文字列ＣＳに含まれていないにもかかわらず、文字が存在する確率が高い領域について推定する。 Figure 9 (C) is a diagram showing an example of the mask MSK. The skip score integrating unit 1322 generates a mask MSK based on the character area CA estimated by the character area estimation unit 120. The mask MSK indicates an area where a character included in the candidate string CS exists or does not exist. The skip score integrating unit 1322 filters the character likelihood map CLM using the generated mask MSK. By filtering the character likelihood map CLM, the skip score integrating unit 1322 estimates areas where a character is highly likely to exist even though it is not included in the candidate string CS.

図９（Ｄ）は、読み飛ばしスコア統合部１３２２によりフィルタリングされた後の文字らしさマップＣＬＭを示す図である。領域ＡＲ２は、文字が存在する確率が高いが、候補文字列ＣＳには含まれていない領域である。すなわち、領域ＡＲ２が大きいほど、読み飛ばしが発生している可能性が高いといえる。 Figure 9 (D) shows the character-likeness map CLM after filtering by the skip score integrating unit 1322. Area AR2 is an area where there is a high probability that a character exists, but is not included in the candidate character string CS. In other words, the larger area AR2 is, the more likely skipping has occurred.

ここで、入力画像ＩＭの画像領域に何らかの文字Ｃが存在する確率が高く、フィルタリングされた後の文字らしさマップＣＬＭに含まれる領域を、Ｕ_ｊ（ｙ）とする。入力画像ＩＭの画像領域を、幅Ｗ、高さＨに区切った場合、読み飛ばしスコア統合部１３２２は、下の式（２）を、読み飛ばしの整合性スコアＰ_ＳＫＩＰ（ｙ）として算出する。なお、入力画像ＩＭの画像領域は、入力画像ＩＭのピクセル単位で区切られてもよいし、複数のピクセルから構成される所定の範囲を単位として区切られてもよい。 Here, an image region of the input image IM in which there is a high probability that some character C exists and which is included in the character-likeliness map CLM after filtering is defined as U _j (y). When the image region of the input image IM is divided into a region with a width W and a height H, the skipping score integrating unit 1322 calculates the following formula (2) as the skipping consistency score P _SKIP (y). Note that the image region of the input image IM may be divided into pixel units of the input image IM, or may be divided into units of a predetermined range consisting of a plurality of pixels.

ここで、Ｃ_ＳＰは０以上の定数であり、Ｃ_ＳＰが大きいほど読み飛ばしの整合性スコアＰ_ＳＫＩＰは小さくなる。Ｃ_ＳＰの値は実験的に求められてもよい。なお、読み飛ばしペナルティを課さない場合は、Ｃ_ＳＰを０としてもよい。 Here, _CSP is a constant equal to or greater than 0, and the larger _CSP is, the smaller the skipping consistency score _PSKIP is. The value of _CSP may be obtained experimentally. If no skipping penalty is imposed, _CSP may be set to 0.

図６に戻り、第２スコア統合部１３３は、重複読みスコア算出部１３１により算出された重複読みスコアＳ２１と、読み飛ばしスコア算出部１３２により算出された読み飛ばしスコアＳ２２とに基づき、第２スコアＳ２を算出する。例えば、第２スコア統合部１３３は、重複読みスコアＳ２１と読み飛ばしスコアＳ２２を乗じた値を第２スコアＳ２として算出する。 Returning to FIG. 6, the second score integrating unit 133 calculates the second score S2 based on the overlapping reading score S21 calculated by the overlapping reading score calculating unit 131 and the skipping score S22 calculated by the skipping score calculating unit 132. For example, the second score integrating unit 133 calculates the second score S2 as a value obtained by multiplying the overlapping reading score S21 and the skipping score S22.

図１０は、第１の実施形態に係る第１スコア算出部の機能構成の変形例を示す図である。同図を参照しながら、第１スコア算出部１１０の変形例である第１スコア算出部１１０Aについて説明する。第１スコア算出部１１０Ａは、文字認識スコア算出部１１１と、知識処理スコア算出部１１２と、第１スコア統合部１１３とを備える。 Fig. 10 is a diagram showing a modified functional configuration of the first score calculation unit according to the first embodiment. With reference to the figure, a first score calculation unit 110A, which is a modified example of the first score calculation unit 110, will be described. The first score calculation unit 110A includes a character recognition score calculation unit 111, a knowledge processing score calculation unit 112, and a first score integration unit 113.

文字認識スコア算出部１１１は、候補文字列ＣＳごとに文字認識スコアＳ１１を算出する。文字認識スコアＳ１１は、文字列の尤もらしさを示す。 The character recognition score calculation unit 111 calculates a character recognition score S11 for each candidate character string CS. The character recognition score S11 indicates the likelihood of the character string.

知識処理スコア算出部１１２は、候補文字列ＣＳごとに知識処理スコアＳ１２を算出する。知識処理スコア算出部１１２は、入力画像ＩＭに記載されるべき候補文字列ＣＳが限られる場合に用いられる。入力画像ＩＭに記載されるべき候補文字列ＣＳが限られる場合とは、例えば、入力画像ＩＭが郵便番号、住所、氏名等である情報を事前に得ている場合である。入力画像ＩＭが郵便番号であることが分かっている場合、候補文字列ＣＳが数字でない場合には、知識処理スコアＳ１２は低く算出される。また、入力画像ＩＭが住所であることが分かっている場合、“川崎”よりも“川山奇”である場合の方が知識処理スコアＳ１２は低く算出される。 The knowledge processing score calculation unit 112 calculates the knowledge processing score S12 for each candidate character string CS. The knowledge processing score calculation unit 112 is used when the candidate character strings CS to be written in the input image IM are limited. A case in which the candidate character strings CS to be written in the input image IM are limited is, for example, when information such as the postal code, address, name, etc. is obtained in advance for the input image IM. If it is known that the input image IM is a postal code, and if the candidate character string CS is not a number, the knowledge processing score S12 is calculated to be lower. Also, if it is known that the input image IM is an address, the knowledge processing score S12 is calculated to be lower for "Kawayamaki" than for "Kawasaki".

第１スコア統合部１１３は、文字認識スコア算出部１１１により算出された文字認識スコアＳ１１と、知識処理スコア算出部１１２により算出された知識処理スコアＳ１２とに基づき、第１スコアＳ１を算出する。選択部１４０は、算出された第１スコアＳ１と、第２スコアＳ２とに基づき、選択文字列ＳＳを選択する。 The first score integration unit 113 calculates a first score S1 based on the character recognition score S11 calculated by the character recognition score calculation unit 111 and the knowledge processing score S12 calculated by the knowledge processing score calculation unit 112. The selection unit 140 selects a selected character string SS based on the calculated first score S1 and second score S2.

ここで、選択部１４０が、文字認識スコアＳ１１と、知識処理スコアＳ１２と、重複読みスコアＳ２１と、読み飛ばしスコアＳ２２とに基づき、選択文字列ＳＳを選択する場合の一例について説明する。この場合、選択部１４０は、下の式（３）に基づき、選択文字列ＳＳを選択する。 Here, an example will be described in which the selection unit 140 selects the selected character string SS based on the character recognition score S11, the knowledge processing score S12, the overlapping reading score S21, and the skipping score S22. In this case, the selection unit 140 selects the selected character string SS based on the following formula (3).

具体的には、選択部１４０は、文字認識スコアＳ１１であるＰ_ＯＣＲと、知識処理スコアＳ１２であるＰ_ＬＭと、重複読みスコアＳ２１であるＰ_ｏｖｌｐと、読み飛ばしスコアＳ２２であるＰ_ｓｋｉｐとを乗じた値が最大となる候補文字列ＣＳを選択文字列ＳＳとして選択する。 Specifically, the selection unit 140 selects, as the selected string SS, the candidate string CS that maximizes the product of P _OCR , which is the character recognition score S11, P _LM , which is the knowledge processing score S12, P _ovlp , which is the overlapping reading score S21, and P _skip , which is the skipping score S22.

［第１の実施形態のまとめ］
上述した実施形態によれば、文字認識装置１０は、第１スコア算出部１１０を備えることにより文字列Ｓの尤もらしさを示す第１スコアＳ１を候補文字列ＣＳごとに算出し、文字領域推定部１２０を備えることにより文字列Ｓに含まれる文字Ｃごとの領域を推定し、第２スコア算出部１３０を備えることにより文字Ｃの整合性を示す第２スコアＳ２を算出し、選択部１４０を備えることにより第１スコアＳ１と第２スコアＳ２に基づいて選択文字列ＳＳを選択する。すなわち、上述した実施形態によれば、文字Ｃが存在する領域の整合性を考慮して最尤文字列を選択する。したがって、文字認識装置１０は、入力画像ＩＭに含まれる文字Ｃを正しく文字認識することができる。 [Summary of the first embodiment]
According to the above-described embodiment, the character recognition device 10 includes the first score calculation unit 110 to calculate a first score S1 indicating the likelihood of the character string S for each candidate character string CS, the character region estimation unit 120 to estimate a region for each character C included in the character string S, the second score calculation unit 130 to calculate a second score S2 indicating the consistency of the character C, and the selection unit 140 to select a selected character string SS based on the first score S1 and the second score S2. That is, according to the above-described embodiment, the most likely character string is selected in consideration of the consistency of the region in which the character C exists. Therefore, the character recognition device 10 can correctly recognize the character C included in the input image IM.

また、上述した実施形態によれば、第２スコア算出部１３０は、重複読みスコアＳ２１に基づいて、第２スコアＳ２を算出する。重複読みスコアＳ２１とは、補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域が互いに重なり合う量に応じたスコアである。したがって、本実施形態によれば、文字認識装置１０は、重複読みを抑止することができるため、入力画像ＩＭに含まれる文字Ｃを正しく文字認識することができる。 Furthermore, according to the above-described embodiment, the second score calculation unit 130 calculates the second score S2 based on the overlapping reading score S21. The overlapping reading score S21 is a score according to the amount of overlap between the areas corresponding to the characters C included in the complementary string CS. Therefore, according to this embodiment, the character recognition device 10 can suppress overlapping reading, and therefore can correctly recognize the character C included in the input image IM.

また、上述した実施形態によれば、第２スコア算出部１３０は、読み飛ばしスコアＳ２２に基づいて、第２スコアＳ２を算出する。読み飛ばしスコアＳ２２とは、候補文字列ＣＳに含まれる文字Ｃと、文字領域推定部１２０により推定された文字領域ＣＡとに基づいたスコアであり、読み飛ばしが発生している場合には、与えられるペナルティが大きくなる。したがって、本実施形態によれば、文字認識装置１０は、読み飛ばしを抑止することができるため、入力画像ＩＭに含まれる文字Ｃを正しく文字認識することができる。 Furthermore, according to the embodiment described above, the second score calculation unit 130 calculates the second score S2 based on the skipping score S22. The skipping score S22 is a score based on the character C included in the candidate string CS and the character area CA estimated by the character area estimation unit 120, and if skipping occurs, the penalty given is large. Therefore, according to the embodiment, the character recognition device 10 can prevent skipping, and can therefore correctly recognize the character C included in the input image IM.

ここで、従来技術によれば、重複読みの改善と、読み飛ばしの改善とは二律背反の関係にあり、一方を改善すると他方の問題が生じやすくなってしまっていた。上述した実施形態によれば、重複読みスコアＳ２１と、読み飛ばしスコアＳ２２とを分けて算出し、総合的に選択文字列ＳＳを選択するため、重複読み及び読み飛ばしのいずれの問題についても改善することができる。 Here, according to the conventional technology, the improvement of duplicate reading and the improvement of skipping are in a trade-off relationship, and improving one makes the other problem more likely to occur. According to the above-mentioned embodiment, the duplicate reading score S21 and the skipping score S22 are calculated separately, and the selected character string SS is selected comprehensively, so that both the duplicate reading and skipping problems can be improved.

また、上述した実施形態によれば、第２スコア算出部１３０は、文字らしさマップＣＬＭを用いることにより、読み飛ばしスコアＳ２２を算出する。文字らしさマップＣＬＭとは、入力画像ＩＭの領域に何らかの文字Ｃが存在する尤もらしさを示す。本実施形態によれば、読み飛ばしを容易に抑止することができる。 In addition, according to the embodiment described above, the second score calculation unit 130 calculates the skipping score S22 by using the character-likelihood map CLM. The character-likelihood map CLM indicates the likelihood that some character C exists in an area of the input image IM. According to this embodiment, skipping can be easily prevented.

［第２の実施形態］
図１１から図１３を参照しながら、第２の実施形態に係る文字認識装置１０Ａの一例について説明する。第２の実施形態に係る文字認識装置１０Ａは、ビームサーチアルゴリズムを用いて、入力画像ＩＭに含まれる文字列Ｓの文字認識を行う。ここで、文字列Ｓに含まれる複数の文字Ｃそれぞれについて、候補となる文字Ｃを算出し、それぞれの文字Ｃの候補となる組み合わせを候補文字列ＣＳとする場合、文字列Ｓに含まれる文字Ｃの量が多くなるほど、候補文字列ＣＳが多くなってしまうという問題があった。候補文字列ＣＳが多くなると、選択文字列ＳＳを選択するのに時間とリソースがかかってしまう。そこで、本実施形態においては、ビームサーチアルゴリズムを用いることにより、少ない時間とリソースで文字認識することを目的とする。 Second Embodiment
An example of a character recognition device 10A according to the second embodiment will be described with reference to Figs. 11 to 13. The character recognition device 10A according to the second embodiment performs character recognition of a character string S included in an input image IM using a beam search algorithm. Here, when candidate characters C are calculated for each of a plurality of characters C included in the character string S and a combination of the candidates for each character C is set as a candidate character string CS, there is a problem that the more characters C included in the character string S, the more candidate character strings CS there are. When the number of candidate character strings CS increases, it takes time and resources to select a selection character string SS. Therefore, in this embodiment, the beam search algorithm is used to perform character recognition with less time and resources.

図１１は、第２の実施形態に係るビームサーチについて説明するための図である。同図を参照しながら、第２の実施形態に係るビームサーチについて説明する。本実施形態において、文字認識装置１０Ａは、入力画像ＩＭを複数の部分入力画像ＩＭＰに区切り、文字認識を行う。同図に示す一例では、入力画像ＩＭは、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２と、部分入力画像ＩＭＰ－３とに区切られる。部分入力画像ＩＭＰは、例えば所定のピクセル数に応じて区切られていてもよい。所定のピクセル数は、文字Ｃが記載されるであろう幅に応じて定められていてもよい。 FIG. 11 is a diagram for explaining beam search according to the second embodiment. Beam search according to the second embodiment will be explained with reference to the same figure. In this embodiment, the character recognition device 10A divides the input image IM into a plurality of partial input images IMP and performs character recognition. In the example shown in the same figure, the input image IM is divided into partial input images IMP-1, IMP-2, and IMP-3. The partial input images IMP may be divided according to, for example, a predetermined number of pixels. The predetermined number of pixels may be determined according to the width in which the letter C is likely to be written.

具体的には、まず、文字認識装置１０Ａは、入力画像ＩＭのうち、部分入力画像ＩＭＰ－１について、１以上の選択文字列ＳＳを選択する。次に、文字認識装置１０Ａは、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２とについて、１以上の選択文字列ＳＳを選択する。このとき、部分入力画像ＩＭＰ－１については、すでに１以上の選択文字列ＳＳが選択されているため、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２とについての候補文字列ＣＳは少なくなる。更に、文字認識装置１０Ａは、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２と、部分入力画像ＩＭＰ－３とについて、最終的な選択文字列ＳＳを選択する。このとき、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２とについては、すでに１以上の選択文字列ＳＳが選択されているため、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２、部分入力画像ＩＭＰ－３とについての候補文字列ＣＳは少なくなる。このように、本実施形態においては、部分入力画像ＩＭＰごとに候補となる文字列Ｓを絞っていくことにより、全体の処理時間を短くする。 Specifically, first, character recognition device 10A selects one or more selected character strings SS for partial input image IMP-1 of input image IM. Next, character recognition device 10A selects one or more selected character strings SS for partial input image IMP-1 and partial input image IMP-2. At this time, one or more selected character strings SS have already been selected for partial input image IMP-1, so the number of candidate character strings CS for partial input image IMP-1 and partial input image IMP-2 is reduced. Furthermore, character recognition device 10A selects final selected character strings SS for partial input image IMP-1, partial input image IMP-2, and partial input image IMP-3. At this time, one or more selected character strings SS have already been selected for partial input image IMP-1 and partial input image IMP-2, so the number of candidate character strings CS for partial input image IMP-1, partial input image IMP-2, and partial input image IMP-3 is reduced. In this way, in this embodiment, the overall processing time is shortened by narrowing down the candidate character strings S for each partial input image IMP.

図１２は、第２の実施形態に係るビームサーチにおいて、部分入力画像毎の整合性スコアについて説明するための図である。同図を参照しながら、入力画像ＩＭに“川崎市”と記載されている場合における、部分入力画像ＩＭＰ毎の整合性スコアについて説明する。図１２（Ａ）は、文字認識装置１０Ａが、部分入力画像ＩＭＰ－１について文字認識を行った場合における候補文字列ＣＳと整合性スコアの対応関係を示し、図１２（Ｂ）は、文字認識装置１０Ａが、部分入力画像ＩＭＰ－１と、部分入力画像ＩＭＰ－２とについて文字認識を行った場合における候補文字列ＣＳと整合性スコアの対応関係を示す。ここで、整合性スコアとは、選択部１４０が選択文字列ＳＳを選択する際に用いるスコアであって、例えば、第１スコアＳ１と第２スコアＳ２とを乗じたスコアである。 Figure 12 is a diagram for explaining the consistency score for each partial input image in the beam search according to the second embodiment. With reference to the figure, the consistency score for each partial input image IMP in the case where the input image IM contains "Kawasaki City" will be explained. Figure 12 (A) shows the correspondence between the candidate character strings CS and the consistency scores when the character recognition device 10A performs character recognition on the partial input image IMP-1, and Figure 12 (B) shows the correspondence between the candidate character strings CS and the consistency scores when the character recognition device 10A performs character recognition on the partial input images IMP-1 and IMP-2. Here, the consistency score is a score used by the selection unit 140 when selecting the selection character string SS, and is, for example, a score obtained by multiplying the first score S1 and the second score S2.

図１２（Ａ）において、文字認識装置１０Ａは、候補文字列ＣＳ－１１として“川”を、候補文字列ＣＳ－１２として“川１”を、候補文字列ＣＳ－１３として“ノリ”を算出する。それぞれの候補文字列ＣＳの整合性スコアは、それぞれ“１．０”、“０．３”、“１．０”である。文字認識装置１０Ａは、尤もらしい文字列である候補文字列ＣＳ－１１と、候補文字列ＣＳ－１３とを選択文字列ＳＳとして選択する。換言すれば、文字認識装置１０Ａは、候補文字列Ｃ－１２を、候補から除外する。 In FIG. 12(A), character recognition device 10A calculates "川" as candidate string CS-11, "川1" as candidate string CS-12, and "ノリ" as candidate string CS-13. The consistency scores of each candidate string CS are "1.0", "0.3", and "1.0", respectively. Character recognition device 10A selects candidate string CS-11, which is the most likely string, and candidate string CS-13 as selected strings SS. In other words, character recognition device 10A excludes candidate string CS-12 from the candidates.

図１２（Ｂ）において、文字認識装置１０Ａは、候補文字列ＣＳ－２１として“川山崎”を、候補文字列ＣＳ－２２として“川崎”を、候補文字列ＣＳ－２３として“ノリ山崎”を、候補文字列ＣＳ－２４として“ノリ崎”を、候補文字列ＣＳ－２５として“川山奇”を、候補文字列ＣＳ－２６として“ノリ山奇”を算出する。それぞれの候補文字列ＣＳの整合性スコアは、それぞれ“０．１”、“１．０”、“０．１”、“１．０”、“１．０”、“１．０”である。文字認識装置１０Ａは、尤もらしい文字列である候補文字列ＣＳ－２２と、候補文字列ＣＳ－２４と、候補文字列ＣＳ－２５と、候補文字列ＣＳ－２６とを選択文字列ＳＳとして選択する。換言すれば、文字認識装置１０Ａは、候補文字列Ｃ－２１と、候補文字列ＣＳ－２３とを、候補から除外する。ここで、部分入力画像ＩＭＰ－１の検討において、候補文字列ＣＳ－１２である“川１”が候補から除外されているため、部分入力画像ＩＭＰ－１と部分入力画像ＩＭＰ－２との検討において、候補となる文字列Ｓを少なくすることができる。 12B, character recognition device 10A calculates "Kawayamazaki" as candidate character string CS-21, "Kawasaki" as candidate character string CS-22, "Noriyamazaki" as candidate character string CS-23, "Norizaki" as candidate character string CS-24, "Kawayamaki" as candidate character string CS-25, and "Noriyamaki" as candidate character string CS-26. The consistency scores of each candidate character string CS are "0.1", "1.0", "0.1", "1.0", "1.0", and "1.0", respectively. Character recognition device 10A selects candidate character strings CS-22, CS-24, CS-25, and CS-26, which are likely character strings, as the selected character string SS. In other words, character recognition device 10A excludes candidate character strings C-21 and CS-23 from the candidates. Here, because candidate character string CS-12, "川1", has been excluded from the candidates when examining partial input image IMP-1, the number of candidate character strings S can be reduced when examining partial input image IMP-1 and partial input image IMP-2.

図１３は、第２の実施形態に係る文字認識システムの一連の動作を示すフローチャートである。同図を参照しながら、第２の実施形態に係る文字認識システム１Ａの一連の動作について説明する。ステップＳ１００は、図５において説明した第１の実施形態に係る文字認識システムの動作と同様であるため、説明を省略する。 FIG. 13 is a flowchart showing a series of operations of the character recognition system according to the second embodiment. With reference to the same figure, a series of operations of the character recognition system 1A according to the second embodiment will be described. Step S100 is similar to the operation of the character recognition system according to the first embodiment described in FIG. 5, and therefore description thereof will be omitted.

（ステップＳ２１０）文字認識装置１０Ａは、ｘをδとする。δは、部分入力画像ＩＭＰの範囲を示す所定の整数である。ｘは、文字認識装置１０Ａが文字認識する範囲を示す。本フローチャートにおいて、文字認識装置１０Ａは、まず０からｘまでの範囲について候補文字列ＣＳを算出する。ここで、文字認識装置１０Ａが文字認識する範囲であるｘは、図１１を参照しながら説明した一例における部分入力画像ＩＭＰに相当する。 (Step S210) Character recognition device 10A sets x to δ. δ is a predetermined integer indicating the range of the partial input image IMP. x indicates the range of characters recognized by character recognition device 10A. In this flowchart, character recognition device 10A first calculates candidate character string CS for the range from 0 to x. Here, x, which is the range of characters recognized by character recognition device 10A, corresponds to the partial input image IMP in the example described with reference to FIG. 11.

（ステップＳ２２０）文字認識装置１０Ａは、候補集合Φに、から集合（空集合）を設定する。 (Step S220) The character recognition device 10A sets the candidate set Φ to an empty set (empty set).

（ステップＳ２３０）文字認識装置１０Ａが備える第１スコア算出部１１０は、部分入力画像ＩＭＰに含まれる複数の候補文字列ＣＳのうち、それぞれの候補文字列ＣＳについて、第１スコアＳ１を算出する。すなわち、候補文字列ＣＳをｙ_ｎ、第１スコアＳ１をα_ｎとした場合、第１スコア算出部１１０は、（ｙ_１，α_１）…（ｙ_ｎ，α_ｎ）を算出する。 (Step S230) First score calculation unit 110 included in character recognition device 10A calculates a first score S1 for each of multiple candidate character strings CS included in partial input image IMP. That is, when candidate character string CS is y _n and first score S1 is α _n , first score calculation unit 110 calculates (y _1, α ₁ )...(y _n, α _n ).

（ステップＳ２４０）文字認識装置１０Ａは、部分入力画像ＩＭＰにおける選択文字列ＳＳを選択する。文字認識装置１０Ａは、具体的には、γ_ｉの大きいＲ個のｙ_ｉとγ_ｉの組を選択し、候補集合Φとする。Ｒは、次の部分入力画像ＩＭＰについて文字認識をする場合に候補とする文字列の数である。Ｒを小さくすれば処理時間を短くすることができるが、小さすぎると誤認識の可能性が高まる場合がある。 (Step S240) The character recognition device 10A selects a selection character string SS in the partial input image IMP. Specifically, the character recognition device 10A selects R pairs of y _i and γ _i with large γ _i , and sets them as a candidate set Φ. R is the number of character strings to be candidates when performing character recognition on the next partial input image IMP. By making R smaller, the processing time can be shortened, but if R is too small, the possibility of erroneous recognition may increase.

（ステップＳ２５０）文字認識装置１０Ａは、入力画像ＩＭの全部について文字認識を行ったか否かを判定する。具体的には、文字認識装置１０Ａは、ｘがＷより小さい場合には、処理をステップＳ２７０に進める。文字認識装置１０Ａは、ｘがＷより小さくない場合には、処理をステップＳ２６０に進める。 (Step S250) Character recognition device 10A determines whether character recognition has been performed on the entire input image IM. Specifically, if x is smaller than W, character recognition device 10A proceeds to step S270. If x is not smaller than W, character recognition device 10A proceeds to step S260.

（ステップＳ２７０）文字認識装置１０Ａは、文字認識を行う範囲を、広げる。具体的には、文字認識装置１０Ａは、ｘにδを足した値をｘとし、処理をステップＳ２３０に進める。 (Step S270) Character recognition device 10A expands the range in which character recognition is performed. Specifically, character recognition device 10A sets the value obtained by adding δ to x as x, and proceeds to step S230.

（ステップＳ２６０）文字認識装置１０Ａは、γ_ｋが最大となる文字列ｙ_ｋを、選択文字列ＳＳとして出力する。 (Step S260) The character recognition device 10A outputs the character string y _k with the maximum γ _k as a selected character string SS.

［第２の実施形態のまとめ］
上述した実施形態によれば、文字認識装置１０Ａに備えられる第１スコア算出部１１０は、入力画像ＩＭの一部である部分入力画像ＩＭＰについて、第１スコアＳ１を算出する。換言すれば、第１スコア算出部１１０は、入力画像ＩＭに含まれる文字列Ｓを構成する複数の文字Ｃのうち、一部の文字を含む文字列Ｓの候補である候補文字列ＣＳの第１スコアＳ１を算出する。また、文字認識装置１０Ａに備えられる第２スコア算出部１３０は、入力画像ＩＭの一部である部分入力画像ＩＭＰについて、第２スコアＳ２を算出する。換言すれば、第２スコア算出部１３０は、入力画像ＩＭに含まれる文字列Ｓを構成する複数の文字Ｃのうち、一部の文字を含む文字列Ｓの候補である候補文字列ＣＳの第２スコアＳ２を算出する。文字認識装置１０Ａは、入力画像ＩＭの部分ごとに候補文字列ＣＳを算出するため、入力画像ＩＭに含まれる文字列全体の候補の数を少なくすることができる。よって、本実施形態によれば、ビームサーチアルゴリズムを用いることにより、少ない時間とリソースで文字認識することができる。 [Summary of the second embodiment]
According to the above-described embodiment, the first score calculation unit 110 provided in the character recognition device 10A calculates a first score S1 for a partial input image IMP that is a part of the input image IM. In other words, the first score calculation unit 110 calculates a first score S1 for a candidate character string CS that is a candidate for a character string S that includes some characters among a plurality of characters C that constitute the character string S included in the input image IM. In addition, the second score calculation unit 130 provided in the character recognition device 10A calculates a second score S2 for a partial input image IMP that is a part of the input image IM. In other words, the second score calculation unit 130 calculates a second score S2 for a candidate character string CS that is a candidate for a character string S that includes some characters among a plurality of characters C that constitute the character string S included in the input image IM. Since the character recognition device 10A calculates a candidate character string CS for each part of the input image IM, the number of candidates for the entire character string included in the input image IM can be reduced. Therefore, according to this embodiment, character recognition can be performed with less time and resources by using a beam search algorithm.

［第３の実施形態］
図１４及び図１５を参照しながら、第３の実施形態に係る文字認識装置１０Ｂの一例について説明する。第３の実施形態においては、入力画像ＩＭに、基準となる文字の間隔又は記載すべき文字の領域が定められている点において、他の実施形態と異なる。本実施形態においては、基準となる文字の間隔又は記載すべき文字の領域に基づいて文字認識を行うことにより、入力画像ＩＭに含まれる文字Ｃを、より正しく文字認識することを目的とする。 [Third embodiment]
An example of a character recognition device 10B according to the third embodiment will be described with reference to Figures 14 and 15. The third embodiment differs from the other embodiments in that a reference character spacing or a character area to be written is defined in the input image IM. The present embodiment aims to perform character recognition based on the reference character spacing or the character area to be written, thereby more accurately recognizing the character C included in the input image IM.

図１４は、第３の実施形態に係る入力画像ＩＭの一例を示す図である。同図を参照しながら、入力画像ＩＭに定められた、基準となる文字の間隔又は記載すべき文字の領域について説明する。図１４（Ａ）は、本実施形態における入力画像ＩＭの一例である。図１４（Ｂ）は、本実施形態における入力画像ＩＭに文字が記載された場合の一例である。 Figure 14 is a diagram showing an example of an input image IM according to the third embodiment. With reference to the figure, the reference character spacing or the area of the character to be written, which is determined in the input image IM, will be described. Figure 14 (A) is an example of an input image IM in this embodiment. Figure 14 (B) is an example of an input image IM in this embodiment in which characters are written.

図１４（Ａ）に示す入力画像ＩＭは、複数の文字入力領域ＩＡＲを含む。具体的には、入力画像ＩＭは、文字入力領域ＩＡＲ１と、文字入力領域ＩＡＲ２と、文字入力領域ＩＡＲ３とを含む。文字入力領域ＩＡＲは、例えば、入力画像ＩＭに文字列Ｓを記載するユーザに対し、文字を記載する際の基準として与えられる。すなわち、文字入力領域ＩＡＲにより、基準となる文字の間隔又は記載すべき文字の領域が定められる。以後の説明において、文字入力領域ＩＡＲを、“枠”と記載する場合がある。 The input image IM shown in FIG. 14(A) includes multiple character input areas IAR. Specifically, the input image IM includes character input area IAR1, character input area IAR2, and character input area IAR3. The character input area IAR is provided, for example, to a user who writes a character string S in the input image IM as a reference for writing characters. In other words, the character input area IAR determines the reference character spacing or the area of the characters to be written. In the following explanation, the character input area IAR may be referred to as a "frame."

図１４（Ｂ）に示す入力画像ＩＭには、文字Ｃが記載されている。具体的には、文字入力領域ＩＡＲ１には文字Ｃ－１が記載され、文字入力領域ＩＡＲ２には文字Ｃ－２が記載され、文字入力領域ＩＡＲ３には文字Ｃ－３が記載されている。文字Ｃ－１は“川”であり、文字Ｃ－２は“崎”であり、文字Ｃ－３は“市”である。 The character C is written in the input image IM shown in FIG. 14(B). Specifically, the character C-1 is written in the character input area IAR1, the character C-2 is written in the character input area IAR2, and the character C-3 is written in the character input area IAR3. The character C-1 is "川", the character C-2 is "崎", and the character C-3 is "市".

図１５は、第３の実施形態に係る第２スコアの一例を示す図である。同図を参照しながら、第２スコア算出部１３０が算出する第２スコアＳ２について説明する。同図に示す一例において、候補文字列算出部２２は、“川山奇市”と、“川崎市”との候補文字列ＣＳを算出する。 FIG. 15 is a diagram showing an example of the second score according to the third embodiment. The second score S2 calculated by the second score calculation unit 130 will be described with reference to the figure. In the example shown in the figure, the candidate character string calculation unit 22 calculates the candidate character strings CS of "Kawayamaki City" and "Kawasaki City".

候補文字列ＣＳが“川山奇市”である場合、文字入力領域ＩＡＲ１には文字領域ＣＡ１－１が含まれ、文字入力領域ＩＡＲ２には文字領域ＣＡ２－１及び文字領域ＣＡ３－１が含まれ、文字入力領域ＩＡＲ３には文字領域ＣＡ４－１が含まれる。この場合、文字入力領域ＩＡＲ２には文字領域ＣＡ２－１及び文字領域ＣＡ３－１が含まれるため、１つの枠（文字入力領域ＩＡＲ）に、２つの文字領域ＣＡが存在する。この場合、第２スコア算出部１３０は、１つの枠に複数の文字領域ＣＡが存在する場合、小さい方の文字領域ＣＡと枠領域との重複量をｍ（ｙ）とし、下の式（４）に基づき、整合性スコアＰ_ＢＯＸを算出する。 When the candidate character string CS is "川山奇市", character input area IAR1 includes character area CA1-1, character input area IAR2 includes character area CA2-1 and character area CA3-1, and character input area IAR3 includes character area CA4-1. In this case, character input area IAR2 includes character area CA2-1 and character area CA3-1, so two character areas CA exist in one frame (character input area IAR). In this case, when multiple character areas CA exist in one frame, second score calculation unit 130 sets the overlap amount between the smaller character area CA and the frame area as m(y) and calculates the match score P _BOX based on the following formula (4).

ここでＣ_ＢＰは０から１の定数であり、小さいほど整合性スコアＰ_ＢＯＸは小さくなる。Ｃ_ＢＰの値は実験的に求められてもよい。 Here, _CBP is a constant between 0 and 1, and the smaller CBP is, the smaller the match score _PBOX becomes. The value of _CBP may be determined experimentally.

ここで、選択部１４０が、更に整合性スコアＰ_ＢＯＸに基づき、選択文字列ＳＳを選択する場合の一例について説明する。この場合、選択部１４０は、下の式（５）に基づき、選択文字列ＳＳを選択する。 Here, an example will be described in which the selection unit 140 further selects the selected character string SS based on the match score P _BOX . In this case, the selection unit 140 selects the selected character string SS based on the following formula (5).

すなわち、本実施形態において、第２スコア算出部１３０は、候補文字列ＣＳに含まれる文字Ｃそれぞれに対応する領域である文字領域ＣＡと、文字入力領域ＩＡＲとに基づいて、第２スコアＳ２を算出する。 That is, in this embodiment, the second score calculation unit 130 calculates the second score S2 based on the character areas CA, which are areas corresponding to each of the characters C included in the candidate string CS, and the character input area IAR.

［第３の実施形態のまとめ］
上述した実施形態によれば、文字認識装置１０Ｂは、第２スコア算出部１３０を備えることにより、文字領域ＣＡと、文字入力領域ＩＡＲとに基づいて、第２スコアＳ２を算出する。例えば、第２スコア算出部１３０は、１つの枠に複数文字が含まれるような場合は、第２スコアＳ２の値を低く算出する。第２スコア算出部１３０は、１つの枠に複数文字が含まれるような場合に第２スコアＳ２の値を低く算出することにより、漢字の偏や旁等に分けて文字認識してしまうような誤認識を抑止することができる。したがって、本実施形態によれば、入力画像ＩＭに含まれる文字Ｃを、より正しく文字認識することができる。 [Summary of the third embodiment]
According to the above-described embodiment, the character recognition device 10B includes the second score calculation unit 130, and calculates the second score S2 based on the character area CA and the character input area IAR. For example, the second score calculation unit 130 calculates a low value for the second score S2 when a single box contains multiple characters. By calculating a low value for the second score S2 when a single box contains multiple characters, the second score calculation unit 130 can prevent erroneous recognition, such as character recognition by dividing a kanji character into its radicals, side characters, etc. Therefore, according to this embodiment, the character C included in the input image IM can be more accurately recognized.

［第４の実施形態］
図１６及び図１７を参照しながら、第４の実施形態について説明する。第４の実施形態では、文字領域推定部１２０の具体例について説明する。図１６を参照しながら文字領域推定部１２０の一例について説明し、図１７を参照しながら文字領域推定部１２０の変形例について説明する。 [Fourth embodiment]
A fourth embodiment will be described with reference to Fig. 16 and Fig. 17. In the fourth embodiment, a specific example of character region estimation unit 120 will be described. An example of character region estimation unit 120 will be described with reference to Fig. 16, and a modified example of character region estimation unit 120 will be described with reference to Fig. 17.

図１６は、第４の実施形態に係る文字領域推定部の動作の一例を説明するための図である。同図を参照しながら文字領域推定部１２０の一例について説明する。まず、入力データＤＩがニューラルネットワークＮＮ１に入力される。この一例において、入力データＤＩが文字列の画像である場合の一例について説明する。具体的には、入力データＤＩが、“川崎市”と左から右に横方向に手書きされた文字列画像である場合の一例について説明する。 Figure 16 is a diagram for explaining an example of the operation of the character region estimation unit according to the fourth embodiment. An example of the character region estimation unit 120 will be described with reference to the figure. First, input data DI is input to the neural network NN1. In this example, an example in which the input data DI is an image of a character string will be described. Specifically, an example in which the input data DI is an image of a character string "Kawasaki City" handwritten horizontally from left to right will be described.

ニューラルネットワークＮＮ１は、入力された文字列の特徴量Ｆの系列を算出する。入力データＤＩが左から右に横方向に手書きされた文字列画像である場合、ニューラルネットワークＮＮ１は、左方向から右方向に特徴量Ｆの系列を、判定範囲の幅分だけ認識していく。この一例において、ニューラルネットワークＮＮ１は、特徴量Ｆ１から特徴量Ｆ６までの特徴量を算出する。ここで、ニューラルネットワークＮＮ１は、入力データＤＩの行の長さに応じた数の特徴量Ｆを算出する。 The neural network NN1 calculates a series of feature values F for the input character string. When the input data DI is a character string image handwritten horizontally from left to right, the neural network NN1 recognizes the series of feature values F from left to right for the width of the judgment range. In this example, the neural network NN1 calculates feature values F1 to F6. Here, the neural network NN1 calculates a number of feature values F according to the length of the line of the input data DI.

ニューラルネットワークＮＮ２は、ニューラルネットワークＮＮ１により算出された特徴量Ｆごとに確率分布Ｐを算出する。この一例において、ニューラルネットワークＮＮ１は、特徴量Ｆ１から特徴量Ｆ６までの特徴量を算出するため、ニューラルネットワークＮＮ２は、特徴量Ｆ１に対応する確率分布Ｐ１から、特徴量Ｆ６に対応する確率分布Ｐ６までを算出する。 The neural network NN2 calculates a probability distribution P for each feature F calculated by the neural network NN1. In this example, the neural network NN1 calculates features F1 to F6, so the neural network NN2 calculates probability distributions P1 corresponding to feature F1 to P6 corresponding to feature F6.

ＣＴＣ（Connectionist Temporal Classification）８０は、算出されたそれぞれの確率分布を統合し、入力データＤＩに対応する文字列の確率分布Ｐを算出し、算出された確率分布Ｐから認識される文字列を出力データＤＯとして出力する。 CTC (Connectionist Temporal Classification) 80 integrates each of the calculated probability distributions, calculates a probability distribution P of the character string corresponding to the input data DI, and outputs the character string recognized from the calculated probability distribution P as output data DO.

推定部８５は、ニューラルネットワークＮＮ１により算出された特徴量Ｆを取得する。推定部８５は、ニューラルネットワークＮＮ３により、取得した特徴量Ｆから、所定のラベルが付与されるべき要素が存在しうる範囲を推定する。 The estimation unit 85 acquires the feature F calculated by the neural network NN1. The estimation unit 85 estimates the range in which elements to which a specified label should be assigned may exist from the acquired feature F using the neural network NN3.

推定部８５は、ＣＴＣ８０により認識された出力データＤＯのそれぞれのラベルと、それぞれの特徴量Ｆとを対応付ける。推定部８５は、出力データＤＯのラベル列のうち一のラベルが複数の特徴量Ｆに対応づけられる場合、当該一のラベルに対応付けられた複数の特徴量Ｆから推定された範囲を統合し、出力する。推定部８５により出力された出力結果は、入力データＤＩのうち、それぞれのラベルの範囲が特定されている。同図に示す一例では、範囲Ａ１は“川”の範囲を特定し、範囲Ａ２は“崎”の範囲を特定し、範囲Ａ３は“市”の範囲を特定する。 The estimation unit 85 associates each label of the output data DO recognized by the CTC 80 with each feature F. When one label in the label string of the output data DO is associated with multiple feature F, the estimation unit 85 integrates and outputs the range estimated from the multiple feature F associated with the one label. The output result output by the estimation unit 85 specifies the range of each label in the input data DI. In the example shown in the same figure, range A1 specifies the range of "river", range A2 specifies the range of "cape", and range A3 specifies the range of "city".

図１７は、第４の実施形態に係る文字領域推定部の動作の変形例を説明するための図である。同図を参照しながら、文字領域推定部１２０の動作の変形例について説明する。文字領域推定部１２０の動作の変形例では、物体検出を応用して文字領域の推定を行う。 Figure 17 is a diagram for explaining a modified example of the operation of the character region estimation unit according to the fourth embodiment. With reference to the figure, a modified example of the operation of the character region estimation unit 120 will be explained. In the modified example of the operation of the character region estimation unit 120, object detection is applied to estimate the character region.

まず、入力データＤＩがニューラルネットワークＮＮ４に入力される。この一例において、入力データＤＩが文字列の画像である場合の一例について説明する。具体的には、入力データＤＩが、“川崎”と左から右に横方向に手書きされた文字列画像である場合の一例について説明する。ニューラルネットワークＮＮ４は、検出ＤＮＮ（Deep Neural Network）である。ニューラルネットワークＮＮ４は、画像を入力として、複数の候補矩形Ｒと、それぞれの候補矩形Ｒに対応する文字のスコアとを出力する。 First, input data DI is input to neural network NN4. In this example, an example will be described in which the input data DI is an image of a character string. Specifically, an example will be described in which the input data DI is an image of a character string "Kawasaki" handwritten horizontally from left to right. The neural network NN4 is a detection DNN (Deep Neural Network). The neural network NN4 receives an image as input, and outputs multiple candidate rectangles R and the scores of the characters corresponding to each candidate rectangle R.

具体的には、ニューラルネットワークＮＮ４は、候補矩形Ｒ１と、候補矩形Ｒ２と、候補矩形Ｒ３と、候補矩形Ｒ４と、候補矩形Ｒ５と、候補矩形Ｒ６と、それぞれの候補矩形Ｒに対応する文字のスコアとを出力する。より具体的には、候補矩形Ｒ１に対応する文字“川”であるスコア“０．８”、及び文字“州”であるスコア“０．１”と、候補矩形Ｒ２に対応する文字“り”であるスコア“０．５”、及び文字“い”であるスコア“０．２”と、候補矩形Ｒ３に対応する文字“１”であるスコア“０．３”、及び文字“ノ”であるスコア“０．１”と、候補矩形Ｒ４に対応する文字“崎”であるスコア“０．８”、及び文字“埼”であるスコア“０．１”と、候補矩形Ｒ５に対応する文字“山”であるスコア“０．５”、及び文字“凸”であるスコア“０．１”と、候補矩形Ｒ６に対応する文字“奇”であるスコア“０．７”、及び文字“嵜”であるスコア“０．１”とを出力する。 Specifically, the neural network NN4 outputs candidate rectangle R1, candidate rectangle R2, candidate rectangle R3, candidate rectangle R4, candidate rectangle R5, candidate rectangle R6, and character scores corresponding to each candidate rectangle R. More specifically, the character "川" corresponding to candidate rectangle R1 has a score of "0.8" and the character "周" has a score of "0.1", the character "り" corresponding to candidate rectangle R2 has a score of "0.5" and the character "い" has a score of "0.2", the character "１" corresponding to candidate rectangle R3 has a score of "0.3" and the character "ノ" has a score of "0.1", the character "崎" corresponding to candidate rectangle R4 has a score of "0.8" and the character "崎" has a score of "0.1", the character "山" corresponding to candidate rectangle R5 has a score of "0.5" and the character "凸" has a score of "0.1", and the character "奇" corresponding to candidate rectangle R6 has a score of "0.7" and the character "嵜" has a score of "0.1".

［第４の実施形態のまとめ］
上述した実施形態によれば、文字領域推定部１２０は、推定部８５を備えることにより、入力データＤＩから取得した特徴量Ｆに基づき、文字Ｃが存在しうる領域を推定し、文字Ｃを複数の特徴量Ｆのうち少なくとも１つと対応づけ、一のラベルに対応づけられた、複数の範囲を統合することにより、それぞれの文字Ｃに対応する領域を特定する。本実施形態を用いることにより、ビームサーチアルゴリズムによる効率的な探索をすることができる。また、本実施形態による文字領域の推定は、容易に実装することができる。 [Summary of the Fourth Embodiment]
According to the above-described embodiment, the character region estimation unit 120 includes the estimation unit 85, and estimates a region where the character C may exist based on the feature F acquired from the input data DI, associates the character C with at least one of the feature F, and identifies a region corresponding to each character C by integrating multiple ranges associated with one label. By using this embodiment, an efficient search can be performed using a beam search algorithm. Furthermore, the estimation of the character region according to this embodiment can be easily implemented.

また、上述した実施形態によれば、文字領域推定部１２０は、画像を入力として、複数の候補矩形Ｒと、それぞれの候補矩形Ｒに対応する文字のスコアとを出力する。本実施形態を用いることにより、少ないリソースで文字領域の推定をすることができる。 Furthermore, according to the above-described embodiment, the character region estimation unit 120 receives an image as input and outputs multiple candidate rectangles R and character scores corresponding to each candidate rectangle R. By using this embodiment, it is possible to estimate a character region with fewer resources.

［第５の実施形態］
図１８から図２０を参照しながら、第５の実施形態について説明する。第５の実施形態では、文字らしさマップ生成部１３２１が生成する文字らしさマップＣＬＭの変形例について説明する。図１８は、第５の実施形態に係る文字らしさマップの変形例について説明するための図である。図１８（Ａ）は、第１の実施形態において説明した文字らしさマップＣＬＭである。 [Fifth embodiment]
The fifth embodiment will be described with reference to Fig. 18 to Fig. 20. In the fifth embodiment, a modified example of the character-likeliness map CLM generated by the character-likeliness map generating unit 1321 will be described. Fig. 18 is a diagram for explaining a modified example of the character-likeliness map according to the fifth embodiment. Fig. 18(A) shows the character-likeliness map CLM described in the first embodiment.

図１８（Ｂ）は、第５の実施形態に係る文字らしさマップＣＬＭの第１の変形例である文字らしさマップＣＬＭ１である。文字らしさマップＣＬＭ１は、複数のピクセルから構成される領域ごとに文字らしさが階調表現されている点において、文字らしさマップＣＬＭとは異なる。このように、文字らしさマップＣＬＭ１は、入力画像ＩＭのうち、所定の範囲ごとに文字らしさが算出されていてもよい。 Fig. 18(B) shows a character-likeness map CLM1, which is a first modified example of the character-likeness map CLM according to the fifth embodiment. The character-likeness map CLM1 differs from the character-likeness map CLM in that the character-likeness is expressed in gradations for each region made up of multiple pixels. In this way, the character-likeness map CLM1 may be calculated for each predetermined range of the input image IM.

図１８（Ｃ）は、第５の実施形態に係る文字らしさマップＣＬＭの第２の変形例である文字らしさマップＣＬＭ２である。文字らしさマップＣＬＭ２は、入力画像ＩＭのｘ座標と、各ｘ座標における黒画素数との対応関係を含む。すなわち、文字らしさマップＣＬＭ２は、輝度ヒストグラムであってもよい。本実施形態において、入力画像ＩＭは、横書きされた文字であるため、文字が記載された方向であるＸ座標を用いる。入力画像ＩＭが縦書きされた文字である場合はＹ座標を用いてもよい。文字らしさマップＣＬＭ２は、各Ｘ座標における黒画素数の情報を用いるため、容易に文字らしさマップＣＬＭ２を作成することができる。 Fig. 18C shows a character-likeness map CLM2, which is a second modified example of the character-likeness map CLM according to the fifth embodiment. The character-likeness map CLM2 includes a correspondence between the x-coordinate of the input image IM and the number of black pixels at each x-coordinate. That is, the character-likeness map CLM2 may be a luminance histogram. In this embodiment, the input image IM is a character written horizontally, so the X-coordinate, which is the direction in which the character is written, is used. If the input image IM is a character written vertically, the Y-coordinate may be used. The character-likeness map CLM2 uses information on the number of black pixels at each X-coordinate, so the character-likeness map CLM2 can be easily created.

図１８（Ｄ）は、第５の実施形態に係る文字らしさマップＣＬＭの第３の変形例である文字らしさマップＣＬＭ３である。文字らしさマップＣＬＭ３は、文字らしさマップＣＬＭ２を、０から１の値をとるよう正規化したものである。 Fig. 18(D) shows a character-likeliness map CLM3, which is a third modified example of the character-likeliness map CLM according to the fifth embodiment. The character-likeliness map CLM3 is obtained by normalizing the character-likeliness map CLM2 so that it takes values between 0 and 1.

その他、文字らしさマップＣＬＭは、入力画像ＩＭをグリッド状の小領域に分割したものであって、各小領域ごとの黒画素の総数に基づいていてもよい。 Alternatively, the character-likeness map CLM may be obtained by dividing the input image IM into small grid-like regions, and may be based on the total number of black pixels in each small region.

図１９は、第５の実施形態に係る文字らしさマップ生成部の変形例の機能構成の一例を示す図である。同図を参照しながら、文字らしさマップ生成部１３２１の変形例である文字らしさマップ生成部１３２１Ａについて説明する。文字らしさマップ生成部１３２１Ａは、文字らしさ算出ニューラルネットワークＤＮＮを備える点において、文字らしさマップ生成部１３２１とは異なる。 Fig. 19 is a diagram showing an example of the functional configuration of a modified example of the character-likeliness map generating unit according to the fifth embodiment. With reference to the figure, the character-likeliness map generating unit 1321A, which is a modified example of the character-likeliness map generating unit 1321, will be described. The character-likeliness map generating unit 1321A differs from the character-likeliness map generating unit 1321 in that it includes a character-likeliness calculation neural network DNN.

文字らしさ算出ニューラルネットワークＤＮＮは、予め文字らしさを予測できるよう学習されたニューラルネットワークである。図２０は、第５の実施形態に係る入力データＤＩと教師データＤＴの一例を示す図である。入力データＤＩの一例と、教師データＤＴの一例について、図を参照しながら説明する。 The character-likeness calculation neural network DNN is a neural network that has been trained in advance to predict character-likeness. FIG. 20 is a diagram showing an example of input data DI and teacher data DT according to the fifth embodiment. An example of input data DI and an example of teacher data DT will be described with reference to the diagram.

図２０に示す一例において、入力データＤＩ１は教師データＤＴ１に対応し、入力データＤＩ２は教師データＤＴ２に対応する。入力データＤＩ１には、文字Ｃ－１１と、文字Ｃ－１２とが含まれ、教師データＤＴ１には、文字Ｃ－１１に対応する領域ＡＲ１１と、文字Ｃ－１２に対応する領域ＡＲ１２とが含まれる。入力データＤＩ２には、文字Ｃ－２１と、文字Ｃ－２２と、文字Ｃ－２３とが含まれ、教師データＤＴ２には、文字Ｃ－２１に対応する領域ＡＲ２１と、文字Ｃ－２２に対応する領域ＡＲ２２と、文字Ｃ－２３に対応する領域ＡＲ２３とが含まれる。 In the example shown in FIG. 20, input data DI1 corresponds to teacher data DT1, and input data DI2 corresponds to teacher data DT2. Input data DI1 includes characters C-11 and C-12, and teacher data DT1 includes area AR11 corresponding to character C-11 and area AR12 corresponding to character C-12. Input data DI2 includes characters C-21, C-22, and C-23, and teacher data DT2 includes area AR21 corresponding to character C-21, area AR22 corresponding to character C-22, and area AR23 corresponding to character C-23.

［第５の実施形態のまとめ］
上述した実施形態によれば、文字らしさマップＣＬＭ１、文字らしさマップＣＬＭ２、又は文字らしさマップＣＬＭ３を用いることにより、文字らしさマップ生成部１３２１は、容易に文字らしさマップＣＬＭを生成することができる。 [Summary of the Fifth Embodiment]
According to the embodiment described above, the character-likeliness map generating unit 1321 can easily generate the character-likeliness map CLM by using the character-likeliness map CLM1, the character-likeliness map CLM2, or the character-likeliness map CLM3.

また、上述した実施形態によれば、文字らしさマップ生成部１３２１は、文字らしさ算出ニューラルネットワークＤＮＮを備えることにより、機械学習により文字らしさマップＣＬＭを生成することができる。上述した実施形態によれば、機械学習を用いるため、ノイズに強く、誤認識することを抑止することができる。また、上述した実施形態によれば、機械学習を用いるため、異なる背景の入力画像ＩＭについても、正しく認識することができる。 Furthermore, according to the above-described embodiment, the character-likeness map generating unit 1321 is equipped with a character-likeness calculation neural network DNN, and is therefore able to generate a character-likeness map CLM by machine learning. According to the above-described embodiment, the use of machine learning makes it possible to be resistant to noise and to prevent erroneous recognition. Furthermore, according to the above-described embodiment, the use of machine learning makes it possible to correctly recognize input images IM with different backgrounds.

以上説明してきたように、実施形態では、複数の変形例を記載した。ここで、組み合わせることが可能な限りにおいて、複数の実施形態及び複数の変形例を組み合わせて実施するようにしてもよい。 As explained above, multiple modified examples are described in the embodiment. Here, multiple embodiments and multiple modified examples may be combined to the extent that they are possible to combine.

なお、上述した実施形態における情報処理装置の機能をコンピューターで実現するようにしても良い。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 The functions of the information processing device in the above-mentioned embodiment may be realized by a computer. In that case, a program for realizing the functions may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed to realize the functions. The term "computer system" here includes hardware such as an OS and peripheral devices. The term "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, CD-ROMs, DVD-ROMs, and USB memories, and storage devices such as hard disks built into computer systems. The term "computer-readable recording medium" may also include devices that dynamically hold a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, and devices that hold a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in such a case. The above-mentioned program may be for realizing part of the above-mentioned functions, or may be capable of realizing the above-mentioned functions in combination with a program already recorded in the computer system.

以上説明した少なくともひとつの実施形態によれば、第１スコア算出部と、文字領域推定部と、第２スコア算出部と、選択部とを持つことにより、入力画像に含まれる文字を正しく文字認識することができる。 According to at least one of the embodiments described above, by having a first score calculation unit, a character area estimation unit, a second score calculation unit, and a selection unit, it is possible to correctly recognize characters contained in an input image.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although several embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the gist of the invention. These embodiments and their modifications are within the scope of the invention and its equivalents as set forth in the claims, as well as the scope and gist of the invention.

１…文字認識システム、１０…文字認識装置、２１…入力画像取得部、２２…候補文字列算出部、２３…出力部、２４…入力規則情報記憶部、１１０…第１スコア算出部、１２０…文字領域推定部、１３０…第２スコア算出部、１４０…選択部、１１１…文字認識スコア算出部、１１２…知識処理スコア算出部、１１３…第１スコア統合部、１３１…重複読みスコア算出部、１３２…読み飛ばしスコア算出部、１３３…第２スコア統合部、１３２１…文字らしさマップ生成部、１３２２…読み飛ばしスコア統合部、ＩＭ…入力画像、Ｓ…文字列、Ｃ…文字、ＣＳ…候補文字列、ＣＡ…文字領域、Ｓ１…第１スコア、Ｓ２…第２スコア、Ｓ１１…文字認識スコア、Ｓ１２…知識処理スコア、Ｓ２１…重複読みスコア、Ｓ２２…読み飛ばしスコア、ＳＳ…選択文字列、ＩＲ…入力規則、ＩＡＲ…文字入力領域、ＣＬＭ…文字らしさマップ、ＭＳＫ…マスク、ＩＭＰ…部分入力画像 1...character recognition system, 10...character recognition device, 21...input image acquisition unit, 22...candidate character string calculation unit, 23...output unit, 24...input rule information storage unit, 110...first score calculation unit, 120...character area estimation unit, 130...second score calculation unit, 140...selection unit, 111...character recognition score calculation unit, 112...knowledge processing score calculation unit, 113...first score integration unit, 131...overlapping reading score calculation unit, 132...skipping score calculation unit, 133...second score integration unit , 1321...character-likeness map generation unit, 1322...skipping score integration unit, IM...input image, S...character string, C...character, CS...candidate character string, CA...character area, S1...first score, S2...second score, S11...character recognition score, S12...knowledge processing score, S21...overlapping reading score, S22...skipping score, SS...selected character string, IR...input rule, IAR...character input area, CLM...character-likeness map, MSK...mask, IMP...partial input image

Claims

a first score calculation unit that calculates a first score indicating a likelihood of a character string, the first score being a candidate for a character string included in an input image;
a character region estimation unit that estimates regions of the input image corresponding to each of the characters included in the candidate character string;
a second score calculation unit that calculates a second score indicating consistency of characters included in the candidate character string based on the region estimated by the character region estimation unit;
a selection unit that selects one or more character strings from among the plurality of candidate character strings based on the calculated first score and the calculated second score ,
the input image includes a plurality of character input areas;
The second score calculation unit calculates the second score based on an area corresponding to each character included in the candidate character string and the character input area.
Character recognition device.

The character recognition device according to claim 1 , wherein the first score calculation unit calculates the first score of the candidate character string included in a partial input image that is a part of the input image.

The second score calculation unit,
The character recognition device according to claim 1 or 2, wherein the second score is calculated based on an amount of overlap between regions corresponding to the characters included in the candidate character string.

The second score calculation unit,
The character recognition device according to claim 1 , wherein the second score is calculated based on characters included in the candidate character string and a region estimated by the character region estimation unit.

The second score calculation unit,
The character recognition device according to claim 4 , wherein the second score is calculated based on the likelihood that a character exists in the region of the input image.

a first score calculation step in which the computer calculates a first score indicating a likelihood of a character string, the first score being a candidate for a character string included in the input image for each of a plurality of candidate character strings;
a character region estimation step of estimating , by a computer, regions of the input image corresponding to each of the characters included in the candidate character string;
a second score calculation step of calculating a second score indicating consistency of characters included in the candidate character string based on the estimated region by the computer ;
a selection step of selecting one or more character strings from among the plurality of candidate character strings based on the calculated first score and the calculated second score ,
the input image includes a plurality of character input areas;
The second score calculation step calculates the second score based on an area corresponding to each character included in the candidate character string and the character input area.
Character recognition methods.

On the computer,
a first score calculation step of calculating a first score indicating a likelihood of a character string, the first score being a candidate for a character string included in the input image for each of a plurality of candidate character strings;
a character region estimation step of estimating regions of the input image corresponding to each of the characters included in the candidate character string;
a second score calculation step of calculating a second score indicating consistency of characters included in the candidate character string based on the estimated region;
a selection step of selecting one or more character strings from among the plurality of candidate character strings based on the calculated first score and the calculated second score;
the input image includes a plurality of character input areas;
The second score calculation step calculates the second score based on an area corresponding to each character included in the candidate character string and the character input area.
program.