JP4093738B2

JP4093738B2 - Word recognition device

Info

Publication number: JP4093738B2
Application number: JP2001296014A
Authority: JP
Inventors: 悦伸堀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2000-10-04
Filing date: 2001-09-27
Publication date: 2008-06-04
Anticipated expiration: 2021-09-27
Also published as: JP2002183668A

Description

【０００１】
【発明の属する技術分野】
近年、文書入力機器として文字認識装置ＯＣＲやソフトウェアＯＣＲの需要が増加している。本発明は、この文字認識装置における単語認識に関する。
単語認識とは、「東京」のような手書きの単語を認識する際に、個々の文字に分離したうえで個別文字認識を行なうのではなく、単語そのものを一括して認識する方式である。この方式によれば、文字と文字が接触している場合でも精度の高い認識を実現することが可能であり、フリーピッチ領域の手書き文字列認識において有効な方式の１つとなっている。本発明の単語認識装置は、手書き用文字認識装置だけでなく、印刷文字認識装置、携帯情報端末における文字認識装置等、広い意味での文字認識装置に適用することができる。
【０００２】
【従来の技術】
手書き単語を認識する場合に、単語を構成する各文字の特徴同士を組み合わせて照合用の単語特徴辞書を生成し、入力単語の特徴と照合して認識する方式としては、これまでに、例えば特願平１１−１１３７３３号、特願平１１−３３０２８８号等が提案されている。
上記特願平１１−１１３７３３号に開示されるものは、入力された単語画像を個々に文字認識せずに、個別文字の特徴をもとに単語特徴辞書を生成した上で入力単語画像を一括して認識するものであり、小容量の個別文字イメージ辞書を使用して高精度に単語認識をすることができる。
また、上記特願平１１−３３０２８８号に開示されるものは、１単語に対して複数の単語特徴を合成して単語辞書を生成することにより、入力単語画像の字形変動に対応できるようにしたものである。
【０００３】
【発明が解決しようとする課題】
単語特徴合成用の文字特徴辞書を持つ場合、上記特願平１１−３３０２８８号に開示される従来方式では各文字ごとにその位置や幅を変えた文字イメージから特徴を抽出し、それら全てを保持していた。
例えば、図１１に示すように、横幅の１／６、２／６、…、６／６の特徴を抽出し（これらを以下ではｎ／ｍ特徴という）、それらの特徴を全て保持していた。この場合、１文字あたりの特徴数は２１となる。
個別文字の特徴としては、例えば、加重方向指数ヒストグラム特徴（鶴岡ほか「加重方向指数ヒストグラム法による手書き漢字・ひらがな認識」電子情報通信学会論文誌 D Vol.J70-D No.7,pp.1390-1397 1987年7 月参照）用いている。加重方向指数ヒストグラム特徴は文字画像を小領域に分割し、各少領域の方向指数ヒストグラムを特徴ベクトルとするものであり、例えば、図１２に示すように縦７×横７のメッシュ内で、３６０°を８等分した８方向について特徴量を抽出する。各メッシュは８方向次元の特徴量をもち、例えば「東」という字の３／７特徴は、上記図１２に示すようになる。
そして、単語特徴を合成する場合には、個別文字特徴同士のｎ／ｍ分数の和が１になるように単語特徴を合成する。例えば２文字単語の場合には、３／７特徴＋４／７特徴とか、２／７特徴＋５／７特徴等を合成して単語特徴を合成する。
例えば「東京」という単語特徴を合成する場合には、図１３に示すように、「東」の３／７特徴と「京」の４／７特徴とを足し合わせ、「東京」を合成するといった処理を行なっている。
しかし、約４０００ある文字カテゴリの全てについて、その位置、幅を変えた文字特徴を持つ必要があるため、容量的には数百Ｍｂｙｔｅを必要とし、実用化の点で大きな問題となっていた。
本発明は、上記事情に鑑みなされたものであって、本発明の目的は、文字特徴辞書の容量を大幅に削減することができ、また認識処理を高速化することができる単語認識装置を提供することである。
【０００４】
【課題を解決するための手段】
上記の課題を解決するために本発明においては、単語特徴合成用の特徴辞書の小容量化を図り、辞書容量を実用化レベルの容量とした。また、合成した単語特徴と入力単語特徴の照合方法を改良し、字形の変動吸収を照合部分で行うようにし、辞書容量のさらなる小容量化を図った。
図１に本発明の概要を示す。図１において、１は文字特徴辞書であり、入力文字画像から抽出した特徴ベクトルが格納される。２は文字特徴辞書の小容量化手段、３は特徴辞書であり、特徴辞書３には、学習時、小容量化手段２により小容量化された列特徴（横書きの場合）、もしくは、行特徴（縦書きの場合）が格納される。
小容量化手段２による文字特徴の小容量化は次のように行われる。
(1) 文字特徴辞書１に格納された文字特徴について、各列単位でまたは各行単位で特徴ベクトルのクラスタリングを行い、類似した特徴同士をまとめてｍ個の列または行ベクトルで代表させ、代表ベクトルのそれぞれに１〜ｍの識別番号を付与する（以下、ｍまでの識別番号を付与することをコード化という）。
また、１列または１行だけでなく複数列単位でクラスタリングしてコード化してもよい。さらに、列特徴もしくは行特徴のコード化を列特徴または行特徴単位ではなく、メッシュ単位で行なえば、より正確な特徴近似に基づくコード化が可能となる。
(2) クラスタリングされたｍ個の特徴ベクトルについて、ある列特徴を他の列特徴の加算和で表すことが可能な組み合わせ、もしくは、行特徴を他の行特徴の加算和で表すことが可能な組み合わせがあるかを調べ、可能な組み合わせがあれば、列特徴もしくは行特徴の識別番号と合成係数を辞書に格納する。
また、ある列特徴を他の列特徴と差分特徴の和で表すことが可能な組み合わせ、もしくは、ある行特徴を他の行特徴と差分特徴の和で表すことが可能な組み合わせがあるかを調べ、可能な組み合わせがあれば、列特徴もしくは行特徴の識別番号と合成係数を辞書に格納する。
(3) 予め文字特徴に特徴変換を施して次元圧縮を行い、特徴変換した特徴に対し、クラスタリング処理を行ってコード化する。
なお、上記特徴辞書３内の特徴ベクトルに対してインデックス情報を持つことにより、高速な辞書アクセスが可能となる。また、使用頻度の高い順に列特徴もしくは行特徴を並べておくことにより、インデックス情報に対するアクセスも高速に行なうことが可能となる。
上記のようにして小容量化した特徴辞書３を用いて次のようにして単語認識を行う。
入力単語を正規化手段４により正規化し、特徴抽出手段５により特徴抽出を行う。一方、予め定められた認識対象となる単語リストを基に、単語特徴合成手段６により上記特徴辞書３に格納された列または行特徴から照合用の単語特徴を合成する。
ついで、照合手段７により、入力単語から抽出された特徴と、上記合成された単語特徴を照合し、単語認識を行う。
上記照合手段７における照合に際し、次元数が異なる単語特徴と入力単語特徴を、非線形伸縮マッチングを用いて照合することにより、字形の変動吸収を辞書内の特徴ベクトルではなく照合部分で行なうことができ、さらなる辞書容量を削減することが可能となる。
すなわち、従来の単語認識では、照合方式自体が入力文字の字形変動に弱いため、その分、特徴辞書のほうで、一つの文字カテゴリあたり複数の特徴（例えば前記した３／６特徴、４／６特徴等）を持つ必要があった。これに対し、上記非線形伸縮マッチングは、この方式自体に字形変動の吸収を行う効果があるため、特徴辞書に持つ特徴の数を減らすことが期待でき、辞書容量を削減することが可能となる。
【０００５】
【発明の実施の形態】
以下、本発明の実施の形態について説明する。
本発明は、処理装置、主記憶装置、外部記憶装置、キーボード、画像読み取りを行うためのスキャナ等の入力装置、ディスプレイ等、プリンタ等の出力装置、通信インタフェース等を備えた通常の計算機システムで実現することができ、外部記憶装置等に本発明の処理を行うためのプログラム、データ等が格納され、実行時、上記プログラム等が主記憶装置に読み込まれ、本発明による処理が実行される。
【０００６】
本発明は縦書き／横書きに関わらず有効であるが、以下では横書きの単語を対象に説明する。縦書きに適用する場合には、以下に説明する「列ベクトル」を「行ベクトル」とすればよい。
また、文字特徴はいくつかの種類に分別されるが、本発明では特徴が列単位に分割され得る特徴を対象とする。さらに、メッシュ単位に分割され得る特徴の場合は、メッシュ内の特徴を縦に並べて列特徴として扱う。
具体的には、列単位の特徴としてはｎ次ペリフェラル特徴、投影特徴などがあり、メッシュ型の特徴としては前記した加重方向指数ヒストグラム特徴、あるいは方向線素特徴、メッシュ特徴などが対象となる。以下、前記した加重方向指数ヒストグラム特徴を例に説明する。なお、以下の説明で用いられるメッシュの分割数、方向数などは必ずしもこの数値に限定されない。
【０００７】
（１）実施例１
加重方向指数ヒストグラム特徴は、前記したように特徴抽出処理の最終段階では文字正規化画像に対して区切られた例えば縦７×横７のメッシュ内で、それぞれ８方向分の特徴を有する。すなわち７×７×８次元の特徴となる。ここで８方向とは前記図１２、図１３に示したように３６０°を４５°単位で８等分した方向を指す。
本実施例では、文字特徴辞書を小容量化するため、列単位で特徴ベクトルのクラスタリングを行なう。
図２に本実施例の機能構成を示す。
図２において、１１は文字特徴辞書であり、学習時に、入力文字画像から抽出した特徴ベクトルが格納される。
１２は本実施例に係わるクラスタリング手段であり、文字特徴辞書１１を小容量化するため、学習時、文字特徴辞書１１に格納された文字特徴について、図３（ａ）に示すように加重方向ヒストグラム特徴の各列単位で特徴ベクトルのクラスタリングを行う。
すなわち、縦７×横１のメッシュ内の特徴ベクトル（７×８＝５６次元）を１つの単位とし、類似した特徴同士をまとめてｍ個の列ベクトルで代表させる。そして、代表ベクトルのそれぞれに１からｍまでの識別番号を付与する。
図３（ｂ）に加重方向ヒストグラム特徴の列ベクトル番号表記例を示す。同図に示すように、各列単位でクラスタリングを行ってコード化したｍ個の識別番号を文字特徴ベクトルの各列に付与する。この例では、各列に識別番号（３２２３０，１３１１８，…，６４５１）が付与されている。
【０００８】
このようにすると従来法では、〔（文字カテゴリ数）×（１文字あたりの特徴数）×（横の列特徴数）〕分の列特徴（例えば、文字カテゴリ数＝４０００、１文字あたりの特徴数＝２１、横の列特徴数＝７とすると４０００×２１×７の列特徴）を必要としていたが、それを上記列特徴より大幅に少ないｍ個の列特徴で済ますことが可能となる。
クラスタリング手法としては、階層的クラスタリング、ｋ−ｍｅａｎｓ、ＬＶＱなどの一般的クラスタリング手法を用いることができる。
上記のようにしてクラスタリングされ、識別番号を付与されたｍ個の列ベクトルは、特徴辞書１３に格納される。
【０００９】
図４に上記特徴辞書１３の構成例を示す。小容量化された特徴辞書１３は、同図に示すように、インデックス情報として各識別番号と、辞書内の位置情報を持ち、位置情報は各識別番号に対応したｍ個の特徴ベクトルの格納位置を示す。
なお、上記識別番号を並べる際は予め認識対象とするカテゴリ（ｅｘ．住所、氏名）について単語特徴を生成する際に必要となる文字特徴の出現頻度を調べておき、その頻度の高い順に並べておくことにより、インデックス情報に対するアクセスを高速化することができる。
上記のようにして小容量化された特徴辞書１３を用いて次のようにして単語認識を行う。
まず、入力単語を正規化手段１４により正規化し、特徴抽出手段１５により特徴抽出を行う。一方、予め定められた認識対象となる単語リスト（例えば都道府県名の単語認識を行う場合には都道府県名のリスト等）を基に、単語特徴合成手段１６により上記特徴辞書１３に格納された列特徴から照合用の単語特徴を合成する。
ついで、照合手段１７により、入力単語から抽出された特徴と、上記合成された単語特徴を照合し、単語認識を行う。入力単語特徴と合成された単語特徴の照合は、ユークリッド距離等を用いて行う。
以上のように、本実施例においては、文字特徴辞書１１の特徴ベクトルについて列単位でクラスタリングを行ない、類似した特徴同士をまとめてｍ個の列ベクトルで代表させ、コード化したので、特徴辞書１３の容量を大幅に削減することができ、辞書容量を実用レベルの容量とすることができる。
また、特徴辞書内の特徴ベクトルに対してインデックス情報を持つようにしたので、高速に辞書アクセスが可能となる。
【００１０】
上記実施例では、図５（ａ）に示すように列特徴をクラスタリングする際、横幅１の列単位で特徴ベクトルのクラスタリングを行っているが、横幅１の列特徴だけでなく横幅２や横幅３などの列特徴をまとめてクラスタリングすることも可能である。
すなわち、図５（ｂ）に示すように、横幅をｎとすると縦７×横ｎ×８次元の特徴を１つの単位としてクラスタリングを行なう。列特徴の単位が横１列の場合より大きいので、単語合成を高速化することができる。
【００１１】
また、上記実施例では、文字特徴から単語特徴を合成したとき、合成された単語特徴の次元数と、入力単語特徴の次元数を同一としていた。すなわち、両者とも縦７×横７×８方向次元の特徴としたうえで、ユークリッド距離などを用いて照合を行なっていたが、図６に示すように、次元数が異なる列特徴を照合するようにすることもできる。
すなわち、非線形伸縮マッチングを用いた照合を行なうことにより、両者の次元数が異なる場合でも照合可能となる。非線形伸縮マッチングの一例として、ＤＰマッチングを用いることができる（ＤＰマッチングについては、例えば共立出版社発行、舟久保登著「パターン認識」P62-P67 を参照されたい）。
これにより図６に示すように、合成された単語の特徴が縦７×横８×８方向次元、入力単語の特徴が縦７×横７×８方向次元等となっていても、両者を照合することができる。
上記のような照合を行うことにより、字形の変動吸収を辞書内の特徴ベクトルではなく、照合部分で行なうことができるため辞書容量をさらに削減することが可能となる。
【００１２】
さらに上記説明では、列ベクトルを単位としたクラスタリングについて説明してきたが、列ベクトルをさらに細かくメッシュ単位に見て、メッシュ単位でクラスタリング処理を行うこともできる。
すなわち、図７に示すように、縦１×横１のメッシュ内特徴（８次元）単位でクラスタリング処理を行ない、１メッシュ内特徴のコード化を行なう。すると列ベクトルは縦７×横１のメッシュで表されるので、７つの識別番号で表現されることになる。図７の例では、各メッシュに識別番号（４３２，１２３，…，３５１）が付与されている（ｔは転置を表す）。
上記のようにメッシュ単位でクラスタリングを行い、コード化することにより、より正確な特徴近似に基づく識別番号の付与が可能となる。
また、上記のようにメッシュ内特徴についてクラスタリンを行って各メッシュをコード化し、コード化された各メッシュについて列単位でクラスタリング処理を行うこともできる。
すなわち、メッシュ内特徴についてクラスタリング処理を行って図７に示したようにコード化し、コード化された各列について前記したようにクラスタリング処理を行い、各列に識別番号を付与することもできる。
【００１３】
（２）実施例２
次に、列特徴をクラスタリングしたのち、合成係数を用いて特徴辞書の容量を削減する本発明の第２の実施例について説明する。
コード化された列ベクトルの個数をｍ、ｐ番めの列ベクトルをｆ_p、合成係数をｋ_iとしたとき、次の（１）式と表せる合成係数ｋと列ベクトルの組合せがあるかどうかを調べておき、可能な組合せがあれば列ベクトルの識別番号と合成係数を記録する。
【００１４】
【数１】

【００１５】
これにより、特徴辞書内に列ベクトルの代わりに上記合成係数ｋを持てば済むので辞書容量を削減することができる。
図８に本実施例の機能構成を示す。
図８において、１１は文字特徴辞書であり、前記したように学習時に、入力文字画像から抽出した特徴ベクトルが格納される。
２１は本実施例に係わる小容量化手段であり、前記したクラスタリング手段１２と、合成係数算出手段２２から構成される。
クラスタリング手段１２は前記したように、加重方向ヒストグラム特徴の各列単位で特徴ベクトルのクラスタリングを行い、代表ベクトルのそれぞれに１からｍまでの識別番号を付与する。
合成係数算出手段２２は、上記式（１）で表せる合成係数ｋと列ベクトルの組合せがあるかどうかを調べ、可能な組合せがあれば列ベクトルの識別番号と合成係数を記録する。
上記のようにして合成係数算出手段２２で求めた合成係数もしくは列ベクトルは特徴辞書１３に格納される。
【００１６】
本実施例における単語認識処理は、前記第１の実施例と同様に行うことができる。すなわち、入力単語を正規化手段１４により正規化し、特徴抽出手段１５により特徴抽出を行う。一方、予め定められた認識対象となる単語リストを基に、単語特徴合成手段１６により上記特徴辞書１３に格納された列特徴から照合用の単語特徴を合成する。
ついで、照合手段１７により、入力単語から抽出された特徴と、上記合成された単語特徴を照合し単語認識を行う。
上記照合は、合成された単語特徴の次元数と入力単語特徴の次元数を同一とした照合だけでなく、第１の実施例で説明したように次元数が異なる列特徴を照合するようにしてもよい。
また、クラスタリング処理は、第１の実施例で説明したように、横幅１の列特徴だけでなく横幅２や横幅３などの列特徴をまとめてクラスタリングしたり、メッシュ単位でクラスタリングを行うようにしてもよい。
【００１７】
上記説明では、ある列ベクトルを他の列ベクトルの和で表す場合について説明したが、ある列ベクトルを他の列ベクトルの和で表すだけではなく、他の列ベクトルと差分ベクトルとの和で表すようにしてもよい。すなわち、差分ベクトルをｇ_j、係数をｌ_jとしたとき、次の式（２）と表せる合成係数ｋ、ｌと列ベクトルおよび差分ベクトルの組合せがあるかどうかを調べておき、可能な組み合わせがあればそれらを記録する。なお、差分ベクトルｇ_jは、特徴ベクトルの内の任意の特徴ベクトルの差分である。
【００１８】
【数２】

【００１９】
以上のようにすれば、列特徴を、他の列特徴の加算和、あるいは、他の列特徴の加算和と差分特徴との和で表すことができ、列特徴を他の列特徴の加算和で表す場合より表せる頻度が増加し、特徴辞書のさらなる小容量化を図ることができる。
図９に本実施例により生成された特徴辞書１３の構成例を示す。小容量化された特徴辞書２３は、同図に示すように、インデックス情報として各識別番号と、辞書内の位置情報を持ち、位置情報は、各識別番号に対応したｍ個の特徴ベクトルの格納位置、もしくは、合成係数ｋ，ｌの格納位置を示す。なお、差分特徴を用いず、前記したように列特徴を他の列特徴の加算和で表す場合には、上記合成係数ｌを０とすればよい。
上記特徴辞書から特徴ベクトルを読みだすには、識別番号に対応した位置情報から特徴ベクトル、もしくは、合成係数ｋ，ｌの格納位置を求め、該格納位置に特徴ベクトルが格納されている場合には、特徴ベクトルをそのまま読み出し、また、該格納位置に合成係数ｋ，ｌが格納されている場合には、該合成係数ｋ，ｌから前記した（１）式または（２）式により特徴ベクトルを算出する。
なお、本実施例においても、第１の実施例と同様、上記識別番号を並べる際に、予め認識対象とするカテゴリについて単語特徴を生成する際に必要となる文字特徴の出現頻度を調べておき、その頻度の高い順に並べておくことにより、インデックス情報に対するアクセスを高速化することができる。
【００２０】
（３）実施例３
加重方向指数ヒストグラム特徴では、特徴に含まれる情報の冗長性をなくすために、抽出された７×７×８次元の原特徴に対して、正準判別分析などの特徴変換を施い、次元圧縮を行なっている。これにより特徴次元数が、例えば３９２次元から１００次元程度にまで落ちることになる。このように、予め主成分分析、正準判別分析などの特徴変換された特徴を持ち、さらにそれらの特徴に対してクラスタリングし、コード化を行うことにより、辞書容量の削減を図ることができる。
図１０に本実施例の機能構成を示す。
図１０において、１１は文字特徴辞書であり、前記したように学習時に、入力文字画像から抽出した特徴ベクトルが格納される。
３１は本実施例に係わる小容量化手段であり、上記次元圧縮を行う次元圧縮手段３２と前記したクラスタリング手段１２から構成される。
次元圧縮手段３２は、上記したように正準判別分析などの特徴変換を施し、次元圧縮を行う。原特徴の特徴変換を行う場合には、列ベクトルではなく、原特徴そのものに対して特徴変換を行う。
ここで、原特徴をｆ、特徴変換された特徴をｗ、正準判別分析等により求めた特徴変換行列をＡとすると、特徴変換行列Ａは次の式（３）により求められる。
Ａ＊ｆ_i＝ｗ_i…（３）
クラスタリング手段１２は上記のように特徴変換し、次元圧縮したｗ_i（ｉ＝０，…，Ｍ，Ｍ：原特徴数）に対してクラスタリング処理を行い、前記したようにｍ個（ｍ≦Ｍ）の特徴ベクトルで代表させ、これらの代表ベクトルのそれぞれに１からｍまでの識別番号を付与する。
上記のようにして求めたコード化された各列ベクトルは特徴辞書１３に格納される。
【００２１】
本実施例における単語認識処理は、次のように行われる。
入力単語を正規化手段１４により正規化し、特徴抽出手段１５により特徴抽出を行い、抽出された特徴ベクトルについて上記（３）により特徴変換（次元圧縮）を行う。
一方、予め定められた認識対象となる単語リストを基に、単語特徴合成手段１６により上記特徴辞書１３に格納された列特徴から照合用の単語特徴を合成する。
ついで、照合手段１７により、入力単語から抽出され特徴変換（次元圧縮）された特徴と、上記合成された単語特徴を照合し単語認識を行う。
本実施例においては、次元圧縮した特徴量をクラスタリング処理しているので、さらなる辞書容量の削減を図ることができる。
なお、上記図１０に示した実施例において、クラスタリグ処理した後に、前記第２の実施例で説明したように、合成係数を求めて辞書に該合成係数を格納するようにしてもよい。これにより、辞書のさらなる小容量化を実現することができる。
【００２２】
（付記１）単語イメージを認識する単語認識装置であって、
単語特徴合成に用いる文字特徴辞書を小容量化する小容量化手段と、
上記小容量化手段により小容量化された特徴辞書の列もしくは行特徴から、認識対象とする単語リストをもとに照合用の単語特徴を合成する合成手段と、
入力単語の特徴を抽出する特徴抽出手段と、
上記特徴抽出手段により抽出された入力単語の特徴と、上記合成された単語特徴とを照合する照合手段とを備えた
ことを特徴とする単語認識装置。
（付記２）上記小容量化手段は、メッシュの特徴毎に、または、メッシュに区切られた列もしくは行の特徴毎に、特徴の類似した列特徴または行特徴をクラスタリングし、クラスタリングされた列特徴もしくは行特徴に識別のための識別番号を付与する手段と、
識別番号が付与された特徴を辞書に保持する手段とを備えた
ことを特徴とする付記１の単語認識装置。
（付記３）クラスタリングする際に、１列もしくは１行だけではなく複数列もしくは複数行まとめてクラスタリングする
ことを特徴とする付記２の単語認識装置。
（付記４）クラスタリングされた列特徴もしくは行特徴について、ある列特徴もしくは行特徴を、他の複数の列特徴もしくは行特徴の係数和で記述する
ことを特徴とする付記２の単語認識装置。
（付記５）クラスタリングされた列特徴もしくは行特徴について、ある列特徴もしくは行特徴を、他の列特徴もしくは行特徴と差分特徴との係数和で記述する
ことを特徴とする付記２の単語認識装置。
（付記６）列もしくは行の特徴をクラスタリングする前に、列もしくは行内の各メッシュごとの特徴をそれぞれコード化する
ことを特徴とする付記２の単語認識装置。
（付記７）上記小容量化手段は、文字特徴辞書の小容量化に際し、予め特徴変換して次元圧縮を行った文字特徴を用いてクラスタリングを行なう
ことを特徴とする付記２の単語認識装置。
（付記８）小容量化された特徴辞書を構成する際、各列特徴もしくは各行特徴の識別番号と、辞書内の位置をインデックス情報として保持し、
上記インデックス情報の後に、各部分特徴を並べて特徴辞書を構成する
ことを特徴とする請求項１の単語認識装置。
（付記９）列特徴もしくは行特徴を並べる際に、予め使用頻度の高い列特徴もしくは行特徴を調べておき、使用頻度の高い順に列特徴もしくは行特徴を並べる
ことを特徴とする付記８の単語認識装置。
（付記１０）上記照合手段は、入力単語の特徴と合成した単語特徴とを非線形伸縮マッチングにより照合する
ことを特徴とする付記１の単語認識装置。
（付記１１）単語イメージを認識する単語認識プログラムを記録した記録媒体であって、
上記プログラムは、単語特徴合成に用いる文字特徴辞書を小容量化し、
上記小容量化された特徴辞書の列もしくは行特徴から、認識対象とする単語リストをもとに照合用の単語特徴を合成し、
入力単語の特徴を抽出し、該抽出された入力単語の特徴と、上記合成された単語特徴とを照合することにより単語認識を行う
ことを特徴とする単語認識プログラムを記録した記録媒体。
（付記１２）単語イメージを認識する単語認識プログラムであって、
上記プログラムは、単語特徴合成に用いる文字特徴辞書を小容量化する処理と、
上記小容量化された特徴辞書の列もしくは行特徴から、認識対象とする単語リストをもとに照合用の単語特徴を合成する処理と、
入力単語の特徴を抽出し、該抽出された入力単語の特徴と、上記合成された単語特徴とを照合することにより単語認識を行う処理をコンピュータに実行させることを特徴とする単語認識プログラム。
【００２３】
【発明の効果】
以上説明したように本発明においては以下の効果を得ることができる。
（１）列特徴もしくは行特徴単位で各文字特徴をクラスタリングし、コード化しているので、文字特徴辞書の容量を大幅に削減することが可能となり、辞書容量を実用レベルにすることができる。
また、１列だけでなく複数列単位でクラスタリングしてコード化すれば、単語特徴の合成を高速化することができる。
さらに、列特徴もしくは行特徴のコード化を列特徴単位ではなく、メッシュ単位で行なえば、より正確な特徴近似に基づくコード化が可能となる。
（２）次元数が異なる単語特徴と入力単語特徴を、非線形伸縮マッチングを用いて照合することにより、字形の変動吸収を辞書内の特徴ベクトルではなく照合部分で行なうことができ、辞書に多くの特徴ベクトルを登録して変動吸収を行なう必要がなく、さらなる辞書容量を削減することが可能となる。
（３）列特徴もしくは行特徴単位でクラスタリングしたのち、ある列特徴を他の列特徴の加算和で表すことが可能な組み合わせ、もしくは、行特徴を他の行特徴の加算和で表すことが可能な組み合わせがあるかを調べ、可能な組み合わせがあれば、列特徴もしくは行特徴の識別番号と合成係数を辞書に格納することにより、辞書容量のさらなる小容量化が可能となる。
また、ある列特徴を他の列特徴と差分特徴の和で表すことが可能な組み合わせ、もしくは、ある行特徴を他の行特徴と差分特徴の和で表すことが可能な組み合わせがあるかを調べ、可能な組み合わせがあれば、列特徴もしくは行特徴の識別番号と合成係数を辞書に格納することにより、単に他の列特徴の和で表す場合よりも表せる頻度が増加するので、さらなる辞書の小容量化が可能となる。
（４）予め文字特徴に特徴変換を施して次元圧縮を行い、特徴変換した特徴に対し、クラスタリング処理を行ってコード化すれば、単語特徴合成後に特徴変換を行なう必要がなくなり、文字認識全体の処理の高速化が可能となる。また、同時に辞書容量の小容量化も可能となる。
（５）辞書内の特徴ベクトルに対してインデックス情報を持つことにより、高速な辞書アクセスが可能となる。
さらに、使用頻度の高い順に列特徴もしくは行特徴を並べておくことにより、インデックス情報に対するアクセスも高速に行なうことが可能となる。
【図面の簡単な説明】
【図１】本発明の概要を示す図である。
【図２】本発明の第１の実施例の機能構成を示す図である。
【図３】各列単位の特徴ベクトルのクラスタリングを説明する図である。
【図４】第１の実施例の特徴辞書の構成例を示す図である。
【図５】複数列の列特徴のクラスタリングを説明する図である。
【図６】合成された単語特徴の次元数と、入力単語特徴の次元数が異なる場合の照合を説明する図である。
【図７】メッシュ内特徴単位でクラスタリングを行う場合を説明する図である。
【図８】本発明の第２の実施例の機能構成を示す図である。
【図９】第２の実施例により生成された特徴辞書の構成例を示す図である。
【図１０】本発明の第３に実施例を示す図である。
【図１１】文字特徴のための縮小文字イメージ（大の字）の例を示す図である。
【図１２】加重方向指数ヒストグラム特徴の例を示す図である。
【図１３】単語特徴の合成の例を示す図である。
【符号の説明】
１文字特徴辞書
２小容量化手段
３特徴辞書
４正規化手段
５特徴抽出手段
６単語特徴合成手段
７照合手段
１１文字特徴辞書
１２クラスタリング手段
１３特徴辞書
１４正規化手段
１５特徴抽出手段
１６単語特徴合成手段
１７照合手段
２２合成係数算出手段
３２次元圧縮手段[0001]
BACKGROUND OF THE INVENTION
In recent years, demand for character recognition devices OCR and software OCR as document input devices is increasing. The present invention relates to word recognition in this character recognition device.
Word recognition is a method of recognizing words in a lump rather than separating individual characters and then performing individual character recognition when recognizing a handwritten word such as “Tokyo”. According to this method, it is possible to realize highly accurate recognition even when a character and a character are in contact with each other, which is one of the effective methods for recognizing a handwritten character string in a free pitch area. The word recognition device of the present invention can be applied not only to a handwritten character recognition device but also to a character recognition device in a broad sense such as a printed character recognition device and a character recognition device in a portable information terminal.
[0002]
[Prior art]
When recognizing a handwritten word, a method for generating a word feature dictionary for collation by combining the features of each character constituting the word and recognizing it by matching with the features of the input word has hitherto been, for example, Japanese Patent Application Nos. 11-1113733 and 11-330288 have been proposed.
In Japanese Patent Application No. 11-117333, the input word images are collectively generated after the word feature dictionary is generated based on the characteristics of the individual characters without individually recognizing the input word images. It is possible to recognize words with high accuracy using a small-capacity individual character image dictionary.
In addition, what is disclosed in the above Japanese Patent Application No. 11-330288 is able to cope with character shape fluctuations of an input word image by generating a word dictionary by synthesizing a plurality of word features for one word. Is.
[0003]
[Problems to be solved by the invention]
In the case of having a character feature dictionary for word feature synthesis, the conventional method disclosed in Japanese Patent Application No. 11-330288 extracts features from character images with different positions and widths for each character and retains them all. Was.
For example, as shown in FIG. 11, features of 1/6, 2/6,..., 6/6 of the width are extracted (these are hereinafter referred to as n / m features), and all these features are retained. . In this case, the number of features per character is 21.
As the characteristics of individual characters, for example, the weighted direction index histogram feature (Tsurukaoka et al. "Handwritten Kanji / Hiragana Recognition by Weighted Direction Index Histogram Method" IEICE Transactions D Vol.J70-D No.7, pp.1390- 1397 See July 1987). The weighted direction index histogram feature divides a character image into small regions and uses the direction index histogram of each small region as a feature vector. For example, 360 in a 7 × vertical 7 mesh as shown in FIG. Feature values are extracted in eight directions obtained by dividing the angle into eight equal parts. Each mesh has an eight-dimensional feature amount. For example, a 3/7 feature of the word “east” is as shown in FIG.
When synthesizing word features, the word features are synthesized so that the sum of n / m fractions of individual character features is 1. For example, in the case of a two-character word, a word feature is synthesized by synthesizing a 3/7 feature + 4/7 feature, a 2/7 feature + 5/7 feature, or the like.
For example, when synthesizing the word feature “Tokyo”, as shown in FIG. 13, the 3/7 feature of “East” and the 4/7 feature of “Kyo” are added together to synthesize “Tokyo”. Processing is in progress.
However, since all of the approximately 4000 character categories need to have character features whose positions and widths are changed, a capacity of several hundred Mbytes is required, which is a big problem in practical use.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a word recognition device capable of greatly reducing the capacity of a character feature dictionary and speeding up recognition processing. It is to be.
[0004]
[Means for Solving the Problems]
In order to solve the above problem, in the present invention, the capacity of the feature dictionary for word feature synthesis is reduced, and the dictionary capacity is set to a practical level. In addition, the collation method of synthesized word features and input word features has been improved so that the variation of the character shape is absorbed in the collation part, thereby further reducing the dictionary capacity.
FIG. 1 shows an outline of the present invention. In FIG. 1, reference numeral 1 denotes a character feature dictionary, which stores feature vectors extracted from an input character image. 2 is a character feature dictionary size reduction means, 3 is a feature dictionary, and the feature dictionary 3 includes a column feature (in the case of horizontal writing) reduced by the capacity reduction means 2 at the time of learning, or a row feature. (For vertical writing) is stored.
The capacity reduction of the character feature by the capacity reduction means 2 is performed as follows.
(1) For character features stored in the character feature dictionary 1, clustering of feature vectors is performed in units of columns or in units of rows, and similar features are collectively represented by m columns or row vectors. Are assigned identification numbers 1 to m (hereinafter, assigning identification numbers up to m is referred to as encoding).
Further, coding may be performed by clustering not only in one column or one row but also in units of a plurality of columns. Furthermore, if the coding of the column feature or the row feature is performed not in the column feature or the row feature unit but in the mesh unit, the coding based on the more accurate feature approximation becomes possible.
(2) For clustered m feature vectors, a combination that can represent a column feature as an addition sum of other column features, or a row feature can be expressed as an addition sum of other row features It is checked whether there is a combination. If there is a possible combination, the column feature or row feature identification number and the synthesis coefficient are stored in the dictionary.
Also, investigate whether there is a combination that can represent a column feature as the sum of other column features and difference features, or a combination that can represent a row feature as the sum of other row features and difference features. If there is a possible combination, the identification number of the column feature or row feature and the synthesis coefficient are stored in the dictionary.
(3) Character transformation is performed on character features in advance to perform dimensional compression, and the feature transformed features are subjected to clustering processing to be encoded.
Note that by having index information for the feature vectors in the feature dictionary 3, high-speed dictionary access is possible. Further, by arranging column features or row features in order of frequency of use, it is possible to access index information at high speed.
Using the feature dictionary 3 having a small capacity as described above, word recognition is performed as follows.
The input word is normalized by the normalizing unit 4 and the feature extraction unit 5 performs feature extraction. On the other hand, based on a predetermined word list to be recognized, word feature synthesizing means 6 synthesizes a word feature for collation from the column or row feature stored in the feature dictionary 3.
Next, the collation means 7 collates the feature extracted from the input word with the synthesized word feature and performs word recognition.
In the collation by the collation means 7, the word feature and the input word feature having different dimensionality are collated by using non-linear expansion / contraction matching, so that the variation of the character shape can be absorbed by the collation part instead of the feature vector in the dictionary. It becomes possible to further reduce the dictionary capacity.
That is, in the conventional word recognition, the collation method itself is weak against fluctuations in the shape of the input character, and accordingly, in the feature dictionary, a plurality of features (for example, the above-mentioned 3/6 features, 4/6, etc.) per character category. It was necessary to have features etc.). On the other hand, the non-linear expansion / contraction matching has an effect of absorbing character shape variation in the method itself, so that it can be expected to reduce the number of features in the feature dictionary, and the dictionary capacity can be reduced.
[0005]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below.
The present invention is realized by an ordinary computer system including a processing device, a main storage device, an external storage device, a keyboard, an input device such as a scanner for reading an image, a display, an output device such as a printer, a communication interface, and the like. A program, data, and the like for performing the processing of the present invention are stored in an external storage device or the like, and when executed, the program or the like is read into the main storage device and the processing according to the present invention is executed.
[0006]
Although the present invention is effective regardless of vertical writing / horizontal writing, the following description will be made on horizontal writing. When applied to vertical writing, a “column vector” described below may be a “row vector”.
In addition, character features are classified into several types, but the present invention targets features that can be divided into columns. Furthermore, in the case of features that can be divided into mesh units, the features in the mesh are arranged vertically and treated as column features.
Specifically, there are n-order peripheral features, projection features, etc. as column-wise features, and the above-described weighted direction index histogram features, directional line element features, mesh features, etc. are targeted as mesh-type features. Hereinafter, the above-described weighted direction index histogram feature will be described as an example. Note that the number of mesh divisions and the number of directions used in the following description are not necessarily limited to these values.
[0007]
(1) Example 1
As described above, the weighted direction index histogram features have features corresponding to eight directions in a 7 × vertical × 7 horizontal mesh, for example, which is separated from the character normalized image in the final stage of the feature extraction process. That is, it becomes a feature of 7 × 7 × 8 dimensions. Here, the eight directions refer to directions obtained by dividing 360 ° into eight equal parts in units of 45 ° as shown in FIGS.
In this embodiment, in order to reduce the capacity of the character feature dictionary, clustering of feature vectors is performed in units of columns.
FIG. 2 shows a functional configuration of the present embodiment.
In FIG. 2, 11 is a character feature dictionary, which stores a feature vector extracted from an input character image during learning.
Reference numeral 12 denotes clustering means according to the present embodiment. In order to reduce the capacity of the character feature dictionary 11, the weighted direction histogram of the character features stored in the character feature dictionary 11 during learning is shown in FIG. Clustering of feature vectors is performed for each column of features.
That is, a feature vector (7 × 8 = 56 dimensions) in a 7 × 8 mesh is used as one unit, and similar features are collectively represented by m column vectors. An identification number from 1 to m is assigned to each representative vector.
FIG. 3B shows a column vector number notation example of the weighted direction histogram feature. As shown in the figure, m identification numbers coded by clustering for each column are assigned to each column of the character feature vector. In this example, identification numbers (32230, 13118,..., 6451) are assigned to the respective columns.
[0008]
In this way, in the conventional method, [(character category number) × (number of features per character) × (number of horizontal column features)] column features (for example, number of character categories = 4000, features per character) If the number = 21 and the number of horizontal column features = 7, 4000 × 21 × 7 column features) are required. However, it is possible to use m column features that are significantly smaller than the above column features.
As a clustering method, a general clustering method such as hierarchical clustering, k-means, or LVQ can be used.
The m column vectors clustered as described above and given identification numbers are stored in the feature dictionary 13.
[0009]
FIG. 4 shows a configuration example of the feature dictionary 13. As shown in the figure, the reduced feature dictionary 13 has each identification number as index information and position information in the dictionary, and the position information is a storage position of m feature vectors corresponding to each identification number. Indicates.
When arranging the identification numbers, the appearance frequency of the character features necessary for generating the word features for the category (ex. Address, name) to be recognized is checked in advance and arranged in the descending order of the frequency. As a result, access to the index information can be speeded up.
Using the feature dictionary 13 having a small capacity as described above, word recognition is performed as follows.
First, the input word is normalized by the normalizing unit 14 and the feature extraction unit 15 performs feature extraction. On the other hand, based on a predetermined word list to be recognized (for example, a list of prefecture names when performing word recognition of a prefecture name), the word feature synthesizing means 16 stores it in the feature dictionary 13. A word feature for matching is synthesized from the column feature.
Next, the collation means 17 collates the feature extracted from the input word with the synthesized word feature and performs word recognition. The input word feature and the synthesized word feature are collated using the Euclidean distance or the like.
As described above, in this embodiment, the feature vectors of the character feature dictionary 11 are clustered in units of columns, and similar features are collectively represented by m column vectors and encoded. Can be greatly reduced, and the dictionary capacity can be made a practical level.
In addition, since index information is stored for feature vectors in the feature dictionary, dictionary access can be performed at high speed.
[0010]
In the above embodiment, when clustering column features as shown in FIG. 5A, feature vectors are clustered in units of columns having a width of 1, but not only column features having a width of 1 but also widths 2 and 3 It is also possible to cluster column features such as
That is, as shown in FIG. 5B, when the horizontal width is n, clustering is performed using a feature of 7 × vertical n × 8 dimensions as one unit. Since the unit of the column feature is larger than that in the case of one horizontal row, the word composition can be speeded up.
[0011]
Further, in the above embodiment, when the word feature is synthesized from the character feature, the number of dimensions of the synthesized word feature is the same as the number of dimensions of the input word feature. That is, both of them have features of dimensions of 7 × 8 in the vertical direction, and collation is performed using the Euclidean distance or the like. However, as shown in FIG. 6, column features having different numbers of dimensions are collated. It can also be.
That is, by performing collation using non-linear expansion / contraction matching, collation can be performed even when both dimensions are different. DP matching can be used as an example of non-linear expansion / contraction matching (see, for example, “Pattern Recognition” P62-P67 published by Kyoritsu Publishing Co., Ltd., Noboru Funakubo).
As a result, as shown in FIG. 6, even if the synthesized word feature is 7 × 8 × 8 dimensional dimension and the input word feature is 7 × 7 × 8 dimensional dimension, they are collated. can do.
By performing the collation as described above, it is possible to absorb the variation of the character shape not in the feature vector in the dictionary but in the collation part, so that the dictionary capacity can be further reduced.
[0012]
Further, in the above description, clustering in units of column vectors has been described, but it is also possible to perform clustering processing in units of meshes by looking at the column vectors in more detail in units of meshes.
That is, as shown in FIG. 7, clustering processing is performed in units of 1 × 1 horizontal in-mesh features (8-dimensional), and 1-mesh features are coded. Then, since the column vector is represented by a vertical 7 × horizontal mesh, it is represented by seven identification numbers. In the example of FIG. 7, identification numbers (432, 123,..., 351) are assigned to the meshes (t represents transposition).
As described above, clustering is performed in units of meshes, and coding is performed, so that identification numbers based on more accurate feature approximation can be assigned.
Further, as described above, clustering may be performed on the in-mesh features to code each mesh, and clustering processing may be performed on each coded mesh in units of columns.
That is, it is possible to perform clustering processing on the in-mesh features and encode them as shown in FIG. 7, perform clustering processing on each encoded column as described above, and assign an identification number to each column.
[0013]
(2) Example 2
Next, a description will be given of a second embodiment of the present invention in which the feature dictionary capacity is reduced using the synthesis coefficient after clustering the column features.
The number of coded column vectors is m, and the p-th column vector is f _p , The composite coefficient is k _i Then, it is checked whether there is a combination of the synthesis coefficient k and the column vector that can be expressed by the following equation (1). If there is a possible combination, the identification number of the column vector and the synthesis coefficient are recorded.
[0014]
[Expression 1]

[0015]
As a result, it is only necessary to have the synthesis coefficient k instead of the column vector in the feature dictionary, so that the dictionary capacity can be reduced.
FIG. 8 shows a functional configuration of the present embodiment.
In FIG. 8, reference numeral 11 denotes a character feature dictionary, which stores a feature vector extracted from an input character image during learning as described above.
Reference numeral 21 denotes a capacity reduction means according to the present embodiment, which is composed of the clustering means 12 and the synthesis coefficient calculation means 22 described above.
As described above, the clustering means 12 performs clustering of the feature vectors for each column of the weighted direction histogram features, and assigns an identification number from 1 to m to each representative vector.
The synthesis coefficient calculation means 22 checks whether there is a combination of the synthesis coefficient k and the column vector that can be expressed by the above equation (1), and if there is a possible combination, records the column vector identification number and the synthesis coefficient.
The synthesis coefficient or the column vector obtained by the synthesis coefficient calculation means 22 as described above is stored in the feature dictionary 13.
[0016]
The word recognition process in the present embodiment can be performed in the same manner as in the first embodiment. That is, the input word is normalized by the normalizing means 14 and the feature extraction means 15 performs feature extraction. On the other hand, based on a predetermined word list to be recognized, word feature synthesizing means 16 synthesizes word features for collation from the column features stored in the feature dictionary 13.
Next, the collation means 17 collates the feature extracted from the input word with the synthesized word feature and performs word recognition.
In the above collation, not only the dimensionality of the synthesized word feature and the dimensionality of the input word feature are the same, but also column features having different dimensionality are collated as described in the first embodiment. Also good.
As described in the first embodiment, the clustering process is performed by clustering not only the column features having the width 1 but also the column features such as the width 2 and the width 3, or performing clustering in units of meshes. Also good.
[0017]
In the above description, a case where a certain column vector is represented by the sum of other column vectors has been described, but a certain column vector is represented not only by the sum of other column vectors but also by the sum of other column vectors and difference vectors. You may do it. That is, g _j , The coefficient is l _j Then, it is checked whether or not there is a combination of the synthesis coefficients k and l and the column vector and the difference vector that can be expressed by the following equation (2), and if there are possible combinations, they are recorded. The difference vector g _j Is the difference between any of the feature vectors.
[0018]
[Expression 2]

[0019]
In this way, the column feature can be represented by the sum of other column features or the sum of the sum of other column features and the difference feature, and the column feature can be represented by the sum of other column features. The frequency that can be expressed is increased compared to the case where the feature dictionary is expressed, and the capacity of the feature dictionary can be further reduced.
FIG. 9 shows a configuration example of the feature dictionary 13 generated by the present embodiment. As shown in the figure, the reduced feature dictionary 23 has each identification number as index information and position information in the dictionary, and the position information stores m feature vectors corresponding to each identification number. The position or the storage position of the synthesis coefficient k, l is shown. Note that when the difference feature is not used and the column feature is expressed by the addition sum of the other column features as described above, the synthesis coefficient l may be set to zero.
In order to read out the feature vector from the feature dictionary, the storage location of the feature vector or the synthesis coefficient k, l is obtained from the location information corresponding to the identification number, and the feature vector is stored in the storage location. The feature vector is read as it is, and when the synthesis coefficient k, l is stored at the storage position, the feature vector is calculated from the synthesis coefficient k, l by the above-described formula (1) or (2). To do.
Also in this embodiment, as in the first embodiment, when arranging the identification numbers, the appearance frequency of the character features necessary for generating the word features for the category to be recognized is checked in advance. By arranging them in the order of frequency, access to the index information can be speeded up.
[0020]
(3) Example 3
In the weighted direction index histogram feature, in order to eliminate the redundancy of the information contained in the feature, the extracted 7 × 7 × 8 dimensional original feature is subjected to feature transformation such as canonical discriminant analysis and dimension compression. Is doing. As a result, the number of feature dimensions falls from, for example, 392 dimensions to about 100 dimensions. Thus, it is possible to reduce the dictionary capacity by having features converted in advance such as principal component analysis and canonical discriminant analysis, and further clustering and coding these features.
FIG. 10 shows a functional configuration of the present embodiment.
In FIG. 10, reference numeral 11 denotes a character feature dictionary, which stores a feature vector extracted from an input character image during learning as described above.
Reference numeral 31 denotes a capacity reduction unit according to the present embodiment, which includes a dimension compression unit 32 that performs the above-described dimension compression and the clustering unit 12 described above.
As described above, the dimension compression means 32 performs feature conversion such as canonical discriminant analysis and performs dimension compression. When performing the feature conversion of the original feature, the feature conversion is performed not on the column vector but on the original feature itself.
Here, if the original feature is f, the feature-transformed feature is w, and the feature transformation matrix obtained by canonical discriminant analysis or the like is A, the feature transformation matrix A is obtained by the following equation (3).
A * f _i = W _i ... (3)
The clustering means 12 performs the feature conversion and dimension reduction as described above. _i Clustering processing is performed on (i = 0,..., M, M: number of original features), and as described above, m (m ≦ M) feature vectors are represented, and 1 to each of these representative vectors. Identification numbers up to m are given.
Each coded column vector obtained as described above is stored in the feature dictionary 13.
[0021]
The word recognition process in the present embodiment is performed as follows.
The input word is normalized by the normalization means 14, the feature extraction means 15 performs feature extraction, and the extracted feature vector is subjected to feature conversion (dimensional compression) according to (3) above.
On the other hand, based on a predetermined word list to be recognized, word feature synthesizing means 16 synthesizes word features for collation from the column features stored in the feature dictionary 13.
Next, the collation means 17 collates the feature extracted (dimension-compressed) from the input word and the synthesized word feature to perform word recognition.
In the present embodiment, since the dimension-compressed feature amount is subjected to clustering processing, the dictionary capacity can be further reduced.
In the embodiment shown in FIG. 10, after the cluster rig processing, as described in the second embodiment, a synthesis coefficient may be obtained and stored in the dictionary. Thereby, further reduction in the capacity of the dictionary can be realized.
[0022]
(Supplementary note 1) A word recognition device for recognizing a word image,
A capacity reducing means for reducing the capacity of a character feature dictionary used for word feature synthesis;
A synthesizing unit for synthesizing a word feature for collation based on a word list to be recognized from a column or row feature of the feature dictionary reduced in capacity by the above-described reducing unit;
Feature extraction means for extracting features of the input word;
Collating means for collating the features of the input word extracted by the feature extracting means with the synthesized word features
A word recognition device characterized by that.
(Additional remark 2) The said capacity reduction means clustered the column feature or row feature with the similar feature for every feature of a mesh or for every feature of the column or row divided | segmented into the mesh, and clustered column feature Or means for assigning an identification number for identification to the row feature;
Means for holding a feature with an identification number in a dictionary
The word recognizing device according to supplementary note 1, characterized in that:
(Supplementary note 3) When clustering, not only one column or one row but also multiple columns or multiple rows are clustered together
The word recognition device of supplementary note 2 characterized by the above.
(Supplementary note 4) For a clustered column feature or row feature, a column feature or row feature is described as a coefficient sum of other column features or row features.
The word recognition device of supplementary note 2 characterized by the above.
(Supplementary note 5) For a clustered column feature or row feature, a certain column feature or row feature is described by a coefficient sum of another column feature or row feature and a difference feature.
The word recognition device of supplementary note 2 characterized by the above.
(Appendix 6) Before clustering column or row features, encode the features for each mesh in the column or row, respectively.
The word recognition device of supplementary note 2 characterized by the above.
(Supplementary Note 7) When the capacity of the character feature dictionary is reduced, the above-mentioned capacity reduction means performs clustering using character features that have undergone feature conversion and dimension compression in advance.
The word recognition device of supplementary note 2 characterized by the above.
(Appendix 8) When constructing a feature dictionary with a reduced capacity, the identification number of each column feature or each row feature and the position in the dictionary are held as index information.
After the index information, each partial feature is arranged to form a feature dictionary
The word recognition apparatus according to claim 1, wherein:
(Supplementary note 9) When arranging column features or row features, the column features or row features with high use frequency are checked in advance, and the column features or row features are arranged in descending order of use frequency.
The word recognizing device according to supplementary note 8, characterized by that.
(Additional remark 10) The said collation means collates the characteristic of an input word, and the synthetic | combination word characteristic by nonlinear expansion / contraction matching.
The word recognizing device according to supplementary note 1, characterized in that:
(Additional remark 11) It is the recording medium which recorded the word recognition program which recognizes a word image,
The above program reduces the character feature dictionary used for word feature synthesis,
From the reduced feature dictionary column or row feature, a matching word feature is synthesized based on the word list to be recognized,
The feature of the input word is extracted, and word recognition is performed by collating the extracted feature of the input word with the synthesized word feature.
A recording medium on which a word recognition program is recorded.
(Supplementary note 12) A word recognition program for recognizing a word image,
The program includes a process for reducing the capacity of a character feature dictionary used for word feature synthesis;
A process of synthesizing word features for matching based on a word list to be recognized from a column or row feature of the feature dictionary reduced in capacity,
A word recognition program that extracts a feature of an input word and causes a computer to execute a word recognition process by comparing the extracted feature of the input word with the synthesized word feature.
[0023]
【The invention's effect】
As described above, the following effects can be obtained in the present invention.
(1) Since each character feature is clustered and coded in units of column features or row features, the capacity of the character feature dictionary can be greatly reduced, and the dictionary capacity can be brought to a practical level.
In addition, if clustering and coding is performed not only for one column but for a plurality of columns, it is possible to speed up the synthesis of word features.
Furthermore, if the coding of the column feature or the row feature is performed not in the column feature unit but in the mesh unit, the coding based on the more accurate feature approximation becomes possible.
(2) By matching non-dimensional word features and input word features using non-linear expansion / contraction matching, it is possible to absorb variation in the shape of a character in the matching portion instead of the feature vector in the dictionary. It is not necessary to register feature vectors to absorb fluctuations, and it is possible to further reduce the dictionary capacity.
(3) After clustering in units of column features or row features, a combination that can represent a column feature as an addition sum of other column features, or a row feature can be expressed as an addition sum of other row features It is possible to further reduce the capacity of the dictionary by storing the identification number of the column feature or the row feature and the synthesis coefficient in the dictionary.
Also, investigate whether there is a combination that can represent a column feature as the sum of other column features and difference features, or a combination that can represent a row feature as the sum of other row features and difference features. If there are possible combinations, storing the identification numbers of column features or row features and the synthesis coefficient in the dictionary increases the frequency that can be expressed more simply than the sum of other column features. Capacitance can be achieved.
(4) If feature conversion is performed on a character feature in advance and dimension compression is performed, and the feature-converted feature is coded by performing clustering processing, it is not necessary to perform feature conversion after word feature synthesis. Processing speed can be increased. At the same time, the dictionary capacity can be reduced.
(5) Having index information for feature vectors in the dictionary enables high-speed dictionary access.
Furthermore, by arranging column features or row features in order of frequency of use, it is possible to access index information at high speed.
[Brief description of the drawings]
FIG. 1 is a diagram showing an outline of the present invention.
FIG. 2 is a diagram showing a functional configuration of a first embodiment of the present invention.
FIG. 3 is a diagram illustrating clustering of feature vectors in units of columns.
FIG. 4 is a diagram illustrating a configuration example of a feature dictionary according to the first embodiment.
FIG. 5 is a diagram illustrating clustering of column features of a plurality of columns.
FIG. 6 is a diagram illustrating collation when the number of dimensions of a synthesized word feature is different from the number of dimensions of an input word feature.
FIG. 7 is a diagram illustrating a case where clustering is performed in units of features in a mesh.
FIG. 8 is a diagram showing a functional configuration of a second embodiment of the present invention.
FIG. 9 is a diagram illustrating a configuration example of a feature dictionary generated according to the second embodiment.
FIG. 10 is a diagram showing a third embodiment of the present invention.
FIG. 11 is a diagram illustrating an example of a reduced character image (large character) for character features.
FIG. 12 is a diagram illustrating an example of a weight direction index histogram feature.
FIG. 13 is a diagram illustrating an example of synthesis of word features.
[Explanation of symbols]
1 Character feature dictionary
2 Small capacity means
3 feature dictionary
4 Normalization means
5 Feature extraction means
6 Word feature synthesis means
7 verification means
11 Character feature dictionary
12 Clustering means
13 Feature dictionary
14 Normalization means
15 Feature extraction means
16 Word feature synthesis means
17 Verification means
22 Synthesis coefficient calculation means
32-dimensional compression means

Claims

A word recognition device for recognizing a character image,
Identification that identifies the clustered column feature vector or row feature vector by meshing the input character image in a predetermined unit, clustering column feature vectors or row vectors with similar features in units of columns or rows divided into meshes A capacity reducing means comprising: means for assigning a number; and means for retaining a column feature vector or row feature vector to which the identification number is assigned in a feature dictionary;
From the column feature vector or line feature vectors of the feature dictionary, and combining means for combining a word feature vector for matching based on the word list to be recognized,
Feature extraction means for extracting feature vectors of input words;
Word recognition device characterized by comprising: a feature vector of the input word extracted by said feature extracting means and collating means for collating the word feature vectors the synthesized.

2. The word recognition apparatus according to claim 1 , wherein the capacity reducing means performs clustering using character feature vectors obtained by performing feature conversion and dimension compression in advance when the capacity of the character feature dictionary is reduced.

2. The word recognition apparatus according to claim 1, wherein the collation means collates the feature vector of the input word with the synthesized word feature vector by nonlinear expansion / contraction matching.

A computer-readable recording medium recording a program for causing a computer to execute word recognition processing,
The above program meshes input character images in a predetermined unit on a computer, clusters column feature vectors or row feature vectors with similar features for each column or row feature divided into meshes, and clustered columns assigned an identification number for identifying the feature vector or line feature vectors, and small capacity process to hold the column feature vector or line feature vector identification number is assigned to the feature dictionary,
From the column feature vector or line feature vectors of the feature dictionary, the synthesizing process for synthesizing a word feature vector for matching based on the word list to be recognized,
Extracting a feature vector of the input word, and the feature vector of the input word issued extract, by collating the word feature vectors the synthesized, computer-readable recording a word recognition program to perform word recognition process recoding media.

A program for causing a computer to execute word recognition processing,
The above program meshes input character images in a predetermined unit on a computer, clusters column feature vectors or row feature vectors with similar features for each column or row feature divided into meshes, and clustered columns assigned an identification number for identifying the feature vector or line feature vectors, and small capacity process to hold the column feature vector or line feature vector identification number is assigned to the feature dictionary,
From the column feature vector or line feature vectors of the feature dictionary, the synthesizing process for synthesizing a word feature vector for matching based on the word list to be recognized,
Extracting a feature vector of the input word, it extracts a feature vector of the input word issued, word recognition program for executing a word recognition process by matching the word feature vector above synthesis.