JP7664897B2

JP7664897B2 - Recognition processing device, recognition processing program, recognition processing method, and recognition processing system

Info

Publication number: JP7664897B2
Application number: JP2022161861A
Authority: JP
Inventors: 淳平臼井; 希柿▲崎▼; 晃幸掛; 毓珮洪; 築石丸; 直樹渡辺
Original assignee: Wacom Co Ltd
Current assignee: Wacom Co Ltd
Priority date: 2021-02-15
Filing date: 2022-10-06
Publication date: 2025-04-18
Anticipated expiration: 2041-02-15
Also published as: JP2022124208A; JP7162278B2; JP2022176360A

Description

この発明は、認識処理装置、認識処理プログラム、認識処理方法、及び認識処理システムに関し、例えば、オンライン手書き文字認識処理に適用し得る。 This invention relates to a recognition processing device, a recognition processing program, a recognition processing method, and a recognition processing system, and can be applied, for example, to online handwritten character recognition processing.

従来、オンライン手書き文字認識処理では、文字入力の際のストローク（筆跡）から特徴量を取得し、取得した特徴量に基づいて機械学習を行って学習モデルを取得し、取得した学習モデルを用いて入力文字の認識を行う手法が提案されている。 Conventionally, a method has been proposed for online handwritten character recognition processing in which features are acquired from the strokes (handwriting) made when characters are input, machine learning is performed based on the acquired features to acquire a learning model, and the acquired learning model is used to recognize the input characters.

上記のようなストロークの特徴量について機械学習を行って、文字認識処理を行う手法としては、特許文献１のような記載技術が存在する。 One technique for performing character recognition processing by performing machine learning on the features of strokes like those described above is described in Patent Document 1.

特許文献１の記載技術では、各文字について、時系列ごとのストロークの位置（以下、「入力パターン」と呼ぶ）をサンプルとして取得し、サンプルとして取得した入力パターンと標準パターン（標準的な筆跡で入力した場合の入力パターン）との間で特徴点（ストロークを構成する各位置）を対応付け、対応付けられた特徴点間の差分を、文字認識処理に用いる特徴値として取得している。 In the technology described in Patent Document 1, the positions of strokes over time (hereinafter referred to as "input patterns") are obtained as samples for each character, feature points (positions that make up the strokes) are associated between the sampled input patterns and a standard pattern (an input pattern when input in standard handwriting), and the difference between the associated feature points is obtained as a feature value to be used in character recognition processing.

そして、特許文献１の記載技術では、学習用に取得された特徴値を教師データとして機械学習を行い、学習モデルを取得する。そして、特許文献１の記載技術では、文字認識処理の際には、サンプルの入力パターンについて、全ての標準パターンと特徴点間の差異を演算して特徴量として取得し、取得した全ての標準パターンとの特徴量を学習モデルに入力して文字認識処理を行う。 The technology described in Patent Document 1 performs machine learning using the feature values acquired for learning as training data to acquire a learning model. In the technology described in Patent Document 1, during character recognition processing, the differences between all standard patterns and feature points of the sample input pattern are calculated and acquired as feature amounts, and the feature amounts with all the acquired standard patterns are input into the learning model to perform character recognition processing.

特開２０１８－１１２５２１号公報JP 2018-112521 A

しかしながら、特許文献１の記載技術では、学習モデルを得るために好適な標準パターンを文字毎に用意しなければならないので学習モデルの作成コストが高い。また、特許文献１の記載技術では、文字認識処理の際にサンプルの入力パターンと全ての標準パターンとの間の特徴量を取得して認識処理しなければならないので、非常に処理負荷が高い。 However, with the technology described in Patent Document 1, a suitable standard pattern must be prepared for each character in order to obtain a learning model, which means the cost of creating the learning model is high. In addition, with the technology described in Patent Document 1, the feature amounts between the sample input pattern and all standard patterns must be acquired and then recognized during character recognition processing, which means an extremely high processing load.

特許文献１の記載技術では、文字入力の特徴量について、より多くの情報量を確保する観点から、上記のように標準パターンとの差分を特徴値として用いたが、上記の通り、特許文献１の記載技術では、学習モデルの作成や認識処理に多大なリソース（例えば、作業コストやハードウェア資源等）を必要とする。 In the technology described in Patent Document 1, the difference from a standard pattern is used as the feature value for the character input in order to secure a larger amount of information, but as described above, the technology described in Patent Document 1 requires a large amount of resources (e.g., work costs, hardware resources, etc.) for creating a learning model and for recognition processing.

そのため、より効率的にオンライン文字入力の特徴量を得て機械学習することができる認識処理装置、認識処理プログラム、認識処理方法、及び認識処理システムが望まれている。 Therefore, there is a demand for a recognition processing device, a recognition processing program, a recognition processing method, and a recognition processing system that can obtain features of online character input more efficiently and perform machine learning.

第１の本発明の認識処理装置は、入力文字ごとに電子ペンによるストロークの時系列順の入力パターンを示すものであって前記入力パターンの時系列順の特徴点の位置情報を含む入力ストロークデータを取得し、取得した入力ストロークデータを、固定サンプル数の入力パターンに正規化する正規化処理を行って正規化ストロークデータを取得する正規化手段と、前記正規化手段が正規化した正規化ストロークデータを、前記固定サンプル数の特徴量で表現した入力ベクトルデータに変換する入力ベクトルデータ取得手段と、前記入力ベクトルデータ取得手段が取得した入力ベクトルデータを用いて機械学習した学習モデルを用いて、前記入力ベクトルデータ取得手段が取得した入力ベクトルデータについて文字認識処理を行う文字認識処理手段とを有し、前記正規化手段が行う正規化処理には、前記入力ストロークデータについて、時系列順に隣接する全ての特徴点間の間隔が所定の閾値以上となるように特徴点を間引いて中間ストロークデータを取得する第１の正規化処理工程と、前記中間ストロークデータについて、時系列順に隣接する全ての特徴点間の間隔が前記閾値以下となるように特徴点を補間したデータを取得する第２の正規化処理工程とが含まれることを特徴とする。 A first recognition processing device of the present invention has a normalization means for acquiring input stroke data indicating an input pattern of strokes made by an electronic pen in chronological order for each input character, the input stroke data including position information of feature points in chronological order of the input pattern, and performing a normalization process for normalizing the acquired input stroke data to an input pattern with a fixed number of samples to acquire normalized stroke data, an input vector data acquisition means for converting the normalized stroke data normalized by the normalization means into input vector data expressed by feature amounts of the fixed number of samples, and a character recognition processing means for performing a character recognition process on the input vector data acquired by the input vector data acquisition means using a learning model trained by machine learning using the input vector data acquired by the input vector data acquisition means, and the normalization process performed by the normalization means includes a first normalization processing step for acquiring intermediate stroke data by thinning out feature points for the input stroke data such that an interval between all adjacent feature points in chronological order is equal to or greater than a predetermined threshold, and a second normalization processing step for acquiring data obtained by interpolating feature points for the intermediate stroke data such that an interval between all adjacent feature points in chronological order is equal to or less than the threshold .

第２の本発明の認識処理プログラムは、コンピュータを、入力文字ごとに電子ペンによるストロークの時系列順の入力パターンを示すものであって前記入力パターンの時系列順の特徴点の位置情報を含む入力ストロークデータを取得し、取得した入力ストロークデータを、固定サンプル数の入力パターンに正規化する正規化処理を行って正規化ストロークデータを取得する正規化手段と、前記正規化手段が正規化した正規化ストロークデータを、前記固定サンプル数の特徴量で表現した入力ベクトルデータに変換する入力ベクトルデータ取得手段と、前記入力ベクトルデータ取得手段が取得した入力ベクトルデータを用いて機械学習した学習モデルを用いて、前記入力ベクトルデータ取得手段が取得した入力ベクトルデータについて文字認識処理を行う文字認識処理手段として機能させ、前記正規化手段が行う正規化処理には、前記入力ストロークデータについて、時系列順に隣接する全ての特徴点間の間隔が所定の閾値以上となるように特徴点を間引いて中間ストロークデータを取得する第１の正規化処理工程と、前記中間ストロークデータについて、時系列順に隣接する全ての特徴点間の間隔が前記閾値以下となるように特徴点を補間したデータを取得する第２の正規化処理工程とが含まれることを特徴とする。 A second recognition processing program of the present invention causes a computer to function as: a normalization means for acquiring input stroke data indicating an input pattern of strokes made by an electronic pen in chronological order for each input character, the input pattern including position information of feature points in chronological order, and performing a normalization process for normalizing the acquired input stroke data into an input pattern with a fixed number of samples to acquire normalized stroke data; an input vector data acquisition means for converting the normalized stroke data normalized by the normalization means into input vector data expressed by feature amounts of the fixed number of samples; and a character recognition processing means for performing character recognition processing on the input vector data acquired by the input vector data acquisition means, using a learning model trained by machine learning using the input vector data acquired by the input vector data acquisition means, wherein the normalization process performed by the normalization means includes a first normalization processing step for acquiring intermediate stroke data by thinning out feature points for the input stroke data such that an interval between all adjacent feature points in chronological order is equal to or greater than a predetermined threshold; and a second normalization processing step for acquiring data for the intermediate stroke data by interpolating feature points such that an interval between all adjacent feature points in chronological order is equal to or less than the threshold .

第３の本発明は、認識処理装置が行う認識処理方法において、前記認識処理装置は、正規化手段、入力ベクトルデータ取得手段、文字認識処理手段、及び文字認識結果出力手段を有し、前記正規化手段は、入力文字ごとに電子ペンによるストロークの時系列順の入力パターンを示すものであって前記入力パターンの時系列順の特徴点の位置情報を含む入力ストロークデータを取得し、取得した入力ストロークデータを、固定サンプル数の入力パターンに正規化する正規化処理を行って正規化ストロークデータを取得し、前記入力ベクトルデータ取得手段は、前記正規化手段が正規化した正規化ストロークデータを、前記固定サンプル数の特徴量で表現した入力ベクトルデータに変換し、前記文字認識処理手段は、前記入力ベクトルデータ取得手段が取得した入力ベクトルデータを用いて機械学習した学習モデルを用いて、前記入力ベクトルデータ取得手段が取得した入力ベクトルデータについて文字認識処理を行い、前記正規化手段が行う正規化処理には、前記入力ストロークデータについて、時系列順に隣接する全ての特徴点間の間隔が所定の閾値以上となるように特徴点を間引いて中間ストロークデータを取得する第１の正規化処理工程と、前記中間ストロークデータについて、時系列順に隣接する全ての特徴点間の間隔が前記閾値以下となるように特徴点を補間したデータを取得する第２の正規化処理工程とが含まれることを特徴とする。 A third aspect of the present invention is a recognition processing method performed by a recognition processing device, the recognition processing device having a normalization means, an input vector data acquisition means, a character recognition processing means, and a character recognition result output means, the normalization means acquires input stroke data indicating an input pattern of strokes made by an electronic pen in chronological order for each input character and including position information of feature points in chronological order of the input pattern, and acquires normalized stroke data by performing a normalization process for normalizing the acquired input stroke data into an input pattern with a fixed number of samples, and the input vector data acquisition means acquires input vector data expressed by feature amounts of the fixed number of samples from the normalized stroke data normalized by the normalization means. the character recognition processing means performs character recognition processing on the input vector data acquired by the input vector data acquisition means using a learning model that has been machine-learned using the input vector data acquired by the input vector data acquisition means, and the normalization processing performed by the normalization means includes a first normalization processing step of acquiring intermediate stroke data by thinning out feature points from the input stroke data such that intervals between all feature points adjacent in chronological order are equal to or greater than a predetermined threshold, and a second normalization processing step of acquiring data by interpolating feature points from the intermediate stroke data such that intervals between all feature points adjacent in chronological order are equal to or less than the threshold.

第４の本発明は、電子ペンと前電子ペンを用いた入力を受けることができるペンタブレットと、ユーザにより前記電子ペンで前記ペンタブレットに書きこまれた文字を認識する認識処理装置とを有する認識システムにおいて、前記認識処理装置として第１の本発明の認識処理装置を適用したことを特徴とする認識システム。 The fourth invention is a recognition system having a pen tablet capable of receiving input using an electronic pen and a front electronic pen, and a recognition processing device that recognizes characters written on the pen tablet by a user with the electronic pen, characterized in that the recognition processing device of the first invention is applied as the recognition processing device.

本発明によれば、より効率的にオンライン文字入力の特徴量を得て機械学習することができる。 The present invention makes it possible to obtain features of online character input more efficiently and perform machine learning.

第１の実施形態に係る全体構成について示したブロック図である。1 is a block diagram showing an overall configuration according to a first embodiment. 第１の実施形態に係る入力ベクトルデータの構成例について示した図である。4 is a diagram showing an example of the configuration of input vector data according to the first embodiment; FIG. 第１の実施形態に係る入力ベクトルデータの具体例（その１）について示した図である。FIG. 10 is a diagram showing a specific example (part 1) of input vector data according to the first embodiment. 第１の実施形態に係る入力ベクトルデータの具体例（その２）について示した図である。FIG. 11 is a diagram showing a specific example (part 2) of input vector data according to the first embodiment. 第１の実施形態に係る文字入力画面の構成例について示した図である。FIG. 2 is a diagram showing an example of the configuration of a character input screen according to the first embodiment; 第１の実施形態に係るストロークデータ処理部で取得される入力ストロークデータの構成例について示した図である。5A to 5C are diagrams illustrating an example of the configuration of input stroke data acquired by a stroke data processing unit according to the first embodiment; 第１の実施形態に係るストロークデータ処理部が、入力ストロークデータからオンライン文字認識処理用の入力ベクトルデータを生成する処理について示したフローチャートである。5 is a flowchart showing a process in which a stroke data processing unit according to the first embodiment generates input vector data for online character recognition processing from input stroke data. 第１の実施形態に係るストロークデータ処理部が、オンライン文字認識処理用の入力ベクトルデータを正規化して第１の正規化ストロークデータを取得する処理について示した図である。5A to 5C are diagrams illustrating a process in which the stroke data processing unit according to the first embodiment normalizes input vector data for online character recognition processing to obtain first normalized stroke data. 第１の実施形態に係るストロークデータ処理部が、第１の正規化ストロークデータの特徴点を間引く処理について示した図である。6A to 6C are diagrams illustrating processing performed by the stroke data processing unit according to the first embodiment to thin out feature points of first normalized stroke data; 第１の実施形態に係るストロークデータ処理部が、第１の正規化ストロークデータの特徴点間を所定間隔以下に埋める特徴点補間処理を行って第２の正規化ストロークデータを取得する例について示した図である。FIG. 11 is a diagram illustrating an example in which the stroke data processing unit according to the first embodiment acquires second normalized stroke data by performing feature point interpolation processing for filling intervals between feature points of first normalized stroke data to a predetermined interval or less. 第１の実施形態に係るストロークデータ処理部が、第２の正規化ストロークデータからＮ＋１個の特徴点を抽出して第３の正規化ストロークデータの例について示した図である。FIG. 11 is a diagram illustrating an example of third normalized stroke data obtained by extracting N+1 feature points from second normalized stroke data by the stroke data processing unit according to the first embodiment; 第１の実施形態に係るストロークデータ処理部が取得した第３の正規化ストロークデータに基づく画像について示した図である。13A to 13C are diagrams illustrating an image based on third normalized stroke data acquired by the stroke data processing unit according to the first embodiment. 第１の実施形態に係るストロークデータ処理部が、第３の正規化ストロークデータに基づいて取得した入力ベクトルデータの例について示している。13 illustrates an example of input vector data acquired by the stroke data processing unit according to the first embodiment based on the third normalized stroke data. 第１の実施形態に係るストロークデータ処理部が、入力ストロークデータからオフライン文字認識処理用の入力画像データを生成する処理について示したフローチャートである。10 is a flowchart showing a process in which a stroke data processing unit according to the first embodiment generates input image data for offline character recognition processing from input stroke data. 第１の実施形態に係るストロークデータ処理部が、オフラインＡＩ処理用の入力画像データを生成する過程の正規化処理について示した図である。13A to 13C are diagrams illustrating normalization processing in the process in which the stroke data processing unit according to the first embodiment generates input image data for offline AI processing. 第１の実施形態に係るストロークデータ処理部が、第５の正規化ストロークデータの各特徴点を６４画素×６４画素の正規化領域で描画して取得した入力画像データについて示した図である。FIG. 13 is a diagram showing input image data acquired by the stroke data processing unit according to the first embodiment by drawing each feature point of the fifth normalized stroke data in a normalized area of 64 pixels by 64 pixels. 第１の実施形態に係る文字認識処理部が学習モードで動作する場合の学習処理について示したフローチャートである。10 is a flowchart showing a learning process when the character recognition processing unit according to the first embodiment operates in a learning mode. 第１の実施形態に係る文字認識処理部が認識処理モードで動作する場合の認識処理について示したフローチャートである。5 is a flowchart showing a recognition process when the character recognition processing unit according to the first embodiment operates in a recognition process mode. 第２の実施形態で、ユーザが電子ペンを用いてペンタブレットに文字入力した場合における時系列ごとのペン先状態を示したタイミングチャート（その１）である。13 is a timing chart (part 1) showing the state of the pen tip over time when a user inputs characters to a pen tablet using an electronic pen in the second embodiment; 第２の実施形態で、ユーザが電子ペンを用いてペンタブレットに文字入力した場合における時系列ごとのペン先状態を示したタイミングチャート（その２）である。13 is a timing chart (part 2) showing the pen tip state over time when a user inputs characters to a pen tablet using an electronic pen in the second embodiment. 図１９のタイミングチャートに示す各特徴点におけるペン先状態の集計結果について示している。The results of counting the pen tip states at each characteristic point shown in the timing chart of FIG. 19 are shown. 図１９のタイミングチャートに示す各サンプル（特徴点）を示した図である。FIG. 20 is a diagram showing each sample (characteristic point) shown in the timing chart of FIG. 19 . 図１９のタイミングチャートに示す各特徴点について、第２の正規化方法を適用した場合における入力ストロークデータの例について示した図である。FIG. 20 is a diagram showing an example of input stroke data when a second normalization method is applied to each feature point shown in the timing chart of FIG. 19 . 図１９のタイミングチャートに示す各特徴点について、第３の正規化方法を適用した場合における入力ストロークデータの例について示した図である。FIG. 20 is a diagram showing an example of input stroke data when the third normalization method is applied to each feature point shown in the timing chart of FIG. 19 . 図１９のタイミングチャートに示す各特徴点について、第４の正規化方法を適用した場合における入力ストロークデータの例について示した図である。FIG. 20 is a diagram showing an example of input stroke data when a fourth normalization method is applied to each feature point shown in the timing chart of FIG. 19 . 図１９のタイミングチャートに示す各特徴点について、第５の正規化方法を適用した場合における入力ストロークデータの例について示した図である。FIG. 20 is a diagram showing an example of input stroke data when the fifth normalization method is applied to each feature point shown in the timing chart of FIG. 19 .

（Ａ）第１の実施形態
以下、本発明による認識処理装置、認識処理プログラム、認識処理方法、及び認識処理システムの第１の実施形態を、図面を参照しながら詳述する。この実施形態では、情報処理端末を本発明の認識処理装置として構成した例について説明する。 (A) First embodiment Hereinafter, a first embodiment of a recognition processing device, a recognition processing program, a recognition processing method, and a recognition processing system according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which an information processing terminal is configured as the recognition processing device of the present invention will be described.

（Ａ－１）第１の実施形態の構成
図１は、第１の実施形態に係る認識処理システム１の全体構成について示したブロック図である。なお、図１において括弧内の符号は、後述する第２の実施形態でのみ用いられる符号である。 (A-1) Configuration of the First Embodiment Fig. 1 is a block diagram showing the overall configuration of a recognition processing system 1 according to the first embodiment. Note that the reference characters in parentheses in Fig. 1 are reference characters used only in the second embodiment described later.

認識処理システム１は、情報処理端末１０、ペンタブレット２０及び電子ペン３０を有している。ペンタブレット２０は、電子ペン３０を用いて入力受付が可能な装置である。 The recognition processing system 1 has an information processing terminal 10, a pen tablet 20, and an electronic pen 30. The pen tablet 20 is a device capable of receiving input using the electronic pen 30.

ペンタブレット２０は、ディスプレイパネル２１の表面に電子ペン３０のペン先３１を検知するデバイス（いわゆる、「ポインティングデバイス」）として機能するデバイス）である。また、ディスプレイパネル２１には、情報処理端末１０から供給される映像信号に基づく映像を出力することも可能である。認識処理システム１では、ディスプレイパネル２１にペン先３１の軌跡等を表示することで、ユーザからの文字入力の操作を受け付けることが可能となっている。 The pen tablet 20 is a device that detects the pen tip 31 of the electronic pen 30 on the surface of the display panel 21 (a device that functions as a so-called "pointing device"). The display panel 21 can also output an image based on a video signal supplied from the information processing terminal 10. The recognition processing system 1 is capable of accepting character input operations from the user by displaying the trajectory of the pen tip 31 on the display panel 21.

情報処理端末１０は、制御部１１、映像ＩＦ１２及びＵＳＢポート１３を有している。 The information processing terminal 10 has a control unit 11, a video IF 12, and a USB port 13.

情報処理端末１０は、種々のコンピュータ（例えば、ＰＣ等）に、プログラム（実施形態に係る認識処理プログラムを含む）をインストールすることにより構成できる。 The information processing terminal 10 can be configured by installing a program (including the recognition processing program according to the embodiment) on various computers (e.g., a PC, etc.).

制御部１１は、コンテンツ処理部１１１、ディスプレイドライバ１１２、ペンタブレットドライバ１１３、及び文字認識処理部１１４を有している。 The control unit 11 has a content processing unit 111, a display driver 112, a pen tablet driver 113, and a character recognition processing unit 114.

コンテンツ処理部１１１は、ディスプレイドライバ１１２及びペンタブレットドライバ１１３を介して、ペンタブレット２０にアクセスし、ペンタブレット２０及び電子ペン３０を用いた各種のコンテンツ（例えば、ペンタブレット２０及び電子ペン３０を用いた文字入力を伴う各種操作画面を含むコンテンツ）の処理を行うアプリケーションプログラムである。 The content processing unit 111 is an application program that accesses the pen tablet 20 via the display driver 112 and the pen tablet driver 113, and processes various contents using the pen tablet 20 and the electronic pen 30 (for example, contents including various operation screens involving character input using the pen tablet 20 and the electronic pen 30).

コンテンツ処理部１１１は、ディスプレイドライバ１１２を介して文字入力を伴う操作画面（ＧＵＩ）を表示し、ペンタブレットドライバ１１３を介して液晶タブレットで電子ペン３０を用いて入力された内容（例えば、電子ペン３０がタッチされた部分の座標の情報等）を取得する。コンテンツ処理部１１１は、ペンタブレットドライバ１１３を介して、文字入力の際のストロークのデータ（時系列ごとの電子ペン３０の座標を含むデータ；以下、「入力ストロークデータ」と呼ぶ）を取得する。この実施形態において、コンテンツ処理部１１１は、文字認識処理部１１４に対して、入力ストロークデータを供給する。 The content processing unit 111 displays an operation screen (GUI) involving character input via the display driver 112, and acquires the contents input using the electronic pen 30 on the liquid crystal tablet via the pen tablet driver 113 (e.g., information on the coordinates of the part touched by the electronic pen 30, etc.). The content processing unit 111 acquires stroke data during character input (data including the coordinates of the electronic pen 30 for each time series; hereafter referred to as "input stroke data") via the pen tablet driver 113. In this embodiment, the content processing unit 111 supplies the input stroke data to the character recognition processing unit 114.

文字認識処理部１１４は、入力ストロークデータに基づく文字認識処理を行うものであり、ストロークデータ処理部１１４１、オンラインＡＩ処理部１１４２、オフラインＡＩ処理部１１４３、及び文字認識結果出力部１１４４を有している。文字認識処理部１１４は、この実施形態に係る認識処理プログラムに対応する機能を担っている。 The character recognition processing unit 114 performs character recognition processing based on input stroke data, and includes a stroke data processing unit 1141, an online AI processing unit 1142, an offline AI processing unit 1143, and a character recognition result output unit 1144. The character recognition processing unit 114 is responsible for the functions corresponding to the recognition processing program according to this embodiment.

ストロークデータ処理部１１４１は、入力ストロークデータから、オンラインＡＩ処理部１１４２の処理に適用可能なベクトルデータ（入力ベクトルデータ）と、オフラインＡＩ処理部１１４３の処理に適用可能な画像データ（以下、「入力画像データ」と呼ぶ）を生成する処理を行うものである。 The stroke data processing unit 1141 performs processing to generate, from the input stroke data, vector data (input vector data) that can be applied to processing by the online AI processing unit 1142, and image data (hereinafter referred to as "input image data") that can be applied to processing by the offline AI processing unit 1143.

この実施形態では、コンテンツ処理部１１１から文字認識処理部１１４に入力ストロークデータ及び入力画像データが供給されるものとして説明するが、文字認識処理部１１４に入力ストロークデータ及び入力画像データを供給する供給源はこれに限定されないものである。例えば、外部で作成された入力ストロークデータ及び入力画像データを文字認識処理部１１４に供給して処理するようにしてもよい。 In this embodiment, the input stroke data and input image data are described as being supplied from the content processing unit 111 to the character recognition processing unit 114, but the source that supplies the input stroke data and input image data to the character recognition processing unit 114 is not limited to this. For example, input stroke data and input image data created externally may be supplied to the character recognition processing unit 114 for processing.

オンラインＡＩ処理部１１４２は、入力ベクトルデータが供給されると、当該入力ベクトルデータに基づく文字認識処理又は学習処理を行う。文字認識処理部１１４が学習モードで動作する場合、オンラインＡＩ処理部１１４２は、入力ベクトルデータと正解ラベル（教師ラベル）のセットを用いて機械学習処理を行って学習モデルを更新する。オンラインＡＩ処理部１１４２は、入力ベクトルデータが供給されると、当該入力ベクトルデータに基づく文字認識処理又は学習処理を行う。文字認識処理部１１４が学習モードで動作する場合、オンラインＡＩ処理部１１４２は、入力ベクトルデータと正解ラベルのセットを用いて機械学習処理を行う。文字認識処理部１１４が認識処理モードで動作する場合、オンラインＡＩ処理部１１４２は、入力ベクトルデータに基づいて学習済の学習モデルを用いた文字認識処理を行い、文字認識結果（以下、「オンライン文字認識結果」とも呼ぶ）を出力する。なお、この実施形態では、オンラインＡＩ処理部１１４２は、オンライン文字認識結果に信頼度のデータを付加するものとする。 When input vector data is supplied, the online AI processing unit 1142 performs character recognition processing or learning processing based on the input vector data. When the character recognition processing unit 114 operates in the learning mode, the online AI processing unit 1142 performs machine learning processing using a set of input vector data and correct answer labels (teacher labels) to update the learning model. When input vector data is supplied, the online AI processing unit 1142 performs character recognition processing or learning processing based on the input vector data. When the character recognition processing unit 114 operates in the learning mode, the online AI processing unit 1142 performs machine learning processing using a set of input vector data and correct answer labels. When the character recognition processing unit 114 operates in the recognition processing mode, the online AI processing unit 1142 performs character recognition processing using a learned learning model based on the input vector data, and outputs a character recognition result (hereinafter also referred to as an "online character recognition result"). In this embodiment, the online AI processing unit 1142 adds reliability data to the online character recognition result.

オフラインＡＩ処理部１１４３は、入力画像データが供給されると、当該入力画像データに基づく文字認識処理又は学習処理を行う。文字認識処理部１１４が学習モードで動作する場合、オフラインＡＩ処理部１１４３は、入力画像データと正解ラベルのセットを用いて機械学習処理を行って学習モデルを更新する。オフラインＡＩ処理部１１４３は、入力画像データが供給されると、当該入力画像データに基づく文字認識処理又は学習処理を行う。文字認識処理部１１４が学習モードで動作する場合、オフラインＡＩ処理部１１４３は、入力画像データと正解ラベルのセットを用いて機械学習処理を行う。文字認識処理部１１４が認識処理モードで動作する場合、オフラインＡＩ処理部１１４３は、入力画像データに基づいて学習済の学習モデルを用いた文字認識処理を行い、文字認識結果（以下、「オフライン文字認識結果」とも呼ぶ）を出力する。なお、この実施形態では、オフラインＡＩ処理部１１４３は、オフライン文字認識結果に信頼度のデータを付加するものとする。 When input image data is supplied, the offline AI processing unit 1143 performs character recognition processing or learning processing based on the input image data. When the character recognition processing unit 114 operates in learning mode, the offline AI processing unit 1143 performs machine learning processing using a set of input image data and correct answer labels to update the learning model. When input image data is supplied, the offline AI processing unit 1143 performs character recognition processing or learning processing based on the input image data. When the character recognition processing unit 114 operates in learning mode, the offline AI processing unit 1143 performs machine learning processing using a set of input image data and correct answer labels. When the character recognition processing unit 114 operates in recognition processing mode, the offline AI processing unit 1143 performs character recognition processing using a learned learning model based on the input image data, and outputs a character recognition result (hereinafter also referred to as an "offline character recognition result"). In this embodiment, the offline AI processing unit 1143 adds reliability data to the offline character recognition result.

この実施形態において、オンラインＡＩ処理部１１４２及びオフラインＡＩ処理部１１４３については、種々の機械学習用のエンジン（ＡＩのプラットフォーム）を用いて構成することができる。したがって、この実施形態では、オンラインＡＩ処理部１１４２及びオフラインＡＩ処理部１１４３における機械学習の方法（学習モデルの作成方法）や、作成した学習モデルを用いた認識処理（判定処理）の詳細について説明を省略する。 In this embodiment, the online AI processing unit 1142 and the offline AI processing unit 1143 can be configured using various machine learning engines (AI platforms). Therefore, in this embodiment, details of the machine learning method (method of creating a learning model) in the online AI processing unit 1142 and the offline AI processing unit 1143 and the recognition processing (determination processing) using the created learning model will not be described.

文字認識結果出力部１１４４は、文字認識処理部１１４が認識処理モードで動作する場合、オンライン文字認識結果と、オフライン文字認識結果とに基づいて最終的な文字認識結果（以下、「出力文字認識結果」とも呼ぶ）を出力する。文字認識結果出力部１１４４は、オンライン文字認識結果と、オフライン文字認識結果を評価し、その評価結果に基づいていずれかの文字認識結果を出力するようにしてもよい。例えば、文字認識結果出力部１１４４は、オンライン文字認識結果と、オフライン文字認識結果で、付加された評価値の高い方を採用して出力文字認識結果として出力するようにしてもよい。 When the character recognition processing unit 114 operates in the recognition processing mode, the character recognition result output unit 1144 outputs a final character recognition result (hereinafter also referred to as an "output character recognition result") based on the online character recognition result and the offline character recognition result. The character recognition result output unit 1144 may evaluate the online character recognition result and the offline character recognition result, and output one of the character recognition results based on the evaluation result. For example, the character recognition result output unit 1144 may adopt the online character recognition result or the offline character recognition result with the higher added evaluation value, and output it as the output character recognition result.

この実施形態において、コンテンツ処理部１１１は、文字認識処理部１１４を学習モードで動作させる際に、文字認識処理部１１４に対して、入力ストロークデータ共に、当該入力ストロークデータに対応する正解ラベル（当該入力ストロークデータに対応する正解文字の識別子）を供給するものとする。また、この実施形態において、コンテンツ処理部１１１は、文字認識処理部１１４を認識処理モードで動作させる場合、文字認識処理部１１４に入力ストロークデータを供給して出力文字認識結果を取得するものとする。 In this embodiment, when the content processing unit 111 operates the character recognition processing unit 114 in the learning mode, it supplies the character recognition processing unit 114 with the input stroke data as well as a correct label (an identifier of the correct character corresponding to the input stroke data) corresponding to the input stroke data. Also, in this embodiment, when the content processing unit 111 operates the character recognition processing unit 114 in the recognition processing mode, it supplies the input stroke data to the character recognition processing unit 114 and obtains the output character recognition result.

次に、ストロークデータ処理部１１４１が取得する入力ストロークデータの構成例について説明する。 Next, we will explain an example of the configuration of input stroke data acquired by the stroke data processing unit 1141.

入力ストロークデータには、ペンタブレット２０（ディスプレイパネル２１）で認識された時系列ごとの電子ペン３０のペン先３１の位置に関する情報が含まれている。 The input stroke data includes information regarding the position of the tip 31 of the electronic pen 30 for each time series as recognized by the pen tablet 20 (display panel 21).

以下では、ディスプレイパネル２１で電子ペン３０のペン先３１と接触する面（以下、「ディスプレイ接触面」と呼ぶ）と並行する方向を「横方向」と呼び、ディスプレイ接触面と直交する方向を「高さ方向」と呼ぶものとする。また、以下では、ペン先３１とディスプレイ接触面との高さ方向に関する状態（ステータス）を「ペン先状態」と呼ぶものとする。第１の実施形態においては、「ペン先状態」という用語は、ペン先３１がディスプレイ接触面に接触した状態（以下、「コンタクト状態」と呼ぶ）又はペン先３１がディスプレイ接触面に接触していない状態（以下、「非コンタクト状態」と呼ぶ）のいずれかを示すステータスであるものとして説明する。 Hereinafter, the direction parallel to the surface of the display panel 21 that comes into contact with the pen tip 31 of the electronic pen 30 (hereinafter referred to as the "display contact surface") will be referred to as the "horizontal direction", and the direction perpendicular to the display contact surface will be referred to as the "height direction". In addition, hereinafter, the state (status) related to the height direction between the pen tip 31 and the display contact surface will be referred to as the "pen tip state". In the first embodiment, the term "pen tip state" will be described as a status indicating either a state in which the pen tip 31 is in contact with the display contact surface (hereinafter referred to as the "contact state") or a state in which the pen tip 31 is not in contact with the display contact surface (hereinafter referred to as the "non-contact state").

この実施形態の例では、入力ストロークデータには、ペンタブレット２０（ディスプレイパネル２１）が認識したペン先３１の横方向の位置（以下、「サンプル」又は「サンプル位置」と呼ぶ）の情報と、ペンタブレット２０（ディスプレイパネル２１）が認識したペン先状態（ペン先３１の高さ方向の位置）の情報が含まれるものとして説明する。 In this embodiment, the input stroke data is described as including information on the lateral position of the pen tip 31 recognized by the pen tablet 20 (display panel 21) (hereinafter referred to as the "sample" or "sample position") and information on the pen tip state (the vertical position of the pen tip 31) recognized by the pen tablet 20 (display panel 21).

そして、以下では、図１に示すように、ペンタブレット２０（ディスプレイパネル２１）の「画面／タッチパネル／ポインティングデバイス」としての水平方向（左右方向）をＸ軸とし、ペンタブレット２０（ディスプレイパネル２１）の「画面／タッチパネル／ポインティングデバイス」としての垂直方向（上下方向）をＹ軸として、入力ストロークデータにおけるサンプル位置の座標系を表すものとする。この実施形態の例では、サンプル位置の座標系は、ペンタブレット２０（ディスプレイパネル２１）の左上の点を原点（ｘ＝０，ｙ＝０）とし、下方向を「Ｙ座標が増加する方向（＋Ｙ方向）」とし、右方向を「Ｘ座標が増加する方向（＋Ｘ方向）」とする。したがって、以下では、上方向が「Ｙ座標が減少する方向（－Ｙ方向）」となり、左方向が「Ｘ座標が減少する方向（－Ｘ方向）」となる。なお、以下では、時系列ごとのサンプル位置（サンプル）の軌跡に沿った位置（後述する正規化された領域上の位置を含む）を総称して「特徴点」とも呼ぶものとする。 In the following, as shown in FIG. 1, the horizontal direction (left-right direction) of the pen tablet 20 (display panel 21) as the "screen/touch panel/pointing device" is taken as the X axis, and the vertical direction (up-down direction) of the pen tablet 20 (display panel 21) as the "screen/touch panel/pointing device" is taken as the Y axis, and the coordinate system of the sample position in the input stroke data is expressed. In this embodiment, the coordinate system of the sample position has the upper left point of the pen tablet 20 (display panel 21) as the origin (x=0, y=0), the downward direction is the "direction in which the Y coordinate increases (+Y direction)", and the right direction is the "direction in which the X coordinate increases (+X direction)". Therefore, in the following, the upward direction is the "direction in which the Y coordinate decreases (-Y direction)", and the leftward direction is the "direction in which the X coordinate decreases (-X direction)". In addition, in the following, the positions along the trajectory of the sample positions (samples) for each time series (including positions on the normalized area described later) are collectively referred to as "feature points".

次に、オンラインＡＩ処理部１１４２による文字認識処理の概要について説明する。 Next, we will provide an overview of the character recognition process performed by the online AI processing unit 1142.

まず、オンラインＡＩ処理部１１４２で処理される入力ベクトルデータの構成例について説明する。 First, we will explain an example of the configuration of input vector data processed by the online AI processing unit 1142.

この実施形態において、入力ベクトルデータは、１文字あたり、時系列ごとのＮ個の特徴点のそれぞれに対応するＭ次元のベクトルにより構成されるデータ（すなわち、Ｍ列×Ｎ行の行列式により表されるデータ）であるものとする。 In this embodiment, the input vector data is data composed of M-dimensional vectors corresponding to each of the N feature points for each time series per character (i.e., data represented by a matrix of M columns by N rows).

この実施形態において、入力ベクトルデータは、各特徴点について、現在の特徴点の座標と、次の時系列の特徴点への移動量（ベクトル）と、ペン先状態を示す情報が含まれているものとする。 In this embodiment, the input vector data includes, for each feature point, the coordinates of the current feature point, the amount of movement (vector) to the next feature point in the time series, and information indicating the pen tip state.

この実施形態の例では、入力ベクトルデータは、図２に示す７次元のパラメータ（Ｍ＝７）により表現されるベクトルデータであるものとする。 In this embodiment, the input vector data is assumed to be vector data represented by seven-dimensional parameters (M=7) as shown in FIG. 2.

この実施形態の例では、入力ベクトルデータにおいて、各時系列の特徴点に対応するベクトルには、当該特徴点のＸ座標（以下、「ＶＸ」と表す）、当該特徴点のＹ座標（以下、「ＶＹ」と表す）、当該特徴点から次の時系列の特徴点への右向きの移動量（以下、「ＶＲ」と表す）、当該特徴点から次の時系列の特徴点への上方向の移動量（以下、「ＶＵ」と表す）、当該特徴点から次の時系列の特徴点への左方向の移動量（以下、「ＶＬ」と表す）、当該特徴点から次の時系列の特徴点への下方向きの移動量（以下、「ＶＤ」と表す）、当該特徴点と次の時系列との間の区間におけるペン先状態を示す値（以下、「ＶＴ」と表す）が含まれるものとして説明する。なお、ＶＴは、電子ペン３０のペン先３１がディスプレイ接触面に接触している状態（以下、「コンタクト状態」と呼ぶ）を表す「１」と、接触していない状態（以下、「非コンタクト状態」と呼ぶ）を表す「０」のいずれかの値が設定されるものとする。従って時系列ｔ（ｔは、１～Ｎのいずれかの整数）の特徴量をＶ（ｔ）とすると、Ｖ（ｔ）は以下の（１）式のように示すことができる。そして、１文字分の入力ベクトルデータをＺとすると、Ｚは以下の（２）式のような行列式で示すことができる。
（ｔ）＝｛ＶＸ（ｔ），ＶＹ（ｔ），ＶＲ（ｔ），ＶＵ（ｔ），ＶＬ（ｔ），ＶＤ（ｔ），ＶＴ（ｔ）｝ …（１）

In this embodiment, the input vector data is described as including the X coordinate of the feature point (hereinafter, represented as "VX"), the Y coordinate of the feature point (hereinafter, represented as "VY"), the rightward movement amount from the feature point to the feature point of the next time series (hereinafter, represented as "VR"), the upward movement amount from the feature point to the feature point of the next time series (hereinafter, represented as "VU"), the leftward movement amount from the feature point to the feature point of the next time series (hereinafter, represented as "VL"), the downward movement amount from the feature point to the feature point of the next time series (hereinafter, represented as "VD"), and a value indicating the pen tip state in the section between the feature point and the next time series (hereinafter, represented as "VT"). Note that VT is set to either "1" which indicates a state in which the pen tip 31 of the electronic pen 30 is in contact with the display contact surface (hereinafter, referred to as "contact state"), or "0" which indicates a state in which the pen tip 31 is not in contact (hereinafter, referred to as "non-contact state"). Therefore, if the feature amount of a time series t (t is any integer from 1 to N) is V(t), then V(t) can be expressed as in the following equation (1). If the input vector data for one character is Z, then Z can be expressed by a determinant such as the following equation (2).
(t)={VX(t), VY(t), VR(t), VU(t), VL(t), VD(t), VT(t)} …(1)

上記の通り、特徴量Ｖ（ｔ）のうち、ＶＲ（ｔ）、ＶＵ（ｔ）、ＶＬ（ｔ）、及びＶＤ（ｔ）は、次の時系列ｔ＋１の特徴点への移動量を表している。この実施形態では、入力ベクトルデータを表す座標系において、左上を原点（Ｘ＝０、Ｙ＝０）とし、右方向にＸの値が増加し、下方向にＹの値が増加するものとしている。そうすると、Ｙ軸上でＹが増加する方向（＋Ｙ方向）が「下方向」となりＹが減少する方向（－Ｙ方向）が「上方向」となる。また、Ｘ軸上でＸが増加する方向（＋Ｘ方向）が「右方向」となりＸが減少する方向（－Ｘ方向）が「左方向」となる。この場合、ＶＲ（ｔ）は、＋Ｘ方向への移動量を表すため、ＶＸ（ｔ＋１）＞ＶＸ（ｔ）の場合ＶＲ（ｔ）＝ＶＸ（ｔ＋１）－ＶＸ（ｔ）となり、ＶＸ（ｔ＋１）≦ＶＸ（ｔ）の場合にＶＲ（ｔ）＝０となる。また、ＶＬ（ｔ）は、－Ｘ方向への移動量を表すため、ＶＸ（ｔ＋１）＜ＶＸ（ｔ）の場合ＶＬ（ｔ）＝ＶＸ（ｔ）－ＶＸ（ｔ＋１）となり、ＶＸ（ｔ＋１）≧ＶＸ（ｔ）の場合にＶＬ（ｔ）＝０となる。さらに、ＶＵ（ｔ）は、－Ｙ方向への移動量を表すため、ＶＹ（ｔ＋１）＜ＶＹ（ｔ）の場合ＶＵ（ｔ）＝ＶＹ（ｔ）－ＶＹ（ｔ＋１）となり、ＶＹ（ｔ＋１）≧ＶＹ（ｔ）の場合にＶＵ（ｔ）＝０となる。さらにまた、ＶＤ（ｔ）は、＋Ｙ方向への移動量を表すため、ＶＹ（ｔ＋１）＞ＶＹ（ｔ）の場合ＶＤ（ｔ）＝ＶＹ（ｔ＋１）－ＶＹ（ｔ）となり、ＶＹ（ｔ＋１）≦ＶＹ（ｔ）の場合にＶＤ（ｔ）＝０となる。 As described above, among the feature quantity V(t), VR(t), VU(t), VL(t), and VD(t) represent the amount of movement to the feature point of the next time series t+1. In this embodiment, in the coordinate system representing the input vector data, the upper left corner is the origin (X=0, Y=0), and the X value increases to the right and the Y value increases to the bottom. In this case, the direction in which Y increases on the Y axis (+Y direction) is the "downward direction" and the direction in which Y decreases (-Y direction) is the "upward direction". Also, the direction in which X increases on the X axis (+X direction) is the "rightward direction" and the direction in which X decreases (-X direction) is the "leftward direction". In this case, VR(t) represents the amount of movement in the +X direction, so if VX(t+1)>VX(t), then VR(t)=VX(t+1)-VX(t), and if VX(t+1)≦VX(t), then VR(t)=0. Furthermore, since VL(t) represents the amount of movement in the -X direction, if VX(t+1)<VX(t), then VL(t)=VX(t)-VX(t+1), and if VX(t+1)≧VX(t), then VL(t)=0. Furthermore, since VU(t) represents the amount of movement in the -Y direction, if VY(t+1)<VY(t), then VU(t)=VY(t)-VY(t+1), and if VY(t+1)≧VY(t), then VU(t)=0. Furthermore, since VD(t) represents the amount of movement in the +Y direction, if VY(t+1)>VY(t), then VD(t)=VY(t+1)-VY(t), and if VY(t+1)≦VY(t), then VD(t)=0.

なお、この実施形態では、入力ベクトルデータにおいて、移動量（動き量）を上下左右の４次元（ＶＲ（ｔ），ＶＵ（ｔ），ＶＬ（ｔ），ＶＤ（ｔ））で表しているが、負の値をとっても支障がない場合は水平方向と垂直方向の２次元で表すようにしてもよい。入力ベクトルデータにおいて、移動量を上下左右の４次元（ＶＲ（ｔ），ＶＵ（ｔ），ＶＬ（ｔ），ＶＤ（ｔ））で表すことで、同じ軸上の変化でも別個の項目の特徴量として表現可能とし、機械学習の精度に影響を与えること（すなわち、ＡＩによる判定処理の調整）ができる。例えば、この実施形態において、入力ベクトルデータの移動量を２次元（ｘ（ｔ）、Ｙ（ｔ））又は４次元（ＶＲ（ｔ），ＶＵ（ｔ），ＶＬ（ｔ），ＶＤ（ｔ））のいずれかで表現可能とし、オペレータの操作等により認識精度の良い方を用いた処理を行うようにしてもよい。 In this embodiment, the amount of movement (amount of movement) in the input vector data is expressed in four dimensions (VR(t), VU(t), VL(t), VD(t)), but it may be expressed in two dimensions (horizontal and vertical) if negative values are acceptable. By expressing the amount of movement in the input vector data in four dimensions (VR(t), VU(t), VL(t), VD(t)), even changes on the same axis can be expressed as features of separate items, which can affect the accuracy of machine learning (i.e., adjustment of the judgment process by AI). For example, in this embodiment, the amount of movement of the input vector data can be expressed in either two dimensions (x(t), Y(t)) or four dimensions (VR(t), VU(t), VL(t), VD(t)), and the processing using the one with better recognition accuracy can be performed by the operator's operation, etc.

次に、入力ベクトルデータを構成する時刻ｔの特徴量Ｖ（ｔ）の具体例について図３、図４を用いて説明する。 Next, a specific example of the feature quantity V(t) at time t that constitutes the input vector data will be described with reference to Figures 3 and 4.

図３は、ｔ＝１、ｔ＝２の時点の特徴点をそれぞれＰ１、Ｐ２とした場合における特徴量Ｖ（１）について表した図である。また、図３では、Ｐ１の座標を（Ｘ，Ｙ）＝（１，２）、Ｐ２の座標を（Ｘ，Ｙ）＝（２，１）としている。 Figure 3 shows the feature quantity V(1) when the feature points at time t = 1 and t = 2 are P1 and P2, respectively. In Figure 3, the coordinates of P1 are (X, Y) = (1, 2), and the coordinates of P2 are (X, Y) = (2, 1).

そうすると、特徴量Ｖ（１）は、以下の（３）式のように示すことができる。
Ｖ（１）＝｛ＶＸ（１），ＶＹ（１），ＶＲ（１），ＶＵ（１），ＶＬ（１），ＶＤ（１），ＶＴ（１）｝
＝｛１，２，１，１，０，０，１｝…（３） Then, the feature amount V(1) can be expressed as the following equation (3).
V(1)={VX(1), VY(1), VR(1), VU(1), VL(1), VD(1), VT(1)}
={1,2,1,1,0,0,1}...(3)

図４は、ｔ＝１、ｔ＝２、ｔ＝３、ｔ＝４の時点の特徴点をそれぞれＰ１、Ｐ２、Ｐ３、Ｐ４とした場合における特徴量Ｖ（１）、Ｖ（２）、Ｖ（３）について表した図である。また、図４では、Ｐ１の座標を（Ｘ，Ｙ）＝（１，３）、Ｐ２の座標を（Ｘ，Ｙ）＝（３，１）、Ｐ３の座標を（Ｘ，Ｙ）＝（１，１）、Ｐ４の座標を（Ｘ，Ｙ）＝（３，３）としている。なお、図４では、Ｐ２とＰ３の区間が非コンタクト状態の区間であるものとしている。 Figure 4 shows feature quantities V(1), V(2), and V(3) when feature points at times t=1, t=2, t=3, and t=4 are P1, P2, P3, and P4, respectively. In addition, in Figure 4, the coordinates of P1 are (X,Y)=(1,3), the coordinates of P2 are (X,Y)=(3,1), the coordinates of P3 are (X,Y)=(1,1), and the coordinates of P4 are (X,Y)=(3,3). In Figure 4, the section between P2 and P3 is considered to be a section in a non-contact state.

そうすると、特徴量Ｖ（１）、Ｖ（２）、Ｖ（３）は、それぞれ以下の（４）式～（６）式のように示すことができる。
Ｖ（１）＝｛ＶＸ（１），ＶＹ（１），ＶＲ（１），ＶＵ（１），ＶＬ（１），ＶＤ（１），ＶＴ（１）｝
＝｛１，３，２，１，０，０，１｝…（４）
Ｖ（２）＝｛ＶＸ（２），ＶＹ（２），ＶＲ（２），ＶＵ（２），ＶＬ（２），ＶＤ（２），ＶＴ（２）｝
＝｛３，１，０，０，２，０，０｝…（５）
Ｖ（３）＝｛ＶＸ（３），ＶＹ（３），ＶＲ（３），ＶＵ（３），ＶＬ（３），ＶＤ（３），ＶＴ（３）｝
＝｛１，１，２，０，０，２，１｝…（６） Then, the feature amounts V(1), V(2), and V(3) can be expressed by the following equations (4) to (6), respectively.
V(1)={VX(1), VY(1), VR(1), VU(1), VL(1), VD(1), VT(1)}
={1,3,2,1,0,0,1}...(4)
V(2)={VX(2), VY(2), VR(2), VU(2), VL(2), VD(2), VT(2)}
={3,1,0,0,2,0,0}...(5)
V(3)={VX(3), VY(3), VR(3), VU(3), VL(3), VD(3), VT(3)}
={1,1,2,0,0,2,1}...(6)

（Ａ－２）第１の実施形態の動作
次に、以上のような構成を有するこの実施形態の認識処理システム１の動作（実施形態に係る認識処理方法の各手順）を説明する。 (A-2) Operation of the First Embodiment Next, the operation of the recognition processing system 1 of this embodiment having the above-mentioned configuration (each procedure of the recognition processing method according to the embodiment) will be described.

まず、認識処理システム１において、ユーザからペンタブレット２０（ディスプレイパネル２１）と電子ペン３０を用いて文字入力を受け付ける処理の例について説明する。 First, we will explain an example of a process in which the recognition processing system 1 accepts character input from a user using the pen tablet 20 (display panel 21) and electronic pen 30.

上記の通り、この実施形態の認識処理システム１では、学習モード及び認識処理モードのいずれの動作モードで動作する場合でも、コンテンツ処理部１１１は、ペンタブレット２０（ディスプレイパネル２１）にユーザから文字入力（電子ペン３０を用いた文字入力）を受け付けるための操作画面（以下、「文字入力画面」と呼ぶ）を表示するものとする。 As described above, in the recognition processing system 1 of this embodiment, regardless of whether the system is operating in the learning mode or the recognition processing mode, the content processing unit 111 displays an operation screen (hereinafter referred to as the "character input screen") on the pen tablet 20 (display panel 21) for accepting character input from the user (character input using the electronic pen 30).

文字入力画面としては種々の構成の操作画面を適用することができるが、例えば、図５に示すような操作画面を適用するようにしてもよい。 Operation screens of various configurations can be used as the character input screen, but for example, an operation screen such as that shown in FIG. 5 may be used.

図５に示す文字入力画面には、文字入力を受け付けることができる矩形の領域（以下、「文字入力フィールド」と呼ぶ）が配置されている。図５では、１つの文字入力フィールドＦ１０１が配置されている。図５では、文字入力フィールドに対して、ユーザに文字入力を要求する文字（学習処理や判定処理において正解ラベルとして機能する文字）を付記している。例えば、文字入力フィールドＦ１０１には、それぞれ「十」という文字（漢字）が付記されている。これにより、認識処理システム１では、ユーザから、各文字入力フィールドの枠内に、付記された文字の入力（電子ペン３０を用いた入力）を受け付けることができる。図５では、文字入力フィールドＦ１０１の枠内に、「十」という文字が電子ペン３０で手書きされた状態について示している。なお、図５では、説明を簡易にするため、１画面に１つの文字入力フィールドを配置する例について示しているが、このような文字入力フィールドのレイアウトについては限定されないものであり、複数の文字入力フィールドを配置してもよいことは当然である。 In the character input screen shown in FIG. 5, rectangular areas (hereinafter referred to as "character input fields") that can accept character input are arranged. In FIG. 5, one character input field F101 is arranged. In FIG. 5, characters that request the user to input characters (characters that function as correct labels in learning processing and judgment processing) are added to the character input field. For example, the character "10" (kanji) is added to each character input field F101. As a result, the recognition processing system 1 can accept input of the added character (input using the electronic pen 30) from the user in the frame of each character input field. In FIG. 5, the character "10" is handwritten with the electronic pen 30 in the frame of the character input field F101. Note that, in FIG. 5, an example in which one character input field is arranged on one screen is shown for the sake of simplicity of explanation, but the layout of such a character input field is not limited, and it is natural that multiple character input fields may be arranged.

この実施形態の文字入力画面では、説明を簡易とするため、１文字入力に対して１つの領域（文字入力フィールド）を設定する例を用いて説明するが、認識処理システム１において１つの領域に対して複数の文字入力を受け付けて文字単位の切り出しを行うようにしてもよい。 For ease of explanation, the character input screen of this embodiment will be described using an example in which one area (character input field) is set for one character input, but the recognition processing system 1 may also accept multiple character inputs in one area and perform character-by-character extraction.

この実施形態において、コンテンツ処理部１１１は、例えば、図５に示すような文字入力画面をユーザに提示し、ユーザから電子ペン３０を用いた書き込み入力を受け付け、その入力にもとづいて入力文字に対応する入力ストロークデータを取得することができるものとする。 In this embodiment, the content processing unit 111 can present a character input screen such as that shown in FIG. 5 to the user, accept writing input from the user using the electronic pen 30, and obtain input stroke data corresponding to the input character based on the input.

この実施形態では、コンテンツ処理部１１１は、ペン先状態が非コンタクト状態からコンタクト状態となったときの座標と、ペン先状態がコンタクト状態となっている間の所定期間（例えば、０．１秒程度）ごとの座標と、ペン先状態がコンタクト状態から非コンタクト状態となったときの座標をサンプル位置として取得するものとして説明する。 In this embodiment, the content processing unit 111 is described as acquiring, as sample positions, the coordinates when the pen tip state changes from a non-contact state to a contact state, the coordinates for each predetermined period (e.g., about 0.1 seconds) while the pen tip state is in the contact state, and the coordinates when the pen tip state changes from a contact state to a non-contact state.

図６は、図５のように文字入力フィールドＦ１０１の枠内に、「十」という文字が電子ペン３０で手書きされた場合に、ストロークデータ処理部１１４１で取得される入力ストロークデータの構成例について示した図である。 Figure 6 shows an example of the configuration of input stroke data acquired by the stroke data processing unit 1141 when the character "10" is handwritten with the electronic pen 30 within the frame of the character input field F101 as shown in Figure 5.

図６（ａ）は、「十」という文字が電子ペン３０で手書きされた場合のサンプル点を描画した図となっている。図６（ａ）に示す座標系では、Ｘの範囲が０～１０００、Ｙの範囲が０～１００００となっている。すなわち、図６に示す入力ストロークデータの画像は１０００画素×１０００画素の画像となっている。 Figure 6(a) is a diagram depicting sample points when the character "10" is handwritten with the electronic pen 30. In the coordinate system shown in Figure 6(a), the X range is 0 to 1000, and the Y range is 0 to 10000. In other words, the image of the input stroke data shown in Figure 6 is an image of 1000 pixels x 1000 pixels.

図６（ｂ）は、図６（ｂ）に示す入力ストロークデータの時系列番号ごとの値を示した図である。時系列番号は、時系列の順序を示す値であり、値が小さいほど前の時系列（時刻）を表している。図６（ｂ）に示すように、入力ストロークデータでは、時系列番号ごとにＸ座標とＹ座標の値と、ペン先状態の値の情報が記録されている。 Figure 6(b) is a diagram showing the values for each time series number of the input stroke data shown in Figure 6(b). The time series number is a value that indicates the order of the time series, with smaller values representing earlier time series (times). As shown in Figure 6(b), the input stroke data records information on the X and Y coordinate values and pen tip state value for each time series number.

このとき、ストロークデータ処理部１１４１は、入力ストロークデータの各サンプル（特徴点）のデータを１画ごと（１スロークごと）に分けて管理する。例えば、入力ストロークデータにおいて、ペン先状態が「０」のデータを境界とすることで、１画（１ストローク）ごとのサンプル位置のリストを得ることができる。例えば、図６（ａ）において、先頭から１４個目にペン先状態が「０」のサンプル位置が表れるため、時系列が１～１３のサンプル位置が１画目であり、時系列が１４～２３のサンプル位置が２画目となる。 At this time, the stroke data processing unit 1141 manages the data of each sample (feature point) of the input stroke data by dividing it into strokes (strokes). For example, by using data with a pen tip state of "0" as a boundary in the input stroke data, a list of sample positions for each stroke (stroke) can be obtained. For example, in FIG. 6(a), the 14th sample position from the top has a pen tip state of "0", so the sample positions from 1 to 13 in the time series are the first stroke, and the sample positions from 14 to 23 in the time series are the second stroke.

次に、ストロークデータ処理部１１４１が、入力ストロークデータからオンライン文字認識処理用の入力ベクトルデータを生成する処理について図７のフローチャートを用いて説明する。 Next, the process in which the stroke data processing unit 1141 generates input vector data for online character recognition processing from input stroke data will be explained using the flowchart in FIG. 7.

まず、コンテンツ処理部１１１からストロークデータ処理部１１４１に１文字分の入力ストロークデータが供給され保持されたものとする（Ｓ１０１）。 First, input stroke data for one character is supplied from the content processing unit 111 to the stroke data processing unit 1141 and stored (S101).

次に、ストロークデータ処理部１１４１は、入力ストロークデータについて所定の解像度の領域（以下、「正規化領域」と呼ぶ）に丁度おさまるように正規化したデータ（以下、「第１の正規化ストロークデータ」と呼ぶ）を取得する（Ｓ１０２）。 Next, the stroke data processing unit 1141 acquires data (hereinafter referred to as "first normalized stroke data") that has been normalized so that the input stroke data fits exactly within an area of a predetermined resolution (hereinafter referred to as "normalized area") (S102).

図８は、ストロークデータ処理部１１４１が、オンライン文字認識処理用の入力ベクトルデータを正規化して第１の正規化ストロークデータを取得する処理について示した図である。 Figure 8 shows the process in which the stroke data processing unit 1141 normalizes input vector data for online character recognition processing to obtain first normalized stroke data.

図８（ａ）は、図６に示す入力ストロークデータのうち、サンプル位置（特徴点）が描画された領域のみを切り出した画像となっている。 Figure 8 (a) is an image that has been cut out from the input stroke data shown in Figure 6, showing only the area where the sample positions (feature points) are drawn.

図６に示す入力ストロークデータにおいて、Ｘの最大値が６３５で、Ｘの最小値が４２７である。また、図６に示す入力ストロークデータにおいて、Ｙの最大値が６５８で、Ｙの最小値が３８８である。したがって、図８（ａ）の画像（切り出された画像）は、２０８画素×２７０画素（Ｘ方向の画素数が２０８で、Ｙ方向の画素数が２７０）の画像となる。 In the input stroke data shown in FIG. 6, the maximum X value is 635 and the minimum X value is 427. Also, in the input stroke data shown in FIG. 6, the maximum Y value is 658 and the minimum Y value is 388. Therefore, the image in FIG. 8(a) (the cropped image) is an image of 208 pixels x 270 pixels (the number of pixels in the X direction is 208, and the number of pixels in the Y direction is 270).

図８（ｂ）は、図８（ａ）の画像を１００画素×１００画素の正規化領域（縦横比が１：１の領域）に変換した画像を示している。 Figure 8 (b) shows the image of Figure 8 (a) converted into a normalized region of 100 pixels x 100 pixels (a region with an aspect ratio of 1:1).

そして、図８（ｃ）は、図８（ｂ）の正規化領域の画像の各特徴点（各画素）に対応するデータ（第１の正規化ストロークデータ）を示す図となっている。 Figure 8 (c) is a diagram showing data (first normalized stroke data) corresponding to each feature point (each pixel) of the image in the normalized area of Figure 8 (b).

図８（ｂ）、図８（ｃ）に示すように、ストロークデータ処理部１１４１は、入力ストロークデータを、１００画素×１００画素の正規化領域に正規化する際に、上下左右の端に２画素の余白を設けるものとする。すなわち、ストロークデータ処理部１１４１は、実質的に入力ストロークデータを、９６画素×９６画素の領域に正規化する処理を行うことになる。図８の例では、ストロークデータ処理部１１４１は、入力ストロークデータの画像（２０８画素×２７０画素の画像）を９６画素×９６画素の画像（縦横比が１：１の画像）に変換する解像度変換処理を行った後における各特徴点の座標を取得することで、図８（ｃ）に示す第１の正規化ストロークデータを得ることができる。このとき、ストロークデータ処理部１１４１が行う解像度変換処理の具体的な手法については、種々の画像処理手法を適用することができるので、具体的な処理の過程については説明を省略する。 As shown in FIG. 8B and FIG. 8C, when the stroke data processing unit 1141 normalizes the input stroke data to a normalized area of 100 pixels x 100 pixels, it provides a margin of two pixels at the top, bottom, left, and right edges. That is, the stroke data processing unit 1141 essentially performs a process of normalizing the input stroke data to an area of 96 pixels x 96 pixels. In the example of FIG. 8, the stroke data processing unit 1141 obtains the coordinates of each feature point after a resolution conversion process that converts the image of the input stroke data (an image of 208 pixels x 270 pixels) into an image of 96 pixels x 96 pixels (an image with an aspect ratio of 1:1), thereby obtaining the first normalized stroke data shown in FIG. 8C. At this time, various image processing methods can be applied to the specific method of the resolution conversion process performed by the stroke data processing unit 1141, so a description of the specific process will be omitted.

次に、ストロークデータ処理部１１４１は、第１の正規化ストロークデータから、各特徴点で、隣接する特徴点との間が所定以上となるように特徴点を間引く処理を行う（Ｓ１０３）。 Next, the stroke data processing unit 1141 performs a process of thinning out feature points from the first normalized stroke data so that the distance between each feature point and adjacent feature points is equal to or greater than a predetermined distance (S103).

例えば、時系列ｔの特徴点とその次の時系列ｔ＋１の特徴点に基づいて以下の（７）式を計算し、成立する場合には、その２つの特徴点の間の距離は所定以下であると判断するようにしてもよい。 For example, the following formula (7) may be calculated based on the feature points of time series t and the feature points of the next time series t+1, and if the formula holds, it may be determined that the distance between the two feature points is equal to or smaller than a predetermined value.

（７）式において、時系列ｔの特徴点のｘ座標をｘ（ｔ）、ｙ座標をｙ（ｔ）とし、時系列ｔの次の時系列ｔ＋１の特徴点のｘ座標をｘ（ｔ＋１）、ｙ（ｔ＋１）としている。
また、（７）式において、ＳＩＺＥは画像全体の水平方向及び又は垂直方向の解像度（ここでは１００）が適用されるものとする。ここでは、特徴点を間引くか否か判定するための閾値（（７）式の右辺）としてＳＩＺＥ／１００＝１００／１０＝１０を適用するものとして説明するが、この閾値は任意に設計した値を設定（例えば、実験等により好適な値を探索して設定）するようにしてもよい。 In equation (7), the x-coordinate of a feature point in time series t is x(t) and the y-coordinate is y(t), and the x-coordinate of a feature point in the time series t+1 that follows time series t is x(t+1) and y(t+1).
In addition, in formula (7), SIZE is the horizontal and/or vertical resolution of the entire image (here, 100). Here, a description will be given of applying SIZE/100=100/10=10 as the threshold value (right side of formula (7)) for determining whether or not to thin out feature points, but this threshold value may be set to an arbitrarily designed value (for example, a suitable value may be searched for and set through experiments, etc.).

ここでは、ストロークデータ処理部１１４１は、各時系列の特徴点について以下の（７）式を当てはめて計算し、成立する場合に次の時系列の特徴点を間引くものとする。ストロークデータ処理部１１４１は、時系列ｔ＋１の特徴点を間引いた場合、その次の時系列の特徴点を時系列ｔ＋１として再度以下の（７）式を当てはめて計算して成立する場合間引く処理を繰り返すようにしてもよい。 Here, the stroke data processing unit 1141 applies the following formula (7) to the feature points of each time series to perform calculations, and if the formula holds, thins out the feature points of the next time series. When the stroke data processing unit 1141 has thinned out the feature points of time series t+1, it may also calculate the feature points of the next time series as time series t+1 by applying the following formula (7) again, and repeat the thinning process if the formula holds.

また、ストロークデータ処理部１１４１は、画ごとに間引きの処理を行う。つまり、ストロークデータ処理部１１４１は、画ごとに、全ての時系列の特徴点について以下の（７）式が成立しない状態となるまで（全ての特徴点の間の距離が所定以上となるめで）、間引きの処理を繰返し行うようにしてもよい。例えば、ストロークデータ処理部１１４１は、図８（ｃ）に示す第１の正規化ストロークデータのうち、１画目のデータ（時系列番号１～１３のデータ）を抜き出して上記の間引きの処理を行った後、２画目のデータ（時系列番号１４～２２のデータ）を抜き出して上記の間引きの処理を行うようにしてもよい。
｛ｘ（ｔ＋１）－ｘ（ｔ）｝^２＋｛ｙ（ｔ＋１）－ｙ（ｔ）｝^２＞ＳＩＺＥ／１０ …（７） The stroke data processing unit 1141 may also perform the thinning process for each stroke. In other words, the stroke data processing unit 1141 may repeat the thinning process for each stroke until the following formula (7) is not satisfied for all feature points in time series (because the distances between all feature points are equal to or greater than a predetermined value). For example, the stroke data processing unit 1141 may extract the data of the first stroke (data of time series numbers 1 to 13) from the first normalized stroke data shown in FIG. 8C and perform the above-mentioned thinning process, and then extract the data of the second stroke (data of time series numbers 14 to 22) and perform the above-mentioned thinning process.
{x(t+1)-x(t)} ² +{y(t+1)-y(t)} ² >SIZE/10...(7)

図９は、図８（ｂ）、図８（ｃ）に示す第１の正規化ストロークデータから、上記の処理により特徴点を間引いた状態について示した図である。 Figure 9 shows the state after feature points have been thinned out from the first normalized stroke data shown in Figures 8(b) and 8(c) using the above process.

電子ペン３０を用いた文字入力の場合、非コンタクト状態からコンタクト状態となったときに、電子ペン３０のペン先３１がディスプレイパネル２１上ですべる等して密集した特徴点が発生するが、この密集した特徴点は文字の形態を構成するものでないため、文字認識用のデータとしてはノイズとなる。そのため、ストロークデータ処理部１１４１では、入力ストロークデータについて上記のような間引き処理を行うことにより、ノイズを除去して学習精度及び認識精度を向上させている。 When inputting characters using the electronic pen 30, when the non-contact state changes to a contact state, the pen tip 31 of the electronic pen 30 slides on the display panel 21, generating dense feature points. However, since these dense feature points do not form the shape of a character, they become noise in the data for character recognition. Therefore, the stroke data processing unit 1141 performs the above-mentioned thinning process on the input stroke data to remove noise and improve learning accuracy and recognition accuracy.

図９（ａ）は、間引き処理後の第１の正規化ストロークデータを示した画像である。図９（ｂ）は、間引き処理後の第１の正規化ストロークデータを示している。 Figure 9(a) is an image showing the first normalized stroke data after the thinning process. Figure 9(b) shows the first normalized stroke data after the thinning process.

図９に示す第１の正規化ストロークデータでは、間引き処理前の２２個から１６個まで特徴点が間引かれている。なお、図９に示すように、ストロークデータ処理部１１４１は、第１の正規化ストロークデータから特徴点を間引く際に、時系列番号に抜けがないように降りなおすものとする。 In the first normalized stroke data shown in FIG. 9, the feature points have been thinned from 22 before the thinning process to 16. As shown in FIG. 9, when the stroke data processing unit 1141 thins out feature points from the first normalized stroke data, it renumbers the time series numbers so that there are no gaps.

そして、図９に示す正規化ストロークデータでは、時系列番号１～９の特徴点が１画目の特徴点であり、時系列番号１０～１６の特徴点が２画目の特徴点となっている。ストロークデータ処理部１１４１は、間引きの前後において、各画の特徴点のリストを管理しているものとする。 In the normalized stroke data shown in FIG. 9, the feature points with time series numbers 1 to 9 are the feature points of the first stroke, and the feature points with time series numbers 10 to 16 are the feature points of the second stroke. The stroke data processing unit 1141 manages a list of the feature points of each stroke before and after thinning.

次に、ストロークデータ処理部１１４１は、間引きした第１の正規化ストロークデータに基づき、非コンタクト状態の区間（各画の間の区間）も含めて、各特徴点間の距離が所定以下となるように特徴点を補間する処理（以下、「特徴点補間処理」とも呼ぶ）を行ったデータ（以下、「第２の正規化ストロークデータ」と呼ぶ）を生成する（Ｓ１０４）。 Next, the stroke data processing unit 1141 generates data (hereinafter referred to as "second normalized stroke data") by performing a process of interpolating feature points (hereinafter referred to as "feature point interpolation process") based on the thinned-out first normalized stroke data so that the distance between each feature point is equal to or less than a predetermined value, including non-contact sections (sections between each stroke) (S104).

例えば、時系列ｔの特徴点とその次の時系列ｔ＋１の特徴点に基づいて以下の（８）式を計算し、成立する場合には、その２つの特徴点の間の位置（例えば、中間位置）に新たな特徴点（２つの特徴点の間の時系列の特徴点）を補間（追加）するようにしてもよい。
｛ｘ（ｔ＋１）－ｘ（ｔ）｝^２＋｛ｙ（ｔ＋１）－ｙ（ｔ）｝^２＞ＳＩＺＥ／１０ …（８） For example, the following formula (8) may be calculated based on a feature point of a time series t and a feature point of the next time series t+1. If the formula is satisfied, a new feature point (a feature point in a time series between two feature points) may be interpolated (added) to a position between the two feature points (e.g., an intermediate position).
{x(t+1)-x(t)} ² +{y(t+1)-y(t)} ² >SIZE/10...(8)

ここでは、サンプル位置を間引くか否か判定するための閾値（（８）式の右辺）としてＳＩＺＥ／１００＝１００／１０＝１０を適用するものとして説明するが、この閾値は任意に設計した値を設定（例えば、実験等により好適な値を探索して設定）するようにしてもよい。 Here, we will explain how to apply SIZE/100 = 100/10 = 10 as the threshold value (right side of equation (8)) for determining whether or not to thin out sample positions, but this threshold value may be set to an arbitrarily designed value (for example, by searching for and setting a suitable value through experiments, etc.).

ここでは、ストロークデータ処理部１１４１は、各時系列のサンプル位置について（８）式を当てはめて計算し、成立する場合に次の時系列との間に新たな特徴点を補間するものとする。 Here, the stroke data processing unit 1141 applies equation (8) to the sample positions of each time series to perform calculations, and if the equation holds, it interpolates a new feature point between the sample positions and the next time series.

この場合新たに追加する特徴点のｘ座標を「｛ｘ（ｔ＋１）＋ｘ（ｔ）｝／２」（つまりｘ（ｔ＋１）とｘ（ｔ）の平均値）とし、ｙ座標を「｛ｙ（ｔ＋１）＋ｙ（ｔ）｝／２」（つまりｙ（ｔ＋１）とｙ（ｔ）の平均値）とするようにしてもよい。 In this case, the x coordinate of the newly added feature point may be set to "{x(t+1)+x(t)}/2" (i.e., the average value of x(t+1) and x(t)), and the y coordinate may be set to "{y(t+1)+y(t)}/2" (i.e., the average value of y(t+1) and y(t)).

ストロークデータ処理部１１４１は、画ごとに全ての時系列のサンプル位置について（８）式が成立しない状態となるまで、特徴点補間処理を再帰的に繰返し行う。例えば、ストロークデータ処理部１１４１は、時系列ｔの特徴点と時系列ｔ＋１の特徴点との間に新たな特徴点を補間した場合、追加した特徴点の時系列をｔ＋１として再度（８）式を当てはめて計算して、成立する場合新たな特徴点を補間する処理を繰り返すようにしてもよい。 The stroke data processing unit 1141 recursively performs feature point interpolation processing for each stroke until equation (8) is no longer satisfied for all sample positions in the time series. For example, when the stroke data processing unit 1141 interpolates a new feature point between a feature point in time series t and a feature point in time series t+1, it may perform calculations by applying equation (8) to the time series of the added feature point as t+1 again, and if equation (8) is satisfied, repeat the process of interpolating the new feature point.

例えば、ストロークデータ処理部１１４１は、図８（ｃ）に示す第１の正規化ストロークデータのうち、１画目のデータ（時系列番号１～１３のデータ）を抜き出して上記の特徴点補間処理を行い、さらに２画目のデータ（時系列番号１４～２２のデータ）を抜き出して上記の特徴点補間処理を行う。 For example, the stroke data processing unit 1141 extracts the first stroke data (data with time series numbers 1 to 13) from the first normalized stroke data shown in FIG. 8(c) and performs the above-mentioned feature point interpolation process, and further extracts the second stroke data (data with time series numbers 14 to 22) and performs the above-mentioned feature point interpolation process.

そして、ストロークデータ処理部１１４１は、画と画の間についても上記の特徴点補間処理を行って、所定間隔ごとの特徴点で埋める処理を行う。例えば、ストロークデータ処理部１１４１は、１画目の最後の時系列の特徴点と２画目の最初の時系列の特徴点との間に、上記の特徴点補間処理を行うことにより、１画目の末尾と２画目の先頭との間を所定間隔の特徴点で埋める。つまり、ストロークデータ処理部１１４１は、１画目の末尾と２画目の先頭との間を一つの画として特徴点の追加処理を行うことになる。 The stroke data processing unit 1141 also performs the above-mentioned feature point interpolation process between strokes, filling in the spaces between strokes with feature points at a predetermined interval. For example, the stroke data processing unit 1141 performs the above-mentioned feature point interpolation process between the last time-series feature point of the first stroke and the first time-series feature point of the second stroke, thereby filling in the space between the end of the first stroke and the beginning of the second stroke with feature points at a predetermined interval. In other words, the stroke data processing unit 1141 performs feature point addition process on the space between the end of the first stroke and the beginning of the second stroke as one stroke.

さらに、ストロークデータ処理部１１４１は、それぞれの特徴点に対してペン先状態の項目の情報を付与する。具体的には、ストロークデータ処理部１１４１は、コンタクト状態の特徴点（各画に属する特徴点）のペン先情報にコンタクト状態を表す「１」を付与し、非コンタクト状態（画の間の区間の特徴点）のペン先情報に非コンタクト状態を表す「０」を付与する。 The stroke data processing unit 1141 further assigns information on the pen tip state to each feature point. Specifically, the stroke data processing unit 1141 assigns "1" representing a contact state to the pen tip information of feature points in a contact state (feature points belonging to each stroke), and assigns "0" representing a non-contact state to pen tip information of feature points in a non-contact state (feature points in the section between strokes).

図１０は、図９に示す間引きされた第１の正規化ストロークデータに対して、上記の特徴点補間処理を行い、第２の正規化ストロークデータを取得する処理について示した図である。 Figure 10 shows the process of performing the above-mentioned feature point interpolation process on the thinned first normalized stroke data shown in Figure 9 to obtain second normalized stroke data.

図１０では、１画目として時系列番号１～４０の特徴点が設定され、２画目として時系列番号６３～１０２の特徴点が設定されている。そして、図１０では、１画目と２画目の間の時系列番号４１～６２の特徴点が非コンタクト状態の区間として設定されている。 In FIG. 10, feature points with time series numbers 1 to 40 are set as the first stroke, and feature points with time series numbers 63 to 102 are set as the second stroke. In FIG. 10, feature points with time series numbers 41 to 62 between the first and second strokes are set as the non-contact state section.

次に、ストロークデータ処理部１１４１は、第２の正規化ストロークデータから、Ｎ＋１個の特徴点を抽出したデータ（以下、「第３の正規化ストロークデータ」と呼ぶ）を生成する（Ｓ１０５）。 Next, the stroke data processing unit 1141 generates data (hereinafter referred to as "third normalized stroke data") by extracting N+1 feature points from the second normalized stroke data (S105).

ここでは、第２の正規化ストロークデータの特徴点の数を「Ｃ」と表すものとする。 Here, the number of feature points in the second normalized stroke data is represented as "C".

ストロークデータ処理部１１４１は、Ｃ＞Ｎ＋１の場合、第２の正規化ストロークデータからＮ＋１個の特徴点を抽出（選択）して第３の正規化ストロークデータを生成する。
また、ストロークデータ処理部１１４１は、Ｎ＋１＞Ｃの場合、第２の正規化ストロークデータの一部又は全部の特徴点について複数回選択することで、Ｎ＋１個の特徴点のデータを抽出し、第３の正規化ストロークデータを生成する。第３の正規化ストロークデータでは、可能な限り各特徴点の間の距離の偏りが少ないことが望ましい。ストロークデータ処理部１１４１において、第２の正規化ストロークデータからＮ＋１個の特徴点を抽出する方式については限定されないものであるが、例えば以下のような処理を行うことで、第３の正規化ストロークデータにおける各特徴点間の距離の偏りを低減することができる。 If C>N+1, the stroke data processing unit 1141 extracts (selects) N+1 feature points from the second normalized stroke data to generate third normalized stroke data.
Furthermore, when N+1>C, the stroke data processing unit 1141 selects some or all of the feature points of the second normalized stroke data multiple times to extract data of N+1 feature points and generate third normalized stroke data. It is desirable that the third normalized stroke data have as little bias in the distances between the feature points as possible. The stroke data processing unit 1141 is not limited to a particular method for extracting N+1 feature points from the second normalized stroke data, but it is possible to reduce bias in the distances between the feature points in the third normalized stroke data by performing the following process, for example.

ここでは、ストロークデータ処理部１１４１は、第３の正規化ストロークデータのｉ番目の特徴点（ｉは１～Ｎのいずれかの整数）として、第２の正規化ストロークデータのＤ（ｉ）番目の時系列の特徴点を選択するものとする。Ｄ（ｉ）としては、例えば以下の（９）式を適用することができる。つまり、Ｄ（ｉ）は、Ｃ／（Ｎ＋１）にｉをかけたものから小数点以下を切り捨てた整数となる。
Ｄ（ｉ）＝［｛Ｃ／（Ｎ＋１）｝＊（ｉ―１）］＋１ …（９） Here, the stroke data processing unit 1141 selects the D(i)th time-series feature point of the second normalized stroke data as the i-th feature point (i is an integer from 1 to N) of the third normalized stroke data. For example, the following formula (9) can be applied as D(i). In other words, D(i) is an integer obtained by rounding down the decimal point from C/(N+1) multiplied by i.
D(i)=[{C/(N+1)}*(i-1)]+1...(9)

図１１は、図１０に示す第２の正規化ストロークデータからＮ＋１個の特徴点を抽出した結果得られる第３の正規化ストロークデータの例について示した図である。 Figure 11 shows an example of third normalized stroke data obtained by extracting N+1 feature points from the second normalized stroke data shown in Figure 10.

図１２は、図１１に示す第３の正規化ストロークデータを画像の形式で表した図である。 Figure 12 shows the third normalized stroke data shown in Figure 11 in the form of an image.

例えば、図１０に示す第２の正規化ストロークデータは１０２個の特徴点から構成されているので、ここから（９）式を用いて１０１個を抽出することになる。例えば、Ｄ（１）＝１、Ｄ（２）＝２、・・・、Ｄ（９９）＝９９、Ｄ（１００）＝１００、Ｄ（１０１）＝１０２となるので、第２の正規化ストロークデータのうち１０１番目の特徴点のみ選択（抽出）されないこと（スキップされること）になる。 For example, the second normalized stroke data shown in FIG. 10 is composed of 102 feature points, so 101 are extracted from it using formula (9). For example, D(1) = 1, D(2) = 2, ..., D(99) = 99, D(100) = 100, D(101) = 102, so only the 101st feature point of the second normalized stroke data is not selected (extracted) (skipped).

次に、Ｎ＋１＜Ｃの場合の例について説明する。仮にＣ＝３０とすると、Ｄ（１）＝１、Ｄ（２）＝１、Ｄ（２）＝１、Ｄ（３）＝１、Ｄ（４）＝２、・・・、Ｄ（９９）＝２９、Ｄ（１００）＝２９、Ｄ（１０１）＝３０のようになる。 Next, we will explain an example where N+1<C. If C=30, then D(1)=1, D(2)=1, D(2)=1, D(3)=1, D(4)=2, ..., D(99)=29, D(100)=29, D(101)=30.

以上のように（９）式を用いることで、効率的に第３の正規化ストロークデータのｉ番目の特徴点を、第２の正規化ストロークデータからピックアップすることができる。 By using equation (9) as described above, it is possible to efficiently pick up the i-th feature point of the third normalized stroke data from the second normalized stroke data.

次に、ストロークデータ処理部１１４１は、第２の正規化ストロークデータについて、入力ベクトルデータに変換して取得する（Ｓ１０５）。 Next, the stroke data processing unit 1141 converts the second normalized stroke data into input vector data and obtains it (S105).

ストロークデータ処理部１１４１は、第２の正規化ストロークデータを構成する各時系列のＸ座標、Ｙ座標、及びペン先状態を、それぞれ各時系列のＶＸ、ＶＹ、ＶＴに設定する。そして、ストロークデータ処理部１１４１は、上記の通り、ＶＸ（ｔ）、ＶＸ（ｔ＋１）、ＶＹ（ｔ）、及びＶＹ（ｔ＋１）に基づいて、ＶＲ（ｔ）、ＶＵ（ｔ）、ＶＬ（ｔ）、ＶＤ（ｔ）を得ることができる。これにより、ストロークデータ処理部１１４１は、Ｖ（１）～Ｖ（１００）を得ることができる。 The stroke data processing unit 1141 sets the X coordinate, Y coordinate, and pen tip state of each time series constituting the second normalized stroke data to VX, VY, and VT of each time series, respectively. Then, as described above, the stroke data processing unit 1141 can obtain VR(t), VU(t), VL(t), and VD(t) based on VX(t), VX(t+1), VY(t), and VY(t+1). This allows the stroke data processing unit 1141 to obtain V(1) to V(100).

図１３は、図１１に示す第２の正規化ストロークデータに基づいて取得された入力ベクトルデータの例について示している。 Figure 13 shows an example of input vector data obtained based on the second normalized stroke data shown in Figure 11.

次に、ストロークデータ処理部１１４１が、入力ストロークデータからオフライン文字認識処理用の入力画像データを生成する処理について図１４のフローチャートを用いて説明する。 Next, the process in which the stroke data processing unit 1141 generates input image data for offline character recognition processing from input stroke data will be explained using the flowchart in FIG. 14.

まず、ストロークデータ処理部１１４１が、１文字分の入力ストロークデータを保持したものとする（Ｓ２０１）。 First, the stroke data processing unit 1141 is assumed to hold input stroke data for one character (S201).

次に、ストロークデータ処理部１１４１は、入力ストロークデータについて所定の解像度の正規化領域に丁度おさまるように正規化したデータ（以下、「第４の正規化ストロークデータ」と呼ぶ）を取得する（Ｓ２０２）。 Next, the stroke data processing unit 1141 acquires data (hereinafter referred to as "fourth normalized stroke data") that is normalized so that the input stroke data fits exactly within the normalized area of a specified resolution (S202).

図１５は、ストロークデータ処理部１１４１が、オンライン文字認識処理用の入力ベクトルデータを生成する過程の正規化処理について示した図である。 Figure 15 shows the normalization process performed by the stroke data processing unit 1141 when generating input vector data for online character recognition processing.

図１５（ａ）は、図６に示す入力ストロークデータのうち、特徴点（サンプル位置）が描画される領域のみを切り出した画像となっている。 Figure 15 (a) is an image that has been cut out from the input stroke data shown in Figure 6, showing only the area where the feature points (sample positions) are drawn.

図１５（ｂ）は、図１５（ａ）の画像を６４画素×６４画素の正規化領域（縦横比が１：１の領域）に変換した画像を示している。 Figure 15(b) shows the image of Figure 15(a) converted into a normalized region of 64 pixels x 64 pixels (region with an aspect ratio of 1:1).

そして、図１５（ｃ）は、図１５（ｂ）の正規化領域の画像の各特徴点（各画素）に対応する正規化ストロークデータ（第４の正規化ストロークデータ）を示す図となっている。 Figure 15(c) is a diagram showing normalized stroke data (fourth normalized stroke data) corresponding to each feature point (each pixel) of the image in the normalized area of Figure 15(b).

図１５（ｂ）、図１５（ｃ）に示すように、ストロークデータ処理部１１４１は、入力ストロークデータを、６４画素×６４画素の正規化領域に正規化する際に、上下左右の端に２画素の余白を設けるものとする。すなわち、ストロークデータ処理部１１４１は、実質的に入力ストロークデータを、６０画素×６０画素の領域に正規化する処理を行うことになる。図１５の例では、ストロークデータ処理部１１４１は、入力ストロークデータの画像（２０８画素×２７０画素の画像）を６０画素×６０画素の画像（縦横比が１：１の画像）に変換する解像度変換処理を行った後における各特徴点の座標を取得することで、図１５（ｃ）に示す第４の正規化ストロークデータを得ることができる。このとき、ストロークデータ処理部１１４１が行う解像度変換処理の具体的な手法については、種々の画像処理手法を適用することができるので、具体的な処理の過程については説明を省略する。 As shown in FIG. 15(b) and FIG. 15(c), when the stroke data processing unit 1141 normalizes the input stroke data to a normalized area of 64 pixels x 64 pixels, it provides a margin of two pixels at the top, bottom, left, and right edges. That is, the stroke data processing unit 1141 essentially performs a process of normalizing the input stroke data to an area of 60 pixels x 60 pixels. In the example of FIG. 15, the stroke data processing unit 1141 obtains the coordinates of each feature point after a resolution conversion process that converts the image of the input stroke data (an image of 208 pixels x 270 pixels) into an image of 60 pixels x 60 pixels (an image with an aspect ratio of 1:1), thereby obtaining the fourth normalized stroke data shown in FIG. 15(c). At this time, various image processing methods can be applied to the specific method of the resolution conversion process performed by the stroke data processing unit 1141, so a description of the specific process will be omitted.

次に、ストロークデータ処理部１１４１は、第４の正規化ストロークデータから、各特徴点で、隣接する特徴点との間が所定以上となるように特徴点を間引く処理を行う（Ｓ２０３）。 Next, the stroke data processing unit 1141 performs a process of thinning out feature points from the fourth normalized stroke data so that the distance between each feature point and adjacent feature points is equal to or greater than a predetermined distance (S203).

ストロークデータ処理部１１４１が、第４の正規化ストロークデータから特徴点を間引く処理については、上述の第１の正規化ストロークデータから特徴点を間引く処理とほぼ同様の処理を適用するようにしてもよい。例えば、ストロークデータ処理部１１４１は、画ごとに、全ての時系列の特徴点について（７）式が成立しない状態となるまで（全ての特徴点の間の距離が所定以上となるめで）、間引きの処理を繰返し行うようにしてもよい。このとき、ストロークデータ処理部１１４１は、（７）式を適用する際のＳＩＺＥを第４の正規化ストロークデータの解像度と同じく６４に設定することが望ましい。 The stroke data processing unit 1141 may apply a process for thinning out feature points from the fourth normalized stroke data in a manner similar to the process for thinning out feature points from the first normalized stroke data described above. For example, the stroke data processing unit 1141 may repeat the thinning process for each stroke until equation (7) is no longer satisfied for all time-series feature points (because the distance between all feature points is equal to or greater than a predetermined value). At this time, it is desirable for the stroke data processing unit 1141 to set SIZE when applying equation (7) to 64, which is the same as the resolution of the fourth normalized stroke data.

次に、ストロークデータ処理部１１４１は、特徴点の間引きを行った後の第４の正規化ストロークデータに基づいて、入力画像データを取得する（Ｓ２０４）。 Next, the stroke data processing unit 1141 acquires input image data based on the fourth normalized stroke data after thinning out the feature points (S204).

例えば、ストロークデータ処理部１１４１は、６４画素×６４画素の画像領域に、間引き処理を行った後の第４の正規化ストロークデータから各画の特徴点のデータを取得し、上記の画像領域で各画について特徴点間を結ぶ線を描画することで入力画像データを取得するようにしてもよい。 For example, the stroke data processing unit 1141 may obtain data on the feature points of each stroke from the fourth normalized stroke data after thinning processing in an image area of 64 pixels x 64 pixels, and obtain input image data by drawing lines connecting the feature points of each stroke in the above image area.

図１６は、図１５に示す第４の正規化ストロークデータに基づいて得られる入力画像データの画像について示した図である。 Figure 16 shows an image of the input image data obtained based on the fourth normalized stroke data shown in Figure 15.

次に、文字認識処理部１１４が学習モードで動作する場合の処理について、図１７を用いて説明する。 Next, the processing performed when the character recognition processing unit 114 operates in learning mode will be explained using FIG. 17.

ここでは、文字認識処理部１１４が学習モードで動作しているときに、コンテンツ処理部１１１から文字認識処理部１１４に、学習用の入力ストロークデータ（１文字分の入力ストロークデータ）と、当該入力ストロークデータの文字に対応する正解ラベルのセットが供給されたものとする。 Here, it is assumed that when the character recognition processing unit 114 is operating in the learning mode, the content processing unit 111 supplies the character recognition processing unit 114 with input stroke data for learning (input stroke data for one character) and a set of correct labels corresponding to the characters in the input stroke data.

まず、ストロークデータ処理部１１４１は、供給された入力ストロークデータに基づいて入力ベクトルデータと入力画像データを生成し、それぞれオンラインＡＩ処理部１１４２とオフラインＡＩ処理部１１４３に供給する（Ｓ３０１）。 First, the stroke data processing unit 1141 generates input vector data and input image data based on the supplied input stroke data, and supplies them to the online AI processing unit 1142 and the offline AI processing unit 1143, respectively (S301).

学習モードで動作している文字認識処理部１１４のオンラインＡＩ処理部１１４２では、供給された入力ベクトルデータと正解ラベルに基づいて学習処理が行われる（Ｓ３０２）。 In the online AI processing unit 1142 of the character recognition processing unit 114 operating in learning mode, learning processing is performed based on the supplied input vector data and correct answer labels (S302).

また、学習モードで動作している文字認識処理部１１４のオフラインＡＩ処理部１１４３では、供給された入力画像データと正解ラベルに基づいて学習処理が行われる（Ｓ３０３）。 In addition, in the offline AI processing unit 1143 of the character recognition processing unit 114 operating in the learning mode, learning processing is performed based on the supplied input image data and correct answer labels (S303).

以上のように、文字認識処理部１１４では、コンテンツ処理部１１１から学習用のデータが供給される度に、当該学習用データセットを用いた学習処理が行われる。 As described above, in the character recognition processing unit 114, each time learning data is supplied from the content processing unit 111, a learning process is performed using the learning data set.

次に、文字認識処理部１１４が認識処理モードで動作する場合の処理について、図１８を用いて説明する。 Next, the processing performed when the character recognition processing unit 114 operates in the recognition processing mode will be explained using FIG. 18.

ここでは、文字認識処理部１１４が文字認識モードで動作しているときに、コンテンツ処理部１１１から文字認識処理部１１４に、学習用の入力ストロークデータ（１文字分の入力ストロークデータ）が供給されたものとする。 Here, it is assumed that when the character recognition processing unit 114 is operating in character recognition mode, input stroke data for learning (input stroke data for one character) is supplied from the content processing unit 111 to the character recognition processing unit 114.

まず、ストロークデータ処理部１１４１は、供給された入力ストロークデータに基づいて入力ベクトルデータと入力画像データを生成し、それぞれオンラインＡＩ処理部１１４２とオフラインＡＩ処理部１１４３に供給する（Ｓ４０１）。 First, the stroke data processing unit 1141 generates input vector data and input image data based on the supplied input stroke data, and supplies them to the online AI processing unit 1142 and the offline AI processing unit 1143, respectively (S401).

文字認識モードで動作している文字認識処理部１１４のオンラインＡＩ処理部１１４２は、供給された入力ベクトルデータに基づいて、保持した学習モデルを用いた文字判定処理を行い、その判定結果（オンライン判定結果）について信頼度と共に文字認識結果出力部１１４４に供給する（Ｓ４０２）。 The online AI processing unit 1142 of the character recognition processing unit 114 operating in character recognition mode performs character determination processing using the stored learning model based on the supplied input vector data, and supplies the determination result (online determination result) together with the reliability to the character recognition result output unit 1144 (S402).

文字認識モードで動作している文字認識処理部１１４のオフラインＡＩ処理部１１４３は、供給された入力画像データに基づいて、保持した学習モデルを用いた文字判定処理を行い、その判定結果（オフライン判定結果）について信頼度と共に文字認識結果出力部１１４４に供給する（Ｓ４０３）。 The offline AI processing unit 1143 of the character recognition processing unit 114 operating in character recognition mode performs character determination processing using the stored learning model based on the supplied input image data, and supplies the determination result (offline determination result) together with the reliability to the character recognition result output unit 1144 (S403).

次に、文字認識結果出力部１１４４は、オンライン判定結果とオフライン判定結果の信頼度を比較して、信頼度の大きい方の判定結果を選択し（Ｓ４０４）、出力する（Ｓ４０５）。 Next, the character recognition result output unit 1144 compares the reliability of the online judgment result and the offline judgment result, selects the judgment result with the greater reliability (S404), and outputs it (S405).

以上のように、文字認識処理部１１４では、コンテンツ処理部１１１から供給される入力ストロークデータが供給される度に、判定結果を出力する。 As described above, the character recognition processing unit 114 outputs a determination result each time input stroke data is supplied from the content processing unit 111.

（Ａ－３）第１の実施形態の効果
この実施形態によれば、以下のような効果を奏することができる。 (A-3) Advantages of the First Embodiment According to this embodiment, the following advantages can be achieved.

（Ａ－３－１）まず、発明者が、認識処理システム１の文字認識処理部１１４を実際に構築して、学習処理及び文字認識処理を行った場合における文字認識精度（判定結果の正解率）について実験（以下、「本実験」と呼ぶ）を行ったので、本実験の内容及び結果について以下に記す。 (A-3-1) First, the inventor actually constructed the character recognition processing unit 114 of the recognition processing system 1 and conducted an experiment (hereinafter referred to as "this experiment") on the character recognition accuracy (correctness rate of the judgment result) when learning processing and character recognition processing were performed. The contents and results of this experiment are described below.

本実験では、「カタカナ」、「ひらがな」、及び「ＪＩＳ第１水準の漢字」の文字（計３１０７種類の文字）をサンプルの書体（以下、「サンプル書体」と呼ぶ）として学習処理及び認識処理を行った。本実験では、サンプル書体１文字あたり１６０サンプルの入力ストロークデータ（人間が電子ペン３０を用いてペンタブレット２０に入力した際の入力ストロークデータ）と正解ラベルを用意して、学習モードで動作する文字認識処理部１１４に供給した。これにより、各サンプル書体の各サンプルについて上記の図１７のフローチャートの処理が行われ、オンラインＡＩ処理部１１４２及びオフラインＡＩ処理部１１４３でそれぞれ学習モデルが取得される。 In this experiment, learning and recognition processes were performed using "katakana," "hiragana," and "JIS level 1 kanji" characters (a total of 3,107 types of characters) as sample typefaces (hereinafter referred to as "sample typefaces"). In this experiment, 160 samples of input stroke data (input stroke data input by a human into the pen tablet 20 using the electronic pen 30) and correct answer labels were prepared for each character in the sample typeface, and supplied to the character recognition processing unit 114 operating in learning mode. As a result, the processing of the flowchart in Figure 17 above is performed for each sample of each sample typeface, and a learning model is obtained in the online AI processing unit 1142 and the offline AI processing unit 1143, respectively.

そして、本実験では、上記の文字認識処理部１１４で上記の学習処理が完了した後の認識精度を確認するために、サンプル書体１文字あたり１６０サンプルの入力ストロークデータと正解ラベルを用意して、認識処理モードで動作する文字認識処理部１１４に供給した。これにより、各サンプル書体の各サンプルについて上記の図１８のフローチャートの認識処理が行われた。このとき、発明者は、オンライ判定結果とオフライン判定結果のそれぞれ単独の正解率と、文字認識結果出力部１１４４から出力される判定結果（オンライン判定結果とオフライン判定結果を総合的に判断した結果）の正解率を確認した。本実験の結果、オンライン判定結果単独の正解率は約９４％であり、オフライン判定結果単独の正解率は約９０％であった。そして、文字認識結果出力部１１４４から出力される判定結果は約９８％であった。つまり、オンライン判定結果とオフライン判定結果の両方を考慮して最終的な判定結果を出力する方が認識精度は高いことがわかった。 In this experiment, in order to confirm the recognition accuracy after the learning process is completed in the character recognition processing unit 114, 160 samples of input stroke data and correct answer labels for each character of the sample font were prepared and supplied to the character recognition processing unit 114 operating in the recognition processing mode. As a result, the recognition process of the flowchart in FIG. 18 was performed for each sample of each sample font. At this time, the inventor confirmed the accuracy rate of each of the online judgment result and the offline judgment result, and the accuracy rate of the judgment result output from the character recognition result output unit 1144 (a result of comprehensively judging the online judgment result and the offline judgment result). As a result of this experiment, the accuracy rate of the online judgment result alone was about 94%, and the accuracy rate of the offline judgment result alone was about 90%. And the judgment result output from the character recognition result output unit 1144 was about 98%. In other words, it was found that the recognition accuracy is higher when the final judgment result is output taking into account both the online judgment result and the offline judgment result.

（Ａ－３－２）第１の実施形態の情報処理端末１０（文字認識処理部１１４）では、オンライン文字認識の学習処理及び文字認識処理で用いられる入力ベクトルデータについて、全てＮ個の特徴点となるように正規化して処理している。これにより、第１の実施形態の情報処理端末１０（文字認識処理部１１４）では、画数等に拘わらず、全ての文字について固定長の入力ベクトルデータを生成して処理できる。一般的に、ニューラルネットワークを用いた機械学習処理では、固定長のデータ入力を行うことが望ましいためである。
可変長の入力層に対応したＡＩエンジンを使用することや、最も長いデータ長に合わせた固定長の入力層を備えるニューラルネットワークで構成（固定長の入力層を実質的に可変長で使用）することも考えられるが、固定長の入出力で完結させる場合と比較して処理効率や認識精度が不安定となるおそれがある。 (A-3-2) In the information processing terminal 10 (character recognition processing unit 114) of the first embodiment, input vector data used in the online character recognition learning process and character recognition process is normalized and processed so that all of the input vector data has N feature points. This allows the information processing terminal 10 (character recognition processing unit 114) of the first embodiment to generate and process fixed-length input vector data for all characters, regardless of the number of strokes, etc. This is because, in general, in machine learning processes using neural networks, it is desirable to input fixed-length data.
It is possible to use an AI engine that supports variable-length input layers, or to configure a neural network with a fixed-length input layer that matches the longest data length (effectively using a fixed-length input layer with variable length), but there is a risk that processing efficiency and recognition accuracy will be unstable compared to when the process is completed with fixed-length input and output.

（Ａ－３－３）第１の実施形態の情報処理端末１０（文字認識処理部１１４）では、入力ベクトルデータを構成する特徴量として、座標（ＶＸ、ＶＹ）だけでなく、動きベクトル（ＶＲ、ＶＵ、ＶＬ、ＶＤ）と電子ペン３０のペン先の状態（コンタクト状態又は非コンタクト状態）に関するパラメータについても導入している。これにより、第１の実施形態の情報処理端末１０（文字認識処理部１１４）では、電子ペン３０のペン先が非コンタクト状態の間のストロークの情報も含めて特徴量として取得している。また、第１の実施形態の情報処理端末１０（文字認識処理部１１４）では、特許文献１の記載技術のように予め文字ごとに標準パターンを用意しておくことや、文字認識の際に全ての標準パターンとの特徴点の対応付けの処理等が不要である。以上のように、第１の実施形態の情報処理端末１０（文字認識処理部１１４）では、文字入力の際のストロークについて取得する情報量を増やしつつ効率的な文字認識処理を行うことができる。 (A-3-3) In the information processing terminal 10 (character recognition processing unit 114) of the first embodiment, not only coordinates (VX, VY) but also parameters related to the motion vector (VR, VU, VL, VD) and the state of the tip of the electronic pen 30 (contact state or non-contact state) are introduced as feature quantities constituting the input vector data. As a result, in the information processing terminal 10 (character recognition processing unit 114) of the first embodiment, information on strokes while the tip of the electronic pen 30 is in a non-contact state is also acquired as feature quantities. In addition, in the information processing terminal 10 (character recognition processing unit 114) of the first embodiment, it is not necessary to prepare a standard pattern for each character in advance as in the technology described in Patent Document 1, or to perform processing such as matching feature points with all standard patterns during character recognition. As described above, in the information processing terminal 10 (character recognition processing unit 114) of the first embodiment, efficient character recognition processing can be performed while increasing the amount of information acquired about strokes during character input.

（Ｂ－１）第２の実施形態
以下、本発明による認識処理装置、認識処理プログラム、認識処理方法、及び認識処理システムの第２の実施形態を、図面を参照しながら詳述する。この実施形態では、情報処理端末を本発明の認識処理装置として構成した例について説明する。 (B-1) Second embodiment Hereinafter, a second embodiment of the recognition processing device, the recognition processing program, the recognition processing method, and the recognition processing system according to the present invention will be described in detail with reference to the drawings. In this embodiment, an example in which an information processing terminal is configured as the recognition processing device of the present invention will be described.

第２の実施形態に係る文字も、図１を用いて示すことができる。なお、図１において括弧内の符号は、第２の実施形態でのみ用いられる符号である。 The characters according to the second embodiment can also be shown using FIG. 1. Note that the symbols in parentheses in FIG. 1 are symbols used only in the second embodiment.

以下、第２の実施形態の文字について第１の実施形態との差異を説明する。 The following explains the differences between the characters in the second embodiment and the first embodiment.

第２の実施形態の認識処理システム１Ａでは、情報処理端末１０が情報処理端末１０Ａに置き換わっている点で、第１の実施形態と異なっている。また、第２の実施形態の情報処理端末１０Ａでは、制御部１１が制御部１１Ａに置き換わっている。さらに、第２の実施形態の制御部１１Ａでは、コンテンツ処理部１１１と文字認識処理部１１４が、それぞれコンテンツ処理部１１１Ａと文字認識処理部１１４Ａに置き換わっている点で第１の実施形態と異なっている。さらにまた、第２の実施形態の文字認識処理部１１４Ａでは、ストロークデータ処理部１１４１がストロークデータ処理部１１４１Ａに置き換わっている点で第１の実施形態と異なっている。 The recognition processing system 1A of the second embodiment differs from the first embodiment in that the information processing terminal 10 is replaced with an information processing terminal 10A. Also, in the information processing terminal 10A of the second embodiment, the control unit 11 is replaced with a control unit 11A. Furthermore, in the control unit 11A of the second embodiment, the content processing unit 111 and the character recognition processing unit 114 are replaced with a content processing unit 111A and a character recognition processing unit 114A, respectively, which is different from the first embodiment. Furthermore, in the character recognition processing unit 114A of the second embodiment, the stroke data processing unit 1141 is replaced with a stroke data processing unit 1141A, which is different from the first embodiment.

ところで、第１の実施形態では、電子ペン３０のペン先状態は、電子ペン３０のペン先３１がペンタブレット２０のディスプレイパネル２１に接触しているコンタクト状態と、電子ペン３０のペン先３１がペンタブレット２０のディスプレイパネル２１に接触していない非コンタクト状態のいずれかであると説明したが、ペンタブレット２０と電子ペン３０に適用するデバイスの組合せによっては、非コンタクト状態でも電子ペン３０のペン先３１の横方向の位置を追跡可能なものが存在する。例えば、ワコム（商標登録）社製のペンタブレットとスタイラスペンの組合せを適用する場合、スタイラスペンが非コンタクト状態であっても、ペン先の高さが所定以下であればペンタブレットにおいてペン先の位置（横方向の位置）を追跡することができる。 In the first embodiment, the pen tip state of the electronic pen 30 is described as being either a contact state in which the pen tip 31 of the electronic pen 30 is in contact with the display panel 21 of the pen tablet 20, or a non-contact state in which the pen tip 31 of the electronic pen 30 is not in contact with the display panel 21 of the pen tablet 20. However, depending on the combination of devices applied to the pen tablet 20 and the electronic pen 30, there are some that are capable of tracking the lateral position of the pen tip 31 of the electronic pen 30 even in a non-contact state. For example, when applying a combination of a pen tablet and a stylus pen manufactured by Wacom (registered trademark), even if the stylus pen is in a non-contact state, the position (lateral position) of the pen tip can be tracked on the pen tablet as long as the height of the pen tip is equal to or less than a predetermined value.

そこで、この実施形態においては、ペンタブレット２０において、電子ペン３０のペン先３１が非コンタクト状態であっても、ペン先３１の高さが所定以下であればペン先３１の横方向の位置（座標）を検出可能な構成であるものとして説明する。そして第２の実施形態では、電子ペン３０のペン先状態が非コンタクト状態であり、かつ、ペンタブレット２０でペン先３１の横方向の位置を追跡可能である場合、その状態（ペン先状態）を「ホバー状態」と呼ぶものとする。また、第２の実施形態では、電子ペン３０のペン先状態が非コンタクト状態であり、かつ、ペンタブレット２０でペン先３１の横方向の位置が追跡できない場合、その状態（ペン先状態）を「ロス状態」と呼ぶものとする。 Therefore, in this embodiment, the pen tablet 20 is described as being configured to be able to detect the lateral position (coordinates) of the pen tip 31 even when the pen tip 31 of the electronic pen 30 is in a non-contact state, as long as the height of the pen tip 31 is equal to or less than a predetermined value. In the second embodiment, when the pen tip state of the electronic pen 30 is in a non-contact state and the pen tablet 20 is able to track the lateral position of the pen tip 31, this state (pen tip state) is referred to as a "hover state." Also, in the second embodiment, when the pen tip state of the electronic pen 30 is in a non-contact state and the pen tablet 20 is unable to track the lateral position of the pen tip 31, this state (pen tip state) is referred to as a "loss state."

図１９、図２０は、ユーザが電子ペン３０を用いてペンタブレット２０に、画数として２画である漢字（例えば、「八」等）を描いた場合における時系列ごとのペン先３１の高さ及びペン先状態を示したタイミングチャートである。 Figures 19 and 20 are timing charts showing the height of the pen tip 31 and the state of the pen tip over time when a user uses the electronic pen 30 to draw a Chinese character with two strokes (such as "八") on the pen tablet 20.

図１９では横軸を時刻ｔとし、縦軸を電子ペン３０のペン先３１の高さ（時系列ごとの高さ）を示している。図１９では、時刻ｔ０～ｔ２１の各時刻のペン先３１の位置を楔形（下側に先端を向けた楔型）のシンボルの先端の位置で表している。ここでは、時刻ｔ０～ｔ２１は、それぞれペンタブレット２０において電子ペン３０（ペン先３１）に対する座標等の検知（サンプリング）を行うタイミングを示しているものとして説明する。 In Figure 19, the horizontal axis represents time t, and the vertical axis represents the height of the pen tip 31 of the electronic pen 30 (height over time). In Figure 19, the position of the pen tip 31 at each of times t0 to t21 is represented by the position of the tip of a wedge-shaped symbol (a wedge-shaped symbol with the tip facing downward). Here, times t0 to t21 are described as indicating the timing at which the pen tablet 20 detects (samples) the coordinates, etc., of the electronic pen 30 (pen tip 31).

また、図１９では、ペン先３１がコンタクト状態となっている時刻のシンボルを黒色としており、ペン先３１がホバー状態となっている時刻のシンボルにハッチ（斜線）を付しており、ペン先３１がロス状態となっている時刻のシンボルの輪郭を破線としている。 In addition, in FIG. 19, the symbols of the times when the pen tip 31 is in contact are colored black, the symbols of the times when the pen tip 31 is in hover are hatched (diagonal lines), and the outlines of the symbols of the times when the pen tip 31 is in loss are dashed.

図１９において、時刻ｔ０～ｔ４は、１画目を描くことを示しており、電子ペン３０のペン先３１がコンタクト状態となっている。コンタクト状態の間は、ペンタブレット２０においてセンサにより、電子ペン３０（ペン先３１）の座標及び筆圧が取得される。 In FIG. 19, times t0 to t4 indicate drawing the first stroke, and the pen tip 31 of the electronic pen 30 is in a contact state. During the contact state, the coordinates and writing pressure of the electronic pen 30 (pen tip 31) are acquired by a sensor in the pen tablet 20.

図１９において、時刻ｔ５～ｔ７は、１画目を描き終わって電子ペン３０のペン先３１がホバー状態となっている。上述の通り、ワコム社製のペンタブレット等ではスタイラスペンがペンタブレットのパネルから一定距離浮いた状態でも座標を取得すること、及びスタイラスペンの存在を検出することができる。ホバー状態の場合、ワコム社製のペンタブレットでは、筆圧値として「０」（つまりホバーである値が示される）が取得されることになる。 In FIG. 19, from time t5 to t7, the first stroke is drawn and the pen tip 31 of the electronic pen 30 is in a hover state. As described above, with a Wacom pen tablet or the like, it is possible to acquire coordinates and detect the presence of a stylus pen even when the stylus pen is floating a certain distance above the panel of the pen tablet. In the case of a hover state, a pen pressure value of "0" (in other words, a value indicating a hover) is acquired with a Wacom pen tablet.

図１９において、時刻ｔ８～ｔ１１では、ユーザが電子ペン３０のペン先３１をさらに、ペンタブレット２０から離し、電子ペン３０（ペン先３１）がロス状態となっている。
ロス状態の間は、ペンタブレット２０において、電子ペン３０（ペン先３１）の座標を検知することはできない。 In FIG. 19, from time t8 to time t11, the user further removes the pen tip 31 of the electronic pen 30 from the pen tablet 20, and the electronic pen 30 (pen tip 31) is in a loss state.
During the loss state, the pen tablet 20 cannot detect the coordinates of the electronic pen 30 (pen tip 31 ).

図１９において、時刻ｔ１２～ｔ１３では、ユーザが２画目を描くために、再び電子ペン３０（ペン先３１）をペンタブレット２０に近づけたためホバー状態となっている。そして、続く時刻ｔ１４～ｔ１９では、ユーザが２画目を書き始めるため、電子ペン３０（ペン先３１）がペンタブレット２０に接触し、コンタクト状態となっている。 In FIG. 19, from time t12 to t13, the user again brings the electronic pen 30 (pen tip 31) close to the pen tablet 20 in order to draw the second stroke, resulting in a hover state. Then, from the following time t14 to t19, the user starts drawing the second stroke, so the electronic pen 30 (pen tip 31) comes into contact with the pen tablet 20, resulting in a contact state.

図２０の例では、タイミングｔ８～ｔ１２がロス状態ではなくホバー状態になっていること以外は図１９の例と同様である。 The example in Figure 20 is the same as the example in Figure 19, except that timings t8 to t12 are in a hover state rather than a loss state.

図２１は、図１９のタイミングチャートに示す各サンプル（特徴点）におけるペン先状態の集計結果について示している。図２１に示すように、図１９の例では、２０サンプル分の時間が経過する間に、ロス状態の期間（時刻ｔ８～ｔ１１）を除いて１６個のサンプル（電子ペン３０の座標）が得られている。また、図２１に示すように、図１９の例では、得られた１６個のサンプルのうち、コンタクト状態のサンプルが１１個で、ホバー状態のサンプルが５個となっている。 Figure 21 shows the tally of the pen tip state at each sample (characteristic point) shown in the timing chart of Figure 19. As shown in Figure 21, in the example of Figure 19, 16 samples (coordinates of the electronic pen 30) are obtained over the course of 20 samples, excluding the period of the loss state (times t8 to t11). Also, as shown in Figure 21, in the example of Figure 19, of the 16 samples obtained, 11 are samples in the contact state and 5 are samples in the hover state.

図２２は、図１９のタイミングチャートに示す各サンプル（特徴点）を示した図である。図２２では、ユーザが漢字の「八」を描いた場合の図となっている。 Figure 22 shows each sample (feature point) shown in the timing chart of Figure 19. Figure 22 shows the case where the user draws the kanji character "八".

図２２では、コンタクト状態の特徴点を円形（○）のシンボルで示し、ホバー状態の特徴点を三角形（△）のシンボルで示し、ロス状態の位置を四角形（□）のシンボルで示している。なお、ロス状態の場合、ペンタブレット２０で特徴点の座標を取得することはできないが、図２２では、仮に座標（ペン先３１の横方向の位置）が取得できたとした場合の位置を四角形のシンボルで図示している。また、以下では、ｔ０～ｔ２０の各特徴点のｘ座標をｘ０～ｘ２０、ｙ座標をｙ０～ｙ２０と表す。 In Figure 22, feature points in the contact state are indicated by circular (○) symbols, feature points in the hover state are indicated by triangular (△) symbols, and positions in the loss state are indicated by rectangular (□) symbols. Note that in the case of a loss state, the coordinates of feature points cannot be acquired by the pen tablet 20, but in Figure 22, the positions in the case where the coordinates (horizontal position of the pen tip 31) could be acquired are shown by rectangular symbols. In the following, the x coordinates of each feature point from t0 to t20 are represented as x0 to x20, and the y coordinates are represented as y0 to y20.

以上のように、第２の実施形態のペンタブレット２０では、電子ペン３０について、コンタクト状態、ホバー状態、ロス状態のいずれかを検知することが可能となっているものとする。 As described above, the pen tablet 20 of the second embodiment is capable of detecting the contact state, hover state, or loss state of the electronic pen 30.

第１の実施形態の情報処理端末１０（制御部１１）では、コンタクト状態のサンプル（座標）のみで構成された入力ストロークデータを正規化してオンラインＡＩ処理部１１４２用の入力ベクトルデータを生成していた。これに対して、第２の実施形態の情報処理端末１０Ａ（制御部１１Ａ）では、オンラインＡＩ処理部１１４２向けの入力ストロークデータにおいて、コンタクト状態、ホバー状態、及びロス状態の３つのステータスを反映可能である点で、第２の実施形態と異なっている。なお、第２の実施形態において、オフラインＡＩ処理部１１４３向けの入力ストロークデータの構成及び正規化の処理については第１の実施形態と同様の処理を適用できるため、ここでは説明を省略する。 In the information processing terminal 10 (control unit 11) of the first embodiment, input stroke data consisting only of contact state samples (coordinates) was normalized to generate input vector data for the online AI processing unit 1142. In contrast, the information processing terminal 10A (control unit 11A) of the second embodiment differs from the second embodiment in that the input stroke data for the online AI processing unit 1142 can reflect three statuses: contact state, hover state, and loss state. Note that in the second embodiment, the configuration and normalization process of the input stroke data for the offline AI processing unit 1143 can be applied in the same manner as in the first embodiment, so a description thereof will be omitted here.

次に、第２の実施形態におけるオンラインＡＩ処理部１１４２向けの入力ストロークデータの構成及び正規化の方法の例について説明する。第２の実施形態では、オンラインＡＩ処理部１１４２向けの入力ストロークデータの構成及び正規化方法として、例えば、以下の５つ方法が挙げられる。 Next, examples of methods for configuring and normalizing input stroke data for the online AI processing unit 1142 in the second embodiment will be described. In the second embodiment, the following five methods can be given as examples of methods for configuring and normalizing input stroke data for the online AI processing unit 1142.

［第１の正規化方法］
第１の正規化方法では、コンタクト状態のみの特徴点（座標）のみを用いて入力ストロークデータを構成して正規化し、Ｎ＋１個の特徴点に正規化する。すなわち、第１の正規化方法では、上記の図１９、図２１の例でいうと、ｔ０～ｔ４及びｔ１４～ｔ１９の計９サンプルの特徴点を用いて入力ストロークデータを表現することになる。この場合、Ｎ＝１００で正規化する場合を想定すると、入力ストロークデータの特徴点を約１１倍にアップサンプリングすることになる。第１の正規化方法では、第１の実施形態と同様に、入力ストロークデータについて画（ペン先状態がコンタクト状態の区間）ごとに処理（特徴点の間引き処理及び補間処理）して正規化することにより入力ベクトルデータを得ることができる。 [First normalization method]
In the first normalization method, input stroke data is constructed using only the feature points (coordinates) of the contact state, and normalized to N+1 feature points. That is, in the example of Fig. 19 and Fig. 21, the input stroke data is expressed using a total of nine sample feature points from t0 to t4 and t14 to t19. In this case, assuming that normalization is performed with N=100, the feature points of the input stroke data are upsampled by about 11 times. In the first normalization method, as in the first embodiment, the input stroke data is processed (thinning out and interpolating feature points) for each stroke (section in which the pen tip state is in contact) and normalized to obtain input vector data.

［第２の正規化方法］
第２の正規化方法では、コンタクト状態のサンプル（特徴点）にホバー状態のサンプルの一部（例えば、１サンプル分のみ）を抽出して加えた入力ストロークデータを構成して正規化する。第２の正規化方法では、例えば、ホバー状態のサンプルのうち、ロス状態の期間の前又は後のいずれかのタイミングのサンプル（例えば、図１９の例における時刻ｔ７又はｔ１２のサンプル）を抽出して入力ストロークデータに加えるようにしてもよいし、ロス状態の期間の前後両方のサンプル（例えば、図１９の例における時刻ｔ７とｔ１２のサンプル）を抽出して入力ストロークデータに加えるようにしてもよい。 [Second normalization method]
In the second normalization method, input stroke data is constructed by extracting and adding a portion (e.g., only one sample) of the hover state samples to the contact state samples (feature points), and normalized. In the second normalization method, for example, from the hover state samples, a sample at either the timing before or after the period of the loss state (e.g., the sample at time t7 or t12 in the example of FIG. 19) may be extracted and added to the input stroke data, or samples both before and after the period of the loss state (e.g., the samples at times t7 and t12 in the example of FIG. 19) may be extracted and added to the input stroke data.

図２３は、図１９のタイミングチャートに示す各サンプル（特徴点）について、第２の正規化方法を適用した場合における入力ストロークデータの例について示した図である。
図２３では、コンタクト状態のサンプルに、ロス状態の期間の前後両方のサンプル（時刻ｔ７とｔ１２のサンプル）を追加することで入力ストロークデータを構成した例について示している。図２３では、ペン先状態の項目でホバー状態を「０」で表している。 FIG. 23 is a diagram showing an example of input stroke data when the second normalization method is applied to each sample (feature point) shown in the timing chart of FIG.
23 shows an example of input stroke data constructed by adding samples from both before and after the period of the loss state (samples at times t7 and t12) to the contact state sample. In FIG. 23, the hover state is represented by "0" in the pen tip state item.

ストロークデータ処理部１１４１Ａでは、第２の正規化方法が適用される場合、画の間（コンタクト状態区間の間）の区間（ホバー状態及びロス状態により構成される１つの区間）についても、画（コンタクト状態の区間）と同様に正規化する処理を行って入力ベクトルデータを取得するようにしてもよい。例えば、図２３のような入力ストロークデータであった場合、ストロークデータ処理部１１４１Ａは、１画目の最後の特徴点と、２画目の最初の特徴点との間の区間について１つの画と同様の正規化処理を行うようにしてもよい。 When the second normalization method is applied, the stroke data processing unit 1141A may perform normalization processing on the section between strokes (between contact state sections) (a section consisting of a hover state and a loss state) in the same way as for strokes (contact state sections) to obtain input vector data. For example, in the case of input stroke data as shown in FIG. 23, the stroke data processing unit 1141A may perform normalization processing on the section between the last feature point of the first stroke and the first feature point of the second stroke in the same way as for one stroke.

ストロークデータ処理部１１４１Ａでは、第２の正規化方法が適用される場合、非コンタクト状態の期間（ホバー状態及びロス状態の期間）又は、当該非コンタクト状態の期間の直前もしくは直後に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「０」とするものとする。 When the stroke data processing unit 1141A applies the second normalization method, if feature points are interpolated during a non-contact state (hover state and loss state period) or immediately before or after the non-contact state period, the pen tip state of those feature points in the input vector data is set to "0".

［第３の正規化方法］
第３の正規化方法では、コンタクト状態のサンプルに全てのホバー状態のサンプルを加えた入力ストロークデータを構成して正規化する。例えば、図１９、図２１の例では、コンタクト状態又はホバー状態の１６個のサンプルを入力ストロークデータに含めるようにしてもよい。 [Third normalization method]
In the third normalization method, the input stroke data is normalized by constructing the input stroke data by adding all the hover state samples to the contact state samples. For example, in the examples of Figures 19 and 21, 16 samples of the contact state or the hover state may be included in the input stroke data.

ストロークデータ処理部１１４１Ａでは、第３の正規化方法が適用される場合、画の間（コンタクト状態区間の間）の区間（ホバー状態及びロス状態により構成される１つの区間）についても、画（コンタクト状態の区間）と同様に正規化する処理を行って入力ベクトルデータを取得するようにしてもよい。 When the stroke data processing unit 1141A applies the third normalization method, the section between strokes (between contact state sections) (a section consisting of a hover state and a loss state) may also be normalized in the same way as the strokes (contact state sections) to obtain input vector data.

図２４は、図１９のタイミングチャートに示す各特徴点について、第３の正規化方法を適用した場合における入力ストロークデータの例について示した図である。図２４では、ホバー状態におけるペン先状態を「０」としている。 Figure 24 shows an example of input stroke data when the third normalization method is applied to each feature point shown in the timing chart of Figure 19. In Figure 24, the pen tip state in the hover state is set to "0".

第３の正規化方法では、ホバー状態の期間及びホバー状態の期間の前後に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「０」とするものとする。ストロークデータ処理部１１４１Ａでは、第３の正規化方法が適用される場合、非コンタクト状態の期間（ホバー状態及びロス状態の期間）又は、当該非コンタクト状態の期間の直前もしくは直後に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「０」とするものとする。 In the third normalization method, when feature points are interpolated during a hover state period and before and after a hover state period, the pen tip state of those feature points is set to "0" in the input vector data. In the stroke data processing unit 1141A, when the third normalization method is applied, when feature points are interpolated during a non-contact state period (hover state and loss state periods) or immediately before or after the non-contact state period, the pen tip state of those feature points is set to "0" in the input vector data.

［第４の正規化方法］
第４の正規化方法では、全てのサンプル（コンタクト状態、ホバー状態、及びロス状態）で入力ストロークデータを構成して正規化する。例えば、図１９、図２１の例では、２０個全てのサンプルを入力ストロークデータに含めるようにしてもよい。 [Fourth normalization method]
In the fourth normalization method, the input stroke data is normalized by forming all samples (contact state, hover state, and loss state). For example, in the examples of Figures 19 and 21, all 20 samples may be included in the input stroke data.

図２５は、図１９のタイミングチャートに示す各特徴点について、第４の正規化方法を適用した場合における入力ストロークデータの例について示した図である。 Figure 25 shows an example of input stroke data when the fourth normalization method is applied to each feature point shown in the timing chart of Figure 19.

図２５では、ホバー状態におけるペン先状態を「０」とし、ロス状態におけるペン先状態を「２」としている。図２５では、ロス状態の時刻ｔ８～ｔ１１のｘ座標をｃ＿ｘ８～ｃ＿ｘ１１、ｙ座標をｃ＿ｙ８～ｃ＿ｙ１１と図示している。図２５では、ロス状態の時刻ｔ８～ｔ１１の各座標（Ｘ座標とＹ座標）は、前後のホバー期間の特徴点の座標の間を補間（線形補間）した位置の座標を設定するようにしてもよい。図２５の例では、ロス状態の期間の直前のｔ７の座標（ｘ７、ｙ７）と直後の座標（ｘ１２，ｙ１２）の間を結ぶ線上に等間隔でｔ８～ｔ１１の各座標を設定するようにしてもよい。 In FIG. 25, the pen tip state in the hover state is designated as "0", and the pen tip state in the loss state is designated as "2". In FIG. 25, the x coordinates of times t8 to t11 in the loss state are designated as c_x8 to c_x11, and the y coordinates are designated as c_y8 to c_y11. In FIG. 25, the coordinates (X and Y coordinates) of times t8 to t11 in the loss state may be set to coordinates of positions interpolated (linearly interpolated) between the coordinates of the feature points of the preceding and succeeding hover periods. In the example of FIG. 25, the coordinates of t8 to t11 may be set at equal intervals on a line connecting the coordinates (x7, y7) of t7 immediately before the loss state period and the coordinates (x12, y12) immediately after.

ストロークデータ処理部１１４１Ａでは、第４の正規化方法が適用される場合、ホバー状態の区間及びロス状態の区間についても、画（コンタクト状態の区間）と同様に正規化する処理（特徴点の間引き及び補間の処理）を行って入力ベクトルデータを取得する。例えば、図２５のような入力ストロークデータであった場合、ストロークデータ処理部１１４１Ａは、時刻ｔ５～ｔ７のホバー状態の区間、時刻ｔ８～ｔ１１のロス状態の区間、時刻ｔ１２～ｔ１４のホバー区間についても、画の区間（コンタクト状態の区間）と同様の正規化処理（特徴点の間引き及び補間の処理）を行って接続するようにしてもよい。 When the fourth normalization method is applied, the stroke data processing unit 1141A performs normalization processing (feature point thinning and interpolation processing) on the hover state section and the loss state section in the same way as the image (contact state section) to acquire input vector data. For example, in the case of input stroke data as shown in FIG. 25, the stroke data processing unit 1141A may perform normalization processing (feature point thinning and interpolation processing) on the hover state section from time t5 to t7, the loss state section from time t8 to t11, and the hover section from time t12 to t14 in the same way as the image section (contact state section) to connect them.

また、第４の正規化方法では、ホバー状態の期間やロス状態の期間において特徴点を補間する際には、以下のようなルールで入力ベクトルデータ上のペン先状態の値を設定するようにしてもよい。第４の正規化方法では、ロス状態の期間及びロス状態の期間の直前又は直後前後に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「２」（ロス状態）とするようにしてもよい。さらにまた、第４の正規化方法では、ロス状態の期間の直前のホバー状態の期間において、当該ホバー状態の期間及び当該ホバー状態の期間の直前に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「０」（ホバー状態）とするようにしてもよい。また、第４の正規化方法では、ロス状態の期間の直後のホバー状態の期間において、当該ホバー状態の期間及び当該ホバー状態の期間の直後に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「０」（ホバー状態）とするようにしてもよい。 In addition, in the fourth normalization method, when feature points are interpolated during a hover state period or a loss state period, the pen tip state value on the input vector data may be set according to the following rules. In the fourth normalization method, when feature points are interpolated during a loss state period and immediately before or immediately after a loss state period, the pen tip state of those feature points may be set to "2" (loss state) in the input vector data. Furthermore, in the fourth normalization method, when feature points are interpolated during a hover state period immediately before a loss state period and immediately before the hover state period, the pen tip state of those feature points may be set to "0" (hover state) in the input vector data. In addition, in the fourth normalization method, when feature points are interpolated during a hover state period immediately after a loss state period and immediately after the hover state period, the pen tip state of those feature points may be set to "0" (hover state) in the input vector data.

以上のように、第４の正規化方法を適用する場合、図１９のタイミングチャートの例では、ロス状態のサンプルも含む全てのサンプル（２０サンプル）を用いて入力ストロークデータを生成するので、正規化の際、５倍程度の特徴点の補間（希釈化）ですむため、より正確な情報（より多くの情報量）を入力ベクトルデータに盛り込むことができる。つまり、第４の正規化方法を適用する場合、図１９のタイミングチャートの例では、ホバー状態とロス状態を区別した情報を入力ベクトルデータに盛り込むことができる。これにより、第４の正規化方法では、入力ベクトルデータの情報量を増やすことができるので、学習環境や認識環境によっては、認識精度を向上させることができる。 As described above, when the fourth normalization method is applied, in the example of the timing chart in FIG. 19, input stroke data is generated using all samples (20 samples), including those in the loss state, so that normalization requires only about five times the interpolation (dilution) of feature points, allowing more accurate information (more information volume) to be included in the input vector data. In other words, when the fourth normalization method is applied, in the example of the timing chart in FIG. 19, information distinguishing between the hover state and the loss state can be included in the input vector data. As a result, with the fourth normalization method, the amount of information in the input vector data can be increased, which can improve recognition accuracy depending on the learning environment and recognition environment.

［第５の正規化方法］
第５の正規化方法では、第４の正規化方法と同様に全てのサンプル（コンタクト状態、ホバー状態、及びロス状態）で入力ストロークデータを構成して正規化する。ただし、第５の正規化方法では、ロス状態とホバー状態を画一的に扱う点で、第４の正規化方法と異なる。ここでは、第５の正規化方法において、ロス状態とホバー状態を全てホバー状態として扱うものとして説明する。 [Fifth normalization method]
In the fifth normalization method, input stroke data is constructed and normalized for all samples (contact state, hover state, and loss state) in the same way as in the fourth normalization method. However, the fifth normalization method differs from the fourth normalization method in that the loss state and the hover state are treated uniformly. Here, the fifth normalization method will be described assuming that the loss state and the hover state are all treated as the hover state.

図２６は、図１９のタイミングチャートに示す各特徴点について、第５の正規化方法を適用した場合における入力ストロークデータの例について示した図である。 Figure 26 shows an example of input stroke data when the fifth normalization method is applied to each feature point shown in the timing chart of Figure 19.

図２６では、ロス状態の期間のペン先状態もホバー状態と同じく「０」に設定されている点で、図２５（第４の正規化方法）と異なっている。 Figure 26 differs from Figure 25 (fourth normalization method) in that the pen tip state during the loss state is also set to "0" like the hover state.

第５の正規化方法では、ホバー状態及びロス期間により構成される期間（図１９では、時刻ｔ５～ｔ１３の期間）及び当該期間の直前もしくは直後に特徴点を補間する場合、入力ベクトルデータにおいてそれらの特徴点のペン先状態を「０」とするものとする。 In the fifth normalization method, when feature points are interpolated during a period consisting of a hover state and a loss period (in FIG. 19, the period from time t5 to t13) and immediately before or after that period, the pen tip state of those feature points in the input vector data is set to "0".

これにより、第５の正規化方法では、例えばペンタブレット２０のセンサ（電子ペン３０を検知するセンサ）の能力の違いに寄りホバー状態で座標を取得できる高さ（検出範囲）に変動があり、図１９の状態と図２０の状態が混在する場合であっても、入力ベクトルデータの内容はほぼ同じになるため、安定的な学習処理や認識処理を行うことができる。 As a result, in the fifth normalization method, even if the height (detection range) at which coordinates can be obtained in the hover state varies due to differences in the capabilities of the sensor (sensor that detects the electronic pen 30) of the pen tablet 20, and the state of Figure 19 and the state of Figure 20 are mixed, the contents of the input vector data will be approximately the same, making it possible to perform stable learning and recognition processes.

（Ｃ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (C) Other Embodiments The present invention is not limited to the above-described embodiments, and modified embodiments such as those exemplified below can also be mentioned.

（Ｃ－１）上記の各実施形態において、情報処理端末１０とペンタブレット２０（ディスプレイパネル２１）とは分離されたデバイス構成となっているが、一体の構成となるようにしてもよい。例えば、情報処理端末１０として、タッチパネルディスプレイを備えるコンピュータ（例えば、タブレット端末やスマートホン）を用いて構成するようにしてもよい。 (C-1) In each of the above embodiments, the information processing terminal 10 and the pen tablet 20 (display panel 21) are configured as separate devices, but they may be configured as an integrated device. For example, the information processing terminal 10 may be configured using a computer (e.g., a tablet terminal or a smartphone) equipped with a touch panel display.

（Ｃ－２）上記の各実施形態において、文字認識処理部１１４は、オンラインＡＩ処理部１１４２とオフラインＡＩ処理部１１４３の両方を備えているが、オンラインＡＩ処理部１１４２のみを備える構成としてもよい。 (C-2) In each of the above embodiments, the character recognition processing unit 114 has both an online AI processing unit 1142 and an offline AI processing unit 1143, but it may also be configured to have only the online AI processing unit 1142.

１…認識処理システム、１０…情報処理端末、１１…制御部、１１１…コンテンツ処理部、１１２…ディスプレイドライバ、１１３…ペンタブレットドライバ、１１４…文字認識処理部、１１４１…ストロークデータ処理部、１１４２…オンラインＡＩ処理部、１１４３…オフラインＡＩ処理部、１１４４…文字認識結果出力部、１２…映像ＩＦ、１３…ＵＳＢポート、２０…ペンタブレット、２１…ディスプレイパネル、３０…電子ペン、３１…ペン先。 1...recognition processing system, 10...information processing terminal, 11...control unit, 111...content processing unit, 112...display driver, 113...pen tablet driver, 114...character recognition processing unit, 1141...stroke data processing unit, 1142...online AI processing unit, 1143...offline AI processing unit, 1144...character recognition result output unit, 12...video IF, 13...USB port, 20...pen tablet, 21...display panel, 30...electronic pen, 31...pen tip.

Claims

a normalization means for acquiring input stroke data indicating an input pattern of strokes made by an electronic pen in a time series order for each input character, the input stroke data including position information of feature points in the time series order of the input pattern, and performing a normalization process for normalizing the acquired input stroke data into an input pattern having a fixed number of samples to acquire normalized stroke data;
an input vector data acquisition means for converting the normalized stroke data normalized by the normalization means into input vector data expressed by the feature amount of the fixed number of samples;
a character recognition processing means for performing character recognition processing on the input vector data acquired by the input vector data acquisition means, using a learning model that has been machine-learned using the input vector data acquired by the input vector data acquisition means,
The normalization process performed by the normalization means includes the following steps:
a first normalization process for thinning out feature points from the input stroke data so that intervals between all feature points adjacent in time series are equal to or greater than a predetermined threshold value, thereby acquiring intermediate stroke data;
and a second normalization processing step of acquiring data obtained by interpolating feature points of the intermediate stroke data such that intervals between all feature points adjacent in chronological order are equal to or smaller than the threshold value.

The recognition processing device according to claim 1, characterized in that each feature constituting the input vector data includes a coordinate parameter indicating a coordinate corresponding to the time series of the feature, a motion vector parameter indicating a motion vector from the time series immediately preceding the feature, and a pen tip state parameter indicating the state of the pen tip of the electronic pen corresponding to the time series of the feature.

The recognition processing device according to claim 2, characterized in that each feature constituting the input vector data includes a pen tip state parameter indicating the state of the pen tip of the electronic pen corresponding to the time series of the feature.

The input vector data acquisition means further generates input image data drawn based on the input stroke data,
the character recognition processing means performs character recognition processing on the input vector data acquired by the input vector data acquisition means using a first learning model previously trained by machine learning using input vector data to acquire a first character recognition result, and further performs character recognition processing on the input image data acquired by the input vector data acquisition means using a second learning model previously trained by machine learning using an input image to acquire a second character recognition result;
4. The recognition processing device according to claim 1, further comprising a character recognition result output means for selecting either the first character recognition result or the second character recognition result by the character recognition processing means and outputting the selected result as a final character recognition processing result.

the character recognition processing means acquires a reliability of the first character recognition result or the second character recognition result when acquiring the first character recognition result or the second character recognition result;
5. The recognition processing device according to claim 4, wherein the character recognition result output means selects one of the first character recognition result and the second character recognition result by the character recognition processing means, which has a higher reliability, and outputs the selected result as a final character recognition processing result.

Computer,
a normalization means for acquiring input stroke data indicating an input pattern of strokes made by an electronic pen in a time series order for each input character, the input stroke data including position information of feature points in the time series order of the input pattern, and performing a normalization process for normalizing the acquired input stroke data into an input pattern having a fixed number of samples to acquire normalized stroke data;
an input vector data acquisition means for converting the normalized stroke data normalized by the normalization means into input vector data expressed by the feature amount of the fixed number of samples;
a learning model that has been machine-learned using the input vector data acquired by the input vector data acquisition means, the learning model functioning as a character recognition processing means that performs character recognition processing on the input vector data acquired by the input vector data acquisition means;
The normalization process performed by the normalization means includes the following steps:
a first normalization process for thinning out feature points from the input stroke data so that intervals between all feature points adjacent in time series are equal to or greater than a predetermined threshold value, thereby acquiring intermediate stroke data;
and a second normalization process step of acquiring data obtained by interpolating feature points of the intermediate stroke data so that intervals between all feature points adjacent in chronological order are equal to or less than the threshold value.

A recognition processing method performed by a recognition processing device,
The recognition processing device includes a normalization means, an input vector data acquisition means, a character recognition processing means, and a character recognition result output means,
the normalization means acquires input stroke data indicating an input pattern of strokes made by an electronic pen in a time series order for each input character and including position information of feature points in the time series order of the input pattern, and acquires normalized stroke data by performing a normalization process for normalizing the acquired input stroke data into an input pattern with a fixed number of samples;
the input vector data acquisition means converts the normalized stroke data normalized by the normalization means into input vector data expressed by the feature amount of the fixed number of samples;
the character recognition processing means performs character recognition processing on the input vector data acquired by the input vector data acquisition means, using a learning model that has been machine-learned using the input vector data acquired by the input vector data acquisition means;
The normalization process performed by the normalization means includes the following steps:
a first normalization process for thinning out feature points from the input stroke data so that intervals between all feature points adjacent in time series are equal to or greater than a predetermined threshold value, thereby acquiring intermediate stroke data;
and a second normalization process step of acquiring data by interpolating feature points of the intermediate stroke data so that intervals between all feature points adjacent in chronological order are equal to or less than the threshold value.

A recognition processing system having a pen tablet capable of receiving input using an electronic pen and a front electronic pen, and a recognition processing device that recognizes characters written on the pen tablet by a user with the electronic pen, characterized in that the recognition processing device is a recognition processing device according to any one of claims 1 to 5.