JP5047209B2

JP5047209B2 - Error conversion pointing device and method for indicating error conversion based on conversion break position

Info

Publication number: JP5047209B2
Application number: JP2009058168A
Authority: JP
Inventors: 圭吾町永
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2009-03-11
Filing date: 2009-03-11
Publication date: 2012-10-10
Anticipated expiration: 2029-03-11
Also published as: JP2010211609A

Description

本発明は、変換後の文字列により構成された文章の中に含まれる同音異義語の誤変換を指摘する誤変換指摘装置及びその方法に関する。 The present invention relates to an erroneous conversion indication device and method for indicating an erroneous conversion of a homonym included in a sentence composed of converted character strings.

従来、仮名漢字変換に起因する同音異義語の誤り検出・訂正の方法として、確率的ＬＳＡを用いた日本語同音異義語誤りの検出・訂正の方法（非特許文献１）がある。 Conventionally, there is a Japanese homonym error detection / correction method using probabilistic LSA (Non-Patent Document 1) as a method for detecting and correcting homonym error due to kana-kanji conversion.

この方法では、同音異義語のｎｇｒａｍでモデル化される局所的出現確率及びＰＬＳＡによってモデル化される大域的出現確率に基づいて定義される尤度を用いて誤変換の有無を判定する。 In this method, the presence / absence of erroneous conversion is determined using the likelihood defined based on the local appearance probability modeled by ngram of the homonym and the global appearance probability modeled by PLSA.

三品拓也、貞光九月、山本幹雄「確率的ＬＳＡを用いた日本語同音異義語誤りの検出・訂正」、情報処理学会論文誌、平成１６年９月、Ｖｏｌ．４５、Ｎｏ．９、ｐ．１−９Takuya Sanna, September Sadamitsu, Mikio Yamamoto “Detection and correction of Japanese homonyms using probabilistic LSA”, Journal of Information Processing Society, September 2004, Vol. 45, no. 9, p. 1-9

しかし、漢字変換後の文章における入力時の変換区切り位置の違いは考慮されていないため、例えば、「練習成果」、「練習生可」等のような変換位置が異なる同音異義語（検査対象語）に対しては、誤り検出ができなかった。 However, since the difference in the conversion break position at the time of input in the sentence after conversion to Kanji is not considered, for example, homonyms with different conversion positions (test target words such as “practice result”, “practice trainee”), etc. ) Could not be detected.

そこで、本発明は、文章中の同音異義語（検査対象語）について変換区切り位置が異なる場合であっても、変換誤りを指摘することを目的とする。 Therefore, an object of the present invention is to point out a conversion error even when the conversion delimiter positions differ for homonyms (inspection words) in a sentence.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）変換後の文字列により構成された文章の中に含まれる、誤変換を指摘する誤変換指摘装置であって、仮名文字列の漢字変換後の変換区切り位置を記憶する変換区切位置記憶手段と、漢字変換後の文章全体を仮名文字列に変換した際に、当該仮名文字列の最長の共通部分に対応する前記漢字変換後の文字列をそれぞれ検査対象語として抽出する検査対象語抽出手段と、前記変換区切位置記憶手段を参照することにより、前記検査対象語抽出手段により抽出された各前記検査対象語の変換区切り位置を抽出する変換区切位置抽出手段と、前記変換区切位置抽出手段によって抽出された変換区切り位置が前記検査対象語間で異なる場合に、前記検査対象語抽出手段により抽出された前記検査対象語に誤変換があるか否かを判定する誤変換判定手段と、を備え、前記誤変換判定手段によって誤変換があると判定された場合に、前記検査対象語抽出手段により抽出された前記検査対象語に誤変換があることを指摘することを特徴とする誤変換指摘装置。 (1) contained in the sentence which is constituted by the character string after conversion, erroneous conversion a false conversion pointed device to point out, conversion delimiting position storage for storing the converted delimiter position after kanji conversion kana string Means for extracting a word to be inspected to extract the character string after the kanji conversion corresponding to the longest common part of the kana character string when the whole sentence after the kanji conversion is converted into a kana character string And a conversion delimiter position extracting unit for extracting a conversion delimiter position of each of the test target words extracted by the test target word extracting unit by referring to the conversion delimiter position storage unit , and the conversion delimiter position extracting unit If the extracted transform delimiter position differs between the inspection target word by erroneous conversion determination to determine whether there is a conversion false before Symbol inspection target word extracted by said object word extraction means Comprising a stage, and a feature that when it is determined that there is a mis-converted by the erroneous conversion determination means, points out that there is a conversion erroneous the inspection target word extracted by said object word extraction means An erroneous conversion indication device.

（１）の構成によれば、変換区切位置記憶手段は、仮名文字列の漢字変換後の変換区切り位置を記憶し、検査対象語抽出手段は、漢字変換後の文章の中に含まれる検査対象語を抽出し、変換区切位置抽出手段は、各検査対象語の変換区切り位置を抽出し、誤変換判定手段は、変換区切り位置が検査対象語間で異なる場合に、抽出された前記検査対象語に誤変換があるか否かを判定し、誤変換があると判定された場合に、検査対象語に誤変換があることを指摘する。 According to the configuration of (1), the conversion delimiter position storage unit stores the conversion delimiter position after kanji conversion of the kana character string, and the inspection target word extraction unit stores the inspection target included in the sentence after the kanji conversion. A conversion delimiter position extracting unit extracts a conversion delimiter position of each test target word, and an erroneous conversion determination unit extracts the test target word extracted when the conversion delimiter position differs between the test target words. It is determined whether or not there is an erroneous conversion. If it is determined that there is an erroneous conversion, it is pointed out that there is an erroneous conversion in the word to be examined .

これにより、誤変換指摘装置は、文章中の同音異義語（検査対象語）について変換区切り位置が異なる場合であっても、変換誤りを指摘することができる。 Accordingly, the erroneous conversion indication device can indicate a conversion error even when the conversion delimiter positions are different for the homonyms (test target words) in the sentence.

（２）前記検査対象語抽出手段は、前記漢字変換後の文章を形態素に分割する形態素分割手段と、形態素分割手段によって分割された形態素を仮名文字列に変換する仮名変換手段と、仮名変換手段によって仮名文字列に変換された形態素を連結する仮名形態素連結手段と、仮名形態素連結手段によって連結された形態素である連結形態素の中から同一の連結形態素を抽出する同一連結形態素抽出手段と、を備え、前記漢字変換後の文章の中から、抽出した前記同一の連結形態素に対応する文字列を検査対象語として抽出することを特徴とする（１）記載の誤変換指摘装置。 (2) The inspection target word extracting unit includes a morpheme dividing unit that divides the sentence after the kanji conversion into morphemes, a kana conversion unit that converts the morphemes divided by the morpheme dividing unit into a kana character string, and a kana conversion unit. comprising a pseudonym morpheme connecting means for connecting the morphemes converted into kana character string, and the same connecting morpheme extraction means for extracting the same connecting morphemes from the connecting morpheme is a morpheme linked by pseudonym morpheme connecting means, by The erroneous conversion indication device according to (1), wherein a character string corresponding to the extracted connected morpheme is extracted as an inspection target word from the sentence after the Kanji conversion.

（２）の構成によれば、検査対象語抽出手段は、漢字変換後の文章を形態素に分割し、分割された形態素を仮名文字列に変換し、仮名文字列に変換された形態素を連結し、連結された形態素である連結形態素の中から同一の連結形態素を抽出し、漢字変換後の文章の中から、抽出した同一の連結形態素に対応する文字列を検査対象語として抽出する。 According to the configuration of (2), the inspection target word extraction unit divides the sentence after the kanji conversion into morphemes, converts the divided morphemes into kana character strings, and connects the morphemes converted into kana character strings. Then, the same connected morpheme is extracted from the connected morpheme that is the connected morpheme, and the character string corresponding to the extracted same connected morpheme is extracted from the sentence after the kanji conversion as the inspection target word.

ここで、（１）の発明では、誤変換指摘装置は、漢字変換後の文章全体を仮名文字に変換した際に、最長の共通部分を検査対象語として抽出するので、検査対象語が長すぎて的確に誤変換指摘ができないおそれがあるが、（２）の構成により、検査対象語を的確な長さで抽出することができるので、最適な方法で誤変換指摘ができる。 Here, in the invention of (1), the misconversion indication device extracts the longest common part as the inspection target word when the entire sentence after the kanji conversion is converted into the kana character, so the inspection target word is too long. Although there is a possibility that an erroneous conversion indication cannot be made accurately, the configuration (2) enables extraction of an inspection target word with an accurate length, so that an erroneous conversion indication can be made by an optimum method.

（３）前記変換区切位置記憶手段は、仮名文字列と漢字変換後の文字列と前記変換区切り位置とを対応付けて記憶しており、前記変換区切位置抽出手段によって抽出された前記変換区切り位置が前記検査対象語間で異なる場合に、前記検査対象語に誤変換の可能性があることを示唆する誤変換フラグを起動する誤変換フラグ起動手段を備え、前記誤変換判定手段は、前記誤変換フラグが起動した場合に、前記抽出された前記検査対象語に誤変換があるか否かを判定することを特徴とする（１）又は（２）に記載の誤変換指摘装置。 (3) the conversion break position storage means stores in association with the kana character string and the character string after kanji conversion and the conversion delimiter position, wherein the conversion delimited extracted by prior Symbol conversion break position extracting means If the positions are different between the inspection target word, before SL comprises an erroneous conversion flag activation means activates erroneously conversion flag to indicate that there is a possibility of inspection target word in erroneous conversion, the erroneous conversion determination means, if the erroneous conversion flag is activated, erroneous conversion pointed device according to, characterized in that you determine whether there is a conversion erroneous said object word the extracted (1) or (2).

（３）の構成によれば、前記変換区切位置記憶手段は、仮名文字列と漢字変換後の文字列と前記変換区切り位置とを対応付けて記憶し、誤変換フラグ起動手段は、前記変換区切位置抽出手段によって抽出された前記変換区切り位置が前記検査対象語間で異なる場合に、前記検査対象語に誤変換の可能性があることを示唆する誤変換フラグを起動する。さらに、前記誤変換判定手段は、前記誤変換フラグが起動した場合に、前記抽出された前記検査対象語に誤変換があるか否かを判定する。 According to the configuration of (3), wherein the conversion delimiting position storage means, in association with the kana character string and the character string after kanji conversion and the conversion delimiter position, erroneous conversion flag activation means, before Symbol conversion when the conversion delimiter position extracted by the break position extracting means is different between the test subject word starts erroneous conversion flag to indicate that there is a possibility of erroneous converted before Symbol inspection target word. Furthermore, the erroneous conversion determination unit, when the erroneous conversion flag is activated, it determines whether there is a conversion erroneous the extracted said object language.

これにより、誤変換指摘装置は、変換区切位置抽出手段により抽出された変換区切り位置に基づいて、検査対象語に誤変換の可能性がある場合にのみ、検査対象語に誤変換があるか否かを判定するので、誤変換の判定対象を絞り込むことができる。よって、誤変換判定処理が不要な場合、誤変換指摘装置は、当該処理を行わないで済むので、誤変換判定処理の処理効率をアップさせることができる。 Thus, the erroneous conversion pointed device, whether based on the conversion delimiter position extracted by converting delimiting position extracting means, only if there is a possibility of converting erroneous inspection target word, it is converted erroneous inspection target word Therefore, it is possible to narrow down the determination target for erroneous conversion. Therefore, when the erroneous conversion determination process is unnecessary, the erroneous conversion indication device does not need to perform the process, and thus the processing efficiency of the erroneous conversion determination process can be increased.

（４）コンピュータが、変換後の文字列により構成された文章の中に含まれる、誤変換を指摘する誤変換指摘方法であって、仮名文字列の漢字変換後の変換区切り位置を記憶する記憶ステップと、漢字変換後の文章全体を仮名文字列に変換した際に、当該仮名文字列の最長の共通部分に対応する前記漢字変換後の文字列をそれぞれ検査対象語として抽出する検査対象語抽出ステップと、前記記憶された前記変換区切り位置に基づいて、前記検査対象語抽出手段により抽出された各前記検査対象語の変換区切り位置を抽出する変換区切位置抽出ステップと、前記抽出された変換区切り位置が前記検査対象語間で異なる場合に、前記抽出された前記検査対象語に誤変換があるか否かを判定する誤変換判定ステップと、前記誤変換があると判定された場合に、前記抽出された前記検査対象語に誤変換があることを指摘する指摘ステップと、有することを特徴とする誤変換指摘方法。 (4) computer is included within a sentence constituted by the character string after conversion, a false conversion point out how to point out erroneous conversion, storage for storing the converted delimiter position after kanji conversion kana string Step and test word extraction for extracting the kanji converted character string corresponding to the longest common part of the kana character string as a test target word when the whole sentence after kanji conversion is converted into a kana character string steps and, on the basis of the stored the converted delimiter position, and conversion break position extracting a conversion delimiter position of each of the inspection target words extracted by the inspection target word extracting means, converting delimited the extracted When the position is different between the inspection target words, it is determined that there is an erroneous conversion determination step for determining whether or not the extracted inspection target word has an erroneous conversion, and that there is an erroneous conversion. Case, pointed out steps and, Yusuke erroneous transformation point out methods wherein Rukoto to point out that there is a conversion erroneous the extracted said object language.

（４）の構成によれば、（１）の誤変換指摘装置と同様な作用効果を奏する。 According to the structure of (4), there exists an effect similar to the erroneous conversion indication apparatus of (1).

（５）前記検査対象語として抽出するステップでは、前記漢字変換後の文章を形態素に分割し、分割された前記形態素を仮名文字列に変換し、仮名文字列に変換された前記形態素を連結し、連結された前記形態素である連結形態素の中から同一の連結形態素を抽出し、前記漢字変換後の文章の中から、抽出した前記同一の連結形態素に対応する文字列を検査対象語として抽出することを特徴とする（４）記載の誤変換指摘方法。 (5) In the step of extracting as the inspection target word, divide the sentence after the kanji conversion into morphemes, the divided the morphemes into a kana character string, connecting the morphemes converted into kana character string extracts identical coupling morphemes from the connecting morphemes are concatenated the morpheme extracting from the text after the kanji conversion, the extracted character string corresponding to the same connection morphemes as an inspection target word (4) The erroneous conversion indication method according to (4).

（５）の構成によれば、（２）の誤変換指摘装置と同様な作用効果を奏する。 According to the structure of (5), there exists an effect similar to the erroneous conversion indication apparatus of (2).

（６）前記記憶ステップでは、同一の仮名文字列と漢字変換後の文字列と前記変換区切り位置とを対応付けて記憶し、前記抽出された前記変換区切り位置が前記検査対象語間で異なる場合に、前記検査対象語に誤変換の可能性があることを示唆する誤変換フラグを起動する起動ステップを有し、前記誤変換判定ステップでは、前記誤変換フラグが起動した場合に、前記抽出された前記検査対象語に誤変換があるか否かを判定することを特徴とする（４）又は（５）に記載の誤変換指摘方法。 (6) In the storing step, the same kana character string, the character string after kanji conversion and the conversion delimiter position are stored in association with each other, and the extracted conversion delimiter position is different between the inspection target words the front SL has an activation step of activating erroneous conversion flag to indicate that there is a possibility of inspection target word in erroneous conversion, in the erroneous conversion determination step, when the erroneous conversion flag is activated, the extracted It has been erroneously converted pointed method according to the inspection target word in erroneous conversion whether the characterized determine Teisu Rukoto there (4) or (5).

（６）の構成によれば、（３）の誤変換指摘装置と同様な作用効果を奏する。 According to the structure of (6), there exists an effect similar to the error conversion indication apparatus of (3).

本発明によれば、誤変換指摘装置は、文章中の同音異義語（検査対象語）について変換区切り位置が異なる場合であっても、変換誤りを指摘することができる。 According to the present invention, the erroneous conversion indicating device can indicate a conversion error even when the conversion delimiter positions are different for the homonyms (test target words) in the sentence.

本実施形態の誤変換指摘装置の機能構成を示す機能ブロック図である。It is a functional block diagram showing the functional configuration of the erroneous conversion pointed equipment of the present embodiment. 本実施形態の誤変換指摘装置のハードウェア構成を示す図である。Is a diagram showing an erroneous conversion pointed equipment hardware configuration of the present embodiment. 本実施形態の誤変換指摘装置の文字入力時処理のフローチャートである。Is a flowchart at the time the character input processing of erroneous conversion pointed equipment of the present embodiment. 本実施形態の誤変換指摘装置の誤変換指摘処理のフローチャートである。Is a flowchart of the erroneous conversion report processing of erroneous conversion pointed equipment of the present embodiment. 入力時における入力文字列の例を示す図である。It is a figure which shows the example of the input character string at the time of input. 変換後の文章の例を示す図である。It is a figure which shows the example of the text after conversion. 変換単位文字列テーブルを示す図である。It is a figure which shows the conversion unit character string table. 変換位置テーブルを示す図である。It is a figure which shows a conversion position table. 検査対象語抽出結果テーブルを示す図である。It is a figure which shows a test object word extraction result table. 変換位置抽出結果テーブルを示す図である。It is a figure which shows a conversion position extraction result table. 誤変換フラグ起動結果テーブルを示す図である。It is a figure which shows an erroneous conversion flag starting result table. 誤変換指摘の例を示す図である。It is a figure which shows the example of an incorrect conversion indication. サーバが変換位置記憶手段を備えることを示す図である。 Server is a diagram showing further comprising a converting position storage means. 検査対象語抽出処理のフローチャートである。It is a flowchart of a test object word extraction process. 形態素解析、仮名変換及び隣接２形態素の連結の過程を示す図である。It is a figure which shows the process of a morphological analysis, kana conversion, and the connection of 2 adjacent morphemes. 同一仮名検索テーブルを示す図である。It is a figure which shows the same kana search table.

以下、本発明の実施形態について図を参照しながら説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［誤変換指摘装置１の機能構成］
図１は、本発明の一実施形態に係る誤変換指摘装置１の機能構成を示す機能ブロック図である。 [Functional configuration of erroneous conversion indication device 1]
FIG. 1 is a functional block diagram showing a functional configuration of an erroneous conversion indication device 1 according to an embodiment of the present invention.

誤変換指摘装置１は、同一仮名文字列抽出手段１１０と、変換位置記憶手段１２０と、文章受付手段１３０と、検査対象語抽出手段１４０と、誤変換指摘手段１５０と、から構成される。 The erroneous conversion indication device 1 includes an identical kana character string extraction unit 110, a conversion position storage unit 120, a sentence reception unit 130, an inspection target word extraction unit 140, and an erroneous conversion indication unit 150.

さらに、誤変換指摘手段１５０は、変換位置抽出手段１５１と、誤変換フラグ起動手段１５２と、誤変換判定手段１５３と、から構成される。変換位置抽出手段１５１は、変換位置記憶手段１２０を参照して、変換位置を抽出する。 Further, the erroneous conversion indication unit 150 includes a conversion position extraction unit 151, an erroneous conversion flag activation unit 152, and an erroneous conversion determination unit 153. The conversion position extraction unit 151 refers to the conversion position storage unit 120 and extracts the conversion position.

同一仮名文字列抽出手段１１０は、文章を構成する仮名文字列から同一の仮名文字列を抽出し、抽出した同一の仮名文字列の漢字変換後の変換位置を変換位置記憶手段１２０（図８で後述する変換位置テーブル）に記憶する。 The same kana character string extraction unit 110 extracts the same kana character string from the kana character string constituting the sentence, and converts the extracted conversion position of the same kana character string after the kanji conversion into the conversion position storage unit 120 (in FIG. 8). It stores in a conversion position table (to be described later).

文章受付手段１３０は、漢字変換後の文章（図６で後述する変換後の文章）の入力を受け付け、検査対象語抽出手段１４０は、検査対象語を抽出し、誤変換指摘手段１５０は、抽出した検査対象語に誤変換がある場合、誤変換があることを指摘する。 The sentence accepting unit 130 accepts input of a sentence after conversion to kanji (converted sentence described later in FIG. 6), the inspection target word extracting unit 140 extracts the inspection target word, and the erroneous conversion indication unit 150 extracts If there is an incorrect conversion in the checked word, point out that there is an incorrect conversion.

その際、誤変換指摘手段１５０が備える誤変換フラグ起動手段１５２が、誤変換フラグを起動した場合（変換位置に基づいた誤変換の可能性がある場合）に限って、誤変換判定手段１５３が、ｎｇｒａｍ＋ＰＬＳＡの手法（三品拓也、貞光九月、山本幹雄「確率的ＬＳＡを用いた日本語同音異義語誤りの検出・訂正」、情報処理学会論文誌、平成１６年９月、Ｖｏｌ．４５、Ｎｏ．９、ｐ．１−９）を用いて誤変換の有無を判定する。 At that time, only when the erroneous conversion flag starting means 152 provided in the erroneous conversion indicating means 150 starts the erroneous conversion flag (when there is a possibility of erroneous conversion based on the conversion position), the erroneous conversion determining means 153 , Ngram + PLSA (Takuya Sanna, Sadamitsu September, Mikio Yamamoto “Detection and correction of Japanese homonyms using stochastic LSA”, Information Processing Society of Japan, September 2004, Vol. 45, No. .9, p.1-9), the presence or absence of erroneous conversion is determined.

［誤変換指摘装置１のハードウェア構成］
図２は、本実施形態の誤変換指摘装置１のハードウェア構成を示す図である。図２に示すように、制御部２００を構成するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２１０（マルチプロセッサ構成ではＣＰＵ２２０等複数のＣＰＵが追加されてもよい）、バスライン１００、通信Ｉ／Ｆ（Ｉ／Ｆ：インタフェース）２３０、メインメモリ２４０、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）２５０、Ｉ／Ｏコントローラ２６０、ハードディスク２７０、光ディスクドライブ２８０及び半導体メモリ２９０を備える。なお、ハードディスク２７０、光ディスクドライブ２８０及び半導体メモリ２９０はまとめて記憶装置３１０と呼ばれる。 [Hardware configuration of erroneous conversion indication device 1]
FIG. 2 is a diagram illustrating a hardware configuration of the erroneous conversion indication device 1 of the present embodiment. As shown in FIG. 2, a central processing unit (CPU) 210 (a plurality of CPUs such as CPU 220 may be added in a multiprocessor configuration), a bus line 100, a communication I / F (I / F) Interface) 230, main memory 240, BIOS (Basic Input Output System) 250, I / O controller 260, hard disk 270, optical disk drive 280, and semiconductor memory 290. The hard disk 270, the optical disk drive 280, and the semiconductor memory 290 are collectively referred to as a storage device 310.

制御部２００は、誤変換指摘装置１を統括的に制御する部分であり、ハードディスク２７０に記憶された各種プログラムを適宜読み出して実行することにより、上述したハードウェアと協働し、本発明に係る各種機能を実現している。 The control unit 200 is a part that controls the misconversion indication device 1 in an integrated manner, and by appropriately reading and executing various programs stored in the hard disk 270, the control unit 200 cooperates with the hardware described above, and relates to the present invention. Various functions are realized.

通信Ｉ／Ｆ２３０は、誤変換指摘装置１がネットワークを介して他の装置と情報を送受信する場合のネットワーク・アダプタである。 The communication I / F 230 is a network adapter when the erroneous conversion indication device 1 transmits / receives information to / from another device via a network.

ＢＩＯＳ２５０は、誤変換指摘装置１の起動時にＣＰＵ２１０が実行するブートプログラムや、誤変換指摘装置１のハードウェアに依存するプログラム等を記録する。 The BIOS 250 records a boot program executed by the CPU 210 when the erroneous conversion indication device 1 is started, a program depending on the hardware of the erroneous conversion indication device 1, and the like.

Ｉ／Ｏコントローラ２６０には、ハードディスク２７０、光ディスクドライブ２８０及び半導体メモリ２９０等の記憶装置３１０を接続することができる。 The I / O controller 260 can be connected hard disk 270, the optical disk drive 28 0及 beauty semiconductor memory 290 or the like of the storage device 310.

ハードディスク２７０は、本ハードウェアを誤変換指摘装置１として機能させるための各種プログラム、本発明の機能を実行するプログラム及び後述するデータテーブル等を記憶する。なお、誤変換指摘装置１は、外部に別途設けたハードディスク（図示せず）を外部記憶装置として利用することもできる。 The hard disk 270 stores various programs for causing the hardware to function as the erroneous conversion indication device 1, a program for executing the functions of the present invention, a data table to be described later, and the like. Note that the erroneous conversion indication device 1 can also use a hard disk (not shown) separately provided as an external storage device.

光ディスクドライブ２８０としては、例えば、ＤＶＤ−ＲＯＭドライブ、ＣＤ−ＲＯＭドライブ、ＤＶＤ−ＲＡＭドライブ及びＣＤ−ＲＡＭドライブを使用することができる。この場合は各ドライブに対応した光ディスク３００を使用する。光ディスク３００から光ディスクドライブ２８０によりプログラム又はデータを読み取り、Ｉ／Ｏコントローラ２６０を介してメインメモリ２４０又はハードディスク２７０に提供することもできる。 As the optical disk drive 280, for example, a DVD-ROM drive, a CD-ROM drive, a DVD-RAM drive, and a CD-RAM drive can be used. In this case, the optical disc 300 corresponding to each drive is used. A program or data can be read from the optical disk 300 by the optical disk drive 280 and provided to the main memory 240 or the hard disk 270 via the I / O controller 260.

なお、本発明でいうコンピュータとは、記憶装置、制御部等を備えた情報処理装置をいい、誤変換指摘装置１は、記憶装置３１０、制御部２００等を備えた情報処理装置により構成される。 The computer in the present invention refers to an information processing apparatus including a storage device, a control unit, and the like, and the erroneous conversion indication device 1 includes an information processing device including the storage device 310, the control unit 200, and the like. .

以上の例は、誤変換指摘装置１について主に説明したが、コンピュータに、プログラムをインストールして、そのコンピュータをサーバ装置として動作させることにより上記で説明した機能を実現することもできる。したがって、本発明において一実施形態として説明した誤変換指摘装置１により実現される機能は、上述の方法を当該コンピュータにより実行することにより、或いは、上述のプログラムを当該コンピュータに導入して実行することによっても実現可能である。 In the above example, the erroneous conversion indication device 1 has been mainly described. However, the functions described above can also be realized by installing a program in a computer and operating the computer as a server device. Therefore, the function realized by the erroneous conversion indication device 1 described as one embodiment in the present invention is executed by executing the above-described method by the computer or by introducing the above-described program into the computer. This is also possible.

［文字入力処理のフローチャート］
図３は、文字入力処理のフローチャートである。 [Character input process flowchart]
FIG. 3 is a flowchart of the character input process.

まず、ステップＳ１では、制御部２００は、仮名文字列の入力受付を行う。具体的には、制御部２００は、誤変換指摘装置１に備えられたキーボード等の入力装置（図示せず）から、仮名文字列の入力を受け付ける。 First, in step S1, the control unit 200 receives an input of a kana character string. Specifically, the control unit 200, an input device such as a keyboard provided in erroneous conversion pointed apparatus 1 de (not shown), receives an input of the kana character string.

ステップＳ２では、制御部２００は、漢字変換を行う。具体的には、制御部２００は、ステップＳ１で受け付けた仮名文字列を漢字を含む文字列に漢字変換する。 In step S2, the control unit 200 performs kanji conversion. Specifically, the control unit 200 converts the kana character string received in step S1 into a kanji character string including kanji.

ここで、図５及び図６を参照して、図３のステップＳ１及びステップＳ２における変換処理の具体例を説明する。 Here, a specific example of the conversion process in step S1 and step S2 of FIG. 3 will be described with reference to FIGS.

図５は、入力時における入力文字列の例であり、囲み部分（図５における「やっと」、「れんしゅう」等）を入力する度に漢字変換が行われる。漢字変換は、文字を入力するユーザによって、入力装置の漢字変換に対応するキー（図示せず）が押下されるタイミングで行われる。 FIG. 5 is an example of an input character string at the time of input, and kanji conversion is performed each time a box portion (“Yatsuto”, “Renshu”, etc. in FIG. 5) is input. The kanji conversion is performed at a timing when a user inputting a character presses a key (not shown) corresponding to the kanji conversion of the input device.

図６は、変換後の文章の例であり、図５の囲み部分に対応して漢字変換がなされている。具体的に説明すると、「やっと」を入力し、入力装置の漢字変換に対応するキー（以下、「漢字変換キー」とする。）が押下されると、「やっと」と変換され、「れんしゅう」を入力し、漢字変換キーが押下されると、「練習」と変換され、以下、同様に繰り返される。 FIG. 6 is an example of the sentence after conversion, and kanji conversion is performed corresponding to the encircled portion of FIG. More specifically, when “Yatsuto” is input and a key corresponding to Kanji conversion of the input device (hereinafter referred to as “Kanji conversion key”) is pressed, “Yatsuto” is converted to “Renshu”. "Is input and the Kanji conversion key is pressed, it is converted to" practice ", and the same is repeated thereafter.

図３に戻って、ステップＳ３では、制御部２００は、変換単位文字列テーブル（図７）の作成を行う。この変換単位文字列テーブルは、メインメモリ２４０の所定領域に作成される。 Returning to FIG. 3, in step S3, the control unit 200 creates a conversion unit character string table (FIG. 7). This conversion unit character string table is created in a predetermined area of the main memory 240.

ここで、図７を参照して、変換単位文字列テーブルについて説明する。この変換単位文字列テーブルには、漢字変換の単位毎に変換前の文字列と変換後の文字列とが対応付けられており、漢字変換キーが押下されたタイミングで、最下段に変換前の文字列と変換後の文字列とが追加して格納される。 Here, the conversion unit character string table will be described with reference to FIG. In this conversion unit character string table, the character string before conversion and the character string after conversion are associated with each Kanji conversion unit, and at the timing when the Kanji conversion key is pressed, the character string before conversion is displayed at the bottom. The character string and the converted character string are added and stored.

例えば、図５の囲み部分「せいか」が入力され、漢字変換キーが押下されると、「せいか」が「変換前」欄に、「成果」が「変換後」欄に、追加して格納される。 For example, when the box “Seika” in FIG. 5 is input and the Kanji conversion key is pressed, “Seika” is added to the “Before conversion” field, and “Result” is added to the “After conversion” field. Stored.

図３に戻って、ステップＳ４では、制御部２００は、文章作成終了か否かを判定する。この処理がＹＥＳと判定される場合、制御部２００は、ステップＳ５に処理を移し、ＮＯと判定される場合、処理をステップＳ１に戻す。具体的に文章作成終了を判定する方法は、例えば、文章が作成されたファイルが閉じられたことを制御部２００が検知すること等である。なお、文章が作成されたファイルが閉じられると、図６に示す変換後の文章が記憶されたファイルが、ハードディスク２７０に記憶される。さらに、変換前の仮名文字列のみで構成されるファイル（図５に示す例において囲み部分を省いて漢字変換をしないものに相当）が、メインメモリ２４０に記憶される。 Returning to FIG. 3, in step S 4, the control unit 200 determines whether or not the sentence creation is finished. When this process is determined to be YES, control unit 200 moves the process to step S5, and when it is determined to be NO, the process returns to step S1. A specific method for determining the end of sentence creation is, for example, that the control unit 200 detects that the file in which the sentence has been created is closed. When the file in which the sentence is created is closed, the file in which the converted sentence shown in FIG. 6 is stored in the hard disk 270. Further, a file composed only of the kana character string before conversion (corresponding to the example shown in FIG. 5 that omits the enclosing part and does not perform kanji conversion) is stored in the main memory 240.

ここで、変換前の仮名文字列のみで構成されるファイルの作成方法は、上述した変換単位文字列テーブル（図７）の「変換前」欄の最上段に格納された仮名文字列から、下段に向かって最下段に至るまで順次に仮名文字列を抽出しファイルに展開する方法などが挙げられる。 Here, a method of creating a file composed only of kana character strings before conversion is based on the kana character strings stored in the uppermost column of the “before conversion” column of the above-described conversion unit character string table (FIG. 7). For example, a kana character string is sequentially extracted from the bottom to the bottom and expanded into a file.

図３に戻って、ステップＳ５では、制御部２００は、同一仮名文字列抽出を行う。具体的には、制御部２００は、上述した変換前の仮名文字列のみで構成されるファイルから、同一仮名文字列を検索して抽出する。 Returning to FIG. 3, in step S5, the control unit 200 extracts the same kana character string. Specifically, the control unit 200 searches for and extracts the same kana character string from the file composed only of the kana character string before conversion described above.

ここで、抽出される文字列は、「れんしゅうせいか」など複数存在する。 Here, there are a plurality of character strings to be extracted, such as “Ryushu Seika”.

ステップＳ６では、制御部２００は、変換位置テーブル（図８）を作成する。 In step S6, the control unit 200 creates a conversion position table (FIG. 8).

ここで、図８を参照して、変換位置テーブルについて説明する。この変換位置テーブルは、ハードディスク２７０の所定の領域に割り当てられている。 Here, the conversion position table will be described with reference to FIG. This conversion position table is assigned to a predetermined area of the hard disk 270.

図８では、説明の便宜のために「れんしゅうせいか」に絞っているが、実際には、図３のステップＳ５で抽出された文字列の全てについて、変換位置テーブルに書き込まれる。 In FIG. 8, for convenience of explanation, it is limited to “relenty”, but in reality, all the character strings extracted in step S5 of FIG. 3 are written in the conversion position table.

具体的には、制御部２００は、変換単位文字列テーブル（図７）の「変換前」欄を検索し、２連続する仮名文字列データを結合し、ステップＳ５で抽出された文字列の全てと比較する。比較した結果、一致した文字列が存在した場合、制御部２００は、この文字列の変換位置（何文字目の後で漢字変換キーが押下されたか）を判定し、この文字列について、変換位置テーブル（図８）の「入力ワード」欄、「変換後」欄、「変換位置」欄にデータが記憶される。 Specifically, the control unit 200 searches the “before conversion” field in the conversion unit character string table (FIG. 7), combines two consecutive kana character string data, and all the character strings extracted in step S5. Compare with As a result of the comparison, if there is a matched character string, the control unit 200 determines the conversion position of this character string (after what character the Kanji conversion key was pressed), and for this character string, the conversion position Data is stored in the “input word” column, “after conversion” column, and “conversion position” column of the table (FIG. 8).

変換単位文字列テーブル（図７）の「変換前」欄を参照すると、「れんしゅう」及び「せいか」が２連続しているので、「れんしゅうせいか」が、変換位置テーブル（図８）の入力ワード欄にデータ形式で記憶される。さらに、「れんしゅう」及び「せいか」にそれぞれ対応する「練習」及び「成果」を結合した「練習成果」が変換後欄にデータ形式で記憶される。変換位置については、「れんしゅう」は「れんしゅうせいか」の左から５文字分であるので、「５文字目の後」が変換位置欄にデータ形式で記憶される。 Referring to the “before conversion” column of the conversion unit character string table (FIG. 7), since “Ryushu” and “Seika” are two consecutive, “Ryushu Seika” is converted into the conversion position table (FIG. 8). ) Is stored in the data format in the input word field. Further, “practice results” obtained by combining “practice” and “results” corresponding to “renshu” and “seika” are stored in a data format in the converted column. Regarding the conversion position, “Renshu” is five characters from the left of “Renshuiseika”, so “after the fifth character” is stored in the conversion position column in the data format.

ここで、データ形式で記憶されるとは、文字コード等のデータが記憶されることであり、具体例を挙げれば、文字列「れんしゅうせいか」及び「練習成果」を構成する単位文字のそれぞれの文字コードが記憶されることである。また、「５文字目の後」については、「５」等の数値データが記憶されることである。 Here, storing in the data format means storing data such as character codes. To give a specific example, the unit characters constituting the character strings “Ryushu Seika” and “Practice Outcome” Each character code is stored. For “after the fifth character”, numerical data such as “5” is stored.

「れんしゅうせい」及び「か」についても同様に、変換位置テーブルの「入力ワード」欄、「変換後」欄、「変換位置」欄に、それぞれ、「れんしゅうせいか」、「練習生可」、「７文字目の後」が、データ形式で記憶される。 In the same way for “Ryushusei” and “Ka”, the “input word” field, “after conversion” field, and “conversion position” field in the conversion position table are respectively “Renesas Seika” and “Practice Student Allowed”. ”And“ after the seventh character ”are stored in the data format.

図３のステップＳ６の処理が終了すると、制御部２００は、文字入力時処理を終了する。 When the process of step S6 in FIG. 3 is completed, the control unit 200 ends the character input process.

［誤変換指摘処理のフローチャート］
図４は、誤変換指摘処理のフローチャートである。 [Flow chart of erroneous conversion indication processing]
FIG. 4 is a flowchart of the erroneous conversion indication process.

ステップＳ１１では、制御部２００は、文章入力受付を行う。具体的には、制御部２００は、図６に示す、変換後の文章が記憶されたファイルをハードディスク２７０から読み込み、読み込んだファイルの文章データをメインメモリ２４０に展開する。 In step S11, the control unit 200 receives a text input. Specifically, the control unit 200 reads from the hard disk 270 a file storing the converted text shown in FIG. 6 and develops the text data of the read file in the main memory 240.

図４に戻って、ステップＳ１２では、制御部２００は、検査対象語抽出を行う。具体的には、制御部２００は、図６に示す漢字変換後の文章全体を仮名文字に変換し、変換した仮名文字の最長の共通部分に対応する漢字変換後の文字列を検査対象語として抽出する。 Returning to FIG. 4, in step S 12, the control unit 200 performs the inspection target word extraction. Specifically, the control unit 200 converts the entire sentence after the kanji conversion shown in FIG. 6 into a kana character, and uses the character string after the kanji conversion corresponding to the longest common part of the converted kana character as an inspection target word. Extract.

詳細に説明すると、図６に示す「やっと練習成果が出た・・・これも練習生可だと思う・・・」を仮名文字に変換すると、「やっとれんしゅうせいかがでた・・・これもれんしゅうせいかだとおもう・・・」となる。さらに、「やっとれんしゅうせいかがでた・・・これもれんしゅうせいかだとおもう・・・」の中から、最長の共通部分の文字列を抽出すると、「れんしゅうせいか」が得られる。さらにまた、「れんしゅうせいか」に対応する漢字変換後の文字列は、「練習成果」及び「練習生可」であるから、検査対象語は、「練習成果」及び「練習生可」となる。制御部２００は、検査対象語を抽出したら、抽出した検査対象語と仮名文字列を対応付けて、検査対象語抽出結果テーブル（図９）を作成する。この検査対象語抽出結果テーブルは、メインメモリ２４０の所定領域に作成される。 Explaining in detail, when “practice results finally came out… I think this is also a trainee…” converted to kana characters, I'm sure it'll be ... " Furthermore, by extracting the longest common part character string from “It was a great deal, I guess it ’s a great deal”, you can get “Renshuiseika”. Furthermore, since the kanji converted character strings corresponding to “Renshuiseika” are “practice results” and “practice trainees”, the test target words are “practice results” and “practices trainees”. Become. When the test target word is extracted , the control unit 200 associates the extracted test target word with the kana character string and creates a test target word extraction result table (FIG. 9). This inspection target word extraction result table is created in a predetermined area of the main memory 240.

図９を参照して、検査対象語抽出結果テーブルについて説明する。この検査対象語抽出結果テーブルは、同一仮名文字列と検査対象語との対応関係を表すテーブルある。 The inspection target word extraction result table will be described with reference to FIG. This inspection target word extraction result table is a table representing the correspondence between the same kana character string and the inspection target word.

このテーブルによれば、「れんしゅうせいか」に対応する検査対象語は「練習成果」及び「練習生可」である。 According to this table, the test target words corresponding to “Renshu Seika” are “practice result” and “practicable”.

図４に戻って、ステップＳ１３では、制御部２００は、変換位置抽出を行う。具体的には、制御部２００は、検査対象語についての変換位置を、図８に示した変換位置テーブルを参照して抽出する。 Returning to FIG. 4, in step S13, the control unit 200 performs conversion position extraction. Specifically, the control unit 200 extracts the conversion position for the inspection target word with reference to the conversion position table shown in FIG.

ここで、変換位置の抽出方法について説明すると、制御部２００は、図９に示した検査対象語抽出結果テーブルの「検査対象語」欄に格納されたデータをキーとして、図８に示した変換位置テーブルの「変換後」欄を検索して、変換位置を抽出する。 Here, the conversion position extraction method will be described. The control unit 200 uses the data stored in the “test target word” column of the test target word extraction result table shown in FIG. 9 as a key, and the conversion shown in FIG. The “post-conversion” column of the position table is searched to extract the conversion position.

例えば、図９に示した検査対象語抽出結果テーブルの「検査対象語」欄に格納された「練習成果」をキーとした場合、変換位置として「５文字目の後」が抽出される。同様に、「練習生可」をキーとした場合、変換位置として「７文字目の後」が抽出される。 For example, when “practice result” stored in the “test target word” column of the test target word extraction result table shown in FIG. 9 is used as a key, “after the fifth character” is extracted as the conversion position. Similarly, when “Practice Student Allowed” is used as a key, “after the seventh character” is extracted as the conversion position.

抽出されたデータは、変換位置抽出結果テーブル（図１０）の「変換位置」欄に記憶される。 The extracted data is stored in the “conversion position” column of the conversion position extraction result table (FIG. 10).

図１０を参照して、変換位置抽出結果テーブルについて説明する。この変換位置抽出結果テーブルは、図９に示した検査対象語抽出結果テーブルに「変換位置」欄を追加したテーブルであり、メインメモリ２４０に記憶された検査対象語抽出結果テーブルに「変換位置」欄を結合して作成される。 The conversion position extraction result table will be described with reference to FIG. This conversion position extraction result table is a table in which the “conversion position” column is added to the inspection target word extraction result table shown in FIG. 9, and “conversion position” is added to the inspection target word extraction result table stored in the main memory 240. Created by merging fields.

図４に戻って、ステップＳ１４では、制御部２００は、誤変換フラグ起動を行う。具体的には、制御部２００は、変換位置抽出結果テーブル（図１０）における「検査対象語」欄に格納された複数の検査対象語にそれぞれ対応する変換位置を比較し、異なっている場合には、誤変換フラグ（変換位置の違いによる誤変換の可能性があることを示唆するフラグ）を起動（オンにする）し、同じ場合には、誤変換フラグを起動しない（オフのまま）。 Returning to FIG. 4, in step S 14, the control unit 200 activates an erroneous conversion flag. Specifically, the control unit 200 compares the conversion positions corresponding to the plurality of inspection target words stored in the “inspection target word” column in the conversion position extraction result table (FIG. 10), and when the conversion positions are different from each other. Activates (turns on) an erroneous conversion flag (a flag indicating that there is a possibility of erroneous conversion due to a difference in conversion position), and in the same case, does not activate the erroneous conversion flag (leaves it off).

実施例では、変換位置抽出結果テーブル（図１０）における変換位置は、「練習成果」と「練習生可」とで異なっているので、制御部２００は、誤変換フラグを「オン」にする。 In the embodiment, since the conversion positions in the conversion position extraction result table (FIG. 10) differ between “practice results” and “practice trainees allowed”, the control unit 200 sets the erroneous conversion flag to “ON”.

誤変換フラグが起動したか否かについては、図１１に示す誤変換フラグ起動結果テーブルに記憶される。 Whether or not the erroneous conversion flag is activated is stored in the erroneous conversion flag activation result table shown in FIG.

図１１を参照して、誤変換フラグ起動結果テーブルについて説明する。この誤変換フラグ起動結果テーブルは、図１０に示した変換位置抽出結果テーブルに「誤変換フラグ」欄を追加したテーブルであり、メインメモリ２４０に記憶された変換位置抽出結果テーブルに「誤変換フラグ」欄を結合して作成される。 The erroneous conversion flag activation result table will be described with reference to FIG. This erroneous conversion flag activation result table is a table in which an “error conversion flag” column is added to the conversion position extraction result table shown in FIG. 10, and “error conversion flag” is added to the conversion position extraction result table stored in the main memory 240. "" Is created by combining the fields.

ここで、誤変換フラグが起動した場合には、「誤変換フラグ」欄に「オン」が格納され、起動しない場合には、「オフ」が格納される。「オン」を数値データの「１」、「オフ」を「０」としてもよい。 Here, when the erroneous conversion flag is activated, “ON” is stored in the “erroneous conversion flag” field, and when not activated, “OFF” is stored. “On” may be “1” of numerical data, and “off” may be “0”.

図１１の誤変換フラグ起動結果テーブルを参照することにより、検査対象語について変換位置の違いによる誤変換の可能性があるか否かが分かる。図１１の例で説明すると、誤変換フラグがオンとなっているので、「練習成果」又は「練習生可」に誤変換の可能性があることが分かる。 By referring to the erroneous conversion flag activation result table in FIG. 11, it can be determined whether or not there is a possibility of erroneous conversion due to a difference in conversion position for the inspection target word. In the example of FIG. 11, since the erroneous conversion flag is on, it can be seen that there is a possibility of erroneous conversion in “practice result” or “trainee acceptable”.

図４に戻って、ステップＳ１５では、制御部２００は、誤変換判定を行う。この誤変換判定の処理（及びステップＳ１６の誤変換指摘の処理）は、ステップＳ１４の処理で、誤変換フラグが起動した場合にのみ実行される。 Returning to FIG. 4, in step S 15, the control unit 200 performs erroneous conversion determination. This erroneous conversion determination processing (and erroneous conversion indication processing in step S16) is executed only when the erroneous conversion flag is activated in step S14.

誤変換判定の処理では、制御部２００は、「練習成果」又は「練習生可」のどちらが誤変換であるかを判定する。 In the erroneous conversion determination process, the control unit 200 determines which of “practice result” or “trainee acceptable” is an erroneous conversion.

まず、「練習生可」に誤変換があるか否かを判定する方法について説明する。 First, a method for determining whether or not there is an erroneous conversion in “Practice trainee” will be described.

詳細には、制御部２００は、「練習成果」及び「練習生可」を同音異義語リストとして、ｎｇｒａｍ＋ＰＬＳＡの手法（三品拓也、貞光九月、山本幹雄「確率的ＬＳＡを用いた日本語同音異義語誤りの検出・訂正」、情報処理学会論文誌、平成１６年９月、Ｖｏｌ．４５、Ｎｏ．９、ｐ．１−９）を用いて、ステップＳ１１で読み込んだ文章中における「練習成果」のＰＬＳＡによってモデル化される大域的出現確率及びｎｇｒａｍでモデル化される局所的出現確率に基づいて定義される尤度（以下、「尤度」とする）と、「練習生可」の尤度とを計算する。次に、誤り判定の計算として、制御部２００は、計算した尤度の比の対数を算出し、算出した対数の値が一定の閾値を超えた場合に、「練習生可」に誤変換があると判定する。 Specifically, the control unit 200 uses “practice results” and “trainees allowed” as a homonym list, and uses the ngram + PLSA method (Takuya Sanna, Sadamitsu Seketsu, Mikio Yamamoto “Japanese homonyms using probabilistic LSA”). "Detection / correction of word errors", Journal of Information Processing Society of Japan, September 2004, Vol. 45, No. 9, p. 1-9), "Practice results" in the sentence read in step S11 Likelihood defined based on global appearance probability modeled by PLSA and local appearance probability modeled by ngram (hereinafter referred to as “likelihood”), and likelihood of “trainee acceptable” And calculate. Next, as an error determination calculation, the control unit 200 calculates the logarithm of the calculated likelihood ratio, and when the calculated logarithm value exceeds a certain threshold, an erroneous conversion is made to “trainee acceptable”. Judge that there is.

計算方法としては、ｄ＝ｌｏｇ｛（「練習生可」の尤度）／（「練習成果」の尤度）｝を計算し、ｄ＜０となった場合に、「練習生可」に誤変換があると判定できる。しかし、判定条件がｄ＜０では、「練習成果」の尤度が「練習生可」の尤度とほとんど変わらず、若干高い程度（例えば、「練習成果」の尤度が５０で、「練習生可」の尤度が４９）でも誤変換であると判定されてしまうので、判定条件ｄ＜０の閾値「０」の値は、負の数であることを条件に、適宜調節するようにしてもよい（例えば、ｄ＜−０．５等）。 As a calculation method, d = log { (likelihood of “practice trainee”) / (likelihood of “practice result”) } is calculated. It can be determined that there is a conversion. However, when the judgment condition is d <0, the likelihood of “practice results” is almost the same as the likelihood of “trainees allowed” and is slightly higher (for example, the likelihood of “practice results” is 50, Even if the likelihood of “possible” is 49), it is determined to be erroneous conversion. Therefore, the value of the threshold value “0” in the determination condition d <0 is appropriately adjusted on condition that it is a negative number. (E.g., d <-0.5).

一方、「練習成果」に誤変換があるか否かを判定する方法は、上記と同様の方法で、ｄ＝ｌｏｇ｛（「練習成果」の尤度）／（「練習生可」の尤度）｝を計算することで行う。 On the other hand, a method for determining whether or not there is an erroneous conversion in the “practice result” is the same method as described above, and d = log { (likelihood of “practice result”) / (likelihood of “practice student”). ) } Is calculated.

ステップＳ１６では、制御部２００は、誤変換指摘を行う。この誤変換指摘の処理は、ステップＳ１５の処理で、誤変換があると判定された場合にのみ実行される。 In step S16, the control unit 200 performs misconversion indication. This erroneous conversion indication process is executed only when it is determined in step S15 that there is an erroneous conversion.

具体的には、制御部２００は、一方の検査対象語に誤変換があると判定された場合に、他方の検査対象語が正しいことを指摘する。具体例としては、図１２に示すように、「練習生可」に誤変換があると判定された場合、「練習生可」に対して「練習成果」を指摘する。さらに、この図１３に示した内容を、誤変換指摘装置１が備える表示装置（図示せず）に表示することで、ユーザは、誤変換があることを認識できる。 Specifically, when it is determined that there is an erroneous conversion in one of the inspection target words, the control unit 200 points out that the other inspection target word is correct. As a specific example, as shown in FIG. 12, when it is determined that there is an erroneous conversion in “practice trainee”, “practice result” is pointed out with respect to “practice trainee”. Furthermore, by displaying the content shown in FIG. 13 on a display device (not shown) included in the erroneous conversion indication device 1, the user can recognize that there is an erroneous conversion.

誤変換があることを認識したユーザは、「練習生可」を「練習成果」に変更するか否かを判断し、変更する場合には、誤変換指摘装置１が備える変更確定ボタン（図示せず）を押下することで、変更を確定することができる。 The user who recognizes that there is an erroneous conversion determines whether or not to change “Practice Student Allowed” to “Practice Result”, and in the case of changing, a change confirmation button (not shown) provided in the erroneous conversion indication device 1 is shown. )) To confirm the change.

図４のステップＳ１６の処理が終了すると、制御部２００は、誤変換指摘処理を終了する。 When the process of step S16 in FIG. 4 ends, the control unit 200 ends the erroneous conversion indication process.

以上の処理を行うことにより、文章中の同音異義語（検査対象語）について変換位置が異なる場合であっても、変換誤りを指摘することができる。 By performing the above processing, a conversion error can be pointed out even if the conversion position is different for the homonym (inspection word) in the sentence.

さらに、誤変換フラグが起動した場合にのみ、検査対象語に誤変換があるか否かを判定するので、誤変換の判定対象を絞り込むことができ、誤変換判定処理の処理効率をアップさせることができる。 Furthermore, since it is determined whether or not there is an erroneous conversion in the inspection target word only when the erroneous conversion flag is activated, it is possible to narrow down the erroneous conversion determination target and increase the processing efficiency of the erroneous conversion determination process. Can do.

以上、実施例では、変換位置テーブル（図８）が誤変換指摘装置１に備えられているが、これに限られるものではない。例えば、図１３に示すように、サーバ４００が変換位置テーブルを備えるようにして、誤変換指摘装置１と通信しつつ、サーバ４００内の変換位置テーブルを作成し（図３のステップＳ６における処理を行う）、さらに、参照する（図４のステップＳ１２及びステップＳ１３における処理を行う）ようにしてもよい。 As described above, in the embodiment, the conversion position table (FIG. 8) is provided in the erroneous conversion indication device 1, but is not limited thereto. For example, as shown in FIG. 13, the server 400 is provided with a conversion position table, and the conversion position table in the server 400 is created while communicating with the erroneous conversion indication device 1 (the process in step S6 in FIG. 3 is performed). Further, it may be referred to (the processing in step S12 and step S13 in FIG. 4 is performed).

また、実施例における図４のステップＳ１２における検査対象語抽出処理では、制御部２００は、図６に示す漢字変換後の文章全体を仮名文字に変換し、変換した仮名文字の最長の共通部分に対応する漢字変換後の文字列を検査対象語として抽出したが、これに限られるものではない。後述する図１４に示す、ステップＳ３１からステップＳ３６の処理を、図４のステップＳ１２の処理に替えて行うようにしてもよい。 Further, in the inspection target word extraction process in step S12 of FIG. 4 in the embodiment, the control unit 200 converts the entire sentence after the kanji conversion shown in FIG. 6 into a kana character, and sets the longest common part of the converted kana character. The corresponding character string after Kanji conversion is extracted as the inspection target word, but is not limited to this. The processing from step S31 to step S36 shown in FIG. 14 described later may be performed in place of the processing of step S12 in FIG.

以下、図１４を参照して、検査対象語抽出処理について説明する。 Hereinafter, the inspection target word extraction process will be described with reference to FIG.

ステップＳ３１では、制御部２００は、形態素解析を行う。具体的には、制御部２００は、図４のステップＳ１１で読み込んだファイルの文章データを形態素解析する。ステップＳ３２では、制御部２００は、仮名変換を行い、ステップＳ３３では、制御部２００は、隣接２形態素の連結を行う。 In step S31, the control unit 200 performs morphological analysis. Specifically, the control unit 200 performs morphological analysis on the text data of the file read in step S11 of FIG. In step S32, the control unit 200 performs kana conversion, and in step S33, the control unit 200 connects adjacent two morphemes.

ここで、形態素解析とは、コンピュータ等の計算機を用いた自然言語処理の基礎技術のひとつであり、対象言語の文法の知識（文法のルールの集まり）や辞書（品詞等の情報付きの単語リスト）を情報源として用い、自然言語で書かれた文を形態素（おおまかにいえば、言語で意味を持つ最小単位）の列に分割することである。 Here, morphological analysis is one of the basic techniques of natural language processing using a computer such as a computer, and it includes knowledge of the target language grammar (gathering of grammar rules) and dictionary (word list with information such as parts of speech). ) As an information source, and a sentence written in a natural language is divided into columns of morphemes (roughly speaking, the smallest unit having meaning in a language).

図１５を参照して、形態素解析の具体例について、仮名変換と隣接２形態素の連結をも含めて説明する。図１５は、形態素解析の過程と、隣接２形態素の連結とを示す図である。例えば、「練習成果」について形態素解析を行うと、「練習」と「成果」と、に分割される（図１５の丸数字１）。 A specific example of morphological analysis will be described with reference to FIG. 15 including kana conversion and connection of adjacent two morphemes. FIG. 15 is a diagram illustrating a process of morphological analysis and a connection between adjacent two morphemes. For example, when morphological analysis is performed on “practice results”, it is divided into “practices” and “results” (circle numeral 1 in FIG. 15).

次に、分割された形態素である「練習」及び「成果」を仮名変換し、それぞれ「れんしゅう」及び「せいか」に変換する（図１５の丸数字２）。 Next, “practice” and “result” which are the divided morphemes are converted to kana and converted to “rensyu” and “seika”, respectively (circle numeral 2 in FIG. 15).

次に、隣接２形態素（分割された形態素のうち互いに隣接する二つの形態素）の連結であるが、「れんしゅう」及び「せいか」は、互いに隣接するので、これらを連結して「れんしゅうせいか」とする（図１５の丸数字３）。 Next, two adjacent morphemes (two morphemes adjacent to each other among the divided morphemes) are connected. Since “Ryushu” and “Seika” are adjacent to each other, they are connected to each other, "Seikai" (circled number 3 in FIG. 15).

「練習生可だ」については、形態素解析を行い（図１５の丸囲み数字１）、「練習生」と「可」と「だ」に分割し、仮名変換を行い（図１５の丸囲み数字２）、それぞれ「れんしゅうせい」と「か」と「だ」に変換し、隣接２形態素の連結を行い、「れんしゅうせいか」と「かだ」とする。 For “Practice Student Allowed”, morphological analysis is performed (circled number 1 in FIG. 15), divided into “Practice Student”, “OK”, and “DA”, and kana conversion is performed (circled number in FIG. 15). 2) Convert them into “Renshuisei”, “ka” and “da”, respectively, and connect two adjacent morphemes to make “Renshuiseika” and “Kada”.

この形態素解析、仮名変換、隣接２形態素の連結の過程は、制御部２００によりデータとしてメインメモリ２４０に一時的に記憶される。 The processes of morphological analysis, kana conversion, and connection of adjacent two morphemes are temporarily stored in the main memory 240 as data by the control unit 200.

図１４に戻って、ステップＳ３４では、制御部２００は、同一仮名検索テーブル（図１６参照）の作成を行う。具体的には、図１５で説明した、連結された隣接２形態素（以下、「連結形態素」という。）を含む形態素のそれぞれについて、ステップＳ１１で読み込んだファイルの文章中の文字列との対応関係と、形態素に分割する前の文字列との対応関係と、をハードディスク２７０の所定の領域に割り当てられた同一仮名検索テーブルに記憶する。 Returning to FIG. 14, in step S 34, the control unit 200 creates the same kana search table (see FIG. 16). Specifically, the correspondence relationship between each of the morphemes including adjacent two morphemes connected in FIG. 15 (hereinafter referred to as “connected morphemes”) with the character string in the text of the file read in step S11. And the correspondence relationship with the character string before being divided into morphemes are stored in the same kana search table assigned to a predetermined area of the hard disk 270.

図１６を参照して、同一仮名検索テーブルについて説明する。この同一仮名検索テーブルは、上述したように、仮名のみの連結形態素を含む形態素（連結形態素を含む形態素（仮名のみ））と、文章中の文字列（連結形態素を含む形態素（文章中））と、形態素に分割する前の文字列（形態素に分ける前）と、の対応関係を記憶するテーブルである。 The same kana search table will be described with reference to FIG. As described above, the same kana search table includes a morpheme including a connected morpheme only of kana (morpheme including a connected morpheme (only kana)), a character string in a sentence (a morpheme including a connected morpheme (in a sentence)), and This is a table for storing a correspondence relationship between character strings before being divided into morphemes (before being divided into morphemes).

この同一仮名検索テーブルは、メインメモリ２４０にデータとして記憶された、形態素解析、仮名変換、隣接２形態素の連結の過程のデータに基づいて作成される。 This same kana search table is created based on data of morpheme analysis, kana conversion, and connection of adjacent two morphemes stored as data in the main memory 240.

例えば、図１５における連結形態素としての「れんしゅうせいか」について、図１６の同一仮名検索テーブルでの対応関係を説明すると、文章中の文字列「練習成果」及び「練習生可」に対応しており、形態素に分ける前では「練習成果」及び「練習生可だ」に対応している。 For example, the correspondence relationship in the same kana search table of FIG. 16 will be described for “rensyu seika” as the connected morpheme in FIG. 15, corresponding to the character strings “practice results” and “trainees allowed” in the sentence. Before dividing into morphemes, it corresponds to “practice results” and “practicable”.

図１４に戻って、ステップＳ３５では、制御部２００は、同一仮名検索を行う。具体的には、制御部２００は、同一仮名検索テーブル（図１６）を参照して、「連結形態素を含む形態素（仮名のみ）」欄を参照して、同一仮名の検索（２以上の同文字列があれば同一仮名が存在することになる）を行う。 Returning to FIG. 14, in step S 35, the control unit 200 performs the same kana search. Specifically, the control unit 200 refers to the same kana search table (FIG. 16), refers to the “morpheme including linked morphemes (only kana)” column, and searches for the same kana (two or more same characters). If there is a column, the same pseudonym exists).

例えば、図１６の同一仮名検索テーブルでは、検索結果として「れんしゅうせいか」が同一仮名として抽出される。 For example, in the same kana search table of FIG. 16, “renyu seika” is extracted as the same kana as the search result.

図１４に戻って、ステップＳ３６では、制御部２００は、検査対象語抽出を行う。具体的には、制御部２００は、図１６の同一仮名検索テーブルから、同一仮名として検索された連結形態素を含む形態素（仮名のみ）に対応する連結形態素を含む形態素（文章中）を抽出する。 Returning to FIG. 14, in step S 36, the control unit 200 performs the inspection target word extraction. Specifically, the control unit 200 extracts a morpheme (in a sentence) including a connected morpheme corresponding to a morpheme including a connected morpheme searched as the same kana (only kana) from the same kana search table of FIG.

例えば、検索された同一仮名は「れんしゅうせいか」であるので、「れんしゅうせいか」に対応する行のデータ（「練習成果」及び「練習生可」）を抽出して、上述した検査対象語抽出結果テーブル（図９）を作成する。 For example, since the retrieved same kana is “Renshuiseika”, the row data corresponding to “Renshuiseika” (“Practice result” and “Practice trainee”) is extracted and the above-mentioned inspection is performed. A target word extraction result table (FIG. 9) is created.

ステップＳ３６の処理が終了すると、制御部２００は、図４のステップＳ１３の処理を行う。 When the process of step S36 ends, the control unit 200 performs the process of step S13 in FIG.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施例に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

１誤変換指摘装置
１１０同一仮名文字列抽出手段
１２０変換位置記憶手段
１３０文章受付手段
１４０検査対象語抽出手段
１５０誤変換指摘手段 DESCRIPTION OF SYMBOLS 1 Error conversion indication apparatus 110 Same kana character string extraction means 120 Conversion position storage means 130 Sentence reception means 140 Inspection object word extraction means 150 Error conversion indication means

Claims

An erroneous conversion indication device for indicating an erroneous conversion included in a sentence composed of converted character strings,
A conversion break position storage means for storing a conversion delimiter position after kanji conversion kana character string,
When the entire sentence after the kanji conversion is converted into a kana character string, a test target word extracting unit that extracts the character string after the kanji conversion corresponding to the longest common part of the kana character string as a test target word, and
A conversion delimiter position extracting unit that extracts a conversion delimiter position of each of the test target words extracted by the test target word extracting unit by referring to the conversion delimiter position storage unit ;
When the conversion delimiter positions extracted by the conversion delimiting position extracting means is different between the inspection target word, to determine whether there is a conversion false before Symbol inspection target word extracted by said object word extraction means An erroneous conversion determination means ,
When the erroneous conversion determining means determines that there is an erroneous conversion, it points out that there is an erroneous conversion in the test target word extracted by the test target word extracting means. Conversion indication device.

The inspection target word extraction means includes
Morpheme dividing means for dividing the sentence after the kanji conversion into morphemes;
A kana conversion means for converting the morpheme divided by the morpheme dividing means into a kana character string;
Kana morpheme linking means for linking morphemes converted into kana character strings by the kana conversion means;
The same connected morpheme extracting means for extracting the same connected morpheme from the connected morphemes that are connected by the kana morpheme connecting means,
2. The erroneous conversion indication device according to claim 1, wherein a character string corresponding to the extracted connected morpheme is extracted as an inspection target word from the sentence after the Kanji conversion.

The conversion break position storage means stores in association with the conversion delimiting position and kana character string and the character string after kanji conversion,
False front Symbol converter break position extracting means and said converting delimiter position extracted by the when different between the inspection target word, activates erroneously conversion flag to indicate that there is a possibility of erroneous converted before Symbol inspected word Conversion flag activation means ,
The erroneous conversion determination unit, when the erroneous conversion flag is activated, according to claim 1 or 2, characterized in the Turkey to determine whether there is a conversion erroneous said object word the extracted Incorrect conversion indication device.

An erroneous conversion indication method in which a computer indicates an erroneous conversion included in a sentence composed of converted character strings,
A storage step for storing the conversion delimiter position after the kanji conversion of the kana character string;
A test target word extraction step of extracting the kanji converted character string corresponding to the longest common part of the kana character string as a test target word when the entire sentence after the kanji conversion is converted into a kana character string;
On the basis of the stored the converted delimiter position, and conversion break position extracting a conversion delimiter position of each of the inspection target word which is the extraction,
An erroneous conversion determination step for determining whether or not there is an erroneous conversion in the extracted inspection target word when the extracted conversion break position differs between the inspection target words;
When it is determined that there is the erroneous conversion, an indication step for pointing out that there is an erroneous conversion in the extracted word to be examined;
Erroneous transformation point out methods wherein Rukoto to have a.

In the inspection target word extraction step,
Divide the kanji-converted sentence into morphemes,
Converting the divided morpheme into a kana character string;
Concatenate the morphemes converted to kana strings,
Extracting the same connected morpheme from the connected morphemes that are the connected morphemes,
5. The erroneous conversion indication method according to claim 4, wherein a character string corresponding to the extracted connected morpheme is extracted as an inspection target word from the sentence after the Kanji conversion.

In the storing step, the same kana character string, the character string after the kanji conversion and the conversion delimiter position are stored in association with each other,
Having an activation step of the extracted said conversion delimiter position said to vary between the inspection subject word starts erroneous conversion flag to indicate that there is a possibility of erroneous converted before Symbol inspection target word,
Wherein the erroneous conversion determination step, when the erroneous conversion flag is activated, according to whether there is a conversion erroneous the extracted the inspection target word to claim 4 or 5, wherein the stamp Teisu Rukoto Of misconversion indication.