JP2847985B2

JP2847985B2 - Kana-Kanji conversion system for phrase segmentation learning information retrieval

Info

Publication number: JP2847985B2
Application number: JP3042020A
Authority: JP
Inventors: 浩一郎高橋
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1991-03-07
Filing date: 1991-03-07
Publication date: 1999-01-20
Anticipated expiration: 2014-01-20
Also published as: JPH04279966A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、読み文字列をかな漢
字混じり文字列に変換するかな漢字変換装置において、
使用者が行ったかな漢字変換の特徴を学習し、以降のか
な漢字変換の変換の変換効率を向上させたかな漢字変換
装置における文節区切り学習情報検索方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a kana-kanji conversion apparatus for converting a reading character string into a kana-kanji mixed character string.
The present invention relates to a kana-kanji conversion information retrieval method in a kana-kanji conversion device that learns the features of kana-kanji conversion performed by a user and improves the conversion efficiency of the conversion of kana-kanji conversion thereafter.

【０００２】[0002]

【従来の技術】かな漢字変換装置には、変換結果の文節
区切り位置を使用者の希望する位置に移動するための文
節区切り移動手段が設けられている。使用者は変換結果
の文節区切り位置が希望したものと違っていた場合、文
節区切り移動手段により文節区切りの位置を変更して再
変換することにより、希望の文節区切りによる変換結果
を得ることができる。同様に、かな漢字変換装置内部に
は、先に使用者が変更した文節区切り位置を位置情報と
して記憶しておき、この位置情報を以降のかな漢字変換
に反映するようにした文節区切り学習手段が設けられて
おり、変換効率の向上が図られている。2. Description of the Related Art A kana-kanji conversion device is provided with a phrase separation moving means for moving a phrase separation position of a conversion result to a position desired by a user. The user can obtain the conversion result by the desired segment break by changing the position of the segment break by the segment break moving means and re-converting if the segment break position of the conversion result is different from the desired one. . Similarly, the kana-kanji conversion device is provided with a phrase-separation learning means for storing the phrase-separation position previously changed by the user as position information and reflecting this position information in the subsequent kana-kanji conversion. The conversion efficiency is being improved.

【０００３】このような文節区切り学習手段の一例とし
て、特開平１−２１４９６７号公報には、文節区切り修
正の対象となった２文節分の読み文字列と文節区切り位
置を記憶し、以降のかな漢字変換時に読み文字列と文節
区切り学習情報を検索して変換結果に反映するようにし
た文字処理装置が開示されている。[0003] As an example of such a phrase separation learning means, Japanese Patent Laid-Open Publication No. 1-214967 discloses a system in which a reading character string and a phrase separation position for two phrases that are subject to phrase division correction are stored, and the subsequent Kana-Kanji characters. There is disclosed a character processing apparatus which searches for a read character string and phrase-separation learning information at the time of conversion and reflects the information in a conversion result.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
文字処理装置に用いられている文節区切り学習情報検索
方式には、次のような問題点があった。However, the phrase segmentation learning information retrieval system used in the conventional character processing apparatus has the following problems.

【０００５】図４は、従来のかな漢字変換時における文
節区切り学習情報検索方式の処理手順を示すフローチャ
ートである。まず、図４のフローチャートで使用されて
いる語について説明する。「ＹＯＭＩ」はこれからかな
漢字変換される読み文字列、「ＹＬＥＮ」はＹＯＭＩの
文字数、「ＨＰＯＳＩ」は文節区切り学習情報ファイル
を検索するための読み文字列の先頭文字のＹＯＭＩの中
での位置、「ＴＰＯＳＩ」は文節区切り学習情報ファイ
ルを検索するための読み文字列の末尾文字のＹＯＭＩ中
での位置、「ＤＰＯＳＩ」は文節区切り学習情報によっ
て指定された文節区切り位置を表すパラメータである。
また、図５は、文節区切り学習情報検索方式で使用され
る文節区切り学習ファイルの内容を示す説明図である。
以下、図５の状態における文節区切り学習情報を用いて
文節区切り学習情報検索を行なうときの処理手順につい
て述べる。FIG. 4 is a flowchart showing a processing procedure of a conventional phrase segmentation learning information retrieval system at the time of kana-kanji conversion. First, words used in the flowchart of FIG. 4 will be described. "YOMI" is a reading character string to be converted into Kana-Kanji characters, "YLEN" is the number of characters of YOMI, "HPOSI" is the position of the first character of the reading character string in YOMI for searching the phrase-separation learning information file, "TPOSI" is a parameter indicating the position in the YOMI of the last character of the reading character string for searching the phrase separation learning information file, and "DPOSI" is a parameter representing the phrase separation position specified by the phrase separation learning information.
FIG. 5 is an explanatory diagram showing the contents of a phrase separation learning file used in the phrase separation learning information search method.
In the following, a description will be given of a processing procedure for performing a phrase segmentation learning information search using the phrase segmentation learning information in the state of FIG.

【０００６】今、「おくにはにわがあります」という読
み文字列が入力されたとする。ここで、ＹＯＭＩ＝「お
くにはにわがあります」、ＹＬＥＮ＝１１となる。これ
からＹＯＭＩを使って文節区切り学習ファイルを検索す
るための文字列を生成する。まず、ＨＰＯＳＩ＝１とす
ると（ステップ２０１）、ＴＰＯＳＩ＝ＨＰＯＳＩとな
り（ステップ２０２）、検索の読み文字列は「お」とな
る（ステップ２０３）。次に、文節区切り学習ファイル
を検索して一致するエントリが存在するかどうかを判断
する（ステップ２０４）。ここでは一致するエントリが
存在しないので、ＴＰＯＳＩ＝２として（ステップ２０
６）、ＴＰＯＳＩがＹＬＥＮになったかどうかを判断す
る（ステップ２０７）。ここではＴＰＯＳＩがＹＬＥＮ
になっていないのでステップ２０３に戻り、文節区切り
学習情報ファイルを「おく」という読み文字列で検索す
る。ステップ２０４で一致するエントリが存在しないの
でステップ２０６に移る。このようにして、ステップ２
０３からステップ２０６の処理を続けることにより、文
節区切り学習ファイル検索のための読み文字列は、「お
くに」「おくには」「おくにはにわ」「おくにはにわ
が」というように変化していく。読み文字列が「おくに
はにわが」の時に、ステップ２０４で一致するエントリ
が存在しているので、ステップ２０５に移る。ステップ
２０５では、一致した文節区切り学習情報から後のかな
漢字変換に利用するための文節区切り位置情報を取り出
し、ＤＰＯＳＩ＝３とする。この例では文節区切り位置
が３なので、「おくにはにわが」の読み文字列の場合、
文節区切りは「おくに／はにわが」となる。同様に、さ
らに一致する文節区切り学習情報があるかどうかを検索
しに行き、ＨＰＯＳＩ及びＴＰＯＳＩがＹＬＥＮになっ
たところで終了する（ステップ２０７〜ステップ２０
９）。[0006] Now, it is assumed that a reading character string of "I have a waive" is input. Here, YOMI = “I have a buckwheat” and YLEN = 11. From this, a character string for searching for a phrase-separation learning file is generated using YOMI. First, assuming that HPOSI = 1 (step 201), TPOSI = HPOSI (step 202), and the read character string of the search is “O” (step 203). Next, the phrase-separation learning file is searched to determine whether a matching entry exists (step 204). Here, since there is no matching entry, TPOSI = 2 is set (step 20).
6) It is determined whether or not TPOSI has become YLEN (step 207). Here TPOSI is YLEN
Is returned, the process returns to step 203, and the phrase-separation learning information file is searched for using a reading character string of “put”. Since there is no matching entry in step 204, the process proceeds to step 206. Thus, step 2
By continuing the processing of step 206 from step 03, the read character string for searching for a phrase-separated learning file changes to “Oku ni”, “Oku ni”, “Oku ni” and “Oku ni”. Go. When the read character string is “Oku wa Niwa”, since there is a matching entry in step 204, the process proceeds to step 205. In step 205, the phrase separation position information to be used for the subsequent kana-kanji conversion is extracted from the matched phrase separation learning information, and DPOSI = 3. In this example, the segment break position is 3, so in the case of the reading character string of "Oku ni Niwa,"
The phrase break is "Okuni / Haniwa". Similarly, a search is made to see if there is any matching phrase segmentation learning information, and the process ends when HPOSI and TPOSI become YLEN (steps 207 to 20).
9).

【０００７】このように、従来の文節区切り学習情報検
索方式では「おくにはにわが」という学習情報を得るま
でに、ファイル検索を最低その文字数分（７文字）、最
大では６６回（１１文字）も行なわなくてはならない。
しかも、入力された読み文字列から生成された文節区切
り学習ファイル検索のための読み文字列が全て学習情報
ファイルに存在しなかった場合には、６６回の検索が全
て無駄になってしまう。さらに、文節区切り学習情報
ファイルの内容が図６のようになっていた場合、「おく
にはにわがあります」から生成される文節区切り学習フ
ァイル検索のための読み文字列のうち、２パターン
（「おくにはにわが」と「くにはにわ」）が一致する
が、どちらを選択すればよいかという判断基準は特にな
い。したがって、誤った文節区切りによって変換が行わ
れることもあり、変換効率を落とす原因となっていた。As described above, in the conventional phrase segmentation learning information search method, the file search is performed at least for the number of characters (7 characters) and up to 66 times (11 characters) until the learning information "Oku-Niwa" is obtained. Must also be done.
In addition, if all the read character strings for searching for a phrase-separated learning file generated from the input read character string do not exist in the learning information file, all of the 66 searches are wasted. Further, when the contents of the phrase separation learning information file are as shown in FIG. 6, two patterns (“OKA "Naniwa" and "Kuniwa") match, but there is no particular criterion for selecting which one to select. Therefore, the conversion may be performed due to an incorrect segment break, which causes a reduction in the conversion efficiency.

【０００８】この発明は、登録されている文節区切り学
習情報を効率よく検索し、文節区切り学習情報を用いた
かな漢字変換の処理速度を向上することができる文節区
切り学習情報検索方式を提供することを目的とする。An object of the present invention is to provide a phrase segmentation learning information retrieval system capable of efficiently searching registered phrase segmentation learning information and improving the processing speed of kana-kanji conversion using the phrase segmentation learning information. Aim.

【０００９】[0009]

【課題を解決するための手段】この発明に係わる文節区
切り学習情報検索方式では、まず、文節区切り学習情報
を用いてかな漢字変換を行う際に、文節区切り学習手段
によって記憶された文節区切り学習情報を利用せずに内
部的に連文節変換を行う。そして、前記変換結果から２
文節づつを取り出した読み文字列を用いて、前記文節区
切り学習手段によって記憶された学習情報の検索を行
い、検索によって一致した文節区切り学習情報の文節区
切り位置情報に従って、前記変換結果に対して内部的に
文節区切り位置の修正を行うようにしている。According to the phrase separation learning information retrieval method according to the present invention, first, when performing kana-kanji conversion using the phrase separation learning information, the phrase separation learning information stored by the phrase separation learning means is used. Performs continuous phrase conversion internally without using it. Then, from the conversion result, 2
A search for the learning information stored by the phrase separation learning means is performed using the read character string extracted from each phrase, and an internal search result is obtained for the conversion result in accordance with the phrase separation position information of the phrase separation learning information matched by the search. The phrase break position is corrected.

【００１０】[0010]

【作用】入力された読み文字列を内部的に連文節変換
し、このうち２文節を取り出す。そして、この２文節に
より文節区切り学習手段の学習情報の検索を行ない、一
致する学習情報があるときは、当該学習情報の対応する
文節区切り位置情報を取り出す。次に、検索に使った２
文節の最初の文節の先頭からの文字数と前記文節区切り
位置情報を比較する。ここで、２つの値が一致していな
いときは前記文節区切り位置情報に従って、内部的に文
節区切り位置の修正を行う。そして、文節区切り位置情
報から前を単文節変換し、後を連文節変換する。また、
２つの値が一致しているときは、次の２文節について同
様の検索を行なう。これらの処理を全ての文節について
行なうことによって、文節区切り学習情報を反映したか
な漢字変換結果を得ることができる。この方式によれ
ば、従来に比べて読み文字列の数や検索回数を少なくす
ることができる。また、文節単位で検索するので、部分
的に一致する読み文字列を持つ学習情報が登録されてい
ても、検索されることがなく、誤った文節区切りによっ
て変換されることがない。The input character string is internally converted into a continuous phrase, and two phrases are extracted. Then, the learning information of the phrase segmentation learning means is searched based on the two phrases, and when there is matching learning information, the phrase segmentation position information corresponding to the learning information is extracted. Next, 2
The number of characters from the beginning of the first phrase of the phrase is compared with the phrase delimiter position information. Here, when the two values do not match, the clause break position is internally corrected according to the clause break position information. Then, based on the phrase segment position information, the preceding paragraph is converted into a single phrase, and the latter is converted into a continuous phrase. Also,
If the two values match, a similar search is performed for the next two clauses. By performing these processes for all the phrases, a kana-kanji conversion result reflecting the phrase segmentation learning information can be obtained. According to this method, the number of read character strings and the number of searches can be reduced as compared with the related art. In addition, since the search is performed on a phrase basis, even if learning information having a partially matched reading character string is registered, it is not searched for and is not converted by an incorrect phrase delimiter.

【００１１】[0011]

【実施例】以下、この発明に係わるかな漢字変換装置に
おける文節区切り学習情報検索方式の一実施例を説明す
る。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a kana-kanji conversion apparatus according to the present invention will be described below with reference to a phrase segmentation learning information retrieval system.

【００１２】図１は、この発明に係わる文節区切り学習
情報検索方式を適用したかな漢字変換装置の構成を示す
ブロック図である。このかな漢字変換装置は、かな文字
列（又はローマ字文字列、以下同様とする）を入力する
ための入力部１と、入力されたかな文字列をかな漢字混
じり文字列に変換するかな漢字変換部２と、前記かな漢
字変換部２から出力されたかな漢字混じり文字列を表示
する表示部３と、前記かな漢字変換部２が出力したかな
漢字混じり文字列の文節区切り位置を移動するための文
節区切り移動部４と、かな文字列とかな漢字混じり文字
列と文法情報を対応づけて複数格納した単語辞書部５
と、前記文節区切り移動部４による文節区切り位置の移
動にかかわる２つの文節の読み及び文節区切り位置の移
動後の文節区切り位置を、文節区切り学習情報として文
節区切り学習ファイルに記憶する文節区切り学習部６と
から構成されている。FIG. 1 is a block diagram showing a configuration of a kana-kanji conversion apparatus to which a phrase segmentation learning information search method according to the present invention is applied. The kana-kanji conversion device includes an input unit 1 for inputting a kana character string (or a Roman character string, the same applies hereinafter), a kana-kanji conversion unit 2 for converting an input kana character string into a kana-kanji mixed character string, A display unit 3 for displaying a kana-kanji mixed character string output from the kana-kanji conversion unit 2, a kana-kanji mixed character string output by the kana-kanji conversion unit 2, a phrase separation moving unit 4 for moving a kana-kanji mixed character string, A word dictionary unit 5 in which a plurality of character strings and kana-kanji mixed character strings and grammatical information are stored in association with each other.
And a phrase segmentation learning unit that stores, in the phrase segmentation learning file, the phrase segmentation positions after reading the two phrases and moving the phrase segmentation position related to the movement of the segmentation segmentation position by the segmentation segmentation movement unit 4 in the segmentation segment learning file. 6 is comprised.

【００１３】上記かな漢字変換部２は、基本的なかな漢
字変換処理に加えて、文節区切り学習情報を用いてかな
漢字変換を行う際に、前記文節区切り学習部６によって
記憶された文節区切り学習情報を利用せずに内部的に連
文節変換を行うと共に、前記変換結果から２文節づつを
取り出した読み文字列を用いて、前記文節区切り学習部
６によって記憶された学習情報の検索を行い、検索によ
って一致した文節区切り学習情報の文節区切り位置情報
に従って、前記変換結果に対して内部的に文節区切り位
置の修正を行う処理を行なっている。The kana-kanji conversion unit 2 uses the phrase separation learning information stored by the phrase separation learning unit 6 when performing kana-kanji conversion using the phrase separation learning information in addition to the basic kana-kanji conversion processing. In addition to performing the continuous phrase conversion internally without performing the above, the learning information stored by the phrase segmentation learning unit 6 is searched using the reading character string obtained by extracting two phrases each from the conversion result, and the search result is matched. According to the phrase segment position information of the phrase segment learning information, a process of internally correcting the segment segment position on the conversion result is performed.

【００１４】ここで、上述したかな漢字変換装置の文節
区切り学習機能について簡単に説明する。今、使用者が
「奥に埴輪があります」という文字列を得るために「お
くにはにわがあります」という文字列を入力してかな漢
字変換を行なったとする。かな漢字変換部２では、入力
された読み文字列に対応する単語を単語辞書部５で検索
し、かな漢字混じり文字列に変換する。この場合、入力
された読み文字列に含まれる２文節分の読み文字列を持
ったデータが文節区切り学習部６に登録されていないと
すると、文節区切り学習部６のデータは使用されないこ
とになる。この結果、「奥には庭があります」という文
字列に変換されたとする。この文字列は使用者の希望す
る文字列とは違うので、文節区切り修正を行わなければ
ならない。そこで、使用者は文節区切り移動部４を操作
し、「おくには／にわが」の２文節間の文節区切り
「／」を１文字分左に移動して再変換を指示する。かな
漢字変換部２では２つの文節を各々変換して変換結果を
表示する。これにより、「奥に埴輪があります」という
文字列が得られる。使用者は希望する文字列が得られた
ので、表示されている文字列を確定する操作を行なう。
この操作が行われると、かな漢字変換部２は図５に示す
ように、２文節の読み文字列「おくにはにわが」と文節
区切り位置情報「３」が登録される。Here, the phrase separation learning function of the above-described kana-kanji conversion device will be briefly described. Now, suppose that the user performs a kana-kanji conversion by inputting a character string of "I have a crocodile" to obtain a character string of "There is a haniwa in the back". In the kana-kanji conversion unit 2, a word corresponding to the input read character string is searched in the word dictionary unit 5, and converted into a kana-kanji mixed character string. In this case, if the data having the reading character strings of two phrases included in the input reading character string is not registered in the phrase separation learning unit 6, the data of the phrase separation learning unit 6 will not be used. . As a result, it is assumed that the character string is converted into a character string “There is a garden in the back”. Since this character string is different from the character string desired by the user, the phrase must be corrected. Therefore, the user operates the phrase separation moving unit 4 to move the phrase separation “/” between the two phrases “Oku ni / Niwa” one character to the left and instruct reconversion. The kana-kanji conversion unit 2 converts each of the two phrases and displays the conversion result. Thus, a character string "There is a haniwa in the back" is obtained. The user obtains the desired character string, and performs an operation to determine the displayed character string.
When this operation is performed, as shown in FIG. 5, the kana-kanji conversion unit 2 registers a two-segment reading character string "Oku ni Niwaga" and the segment break position information "3".

【００１５】以下、使用者が前回と同じ文字列を入力す
ると、かな漢字変換部２は入力された読み文字列に対応
する単語を単語辞書部５で検索して、かな漢字混じり文
字列に変換する。この場合、文節区切り学習部６には図
５に示すデータが登録されているので、かな漢字変換部
２は入力された文字列のうち、「おくにはにわが」の部
分を「おくに」、「はにわが」の文節区切りでそれぞれ
かな漢字変換する。この結果、使用者は文節区切りの修
正や再変換を行なうことなく「奥に埴輪があります」と
いう文字列を得ることができる。Hereinafter, when the user inputs the same character string as the previous time, the kana-kanji conversion unit 2 searches the word corresponding to the input reading character string in the word dictionary unit 5 and converts the word into a kana-kanji mixed character string. In this case, since the data shown in FIG. 5 is registered in the phrase segmentation learning unit 6, the kana-kanji conversion unit 2 replaces the part of "Oku ni Niwa" in the input character string with "Oku ni", "Oku ni". Kana-Kanji conversion is performed at each paragraph break of "Haniwaga". As a result, the user can obtain a character string "There is a haniwa in the back" without correcting or re-converting the segment break.

【００１６】次に、図１のかな漢字変換装置で実行され
る文節区切り学習情報検索方式の処理手順を図２並びに
図３のフローチャートを用いて説明する。Next, a description will be given of the processing procedure of the phrase segmentation learning information search method executed by the kana-kanji conversion apparatus of FIG. 1 with reference to the flowcharts of FIGS.

【００１７】まず、図２並びに図３のフローチャートで
使用されている語について説明する。「ＢＵＮ」は連文
節変換により得られた結果を構成する文節の集まりの
値、「ＢＮＵＭ」はＢＵＮを構成する文節数、「ＢＰＯ
ＳＩ」は文節区切り学習情報ファイルを検索するための
読み文字列を文節単位で得るためのＢＵＮの中での位
置、「ＤＰＯＳＩ」は文節区切り学習情報によって指定
された文節区切り位置を表すパラメータである。また、
以下の説明では図５の状態の文節区切り学習情報を用い
るものとする。First, terms used in the flowcharts of FIGS. 2 and 3 will be described. “BUN” is a value of a set of clauses constituting the result obtained by the continuous clause conversion, “BNUM” is the number of clauses constituting the BUN, “BPO”
"SI" is a position in the BUN for obtaining a reading character string for searching a phrase separation learning information file in units of phrases, and "DPOSI" is a parameter representing a phrase separation position specified by the phrase separation learning information. . Also,
In the following description, the phrase segmentation learning information in the state of FIG. 5 is used.

【００１８】図２において、「おくにはにわがありま
す」という文字列が入力されたとする。まず、文節区切
り学習情報を用いずに連分節変換を行なう。この結果、
「おくには」「にわが」「あります」の３つの文節の集
まりが得られたとして、これらの連文節をＢＵＮに格納
する（ステップ１０１）。次に、前記ＢＵＮから文節単
位で読み文字列を得るためのパラメータであるＢＰＯＳ
Ｉを１にする（ステップ１０２）。そして、ＢＵＮ（Ｂ
ＰＯＳＩ）及びＢＵＮ（ＢＰＯＳＩ＋１）を合わせた読
み文字列「おくにはにわが」を用いて、文節区切り学習
部６の文節区切り学習ファイルを検索し（ステップ１０
３）、一致するエントリが存在するかどうかを判断する
（ステップ１０４）。文節区切り学習ファイルには、図
５に示すように「おくにはにわが」という学習情報があ
るので、その文節区切り学習情報の文節区切り位置を示
す値「３」を取り出し、ＤＰＯＳＩ＝３とする（ステッ
プ１０５）。続いて、ＢＵＮ（ＢＰＯＳＩ）の文字数と
ＤＰＯＳＩの値が一致していないかどうかを判断する
（ステップ１０６）。ステップ１０６では、ＢＵＮ（Ｂ
ＰＯＳＩ）の文字数（４）とＤＰＯＳＩの値（３）が一
致していないのでステップ１０７に移る。このステップ
では「おくにはにわが」という読み文字列をＤＰＯＳＩ
の位置で前と後ろに分ける。そして「おくに」に単文節
変換を施し、ＢＵＮにおけるＤＰＯＳＩから後ろの全て
の文字列「はにわがあります」に対して連文節変換を施
す（ステップ１０７）。なお、ステップ１０７では内部
的に文節区切り位置の修正が行われている。次に、ＢＰ
ＯＳＩがＢＮＵＭになったかどうかを判断する（ステッ
プ１０８）。この段階ではＢＰＯＳＩがＢＮＵＭに達し
ていないので、ステップ１０３〜ステップ１０７までの
ループの処理を繰り返す。さて、再びＢＵＮ（ＢＰＯＳ
Ｉ）及びＢＵＮ（ＢＰＯＳＩ＋１）を合わせた読み文字
列「おくにはにわが」を用いて文節区切り学習ファイル
を検索する。文節区切り学習ファイルには、前述したよ
うに「おくにはにわが」という学習情報があるので、そ
の文節区切り位置を示す値「３」を取り出し、ＤＰＯＳ
Ｉ＝３とする。続いて、ＢＵＮ（ＢＰＯＳＩ）の文字数
とＤＰＯＳＩの値が一致するかどかを判断する。ステッ
プ１０６では、ＢＵＮ（ＢＰＯＳＩ）の文字数（３）と
ＤＰＯＳＩの値（３）が一致しているので、ＢＰＯＳＩ
をインクリメントして検索のための文節の組み合わせを
変更し（ステップ１０９）、ステップ１０８に移る。こ
の段階でもＢＰＯＳＩはＢＮＵＭに達していないので、
ステップ１０３〜ステップ１０７までのループの処理を
繰り返す。In FIG. 2, it is assumed that a character string “I have an army” is input. First, continuous articulation conversion is performed without using the segmentation break learning information. As a result,
Assuming that a group of three clauses “OK”, “Niwa” and “Aru” has been obtained, these consecutive clauses are stored in the BUN (step 101). Next, BPOS which is a parameter for obtaining a reading character string from the BUN in units of clauses
I is set to 1 (step 102). And BUN (B
Using the reading character string “Oku ni Niwa” combining the POSI) and BUN (BPOSI + 1), the phrase separation learning file of the phrase division learning unit 6 is searched (step 10).
3) It is determined whether a matching entry exists (step 104). As shown in FIG. 5, the phrase-separation learning file has the learning information “Oku-ni-niwa”. Therefore, a value “3” indicating the phrase-separation position of the phrase-separation learning information is extracted, and DPOSI = 3 ( Step 105). Subsequently, it is determined whether or not the number of characters of BUN (BPOSI) does not match the value of DPOSI (step 106). In step 106, BUN (B
Since the number of characters (POSI) (4) does not match the value (3) of DPOSI, the process proceeds to step 107. In this step, the read character string “Oku ni Niwa” is written in DPOSI
At the front and back at the position. Then, single phrase conversion is performed on "Okuni", and continuous phrase conversion is performed on all the character strings "Haniwa ga Iru" after DPOSI in BUN (step 107). In step 107, the segment break position is internally corrected. Next, BP
It is determined whether the OSI has become BNUM (step 108). At this stage, since BPOSI has not reached BNUM, the loop processing from step 103 to step 107 is repeated. By the way, BUN (BPOS
The phrase-separated learning file is searched using the reading character string "Oku-Niwa-gama" that combines I) and BUN (BPOSI + 1). As described above, the phrase-separation learning file contains the learning information “Oku-ni-niwa”, so the value “3” indicating the phrase-separation position is extracted, and the DPOS
Let I = 3. Subsequently, it is determined whether the number of characters of BUN (BPOSI) matches the value of DPOSI. In step 106, since the number of characters (3) of BUN (BPOSI) matches the value (3) of DPOSI, the BPOSI
Is incremented to change the combination of clauses for search (step 109), and the routine goes to step 108. Since BPOSI has not reached BNUM even at this stage,
The loop processing from step 103 to step 107 is repeated.

【００１９】次に、ＢＵＮ（ＢＰＯＳＩ）及びＢＵＮ
（ＢＰＯＳＩ＋１）を合わせた読み文字列「はにわがあ
ります」を用いて文節区切り学習ファイルを検索する。
今度は一致するエントリが存在しないので、ステップ１
０４からステップ１０９に移り、再びＢＰＯＳＩをイン
クリメントして検索のための文節の組み合わせを変更
し、ステップ１０８に移る。ここで、ＢＰＯＳＩはＢＮ
ＵＭに達したので、ループの処理を終了する。Next, BUN (BPOSI) and BUN
The phrase-separated learning file is searched using the reading character string "Haniwa wa I" with (BPOSI + 1).
This time, there is no matching entry, so step 1
From 04, the process moves to step 109, the BPOSI is incremented again to change the combination of phrases for search, and the process moves to step 108. Where BPOSI is BN
Since the UM has been reached, the loop processing ends.

【００２０】次に、図３において、ＢＵＮ（ＢＰＯＳ
Ｉ）の１文節の読み文字列で文節区切り学習ファイルを
検索し（ステップ１１０）、一致するエントリが存在す
るかどうかを判断する（ステップ１１１）。この例で
は、末尾の１文節の読み「あります」を用いて文節区切
り学習ファイルを検索する。図５において一致する学習
情報は存在しないので処理を終了する。また、ステップ
１１１で一致する学習情報が存在するときは、その文節
区切り位置を示す値を取り出しＤＰＯＳＩに入れる（ス
テップ１１２）。そして、読み文字列をＤＰＯＳＩの位
置で前と後ろに分け、ＤＰＯＳＩから前に単文節変換を
施し、ＤＰＯＳＩから後ろの文字列に対して連文節変換
を施す（ステップ１１３）。Next, in FIG. 3, BUN (BPOS
The phrase-separated learning file is searched with the one-syllable reading character string of I) (step 110), and it is determined whether a matching entry exists (step 111). In this example, the phrase-separation learning file is searched using the reading of the last one phrase, "Aru". Since there is no matching learning information in FIG. 5, the process ends. If there is matching learning information in step 111, a value indicating the segment break position is extracted and put into DPOSI (step 112). Then, the read character string is divided into front and rear parts at the position of DPOSI, single phrase conversion is performed from DPOSI to front, and continuous phrase conversion is performed to the character string after DPOSI (step 113).

【００２１】上述した処理によれば、３回の検索によっ
て「奥に埴輪があります」という文節区切り学習情報を
反映したかな漢字変換結果を得ることができた。According to the above-described processing, a kana-kanji conversion result reflecting the phrase-separation learning information of "there is a haniwa in the back" can be obtained by three searches.

【００２２】[0022]

【発明の効果】以上説明したように、この発明に係わる
文節区切り学習情報検索方式においては、文節区切り学
習情報ファイルを検索する際に、文節区切り学習情報を
用いずに連文節変換を行い、生成された文節の組み合わ
せを用いて文節区切り学習情報を検索するようにしたた
め、検索するための読み文字列の数や検索回数を少なく
することができる。また、文節単位の読みで検索するの
で、部分的に一致する読み文字列を持つ学習情報が登録
されていても検索されることがなく、誤った文節区切り
によって変換されることがない。このため文節区切り学
習情報を効率よく検索することができ、かな漢字変換の
処理速度を向上させることが可能となる。As described above, in the phrase segmentation learning information search method according to the present invention, when a phrase segmentation learning information file is searched, continuous phrase conversion is performed without using segmentation segmentation learning information. Since the phrase-separation learning information is retrieved using the combination of the retrieved phrases, the number of character strings to be retrieved and the number of retrievals can be reduced. In addition, since the search is performed based on the phrase unit reading, even if learning information having a partially matched reading character string is registered, the retrieval is not performed, and conversion is not performed due to an incorrect phrase delimiter. For this reason, the phrase segmentation learning information can be efficiently searched, and the processing speed of kana-kanji conversion can be improved.

[Brief description of the drawings]

【図１】この発明に係わる文節区切り学習情報検索方式
を適用したかな漢字変換装置の構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of a kana-kanji conversion device to which a phrase separation learning information search method according to the present invention is applied.

【図２】この発明に係わる文節区切り学習情報検索方式
の処理手順を示すフローチャート。FIG. 2 is a flowchart showing a processing procedure of a phrase segmentation learning information search method according to the present invention.

【図３】この発明に係わる文節区切り学習情報検索方式
の処理手順を示すフローチャート。FIG. 3 is a flowchart showing a processing procedure of a phrase segmentation learning information search method according to the present invention.

【図４】従来のかな漢字変換時における文節区切り学習
情報検索方式の処理手順を示すフローチャート。FIG. 4 is a flowchart showing a processing procedure of a phrase separation learning information search method at the time of conventional kana-kanji conversion.

【図５】文節区切り学習ファイルの内容を示す説明図。FIG. 5 is an explanatory diagram showing the contents of a phrase separation learning file.

【図６】文節区切り学習ファイルの内容を示す説明図。FIG. 6 is an explanatory diagram showing the contents of a phrase segmentation learning file.

[Explanation of symbols]

１…入力部、２…かな漢字変換部、３…表示部、４…文
節区切り移動部、５…単語辞書部、６…文節区切り学習
部DESCRIPTION OF SYMBOLS 1 ... Input part, 2 ... Kana-Kanji conversion part, 3 ... Display part, 4 ... Phrase division moving part, 5 ... Word dictionary part, 6 ... Phrase division learning part

Claims

(57) [Claims]

An input means for inputting a kana character string or a Roman character string, a kana-kanji conversion means for converting an input kana character string or a Roman character string into a kana-kanji mixed character string, and an output from the kana-kanji conversion means A display unit for displaying a kana-kanji mixed character string, a kana-kanji mixed character string output by the kana-kanji conversion unit, a kana-kanji mixed character string separating unit, and a kana-kanji mixed character unit for moving the kana-kanji mixed character string. Concerned 2
When performing kana-kanji conversion using phrase-separation learning information in a kana-kanji conversion device equipped with a phrase-separation learning unit that stores the phrase-separation position after reading the two phrases and moving the phrase-separation position as phrase-separation learning information, The continuous phrase conversion is performed internally without using the phrase separation learning information stored by the phrase separation learning means, and stored by the phrase separation learning means by using a read character string extracted two by two from the conversion result. A kana-kanji conversion device for performing a search for the searched learning information, and internally correcting a phrase break position for the conversion result according to the phrase break position information of the phrase break learning information matched by the search. Delimiter learning information search method.