JP2975613B2

JP2975613B2 - Kana-Kanji conversion method and device

Info

Publication number: JP2975613B2
Application number: JP1212175A
Authority: JP
Inventors: 聡木下; 和広木村; 一男住田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1989-08-18
Filing date: 1989-08-18
Publication date: 1999-11-10
Anticipated expiration: 2014-11-10
Also published as: JPH0375865A

Description

【発明の詳細な説明】〔発明の目的〕（産業上の利用分野）この発明は、かな表現の入力日本語文を漢字かな混り
文に変換するかな漢字変換方法及び装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention relates to a kana-kanji conversion method and device for converting an input Japanese sentence of a kana expression into a kanji-kana mixed sentence.

（従来の技術）日本語には、読みが同じでありながら表現の異なる語
（漢字）、いわゆる同音異義語が多数存在する。このた
め、ある読み（かな）を漢字に変換するに際し、その読
みに対応する同音異義語のうち、利用者が入力したいと
考えている語をいかに第１候補として呈示するかという
観点から技術の開発が行われてきた。従来、この試みと
して、表層レベル又は意味レベルでの語と語の共起関係
を利用するものがあった。例えば、「なく」という語
（かな）には、「泣く」と「鳴く」という２つの表記
（漢字）があり、主語となる語に応じて、「人が泣く」
とか、「犬が鳴く」とかいうように使い分ける必要があ
る。表層のレベルの共起関係を利用する変換方式では、
上記の関係を、予め、「人」…「泣く」「犬」…「鳴く」という形のテーブルとして記憶しておく。そして、変換
の際にこのテーブルを利用して、同音異義語中の最も適
切なものを選択する。しかしながら、表層レベルの共起
関係では、「にほんじんがなく」や「あきたけんがな
く」とかな入力した場合には、主語となる語が予め記憶
した上記の共起関係のテーブルと異なるため、正しく変
換することができない。(Prior Art) In Japanese, there are many words (kanji) having the same pronunciation but different expressions, so-called homonyms. For this reason, when converting a certain reading (kana) into a kanji, from the viewpoint of how to present, as a first candidate, a word that the user wants to input among homonyms corresponding to the reading. Development has been done. Heretofore, as this attempt, there has been a method that utilizes a word-word co-occurrence relationship at a surface level or a meaning level. For example, the word "kana" has two notations (kanji), "cry" and "snarling," and "people cry" depending on the subject word.
It is necessary to use them properly, such as "dogs sing". In the conversion method using the co-occurrence relationship at the surface level,
The above relationship is stored in advance as a table of the form "person"... "Crying""dog". Then, at the time of conversion, the most appropriate one of the homonyms is selected using this table. However, in the co-occurrence relation at the surface level, when "Kana ni no genki" or "Akita ken no" is entered, the subject word is different from the previously stored co-occurrence relation table, Cannot convert correctly.

それに対し、意味レベルでの共起とは、共起関係を考
える際に、少なくともどちらか一方の語を概念としてと
らえたものである。例えば、意味素性を用いた場合、
「人間」および「犬」の意味情報は、それぞれ『＋動
物、＋人』および『動物＋、−人』と表現できる。この
ため、先のテーブルは意味レベルでは以下のように記述
できる。On the other hand, the co-occurrence at the semantic level is one in which at least one of the words is considered as a concept when considering the co-occurrence relationship. For example, when using semantic features,
The meaning information of “human” and “dog” can be expressed as “+ animal, + human” and “animal +, − human”, respectively. Therefore, the above table can be described as follows at a semantic level.

『＋動物、＋人』…「泣く」『＋動物、＋人』…「鳴く」さらに、「日本人」や「秋田犬」は、それぞれ、
「人」と「犬」の下位概念である。このため、それぞれ
の意味情報も、『＋動物、＋人』および『＋動物、−
人』と記述できる。このため、「にほんじんがなく」や
「あきたけんがなく」とかなで入力した場合も、「な
く」を主語に合わせて正しく変換することができる。"+ Animals, + People"… "Crying""+ Animals, + People" ... "Crying" Furthermore, "Japanese" and "Akita Inu"
It is a subordinate concept of "people" and "dogs". Therefore, each of the semantic information is also “+ animal, + person” and “+ animal, −
Person. " For this reason, even if the user inputs "Japanese" or "Akita-ken" in kana, it is possible to correctly convert "null" according to the subject.

しかし、このような共起関係は極めて多種多様で、そ
の数は極めて多い。そのため、予め、全ての単語同志の
組み合わせの可能性を調べて、完全な共起関係の情報を
用意することは、実際問題として不可解である。そこ
で、従来の共起関係を用いた変換装置では、オペレータ
がキー操作によって同音語の選択をした際に、選択した
語と例えばその前に存する語との表層レベルの共起関係
を学習することで、共起関係の例を増やすことが行われ
ていた。しかし、この場合にあっても意味レベルでの共
起関係の学習は行われていなかった。このため、せっか
くこのような共起関係の学習を行なっても、変換率の向
上には実際上あまり効果がないことも少なくなかった。However, such co-occurrence relationships are extremely diverse, and the number is extremely large. Therefore, it is practically incomprehensible to check in advance the possibility of all combinations of words and prepare complete co-occurrence information. Therefore, in a conventional conversion device using a co-occurrence relationship, when an operator selects a homonym by a key operation, the co-occurrence relationship at the surface level between the selected word and, for example, a preceding word is learned. Thus, examples of co-occurrence relationships were being increased. However, even in this case, learning of the co-occurrence relationship at the semantic level has not been performed. For this reason, even if such a co-occurrence relationship is learned, it is often the case that the conversion rate is not effectively improved.

（発明が解決しようとする課題）このように、従来のかな漢字変換方法及び装置では、
意味レベルでの共起関係の学習は行われていなかったた
め、表層レベルの共起関係の学習を行なっていながら、
変換率の向上に効果がない場合も少なくなかった。(Problems to be Solved by the Invention) As described above, in the conventional kana-kanji conversion method and apparatus,
Since the co-occurrence relationship was not learned at the semantic level, while learning the co-occurrence relationship at the surface level,
In many cases, the conversion rate was not improved.

そこでこの発明は、かなを漢字に変換して同音異義語
の選択が行われた際に、その選択された語と共起関係に
ある語を探し出し、表層レベルの共起関係の学習だけで
なく、意味レベルの共起関係も学習し、それらの結果を
意味し、その後の変換率を向上させて、利用者の文章入
力の能率を向上させることを目的としている。Therefore, when the present invention converts kana into kanji and selects a homonymous word, it searches for words that are co-occurring with the selected word, and not only learns the co-occurrence relationship at the surface level, but also The purpose of the present invention is to learn the co-occurrence relation of the semantic level, mean the result thereof, improve the conversion rate thereafter, and improve the efficiency of the user's text input.

[Configuration of the invention]

（課題を解決するための手段）本発明のかな漢字変換装置は、単語情報を記憶した単
語情報記憶主だと、単語間の表層レベル及び意味レベルの共起関係情報を
記憶した共起関係記憶手段と、変換対象として入力された読み情報に基づいて、前記
単語情報記憶手段から読み出した単語情報を、前記共起
関係記憶手段から読み出した共起関係情報によって特定
し、入力された読み情報を漢字混じり情報に変換するか
な漢字変換手段と、この変換手段で変換された漢字混じり情報の語のう
ち、同音異義語のある語に関して、他の同音語を選択す
る次候補選択手段と、この次候補選択手段により他の同音語が選択されたこ
とを受けて、該選択された同音語とこれを除く漢字混じ
り情報の他の語との間に新たに共起関係が成り立つか否
かを判断し、共起関係が成り立つ場合には、判断対象と
なった語の組み合わせの表層レベル及び意味レベルの共
起関係情報を新たに前記共起関係記憶手段に記憶せる共
起学習制御手段とを具備し、前記共起関係記憶手段に記憶されていない語の組み合
わせについて、新たに表層レベル及び意味レベルの共起
関係情報を前記共起関係記憶手段に記憶させることで、
新規の共起関係を学習させるようにしたことを特徴とし
たものとして構成される。(Means for Solving the Problems) A kana-kanji conversion device according to the present invention is a word information storage device that stores word information, and a co-occurrence relationship storage device that stores co-occurrence relationship information of a surface level and a meaning level between words. Based on the reading information input as a conversion target, the word information read from the word information storage means is specified by the co-occurrence relation information read from the co-occurrence relation storage means, and the input reading information is kanji. Kana-kanji conversion means for converting to kana-mixed information, next candidate selection means for selecting another homonym for words having homonyms among words of the kanji-mixed information converted by this conversion means, and next candidate selection In response to the selection of another homonym by the means, it is determined whether or not a new cooccurrence relationship is established between the selected homonym and another word of the kanji-mixed information other than the selected homonym, A co-occurrence learning control unit that newly stores the co-occurrence relationship information at the surface level and the meaning level of the combination of the words that have been determined when the occurrence relationship holds, For a combination of words that are not stored in the co-occurrence relation storage means, by newly storing the co-occurrence relation information at the surface level and the meaning level in the co-occurrence relation storage means,
It is configured as a feature of learning a new co-occurrence relationship.

本発明のかな漢字変換方法は、上記第１のかな漢字変
換方法において、単語情報を記憶した単語情報記憶手段
と、単語間の表層レベル及び意味レベルの共起関係情報
を記憶した共起関係記憶手段とを用い、変換対象として
入力された読み情報に基づいて、前記単語情報記憶手段
から読み出した単語情報を、前記共起関係記憶手段から
読み出した共起関係情報によって特定し、入力された読
み情報を漢字混じり情報に変換するかな漢字変換方法で
あって、変換された漢字混じり情報の語のうち、同音異義語の
ある語に関して、同音語候補の中の他の同音語を選択し
た場合に、該選択された同音語とこれを除く漢字混じり
情報の他の語との間に新たに共起関係が成り立つか否か
を判断し、共起関係が成り立つ場合には、判断対象とな
った語の組み合わせの表層レベル及び意味レベルの共起
関係情報を新たに前記共起関係記憶手段に記憶させるよ
うにしたものとして構成される。The kana-kanji conversion method of the present invention is the first kana-kanji conversion method, wherein the word information storage means storing word information, and the co-occurrence relation storage means storing co-occurrence relation information of the surface level and the meaning level between words. Based on the reading information input as a conversion target, the word information read from the word information storage means is specified by the co-occurrence relation information read from the co-occurrence relation storage means, and the input reading information is determined. A kana-kanji conversion method for converting to kana-kanji mixed information, wherein, among words of the converted kana-kanji mixed information, a word having a homonymous word is selected when another homonym in the homophone candidate is selected. It is determined whether or not a new co-occurrence relationship is established between the same homophone and the other words in the kanji-mixed information other than the same. If the co-occurrence relationship is established, the combination of the words to be determined is determined. The co-occurrence relation information of the combination surface layer level and meaning level is newly stored in the co-occurrence relation storage means.

（作用）本発明においては、選択された同音語とこれを除く漢
字混じり情報の他の語との間に新たに共起関係が成り立
つか否かを判断する。共起関係が成り立つ場合には、判
断対象となった語の組み合わせの表層レベル及び意味レ
ベルの共起関係情報を新たに記憶する。つまり、新規な
共起関係が学習される。(Operation) In the present invention, it is determined whether or not a co-occurrence relationship is newly established between the selected homophone and another word other than the kanji mixed information. When the co-occurrence relation is established, the co-occurrence relation information of the surface level and the meaning level of the combination of the words to be determined is newly stored. That is, a new co-occurrence relationship is learned.

（実施例）第１図はこの発明の一実施例を示すブロック図であ
る。同図において、１は読み情報入力部、２はかな漢字
変換制御部、３は文章編集部、４は表示部、５は共起学
習制御部、６は単語辞書記憶部、７は共起関係記憶部で
ある。(Embodiment) FIG. 1 is a block diagram showing an embodiment of the present invention. In the figure, 1 is a reading information input unit, 2 is a kana-kanji conversion control unit, 3 is a text editing unit, 4 is a display unit, 5 is a co-occurrence learning control unit, 6 is a word dictionary storage unit, and 7 is a co-occurrence relationship storage. Department.

読み（かな）情報入力部１から入力されたかな列は、
順次かな漢字変換制御部２に送られる。かな漢字変換制
御部２は、単語辞書記憶部６と共起関係記憶部７を検索
し、供給されるかな列を漢字かな混じり文に変換する。
文章編集部３は、かな漢字変換制御部２によって変換さ
れた結果や、変換前のかな列、同音異義語に対する他の
候補のリストなどが供給され、それを一時的に記憶する
と共に、出力表示すべき文字列などを表示部４に送る。
また、文章編集部３は、カーソルの移動、文字列の削
除、及び同音異義語の選択など通常の編集操作に関する
入力も、かな漢字変換制御部２を通して読み情報入力部
１より受けとり、予め決められた編集関係の動作を行な
う。共起学習制御部５は、文章編集部３で同音異義語の
選択操作が行われた際に、語と語の間の共起関係をチェ
ックし、共起関係記憶部７に記憶する。The kana column input from the reading (kana) information input unit 1 is
Sent to the kana-kanji conversion control unit 2 sequentially. The kana-kanji conversion control section 2 searches the word dictionary storage section 6 and the co-occurrence relation storage section 7 and converts the supplied kana sequence into a sentence mixed with kanji and kana.
The sentence editing unit 3 is supplied with a result of conversion by the kana-kanji conversion control unit 2, a kana column before conversion, a list of other candidates for homonymous words, and the like, temporarily stores them, and outputs and displays them. The power character string and the like are sent to the display unit 4.
The sentence editing unit 3 also receives inputs related to normal editing operations such as cursor movement, character string deletion, and selection of homonyms from the reading information input unit 1 through the kana-kanji conversion control unit 2 and is determined in advance. Performs editing-related operations. The co-occurrence learning control unit 5 checks the co-occurrence relationship between words when the sentence editing unit 3 performs a homonymous word selecting operation, and stores it in the co-occurrence relationship storage unit 7.

第２図は、単語辞書記憶部６に記憶される単語辞書の
フォーマットの一例である。第３図は、そのフォーマッ
トに従って表現された単語辞書中の情報の一例である。
“読み”部分は、単語の読み（例えば“いぬ”）を表
し、“見出し”部分は“読み”部分（例えば“いぬ”）
の読みを持つ単語（例えば“犬”）を表す。“文法情
報”部分は、この単語（例えば、“犬”）の品詞などの
文法情報（例えば、“名詞”）を記憶し、“見出し番
号”部分の番号（例えば、“101"）は、各見出しごとに
付けられた識別番号である。また、一般に、見出しによ
って表現される語は、一般に複数の意味を持つことが多
い。例えば、「犬」という語には、いわゆる「哺乳類の
犬科の動物」という意味と、「回し者」あるいは「スパ
イ」という意味がある。“概念番号”部分の番号は、そ
の見出しの語が表す概念に付けられた番号である。第２
図及び第３図では、概念番号の個数を３個としている
が、これに限るものではなく、数を固定しない可変長の
フォーマットでもかまわない。FIG. 2 is an example of a format of the word dictionary stored in the word dictionary storage unit 6. FIG. 3 is an example of information in a word dictionary expressed according to the format.
The “reading” portion represents a word reading (eg, “dog”), and the “heading” portion represents a “reading” portion (eg, “dog”).
(Eg, "dog"). The “grammar information” portion stores grammatical information (for example, “noun”) such as the part of speech of this word (for example, “dog”), and the number of the “heading number” portion (for example, “101”) is This is an identification number assigned to each heading. In general, a word represented by a heading generally has a plurality of meanings in many cases. For example, the word "dog" has the meaning of so-called "mammalian canine" and the meaning of "rotator" or "spy". The number in the “concept number” portion is a number assigned to the concept represented by the word of the heading. Second
In FIG. 3 and FIG. 3, the number of concept numbers is three, but the number is not limited to this, and a variable length format in which the number is not fixed may be used.

第４図は、共起関係記憶部に記憶される共起辞書のフ
ォーマットの一例である。第５図はそのフォーマットに
従って表現された共起辞書中の情報の一例である。横に
並んだ番号（例えば、“105"と“103"）同士が互いに共
起関係にあることを示しており、それらの番号はタイプ
が“0"ならば見出し番号、“1"なら概念番号である。例
えば、番号“105",“103"は共にタイプが“0"、即ち、
見出し番号であり、“105"は「人」を表し、“103"は
「泣く」を表すことから、「人」と「泣く」とが共起関
係にあることを示している。FIG. 4 is an example of a format of a co-occurrence dictionary stored in the co-occurrence relation storage unit. FIG. 5 is an example of information in the co-occurrence dictionary expressed according to the format. Numbers arranged side by side (for example, “105” and “103”) indicate that they are co-occurring with each other, and these numbers are heading numbers if the type is “0” and concept numbers if the type is “1”. It is. For example, the numbers “105” and “103” are both of type “0”,
It is a heading number, and “105” represents “person” and “103” represents “cry”, which indicates that “person” and “cry” have a co-occurrence relationship.

また、２行目の例では、番号“201"については、タイ
プが“1"と表わされている。よって、その番号“201"
は、概念番号であり、“201"と“102"の共起関係により
「犬」や「猫」の場合は「鳴く」という漢字が適当であ
ることがわかる。In the example of the second line, the type of the number “201” is represented as “1”. Therefore, the number "201"
Is a concept number, and it can be understood that the kanji character “meow” is appropriate in the case of “dog” or “cat” due to the co-occurrence relationship between “201” and “102”.

第６図は文章編集部３の処理のフローチャートの一例
である。このフローチャートに基づく動作は、概略的に
は以下のように説明される。即ち、編集部３に送られた
文字が、同音異義語に関して別の候補を出力するための
次候補キーのコードでない場合には、そのコードに対応
した処理を行う。また、入力が次候補キーであっても同
音異義語の他の候補を選択できない場合、すなわち、も
ともと同音異義語がない場合や、既に確定されていて他
の同音異義語の情報が失われている場合は、そのまま終
了する。他の候補を選択することができる場合は、次候
補キーや前候補キーなど他の候補の表示を指示するキー
がかな漢字変換制御部から送られてくる間選択操作を行
うが、選択操作は、例えば候補を確定させるためのキー
やカーソルを移動するためのキーなどの入力によってい
ずれ終了するので、その選択操作の終了後、続いて選択
された語と共起関係にある語のサーチを行う。共起関係
にある語が見付からない場合にはそのまま終了する。FIG. 6 is an example of a flowchart of the process of the text editing unit 3. The operation based on this flowchart is schematically described as follows. That is, if the character sent to the editing unit 3 is not the code of the next candidate key for outputting another candidate for the homonym, the processing corresponding to the code is performed. In addition, even if the input is the next candidate key, if another candidate of the homonym cannot be selected, that is, if there is no homonym originally, or if the information of the other homonym that has already been determined is lost If so, the process ends. If another candidate can be selected, the selection operation is performed while the key for instructing the display of another candidate such as the next candidate key or the previous candidate key is sent from the kana-kanji conversion control unit. For example, the input is ended by input of a key for fixing a candidate, a key for moving a cursor, or the like. After the selection operation is completed, a search for a word having a co-occurrence relationship with the selected word is performed. If no co-occurring words are found, the process ends.

より詳しくは、次候補キー入力の有無を判定する（S
1）。次候補キー入力でない場合は、その他の編集処理
を行った後（S2）終了する。入力がある場合は、同音異
義語の有無について判断する（S3）。ない場合は終了す
る。ある場合は、同音異義語の選択操作を行う（S4）。
次に、選択した語と他の語との共起関係をサーチする
（S5）。共起関係のある語が見つからない場合は、終了
する。見つかったときは、共起辞書更新語（S7）、終了
する。More specifically, the presence / absence of the next candidate key input is determined (S
1). If the input is not the next candidate key input, other editing processing is performed (S2), and the processing ends. If there is an input, it is determined whether there is a homonym (S3). If not, end. If there is, an operation of selecting a homonym is performed (S4).
Next, the co-occurrence relationship between the selected word and another word is searched (S5). If no co-occurring words are found, the process ends. If found, the co-occurrence dictionary update word (S7) ends.

第７図は、第６図のS5、即ち、共起関係にある語のサ
ーチする処理の一例の詳細を示すフローチャートであ
る。第７図のフローチャートは概略的には以下のように
説明される。即ち、フローチャートに現れる変数Ｗに
は、第６図の同音異義語選択操作によって選択された語
がセットされているものとする。また、変数Ｆは、共起
関係にある語が見付かったか否かを保持するための変数
であり、ここでは、第７図の処理が終了した時点で値が
１なら共起関係が見付かったことを表し、０なら見付か
らなかったことを表す。また、変数W1,W2は共起関係に
ある２つの語を保持するためのもので、第８図のフロー
チャートで語１、語２によって参照されているもので、
ここではW1を名詞（体言）に、W2を用言に限定して考え
ている。FIG. 7 is a flowchart showing details of an example of a process of searching for words having a co-occurrence relationship in S5 of FIG. The flowchart of FIG. 7 is schematically described as follows. That is, it is assumed that the word selected by the homonym selecting operation shown in FIG. 6 is set in the variable W appearing in the flowchart. The variable F is a variable for holding whether or not a word having a co-occurrence relationship is found. Here, if the value is 1 at the time when the processing of FIG. 7 is completed, the co-occurrence relationship is found. , And 0 indicates that no item was found. The variables W1 and W2 are used to hold two words having a co-occurrence relationship, and are referred to by the words 1 and 2 in the flowchart of FIG.
Here, we consider W1 as a noun (nominal) and W2 as a verb.

まず、変数Ｆに０をセットする。続いて、もし選択さ
れた語Ｗが名詞である場合には、Ｗの直前が連体形の用
言であるか、あるいは直後が用言（但しその直後が句点
のもの）であるかを調べ、もしそうなら、W1に選択され
た語Ｗを、W2にその用言をセットし、変数Ｆに１をセッ
トして終了する。また、Ｗが用言の場合には、Ｗが名詞
の場合とは逆に、Ｗの直後が名詞であり、Ｗが連体形で
あるか、Ｗの直後が句点であり、Ｗの前方に名詞がある
場合には、W1にはその名詞をセットし、W2にはＷ自身を
セットし、最後に変数Ｆに１をセットして終了する。そ
れ以外の場合は、直ちに処理を終了する。First, the variable F is set to 0. Subsequently, if the selected word W is a noun, it is checked whether the word immediately before W is a continuation-type word or the word immediately after W is a word (however, the word immediately after the word is a period). If so, the selected word W is set in W1, the decree is set in W2, the variable F is set to 1, and the process ends. Also, when W is a verb, contrary to the case where W is a noun, immediately after W is a noun and W is a continuous form, or immediately after W is a punctuation point, If there is, the noun is set to W1, the W itself is set to W2, and finally, the variable F is set to 1 and the processing is ended. Otherwise, the process ends immediately.

より詳しくは、まず変数Ｆに０をセットする（S1
1）。選択された語Ｗが名詞かどうかを判断する（S1
2）。名詞の場合には、語Ｗの直前が連体形の用言かど
うかを判断する（S13）。連体形の用言である場合は、
語ＷをW1にセットし、直前の用言をW2にセットし（S1
4）、変数Ｆに１をセットし（S15）終了する。More specifically, first, a variable F is set to 0 (S1
1). It is determined whether the selected word W is a noun (S1
2). In the case of a noun, it is determined whether or not the word immediately before the word W is an adjunct word (S13). If it is an adnominal form,
The word W is set to W1, and the last word is set to W2 (S1
4) Then, 1 is set to the variable F (S15), and the processing ends.

S13において、語Ｗの直前が連体形の用言でないと判
断した場合は、語Ｗの直後が用言かどうかを判断する
（S16）。用言でない場合は、終了する。用言のときに
は、その直後が句点かどうかを判断する（S17）。句点
でない場合は終了する。句点のときには、語ＷをW1にセ
ットし、直語の用言をW2にセットし（S18）、変数Ｆに
１をセットし（S15）、終了する。In S13, if it is determined that the word immediately before the word W is not a continuous form, it is determined whether the word immediately after the word W is a word (S16). If not, the process ends. If it is a declinable word, it is determined whether or not it is a punctuation mark immediately after that (S17). If it is not a period, the process ends. If it is a period, the word W is set to W1, the word of the direct word is set to W2 (S18), the variable F is set to 1 (S15), and the process ends.

S12において、語Ｗが名詞でないと判断されたときに
は、語Ｗが用言かどうかを判断する（S19）。用言でな
いときには、終了する。用言であるきとには、語Ｗの直
後が名詞かどうかを判断する（S20）。名詞のときに
は、語Ｗが連体形かどうかを判断する（S21）。連体形
でない場合は終了する。連体形のときには、直後の名詞
をW1にセットし、語ＷをW2にセットし、変数Ｆに１をセ
ットし、終了する。If it is determined in S12 that the word W is not a noun, it is determined whether or not the word W is a verb (S19). If not, the process ends. If not, it is determined whether the word W is a noun immediately after the word W (S20). If it is a noun, it is determined whether the word W is a continuous form (S21). If not, end the process. In the case of the adjoint form, the noun immediately after is set to W1, the word W is set to W2, the variable F is set to 1, and the process ends.

S20において、語Ｗの直後が名詞でないと判断された
ときには、語Ｗの直後が句点かどうかを判断する（S2
3）。句点でない場合は終了する。句点の場合には、文
内で語Ｗの前方にある名詞を探す（S24）。名詞がある
かないかを判断する（S25）。名詞がない場合は、終了
する。名詞がある場合は、その名詞（前方にある名詞）
をW1にセットし、語ＷをW2にセットし（S26）、変数Ｆ
に１をセットし（S15）、終了する。In S20, when it is determined that the part immediately after the word W is not a noun, it is determined whether the part immediately after the word W is a punctuation mark (S2).
3). If it is not a period, the process ends. In the case of a punctuation mark, a noun in front of the word W in the sentence is searched (S24). It is determined whether or not there is a noun (S25). If there is no noun, the process ends. If there is a noun, the noun (the noun in front)
Is set to W1, the word W is set to W2 (S26), and the variable F is set.
Is set to 1 (S15), and the processing ends.

この第７図のフローチャートに述べられている処理に
より、例えば、かな入力を、「犬がわんわん泣く」や
「わんわん泣く犬を見た」などと変換する場合に、同音
異義語選択により「泣く」を「鳴く」に修正すると、
「犬」と「鳴く」が共起関係にあることがわかる。By the processing described in the flowchart of FIG. 7, for example, when the kana input is converted into "dog crying" or "saw dog crying", "crying" is selected by homonym selection. If you change to
It can be seen that "dog" and "crown" are co-occurring.

このようにして、共起関係にある２つの語、語１（例
えば犬）及び語２（例えば鳴く）が見付かった場合に
は、第６図のS7における共起辞書更新の処理が行われ
る。その詳細は第８図のフローチャートに示される。第
８図は、共起関係を共起関係記憶部に登録する処理のフ
ローチャートである。このフローチャートにおいては、
表層レベルおよび意味レベルでの共起関係を、共起関係
記憶部７に登録し、処理を終了する。これにより、４種
類の共起関係のデータ、即ち、語１の見出し番号×語２の見出し番号語１の見出し番号×語２の概念番号語１の概念番号×語２の見出し番号語１の概念番号×語２の概念番号のデータが登録される。ただし、データ〜の中に
は、必ずしも真の共起関係にないものも含まれる可能性
があるが、データ〜は全て共起関係にあるとして、
それらのデータ〜の全てを登録することとしてい
る。語１、２の概念の数をそれぞれN1,N2とすると、上
記データの総数は（N1＋１）×（N2＋１）個となる。例
えば、語１が「犬」で、語２が「鳴く」の場合には、第
３図からわかるように、N1＝1,N2＝２であることから、
（N1＋１）×（N2＋１）＝６となり、６個のデータ、即
ち、 101×102 101×203 201×102,202×102 201×203,202×203 なるデータが得られることになる。In this way, when two words having a co-occurrence relationship, that is, word 1 (for example, dog) and word 2 (for example, sounding) are found, the process of updating the co-occurrence dictionary in S7 of FIG. 6 is performed. The details are shown in the flowchart of FIG. FIG. 8 is a flowchart of a process for registering a co-occurrence relation in the co-occurrence relation storage unit. In this flowchart,
The co-occurrence relation at the surface level and the semantic level is registered in the co-occurrence relation storage unit 7, and the processing is terminated. As a result, four types of co-occurrence relation data, namely, heading number of word 1 × heading number of word 2 heading number of word 1 × concept number of word 2 concept number of word 1 × heading number of word 2 The data of concept number × concept number of word 2 is registered. However, there is a possibility that some of the data ~ may not necessarily have a true co-occurrence relationship, but assuming that all the data ~ have a co-occurrence relationship,
All of these data are to be registered. Assuming that the number of concepts of words 1 and 2 is N1 and N2, respectively, the total number of the data is (N1 + 1) × (N2 + 1). For example, when the word 1 is "dog" and the word 2 is "ringing", as can be seen from FIG. 3, since N1 = 1 and N2 = 2,
(N1 + 1) × (N2 + 1) = 6, and six data, that is, data of 101 × 102 101 × 203 201 × 102, 202 × 102 201 × 203, 202 × 203 are obtained.

これを第８図のフローチャートに沿って詳細に説明す
る。This will be described in detail with reference to the flowchart of FIG.

単語辞書記憶部６より読み出した語１、語２の見出し
番号と概念番号を一時的に記憶するための２組のメモリ
配列を考え、これをA1,A2とする。まず、A1（０）,A2
（０）に語１、語２の見出し番号を登録し、語１の概念
番号をA1（１）〜A1（N1）に、また、語２の概念番号を
A2（１）〜A2（N2）に順次格納する（S30）。また、フ
ローチャートの中で現れる変数I,Jはそれぞれ配列A1,A2
中のデータを示すための添字、変数II,JJはI,Jの値がそ
れぞれ０のとき０がセットされ、それ以外のときは１が
セットされ、第４図のフォーマットのタイプの値として
使用されるものとする。また、変数Ｘは第４図のフォー
マットをそのままデータ構造としてもつ構造型の変数で
あるとする。変数I,IIに０をセットし（S31）、続いて
変数J,JJにも０をセットする（S32）。次に、先に述べ
た変数Ｘに、登録すべき情報をセットする（S33）。A1
（０）,A2（０）にはそれぞれ見出し番号が登録されて
いるため、最初にＸにセットされるのは、表層レベルの
共起関係である。Ｘにセットされたデータは、共起関係
記憶部に記憶されているか調べられる（S34）。登録さ
れていなければ、そこで登録され、S36へ移る。登録さ
れていれば、直接S36へ移る。S36では、Ｊに１を加算す
るとともに、JJの値を１にセットする（S36）。ここ
で、Ｊの加算語の値を調べる（S37）。もし、語２の概
念番号の数N2以下であれば、S33へ移り、変数Ｘに新た
に値をセットし、登録されていない共起関係を順次登録
していく。このようにして、S37で、Ｊの値がN2より大
きくなったと判断したら、今度はＩに１を加算し、IIに
１をセットして（S38）、上記の動作をＩの値がN1より
大きくなるまで繰り返す（S39）。Consider two sets of memory arrangements for temporarily storing the index numbers and concept numbers of words 1 and 2 read from the word dictionary storage unit 6, and these are referred to as A1 and A2. First, A1 (0), A2
Register the heading numbers of word 1 and word 2 in (0), change the concept number of word 1 to A1 (1) to A1 (N1), and change the concept number of word 2 to
The data is sequentially stored in A2 (1) to A2 (N2) (S30). Variables I and J appearing in the flowchart are arrays A1 and A2, respectively.
Subscripts to indicate the data inside, variables II and JJ are set to 0 when the values of I and J are each 0, and set to 1 otherwise, and used as values of the type in the format of Fig. 4. Shall be performed. It is assumed that the variable X is a structured type variable having the format shown in FIG. 4 as a data structure as it is. The variables I and II are set to 0 (S31), and subsequently the variables J and JJ are also set to 0 (S32). Next, information to be registered is set in the variable X described above (S33). A1
Since index numbers are registered in (0) and A2 (0), the first set to X is the co-occurrence relationship at the surface level. It is checked whether the data set in X is stored in the co-occurrence relation storage unit (S34). If not registered, it is registered there and the process moves to S36. If it has been registered, the process moves directly to S36. In S36, 1 is added to J, and the value of JJ is set to 1 (S36). Here, the value of the added word of J is checked (S37). If the concept number of the word 2 is equal to or less than the number N2, the process proceeds to S33, where a new value is set to the variable X, and the unregistered co-occurrence relations are sequentially registered. In this way, if it is determined in S37 that the value of J is larger than N2, then 1 is added to I, and 1 is set in II (S38). Repeat until it becomes larger (S39).

以上の操作により、先述のように、見出し語番号×
見出し語番号、見出し語番号×概念番号、概念番号
×見出し語番号、概念番号×概念番号の４種類のデー
タが、最大（N1＋１）×（N2＋１）個登録されることに
なる。By the above operation, as described above, the headword number ×
A maximum of (N1 + 1) × (N2 + 1) data of four types of headword words, headword number × concept number, concept number × headword number, and concept number × concept number are registered.

上記第８図のフローチャートによる処理では、データ
〜を全て共起関係があるものとして、登録するよう
にしている。しかしながら、それらのデータ〜中に
は、前に述べたように、必ずしも真の共起関係にないも
のも含まれている可能性もある。そのため、データ〜
については、その全てを共起関係があるものとして登
録するのではなく、真の共起関係があるもののみを登録
することもできる。つまり、選択された語１、語２が複
数の概念（概念番号）を持つ場合に、その文脈で使用さ
れている１つの概念番号を同定し、語１、語２間のより
精密な共起関係のデータを求め、登録することもでき
る。第９図〜第11図は、このような処理を示すフローチ
ャートである。In the processing according to the flowchart of FIG. 8, the data to are registered as having a co-occurrence relationship. However, as described above, there is a possibility that some of these data are not necessarily in a true co-occurrence relationship. Therefore, data ~
May not be registered as having a co-occurrence relationship, but only those having a true co-occurrence relationship may be registered. That is, when the selected words 1 and 2 have a plurality of concepts (concept numbers), one concept number used in the context is identified, and a more precise co-occurrence between the words 1 and 2 is determined. Relationship data can be obtained and registered. FIG. 9 to FIG. 11 are flowcharts showing such processing.

第９図は、真の共起信号のみを共起関係記憶部７に登
録する処理のフローチャートである。まず、変数M1,M2
に語１、語２の見出し番号を格納し、変数N1,N2に語
１、語２の概念番号の個数を格納する（S41）。N1の値
が１か否かを判断する（S42）。N1が１のときには、唯
一の概念番号を変数G1に格納する（S43）。N1が１でな
いときには、文中で用いた語１の概念の概念番号をオペ
レータに選択させ、その番号を変数G1に格納する（S4
4）。語２についてもほぼ同様である。即ち、N2の値が
１か否かを判断する（S45）。N2が１のときには、唯一
の概念番号を変数G2に格納する（S46）。N2が１でない
ときには、文中で用いた語２の概念の概念番号をオペレ
ータに選択させ、その番号を変数G2に格納する（S4
7）。このようにして語１、語２のそれぞれについて、
見出し番号及び概念番号がそれぞれ１つずつ決まる。つ
まり、語１については、M1に見出し番号、G1に概念番号
が格納されている。語２につていは、M2に見出し番号
が、G2に概念番号が格納されている。これらに基づい
て、S48〜S51において、４種類の共起関係データ〜
を、未登録の場合に限り、登録する。即ち、S48で登録
されるのは、表層レベルの共起関係（）、ステップ93
および94は、表層と意味レベルの組み合せによる共起関
係（，）、ステップ95では、意味レベルの共起関係
（）となっている。FIG. 9 is a flowchart of a process for registering only the true co-occurrence signal in the co-occurrence relation storage unit 7. First, the variables M1 and M2
, The heading numbers of words 1 and 2 are stored, and the number of concept numbers of words 1 and 2 is stored in variables N1 and N2 (S41). It is determined whether the value of N1 is 1 (S42). When N1 is 1, the unique concept number is stored in the variable G1 (S43). If N1 is not 1, the operator is caused to select the concept number of the concept of word 1 used in the sentence, and the number is stored in the variable G1 (S4).
Four). The same is true for word 2. That is, it is determined whether the value of N2 is 1 (S45). When N2 is 1, the unique concept number is stored in the variable G2 (S46). If N2 is not 1, the operator is caused to select the concept number of the concept of word 2 used in the sentence, and the number is stored in variable G2 (S4
7). Thus, for each of words 1 and 2,
The heading number and the concept number are determined one by one. That is, for word 1, the heading number is stored in M1, and the concept number is stored in G1. For word 2, the heading number is stored in M2 and the concept number is stored in G2. Based on these, in S48 to S51, four types of co-occurrence relation data
Is registered only if it has not been registered. That is, what is registered in S48 is the co-occurrence relation () at the surface level,
And 94 are co-occurrence relations (,) based on the combination of the surface layer and the semantic level, and in step 95, they are co-occurrence relations of the semantic level ().

第10A図は、第９図のS44の詳細を示すフローチャート
である。先ず、オペレータが概念番号を選択するのに使
用する画面を表示部４に設定する（S61）。変数Ｉを１
に設定する（S62）。語１のＩ番目概念番号をＧにセッ
トする（S63）。概念番号としてＧをもつ語を探し出
し、Ｗにセットする（S64）。Ｗにセットされた語が語
１と同じかどうかを判断する（S65）。同じ場合は、S64
に戻って、上記の探す動作を繰り返す。異なる場合は、
番号Ｉと、語１をＷで置き換えた文字列とを表示する
（S66）。以上の動作により、例えば、「犬が鳴く」
が、共起関係として選択された際に、そこで使用されて
いる「犬」の概念の概念番号を求める際には、第11図に
示されているように、「1.猫が鳴く」が表示される。次に、変数Ｉに１を加え
る（S67）。次に、ＩがN1より小さいかどうかを判断す
る（S68）。小さい場合にはS64に戻って、上記の動作を
繰り返す。これにより、例えば、第11図からわかるよう
に、「2.スパイが鳴く」が表示される。つまり、語１のN1個
の概念番号のそれぞれに関し、同一の概念番号を持つ語
１以外の語Ｗを辞書記憶部６より検索し、順次表示す
る。S68において、ＩがN1より大きくなったとき、即
ち、例えば、「犬」の有する全ての概念と同じ概念の語
が表示し終ったら、オペレータは、第11図の画面を見な
がら、画面の指示に従って、1,2のいずれかの番号
（Ｎ）を入力する（S69）。入力された番号（Ｎ）が適
正かどうかを判断する（S70）。適正のときには、G1に
語１のＮ番目の概念番号をセットし（S71）、画面を復
帰する（S72）。例えば、第11図の例において、番号１
を入力すると、「犬が鳴く」の場合の「犬」の概念の概
念番号は「201」と決定される。FIG. 10A is a flowchart showing the details of S44 in FIG. First, a screen used by the operator to select a concept number is set on the display unit 4 (S61). Variable I is 1
(S62). The I-th concept number of word 1 is set to G (S63). A word having G as the concept number is searched for and set to W (S64). It is determined whether the word set in W is the same as word 1 (S65). If the same, S64
And the above search operation is repeated. If not,
The number I and the character string obtained by replacing the word 1 with W are displayed (S66). By the above operation, for example, "the dog rings"
However, when it is selected as a co-occurrence relationship, when calculating the concept number of the concept of "dog" used there, as shown in Fig. 11, "1. Is displayed. Next, 1 is added to the variable I (S67). Next, it is determined whether I is smaller than N1 (S68). If smaller, the process returns to S64 and the above operation is repeated. Thereby, for example, as can be seen from FIG. 11, "2. The spy rings" is displayed. That is, for each of the N1 concept numbers of word 1, words W other than word 1 having the same concept number are retrieved from dictionary storage unit 6 and sequentially displayed. In S68, when I becomes greater than N1, that is, for example, when the words of the same concept as all the concepts of "dog" have been displayed, the operator, while watching the screen of FIG. , One of the numbers (N) of 1 and 2 is input (S69). It is determined whether the input number (N) is appropriate (S70). If it is appropriate, the Nth concept number of word 1 is set in G1 (S71), and the screen is restored (S72). For example, in the example of FIG.
Is input, the concept number of the concept of “dog” in the case of “dog rings” is determined to be “201”.

また、S47における語２の概念番号の選択も、上述の
語１の場合とほぼ同様に行われる。その詳細を第10B図
に示す。The selection of the concept number of word 2 in S47 is performed in substantially the same manner as in the case of word 1 described above. The details are shown in FIG. 10B.

以上の操作により、語１、語２がそれぞれ複数の概念
番号を持つときに、文中で用いられた概念番号を利用者
に選択させることで、より正確な共起関係の学習を行う
ことができ、その後の変換率も大いに向上する。With the above operation, when words 1 and 2 each have a plurality of concept numbers, by allowing the user to select the concept number used in the sentence, more accurate co-occurrence learning can be performed. , The conversion rate afterwards is greatly improved.

なお、この実施例では、利用者（オペレータ）に各後
の概念を選択させる際に、その概念番号と同じ信号をも
つ語を辞書より検索して表示したが、第３図に示した単
語情報において、各概念番号に通常の国語辞典などに記
述されているような自然言語による説明を付加してお
き、それを表示して概念番号を選ぶようにすることも可
能である。In this embodiment, when the user (operator) selects a concept after each word, the word having the same signal as the concept number is searched from the dictionary and displayed. However, the word information shown in FIG. In the above, it is also possible to add a description in a natural language, such as that described in an ordinary Japanese dictionary, to each concept number, and to display the description to select the concept number.

より適切な変換が行われる共起関係データを求めるに
は、第４図に示した共起辞書のフォーマットを、頻度情
報の項を含む形に拡張することもできる。このために
は、第８図のS34,S35の作業を次のように変更すればよ
い。即ち、まず、変数Ｘにセットされた共起関係情報が
既に登録済みの場合には、その頻度情報の値を頻度情報
の値の最大値の範囲で１加算し、未登録の場合には、初
期値を１として登録する。また、かな漢字変換制御部２
においては、変換の際に、頻度情報が定められた値以上
のデータのみを利用するようにすれば、より精度の高い
変換が可能となる。なお、通常の単語学習のように、あ
る一定時間変換に使用されなかったり、値が更新されな
かった共起関係は、共起関係記憶部７より削除するよう
にすれば、共起辞書の領域を圧縮し、さらには、より高
速な変換が可能となる。In order to obtain co-occurrence relation data in which more appropriate conversion is performed, the format of the co-occurrence dictionary shown in FIG. 4 can be extended to include a term of frequency information. For this purpose, the operations in S34 and S35 in FIG. 8 may be changed as follows. That is, first, when the co-occurrence relation information set in the variable X has already been registered, the value of the frequency information is incremented by 1 within the range of the maximum value of the frequency information. The initial value is registered as 1. Kana-Kanji conversion control unit 2
In, if only data whose frequency information is equal to or more than a predetermined value is used at the time of conversion, conversion with higher accuracy can be performed. If a co-occurrence relation that has not been used for conversion for a certain period of time or whose value has not been updated is deleted from the co-occurrence relation storage unit 7 as in normal word learning, the area of the co-occurrence dictionary can be deleted. , And faster conversion is possible.

〔The invention's effect〕

以上説明したように、本発明によれば、同音異義語の
選択が行われた際に、その語と共起関係にある語を求
め、表層レベルの共起関係に加え、意味レベルの共起関
係も学習するようにしたので、その後の適正なかな漢字
変換率を向上させて、オペレータの文章入力の能率を向
上させることができる。As described above, according to the present invention, when a homonymous word is selected, a word having a co-occurrence relation with the word is obtained, and in addition to the co-occurrence relation at the surface level, the co-occurrence at the semantic level is determined. Since the relationship is also learned, it is possible to improve the subsequent appropriate kana-kanji conversion rate and improve the efficiency of the operator's text input.

[Brief description of the drawings]

第１図はこの発明の１実施例を示すブロック図、第２図
は単語辞書記憶部に記憶される単語辞書のフォーマット
の一例を示す説明図、第３図は上記フォーマットに従っ
て表現された単語辞書中の情報の一例を示す図表、第４
図は共起関係記憶部に記憶される共起辞書のフォーマッ
トの一例を示す説明図、第５図は上記フォーマットに従
って表現された共起辞書中の情報の一例を示す図表、第
６図乃至第８図は文章編集部の処理の一例を示すフロー
チャート、第９図および第10図は共起辞書更新の処理の
一例を示すフローチャート、第11図は共起辞書更新にお
いて利用者に表示される画面の一構成例を示す説明図で
ある。１……読み情報入力部、２……かな漢字変換制御部、３
……文章編集部、４……表示部、５……共起学習制御
部、６……単語辞書記憶部。FIG. 1 is a block diagram showing one embodiment of the present invention, FIG. 2 is an explanatory diagram showing an example of a format of a word dictionary stored in a word dictionary storage unit, and FIG. 3 is a word dictionary expressed in accordance with the above-mentioned format. Chart showing an example of the information in the fourth
FIG. 5 is an explanatory diagram showing an example of a format of a co-occurrence dictionary stored in the co-occurrence relation storage unit. FIG. 5 is a table showing an example of information in the co-occurrence dictionary expressed according to the above format. 8 is a flowchart showing an example of the processing of the text editing unit, FIGS. 9 and 10 are flowcharts showing an example of the co-occurrence dictionary update processing, and FIG. 11 is a screen displayed to the user in the co-occurrence dictionary update. FIG. 4 is an explanatory diagram showing one configuration example of FIG. 1 ... Reading information input unit, 2 ... Kana-Kanji conversion control unit, 3
... A sentence editing unit, 4 a display unit, 5 a co-occurrence learning control unit, 6 a word dictionary storage unit.

フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/21 Continuation of front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G06F 17/21

Claims

(57) [Claims]

1. Word information storage means for storing word information, co-occurrence relation storage means for storing co-occurrence relation information of surface level and meaning level between words, and reading information inputted as a conversion target Kana-Kanji conversion means for specifying word information read from the word information storage means by co-occurrence relation information read from the co-occurrence relation storage means, and converting the input reading information into Kanji mixed information. Of the words of kanji mixed information converted by
A next candidate selecting means for selecting another homonym for a word having a homonym, and excluding the selected homonym in response to selection of another homonym by the next candidate selecting means It is determined whether or not a new co-occurrence relationship is established with other words that contain kanji information. A co-occurrence learning control means for newly storing the occurrence relation information in the co-occurrence relation storage means. A kana-kanji conversion apparatus characterized in that a new co-occurrence relation is learned by storing the co-occurrence relation information in the co-occurrence relation storage means.

2. Word information storage means for storing word information,
Using co-occurrence relation storage means storing the co-occurrence relation information of the surface level and the meaning level between words, based on the read information input as a conversion target, the word information read from the word information storage means, A kana-kanji conversion method for specifying input reading information to kanji-mixed information by specifying the co-occurrence relation information read from the co-occurrence relation storage means, wherein, among the words of the converted kanji-mixed information, If another homonym is selected from homonym candidates for a certain word, whether or not a new co-occurrence relationship is established between the selected homonym and another word other than the kanji mixture information Is determined, and when the co-occurrence relation is established, the co-occurrence relation information of the surface level and the semantic level of the combination of the words to be determined is newly stored in the co-occurrence relation storage means. Characteristic kana-kanji conversion method.