JP3386147B2

JP3386147B2 - Roman character converter

Info

Publication number: JP3386147B2
Application number: JP26980091A
Authority: JP
Inventors: 栄宏佐藤; 仁樹樋口
Original assignee: Fujitsu FIP Corp
Current assignee: Fujitsu FIP Corp
Priority date: 1991-10-18
Filing date: 1991-10-18
Publication date: 2003-03-17
Anticipated expiration: 2018-03-17
Also published as: JPH05108609A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、ローマ字表記をカナ表
記に変換するローマ字変換装置に関するものである。【０００２】日本語をローマ字入力する場合、ローマ字
表記が一通りとは限らないと共に、一般的に用いる表記
のローマ字で目的のカナ表記への変換を行えない場合が
ある。例えば、人名の場合を見ると、「イトウ」という
カナ表記は、ローマ字では、「ＩＴＯ」、「ＩＴＯ
Ｕ」、「ＩＴＯＨ」などと表記する。このようなローマ
字表記で用いられるものを含め、正しいカナ表記へ変換
することが望まれている。【０００３】【従来の技術】従来、ワープロ入力などでローマ字でカ
ナを入力する場合、通常、ローマ字表記とカナ表記が１
対１に対応している。しかし、人名などの表記は、必ず
しも一通りの表記とは限らず、図９に示すように、いく
つかのローマ字表記があって、これらが１つのカナ表記
に対応する場合がある。このように、ローマ字表記は、
必ずしもカナ表記に１対１に対応していないし、更に、
必ずしも発音表記そのものでもない。このため、例えば
「ＩＴＯ」をローマ字入力してカナ変換すると、「イ
ト」が生成され、「イトウ」が生成し得なかった。【０００４】【発明が解決しようとする課題】上述したように従来の
ローマ字カナ変換アルゴリズムを用いてローマ字表記を
カナ表記に変換すると、目的のカナ表記に変換されない
ものが発生してしまうという問題があった。これは、ロ
ーマ字表記を単純な発音表記とみなしてカナ表記に変換
していたためであり、これを解決してローマ字表記を適
切なカナ表記に変換することが望まれている。【０００５】本発明は、ローマ字表記の拡張を行って複
数のローマ字表記を得たり、更に複数のローマ字表記か
ら不要なものを抑制し、目的とする適切なローマ字表記
を生成することを目的としている。【０００６】【課題を解決するための手段】図１を参照して課題を解
決するための手段を説明する。図１において、拡張ロー
マ字変換装置２は、入力されたローマ字表記１に対し
て、ローマ字拡張ルール３およびローマ字抑制ルール４
を適用するものである。【０００７】ローマ字拡張ルール３は、ローマ字表記の
先頭から順次文字を取り出して複数のローマ字表記に拡
張するルールである。ローマ字抑制ルール４は、ローマ
字拡張ルール３によって拡張した複数のローマ字表記に
ついて、先頭から順次文字を取り出して抑制するか否か
を判定するルールである。【０００８】【作用】本発明は、図１に示すように、入力されたロー
マ字表記１について、拡張ローマ字変換装置２が先頭か
ら順次文字を取り出してローマ字拡張ルール３を適用し
て複数のローマ字表記に拡張するようにしている。【０００９】また、入力されたローマ字表記１につい
て、拡張ローマ字変換装置２が先頭から順次文字を取り
出してローマ字拡張ルール３を適用して複数のローマ字
表記に拡張し、更に拡張したローマ字表記についてロー
マ字抑制ルール４を適用して抑制しないと判定したロー
マ字表記のみを生成するようにしている。そして、これ
ら拡張あるいは生成したローマ字表記をカナに変換し、
変換したカナをもとにカナ漢字辞書を検索して漢字に変
換するようにしている。【００１０】従って、ローマ字表記の拡張を行って複数
のローマ字表記を得たり、更にこれら複数のローマ字表
記のうちから不要なものを抑制することにより、入力し
たローマ字表記１から適切な拡張したローマ字表記を生
成し、これらをカナ表記に変換することが可能となる。【００１１】【実施例】次に、図１から図８を用いて本発明の実施例
の構成および動作を順次詳細に説明する。【００１２】図１は、本発明の１実施例構成・説明図を
示す。図１の（イ）は、構成図を示す。図１の（イ）に
おいて、ローマ字表記１は、入力されたローマ字による
表記であって、例えばワープロなどでキーボードからロ
ーマ字でカナ漢字変換するために入力したローマ字表記
や、あるいは機械翻訳時に外国語を日本語のローマ字に
翻訳したローマ字表記（例えば人名、固有名詞など）で
ある。特に外国語を日本語のローマ字に翻訳したとき
に、本発明で複数のローマ字表記に拡張することによ
り、カナ漢字変換辞書に全てのローマ字表記に対応する
カナ（漢字）を登録しておく必要がなく、代表的なロー
マ字表記に対応するカナ（漢字）を登録しておけばよ
く、辞書に登録する数を少なくできる。【００１３】拡張ローマ字変換装置２は、入力されたロ
ーマ字表記１に対して、ローマ字拡張ルール３およびロ
ーマ字抑制ルール４を適用し、適切な拡張したローマ字
表記を生成するものである。この拡張ローマ字変換装置
２は、ハードウェアである推論／制御エンジン上でロー
マ字拡張ルール３およびローマ字制御ルール４を、入力
されたローマ字表記１に対して適用することによって構
成されるものである。【００１４】ローマ字拡張ルール３は、入力されたロー
マ字表記１に適用して複数のローマ字表記を生成するも
のである。例えばローマ字表記１「ＴＯＫＹＯ」を、右
側に記載したように、「ＴＯＫＹＯ、ＴＯＫＹＯＯ、ＴＯＫＹＯＵ、ＴＯＯＫ
ＹＯ、ＴＯＯＫＹＯＯ、ＴＯＯＫＹＯＵ、ＴＯＵＫＹ
Ｏ、ＴＯＵＫＹＯＯ、ＴＯＵＫＹＯＵ」・（１）に拡張するものである（図４、図５を用いて後述す
る）。【００１５】ローマ字抑制ルール４は、ローマ字拡張ル
ール３を入力されたローマ字表記１に適用して拡張した
中間候補のローマ字表記に対し、適用して不要なものを
抑制し、目的にあったローマ字表記を生成するものであ
る。例えば上記（１）の中間候補のローマ字表記に図３
のローマ字抑制ルール４を適用し、右側に記載したよう
に抑制し、結果として、「ＴＯＫＹＯ、ＴＯＫＹＯＵ、ＴＯＵＫＹＯ、ＴＯＵＫ
ＹＯＵ」をローマ字表記の候補として生成するものである。【００１６】ローマ字カナ変換部５は、ローマ字抑制ル
ール４によって抑制した後の候補のローマ字表記を、カ
ナに変換するものである。例えば右側に記載したよう
に、４つの候補のローマ字表記を図示のように「トキ
ョ、トキョウ、トウキョ、トウキョウ」のカナに変換す
るものである。【００１７】図１の（ロ）は、具体例を示す。図１の
（ロ）において、中間候補は、ローマ字表記１「ＴＯＫ
ＹＯ」にローマ字拡張ルール３を適用して拡張したロー
マ字表記を示す。【００１８】候補は、中間候補のローマ字表記にローマ
字抑制ルール４を適用して抑制したローマ字表記を示
す。カナは、候補のローマ字表記をカナに変換したもの
を示す。【００１９】図２は、本発明のローマ字拡張ルールの例
を示す。左側の「ルール番号」は、ルールの番号であ
る。中央の「対象文字」は、入力されたローマ字表記１
の先頭から１文字づつ取り出し、拡張する対象の文字で
あって、該当するときに右側の出力文字を生成するため
のものである。右側の「出力文字」は、対象文字につい
て出力する文字であって、例えば対象文字が「Ｉ」のと
きに「Ｉ」と「ＩＩ」の２つを出力するためのものであ
る。【００２０】図３は、本発明のローマ字抑制ルールの例
を示す。左側の「対象文字」は、ローマ字拡張ルール３
を適用して拡張した中間候補のローマ字表記について、
先頭から取り出して抑制ルールを適用する対象となる文
字（文字列）である。前接続ルールは、抑制対象の文字
に対して、前の文字との関係によって抑止する、抑止し
ないを判定するルールである。後接続ルールは、抑制対
象の文字に対して、後の文字との関係によって抑止す
る、抑止しないを判定するルールである。【００２１】次に、図４の流れ図に示す順序に従い、図
１の（イ）の構成について、入力されたローマ字表記
「ＴＯＫＹＯ」に図２のローマ字拡張ルール３を適用
し、図５を参照して詳細に説明する。【００２２】図４において、Ｓ１は、ローマ字表記の入
力を行う。これは、例えば図５のローマ字表記に示すよ
うに「ＴＯＫＹＯ」の入力を行う。Ｓ２は、Ｓ１で入力
したローマ字表記を、表記木の枝として展開する。これ
は、図５の表記木の最上段の「Ｔ−Ｏ−Ｋ−Ｙ−Ｏ」と
いうように、ノードをＴ、Ｏ、Ｋ、Ｙ、Ｏとして−を枝
として展開する。【００２３】Ｓ３は、表記木より枝を１本抽出する。こ
れは、Ｓ２で入力されたローマ字表記を展開した表記木
の最上段の「Ｔ−Ｏ−Ｋ−Ｙ−Ｏ」から枝を１本抽出す
る。Ｓ４は、終了か（枝の抽出終了か）を判別する。Ｙ
ＥＳの場合には終了する。ＮＯの場合にはＳ５に進む。【００２４】Ｓ５は、枝より１文字抽出（ルール適用
上、後続する１文字も含めて抽出）する。Ｓ６は、終了
か（文字の抽出終了か）を判別する。ＹＥＳの場合に
は、枝からの文字抽出が終了したので、Ｓ３で次の枝に
ついて繰り返し行う。一方、ＮＯの場合には、Ｓ７に進
む。【００２５】Ｓ７は、２本目以降の枝で、先頭の文字か
否かを判別する。これは、拡張によって生成した文字に
は適用させないようにするためである。ＹＥＳの場合に
は、拡張によって生成した文字であるのでＳ５に戻る。
ＮＯの場合には、拡張によって生成した文字でないの
で、次のＳ８に進む。【００２６】Ｓ８は、ローマ字拡張ルール３の先頭から
順にルールを適用する。これは、図２のローマ字拡張ル
ール３の先頭から順次適用する。Ｓ９は、ルールが終了
か否か判別する。ＹＥＳの場合には、Ｓ５で次の枝に進
む。ＮＯの場合には、Ｓ１０に進む。【００２７】Ｓ１０は、ルール適用できるか否かを判別
する。ＹＥＳの場合には、表記木の適用文字に枝をは
り、適用文字以降の文字を枝に複写する。これは、例え
ば図５の入力されたローマ字表記「Ｔ−Ｏ−Ｋ−Ｙ−
Ｏ」のうちの先頭から２番目の文字「Ｏ」について、図
２のローマ字拡張ルール３のルール番号５５を適用し、
出力文字「Ｏ、ＯＵ、ＯＯ」を取り出し、例えば２番目
の「ＯＵ」について、図５のに示すように、元のロー
マ字表記のＯから枝をはり、この先にＵをおき、これに
続いて元のローマ字表記のＯ以降、即ち「Ｋ−Ｙ−Ｏ」
を複写し、拡張する。同様に「ＯＯ」についても、図５
のに示すように拡張する。そして、Ｓ５以降を繰り返
し行う。【００２８】以上の処理を図５のローマ字表記「ＴＯＫ
ＹＯ」について繰り返し適用し、図５の表記木を作成す
る。この拡張した表記木をローマ字表記で表わすと、図
５の右側に記載した拡張ローマ字表記（中間候補のロー
マ字表記）となる。【００２９】図５は、本発明のローマ字拡張ルールの適
用例を示す。図５において、ローマ字表記１は、入力し
たローマ字表記であって、ここでは「ＴＯＫＹＯ」であ
る。【００３０】推論／制御エンジン２１は、ローマ字表記
１に対して、ローマ字拡張ルール３を適用するものであ
る。ローマ字拡張ルール３は、図２に示すようなローマ
字拡張ルールであって、ローマ字表記１を拡張するルー
ルである。【００３１】拡張ローマ字表記は、ローマ字表記１にロ
ーマ字拡張ルール３を適用して拡張した後のローマ字表
記である。表記木は、ローマ字表記の各文字をノードと
し、これらノード間を枝（アーク）でその関係を接続し
たものである。ここでは、元のローマ字表記の表記木
「Ｔ−Ｏ−Ｋ−Ｙ−Ｏ」について、２番目のＯのときに
Ｏ、ＯＵ、ＯＯに分岐し、更にこれら分岐した各表記木
でＯのときにＯ、ＯＵ、ＯＯに分岐し、合計９個のロー
マ字表記に拡張している。【００３２】次に、図６の流れ図に示す順序に従い、図
１の（イ）の構成について、図４の流れ図で拡張された
ローマ字表記群（図５の拡張ローマ字表記）に、図３の
ローマ字抑制ルール４を適用し、図７を参照して詳細に
説明する。【００３３】図６において、Ｓ２１は、ローマ字表記の
入力を行う。これは、例えば図５の拡張ローマ字表記に
示す９個のローマ字表記について順次１つ１つ入力す
る。Ｓ２２は、Ｓ２１で入力したローマ字表記より、１
文字抽出する（ルール適用上、後続１文字も含めて抽出
する）。【００３４】Ｓ２３は、終了か（文字の抽出終了か）否
かを判別する。ＹＥＳの場合には、一連の文字の抽出を
終了したので終わる。ＮＯの場合には、Ｓ２４に進む。
Ｓ２４は、ローマ字抑制ルールの先頭から順にルールを
適用する。【００３５】Ｓ２５は、ルールが終了か判別する。ＹＥ
Ｓの場合には、Ｓ２７で必要と判定されたローマ字表記
を出力する。一方、ＮＯの場合には、Ｓ２６でルール適
用か判別し、ＹＥＳのときにルールを適用して削除する
と決定されたので出力しないことにより削除を行い、終
了する。一方、ＮＯの場合には、Ｓ２２以降を繰り返し
行う。【００３６】以上の処理を図７の拡張ローマ字表記に適
用すると、×を記載したものが図３のローマ字抑制ルー
ル３に適合し不要なローマ字表記と判断されて抑制（削
除）し、残りの合計４個のローマ字表記が生成されるこ
ととなる。【００３７】図７は、本発明のローマ字拡張／抑制ルー
ルの適用例を示す。図７において、ローマ字表記１は、
入力したローマ字表記であって、ここでは「ＴＯＫＹ
Ｏ」である。【００３８】推論／制御エンジン２１は、ローマ字表記
１に対して、ローマ字拡張ルール３およびローマ字抑制
ルール４を適用するものである。ローマ字拡張ルール３
は、図２に示すようなローマ字拡張ルールであって、ロ
ーマ字表記１を拡張するルールである。【００３９】ローマ字抑制ルール４は、図３に示すよう
なローマ字抑制ルールであって、ローマ字拡張ルール３
によって拡張されたローマ字表記について、不要なもの
を抑制（削除）し、必要なローマ字表記を生成するもの
である。【００４０】図８は、本発明の他の具体例を示す。これ
は、ローマ字表記で漢字「藤堂」を入力しようとして、
ローマ字表記「ＴＯＤＯ」をキーボードからキー入力し
た例を示す。このローマ字表記「ＴＯＤＯ」に対応し
て、図２のローマ字拡張ルール３を適用すると、図示拡
張と記載した合計９個の拡張したローマ字表記が得られ
る。この拡張した９個のローマ字表記に、図３のローマ
字抑制ルール３を適用すると、図示抑制×としたローマ
字表記が抑制（削除）され、結果として右側に記載した
出力するローマ字表記「ＴＯＤＯ、ＴＯＤＯＵ、ＴＯＵ
ＤＯ、ＴＯＵＤＯＵ」が得られる。そして、これらロー
マ字表記をカナ変換し、「トド、トドウ、トウド、トウ
ドウ」を得る。これらのうちの１つのカナ例えば「トウ
ドウ」に対応づけて漢字「藤堂」を１つ登録しておけ
ば、カナ漢字変換することができる。他のカナ「ト
ド」、「トドウ」、「トウド」についてカナ漢字変換辞
書に登録しなくても、本実施例によればカナ漢字変換す
ることが可能となる。これは、特に人名、土地などの固
有名詞に有効であり、種々のカナに対応づけて漢字を辞
書に登録する必要がなく、１つのカナに対応づけて登録
すればよい。【００４１】【発明の効果】以上説明したように、本発明によれば、
ローマ字表記の拡張を行って複数のローマ字表記を得た
り、更にこれらの複数のローマ字表記のうちから不要な
ローマ字表記を抑制して適切なローマ字表記に生成した
り、これらローマ字表記をカナ変換したカナをもとにカ
ナ漢字辞書を検索して漢字にする構成を採用しているた
め、入力したローマ字表記１から適切な拡張したローマ
字表記を生成することができる。そして、これらローマ
字表記を更にカナ変換したカナ表記をもとに漢字に変換
することができる。これにより、（１）１つのローマ字表記を入力することにより、カ
ナ漢字変換時に必要と思われる複数のローマ字表記の候
補を自動的に生成することができる。そして、これら生
成した複数のローマ字表記をカナ変換し、カナ漢字変換
辞書を引いて漢字に変換することにより、同一漢字に少
ないカナの読みで登録すればよくなり、カナ漢字変換辞
書が簡素となる。【００４２】（２）特に、人名や地理などの固有名詞
について登録する語数が膨大となるために辞書に登録す
るカナによる読みを最小限にし、本発明の処理によって
入力されたローマ字表記を複数の必要なローマ字表記に
してからカナに変換し、これらカナをもとに辞書を引
き、該当する漢字を見つけることが可能となる。これ
は、外国語を日本語に機械翻訳する際に人名などのロー
マ字表記を本発明の処理によって複数のローマ字表記に
変換した後、カナ変換して辞書を引き、漢字を見つける
場合に便利である。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a Roman character conversion device for converting Roman character notation into Kana notation. In the case of inputting Japanese characters in Roman characters, not only one type of Roman characters but also a case where it is not possible to convert a commonly used Roman character into a desired Kana character. For example, looking at the case of a person's name, the kana notation "Ito" is "ITO", "ITO"
U "and" ITOH ". It is desired to convert to the correct kana notation including those used in the Roman notation. 2. Description of the Related Art Conventionally, when inputting kanji in Roman characters by word processing input or the like, usually, roman alphabet notation and kana notation are one.
It corresponds to one to one. However, notation such as a personal name is not always a single notation, and as shown in FIG. 9, there are several Roman notations, and these may correspond to one kana notation. In this way, Roman notation is
It does not necessarily correspond to kana notation one-to-one, and furthermore,
It is not necessarily the phonetic transcription itself. Therefore, for example, when "ITO" is input in Roman characters and converted to kana, "ito" is generated, and "ito" cannot be generated. [0004] As described above, when the Roman alphabet notation is converted to the kana notation using the conventional Roman alphabet kana conversion algorithm, there is a problem that some of the characters are not converted to the desired kana notation. there were. This is because Roman notation was converted to kana notation as a simple phonetic notation, and it is desired to resolve this and convert Roman notation to appropriate kana notation. SUMMARY OF THE INVENTION It is an object of the present invention to obtain a plurality of Roman alphabet notations by expanding the Roman alphabet notation, and to suppress unnecessary ones from the plurality of Roman alphabet notations to generate an appropriate appropriate Roman alphabet notation. . Means for solving the problem will be described with reference to FIG. In FIG. 1, an extended Roman character conversion device 2 converts a Roman alphabet notation 1 into a Roman alphabet expansion rule 3 and a Roman alphabet suppression rule 4.
Is applied. [0007] The Roman alphabet extension rule 3 is a rule for sequentially extracting characters from the beginning of the Roman alphabet notation and extending them to a plurality of Roman alphabet notations. The Roman character suppression rule 4 is a rule for determining whether or not to suppress characters by sequentially extracting characters from the head of a plurality of Roman character expressions expanded by the Roman character expansion rule 3. According to the present invention, as shown in FIG. 1, for an input Roman alphabet notation 1, an extended Roman alphabet conversion device 2 sequentially extracts characters from the beginning and applies a Roman alphabet extension rule 3 to form a plurality of Roman alphabet notations. To be expanded. In addition, the extended Roman alphabet conversion device 2 sequentially extracts characters from the beginning of the input Roman alphabet notation 1 and applies them to a plurality of Roman alphabet notations by applying the Roman alphabet extension rule 3, and further suppresses the expanded Roman alphabet notation. Only the Roman alphabet notation determined not to be suppressed by applying rule 4 is generated. Then, convert these extended or generated Roman alphabet notation to kana,
Based on the converted kana, the kana kanji dictionary is searched and converted to kanji. Therefore, by expanding the Roman alphabet notation to obtain a plurality of Roman alphabet notations, and by suppressing unnecessary ones among the plurality of Roman alphabet notations, an appropriate expanded Roman alphabet notation from the input Roman alphabet notation 1 can be obtained. , And these can be converted to kana notation. Next, the structure and operation of an embodiment of the present invention will be sequentially described in detail with reference to FIGS. FIG. 1 shows a configuration and explanatory diagram of one embodiment of the present invention. FIG. 1A shows a configuration diagram. In FIG. 1 (a), Roman alphabet notation 1 is a Roman alphabet notation that has been input, for example, a Roman alphabet notation input to convert Kana-Kanji to Roman from a keyboard using a word processor or the like, or a foreign language at the time of machine translation. Romanized notation translated into Japanese Romanized characters (for example, personal names, proper nouns, etc.). In particular, when a foreign language is translated into Japanese romaji, it is necessary to register kana (kanji) corresponding to all romaji notations in the kana-kanji conversion dictionary by expanding to a plurality of romaji notations by the present invention. Instead, it is only necessary to register kana (kanji) corresponding to typical Roman alphabet notation, and the number of entries in the dictionary can be reduced. The extended Roman alphabet conversion device 2 applies the Roman alphabet extension rule 3 and the Roman alphabet suppression rule 4 to the input Roman alphabet notation 1 to generate an appropriate extended Roman alphabet notation. The extended Roman character conversion device 2 is configured by applying a Roman alphabet expansion rule 3 and a Roman character control rule 4 to an input Roman alphabet notation 1 on a hardware inference / control engine. The Roman alphabet extension rule 3 is for generating a plurality of Roman alphabets by applying to the input Roman alphabet 1. For example, as shown in the right-hand side of Romaji notation 1 "TOKYO", "TOKYO, TOKYO, TOKYO, TOOK"
YO, TOOKYOO, TOOKYOU, TOUKY
O, TOKYO, TOKYOU ”(1) (to be described later with reference to FIGS. 4 and 5). The Roman alphabet suppression rule 4 applies the Roman alphabet extension rule 3 to the inputted Roman alphabet notation 1 and suppresses unnecessary Roman alphabet notation by expanding the intermediate candidates. Is generated. For example, in FIG.
Applying the Roman alphabet suppression rule 4 and suppressing as described on the right side, as a result, "TOKYO, TOKYO, TOKYO, TOUKO
YOU ”is generated as a candidate for Roman alphabet notation. The Roman alphabet / kana conversion unit 5 converts the Roman alphabet notation of candidates after being suppressed by the Roman alphabet suppression rule 4 into kana. For example, as described on the right side, Romanized notation of four candidates is converted into kana of "Tokyo, Tokyo, Tokyo, Tokyo" as shown in the figure. FIG. 1B shows a specific example. In (b) of FIG. 1, the intermediate candidate is Romanized notation 1 “TOK”.
The roman alphabet notation extended by applying the Roman alphabet extension rule 3 to “YO” is shown. The candidate indicates a Roman alphabet notation in which the Roman alphabet suppression rule 4 is applied to the intermediate candidate Roman alphabet notation. Kana indicates that the Romanized notation of the candidate is converted to kana. FIG. 2 shows an example of a Roman character extension rule according to the present invention. The “rule number” on the left is a rule number. The "target character" in the center is the input Roman alphabet notation 1
Is a character to be extracted one character at a time from the beginning and to be expanded, and to generate an output character on the right side when applicable. The “output character” on the right side is a character to be output for the target character, for example, for outputting two of “I” and “II” when the target character is “I”. FIG. 3 shows an example of a Roman character suppression rule according to the present invention. "Target character" on the left side is Roman extended rule 3.
The Romanized notation of intermediate candidates expanded by applying
This is a character (character string) that is extracted from the beginning and to which the suppression rule is applied. The previous connection rule is a rule for determining whether a character to be suppressed is suppressed or not suppressed based on a relationship with a preceding character. The post-connection rule is a rule for determining whether a character to be suppressed is to be suppressed or not to be suppressed based on a relationship with a subsequent character. Next, in accordance with the order shown in the flow chart of FIG. 4, the Roman alphabet extension rule 3 of FIG. 2 is applied to the input Roman alphabet notation “TOKYO” and the configuration of FIG. This will be described in detail. In FIG. 4, a step S1 is for inputting a Roman character notation. This is done by inputting “TOKYO”, for example, as shown in Roman notation in FIG. In step S2, the Roman alphabet notation input in step S1 is expanded as a notation tree branch. This expands nodes as T, O, K, Y, and O with-as branches, such as "TOOKYO" at the top of the notation tree in FIG. In step S3, one branch is extracted from the notation tree. That is, one branch is extracted from “TOOKYO” at the top of the notation tree obtained by expanding the Roman alphabet notation input in S2. In S4, it is determined whether or not the processing has been completed (whether branch extraction has been completed). Y
In the case of ES, the processing ends. In the case of NO, the process proceeds to S5. In step S5, one character is extracted from the branch (including the succeeding one character for rule application). In step S6, it is determined whether or not the processing has been completed (ie, whether or not character extraction has been completed). If YES, the character extraction from the branch has been completed, and the process is repeated for the next branch in S3. On the other hand, in the case of NO, the process proceeds to S7. In step S7, it is determined whether or not the branch is the first character in the second and subsequent branches. This is to prevent application to characters generated by expansion. In the case of YES, the process returns to S5 because the character is generated by extension.
In the case of NO, since the character is not a character generated by extension, the process proceeds to the next S8. In S8, the rules are applied sequentially from the top of the Roman character extension rule 3. This is applied sequentially from the top of the Roman character extension rule 3 in FIG. In step S9, it is determined whether or not the rule is completed. If YES, proceed to the next branch in S5. In the case of NO, the process proceeds to S10. In S10, it is determined whether or not the rule can be applied. In the case of YES, a branch is applied to the application character of the notation tree, and characters subsequent to the application character are copied to the branch. This corresponds to, for example, the input Romanized notation “TOKY-Y-
The rule number 55 of the Roman alphabet extension rule 3 of FIG. 2 is applied to the second character “O” from the top of “O”,
The output characters “O, OU, OO” are extracted, and for example, for the second “OU”, as shown in FIG. 5, a branch is formed from O in the original Roman alphabet notation, followed by U, followed by After the original Roman alphabet notation, that is, "KYO"
Copy and extend Similarly, for "OO", FIG.
Expand as shown in. Then, S5 and subsequent steps are repeated. The above processing is performed by using the romaji notation “TOK” in FIG.
YO ”is repeatedly applied to create the notation tree of FIG. If this expanded notation tree is expressed in Roman alphabet notation, it becomes the extended Roman alphabet notation (Roman alphabet notation of intermediate candidates) described on the right side of FIG. FIG. 5 shows an application example of the Roman character extension rule of the present invention. In FIG. 5, Roman alphabet notation 1 is the input Roman alphabet notation, and is “TOKYO” here. The inference / control engine 21 applies the Roman alphabet extension rule 3 to the Roman alphabet notation 1. The Roman character extension rule 3 is a Roman character extension rule as shown in FIG. The extended Roman alphabet notation is a Roman alphabet notation obtained by applying the Roman alphabet extension rule 3 to the Roman alphabet notation 1 and expanding it. The notation tree is obtained by making each character of the Roman alphabet notation into nodes, and connecting these nodes with branches (arcs). Here, with respect to the original notation tree “TOKYO” of the Roman alphabet notation, it branches to O, OU, and OO at the time of the second O, and at the time of O in each of the branched notation trees. , OU, and OO, and expanded to a total of nine Roman characters. Next, in accordance with the order shown in the flowchart of FIG. 6, the configuration of FIG. 1A is added to the Roman alphabet notation group (extended Roman alphabet notation of FIG. 5) extended in the flowchart of FIG. A detailed description will be given with reference to FIG. In FIG. 6, in step S21, a Roman character is input. For this, for example, the nine Roman alphabets shown in the extended Roman alphabet notation of FIG. 5 are sequentially input one by one. S22 is 1 from the Roman notation entered in S21.
Characters are extracted (including the following one character for rule application). In step S23, it is determined whether or not the processing has been completed (ie, the character extraction has been completed). If YES, the process ends because the extraction of a series of characters has been completed. In the case of NO, the process proceeds to S24.
In S24, the rules are applied sequentially from the top of the Roman character suppression rules. A step S25 decides whether or not the rule has ended. YE
In the case of S, the Roman notation determined to be necessary in S27 is output. On the other hand, in the case of NO, it is determined in S26 whether or not the rule is applied, and in the case of YES, it is determined that the rule is applied and deleted. On the other hand, in the case of NO, S22 and subsequent steps are repeated. When the above processing is applied to the extended Roman alphabet notation shown in FIG. 7, the ones marked with X conform to the Roman alphabet suppression rule 3 in FIG. 3 and are determined to be unnecessary Roman alphabet notations, and are suppressed (deleted). Four Roman notations will be generated. FIG. 7 shows an application example of the Roman character expansion / suppression rule of the present invention. In FIG. 7, the Roman alphabet 1 is
The entered Roman alphabet notation, in this case "TOKY
O ". The inference / control engine 21 applies the Roman alphabet extension rule 3 and the Roman alphabet suppression rule 4 to Roman alphabet notation 1. Roman alphabet extension rule 3
Is a Roman character extension rule as shown in FIG. The Roman character suppression rule 4 is a Roman character suppression rule as shown in FIG.
With regard to the Roman alphabet notation extended by the above, unnecessary Roman alphabet notation is suppressed (deleted), and necessary Roman alphabet notation is generated. FIG. 8 shows another embodiment of the present invention. This is because you are trying to enter the kanji "Todo" in Roman notation,
An example is shown in which Roman key notation "TODO" is key-inputted from a keyboard. When the Roman alphabet extension rule 3 of FIG. 2 is applied in correspondence with the Roman alphabet notation “TODO”, a total of nine extended Roman alphabet notations described as “illustrated extension” are obtained. When the Roman alphabet suppression rule 3 of FIG. 3 is applied to the expanded nine Roman alphabet notations, the Roman alphabet notation shown as “suppression x” is suppressed (deleted), and as a result, the output Roman alphabet notation “TODO, TODOU, TOU
DO, TOUDOU "is obtained. Then, these Roman characters are converted to kana to obtain "Todo, Todo, Todo, Todo". If one kanji character “Todo” is registered in association with one of these kana characters, for example, “Todo”, the kana kanji character can be converted. According to this embodiment, kana-kanji conversion can be performed without registering other kana “todo”, “todo”, and “todo” in the kana-kanji conversion dictionary. This is particularly effective for proper nouns such as a person's name and land, and there is no need to register kanji in a dictionary in association with various kana, and it is sufficient to register it in correspondence with one kana. As described above, according to the present invention,
By expanding the Roman alphabet notation to obtain multiple Roman alphabet notations, further suppressing unnecessary Roman alphabet notations from among these multiple Roman alphabet notations to generate appropriate Roman alphabet notation, and converting these Roman alphabet notations to Kana , It is possible to generate an appropriate extended Roman alphabet notation from the input Roman alphabet notation 1. The Roman notation can be further converted to Kanji based on the Kana notation obtained by performing the Kana conversion. Thereby, (1) By inputting one Roman alphabet notation, it is possible to automatically generate a plurality of Roman alphabet notation candidates deemed necessary at the time of Kana-Kanji conversion. By converting the generated Roman alphabet notation into kana, converting the kana-kanji conversion dictionary to kanji, and registering the same kanji with fewer kana readings, the kana-kanji conversion dictionary is simplified. . (2) In particular, since the number of words to be registered for proper nouns such as personal names and geography is enormous, reading by kana to be registered in a dictionary is minimized. The required Roman characters are converted to kana, then converted to kana, and a dictionary is searched based on these kana to find the corresponding kanji. This is useful when converting a Roman language notation such as a person's name into a plurality of Roman notations by the processing of the present invention when machine-translating a foreign language into Japanese, then converting it to Kana and looking up a dictionary to find Kanji. .

【図面の簡単な説明】【図１】本発明の１実施例構成・説明図である。【図２】本発明のローマ字拡張ルールの例である。【図３】本発明のローマ字抑制ルールの例である。【図４】本発明のローマ字拡張ルールの流れ図である。【図５】本発明のローマ字拡張ルールの適用例である。【図６】本発明のローマ字抑制ルールの流れ図である。【図７】本発明のローマ字拡張／抑制ルールの適用例で
ある。【図８】本発明の他の具体例である。【図９】カナ表記／ローマ字表記の例である。【符号の説明】１：ローマ字表記２：拡張ローマ字変換装置２１：推論／制御エンジン３：ローマ字拡張ルール４：ローマ字抑制ルール５：ローマ字カナ変換部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a configuration and explanatory diagram of one embodiment of the present invention. FIG. 2 is an example of a Roman character extension rule of the present invention. FIG. 3 is an example of a Roman character suppression rule of the present invention. FIG. 4 is a flowchart of a Roman character extension rule of the present invention. FIG. 5 is an application example of a Roman character extension rule of the present invention. FIG. 6 is a flowchart of a Roman character suppression rule according to the present invention. FIG. 7 is an application example of a Roman character extension / suppression rule of the present invention. FIG. 8 is another specific example of the present invention. FIG. 9 is an example of kana notation / Roman notation. [Description of Signs] 1: Roman alphabet notation 2: Extended Roman alphabet conversion device 21: Inference / control engine 3: Roman alphabet expansion rule 4: Roman alphabet suppression rule 5: Roman alphabet kana conversion unit

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/21 - 17/26 Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/21-17/26

Claims

(57) [Claims] (1) In a Roman alphabet conversion device for converting a Roman alphabet notation into a plurality of Roman alphabet notations, one character and a succeeding character are sequentially extracted from the head of the Roman alphabet notation to form a plurality of Roman alphabet notations. A Roman character expansion rule to be expanded, and a Roman character suppression rule that determines whether or not to suppress one character and at least one of a pre-connection and a post-connection in order from the top of a plurality of Roman character notations expanded by the Roman character expansion rule Means for extracting one character and the succeeding character in order from the beginning of the input Roman alphabet notation and applying the Roman alphabet extension rule to extend to a plurality of Roman alphabet notations; in order from the beginning of these expanded Roman alphabet notations One character and at least one of a pre-connection and a post-connection are taken out and the Roman character suppression rule is extracted. Romaji conversion apparatus characterized by comprising a means for generating only Roman alphabet that is determined not to inhibit by applying.