JP4191805B2

JP4191805B2 - Character string conversion apparatus and method having proofreading support function

Info

Publication number: JP4191805B2
Application number: JP32516095A
Authority: JP
Inventors: 博喜阿望; 健井上
Original assignee: 株式会社ジャストシステム
Priority date: 1995-11-20
Filing date: 1995-11-20
Publication date: 2008-12-03
Anticipated expiration: 2015-11-20
Also published as: JPH09146953A

Description

【０００１】
【発明の属する技術分野】
この発明は文字列変換に関し、特にその変換文字列に対する校正の支援を行う技術に関するものである。
【０００２】
【従来の技術およびその課題】
与えられたかな文字列を、漢字を含む文字列に変換するシステムが実用化されている。このシステムは、日本語ワードプロセッサ等において、かな文字を入力して漢字まじり文に変換するためのシステム（かな漢字変換システム）として用いられている。
【０００３】
このようなかな漢字変換システムにおいて、変換後の漢字まじり文が日本語として適切であるか否かは、使用するユーザが判断していた。つまり、かな漢字変換システムを使用しての文章入力中に適否を判断するか、あるいは、文章を印刷した後に適否を判断するかの何れかを行っていた。
【０００４】
しかしながら、上記のような校正作業は必ずしも容易でなく、また、ユーザによっては十分な校正を行うことができない場合もあった。
【０００５】
この発明は、上記のような従来の問題点を解決して、校正を容易に行うことのできる文字列変換装置および方法を提供することを目的とする。
【０００６】
【課題を解決するための手段】
この発明の文字列変換装置は、
かな文字列に対応付けて変換文字列およびその品詞を記憶した辞書と、
前記辞書を参照して、与えられたかな文字列を、文節に区切って変換文字列に変換した変換文字列候補を出力するとともに、確定指令を受けて、選択された変換文字列候補を確定文字列として出力する変換手段と、
変換文字列候補における各文節の品詞に基づいて、校正支援のための出力を行う校正支援手段と、
を備えている。
【０００７】
この発明の文字列変換装置は、校正支援手段が、変換文字列候補における各文節中の付属語が助詞であるか否かを判定し、同じ助詞を有する文節が所定回数以上連続した場合、助詞が連続している旨の校正支援出力を行うものであることを特徴としている。
【０００８】
この発明の文字列変換装置は、校正支援手段が、変換文字列候補において、各文節が連体修飾的用法になっているか否かを判断し、連体修飾的用法の文節が所定回数以上連続した場合、修飾語が連続している旨の校正支援出力を行うものであることを特徴としている。
【０００９】
この発明の文字列変換装置は、校正支援手段が、a)連体詞、ｂ）用言の連体形、ｃ）名詞相当語と助詞相当語の組合せの何れかによって構成された文節を見い出すと、この文節によって連体修飾適用法が１回連続したと判断し、この文節の後に、a)連体詞、ｂ）用言の連体形、ｃ）名詞相当語と助詞相当語の組合せ、d)名詞と助詞「の」の組合せの何れかによって構成された文節が、何回続くかに基づいて、連体修飾的用法の文節の連続を判断するものであることを特徴としている。
【００１０】
この発明の文字列変換装置は、校正支援手段が、変換文字列候補における文節の品詞に基づいて、文章が常体であるか敬体であるかを判断し、該判断に基づいて校正支援出力を行うものであることを特徴としている。
【００１１】
この発明の文字列変換装置は、校正支援手段が、変換文字列候補の文節が、「です」「ます」または「ございます」またはこれらの他の活用形である場合や、「ください」「なさい」「おっしゃい」を含む場合に、敬体であると判断することを特徴としている。
【００１２】
この発明の文字列変換装置は、校正支援手段が、変換文字列候補の文節のうち、句読点や記号の直前の文節が下記の何れかのものである場合に常体であると判断することを特徴としている。
【００１３】
ａ）体言と終助詞「か」の組合せ
ｂ）体言と「の」と終助詞の組合せ
ｃ）文節最後の用言または助動詞が命令形の場合
ｄ）文節最後の用言が、終止形と接続助詞の組合せの場合
ｅ）文節最後の用言が、終止形と終助詞の組合せの場合
ｆ）文節最後の用言が、連体形と接続助詞の組合せの場合
ｇ）文節最後の用言が、終止形と終助詞の組合せの場合
ｈ）文節最後の用言が、連体形と「の」と終助詞の組合せの場合
この発明の文字列変換装置は、予め決定された「常体」か「敬体」かの文体と、校正支援手段の判断した文体とが異なる場合には、文体が異なる旨の出力を行うことを特徴としている。
【００１４】
この発明の文字列変換装置は、校正支援手段が、１文の変換文字列候補に対して、各文節の文体が同一でないと判断した場合には、文体が変化した旨の校正支援出力を行うことを特徴としている。
【００１５】
この発明の文字列変換装置は、校正支援手段が、文節が助動詞「れる」「られる」を含む場合、受身表現である旨の校正支援出力を行うことを特徴としている。
【００１６】
この発明の文字列変換装置は、
かな文字列に対応付けて変換文字列および校正関連情報を記憶した辞書と、
前記辞書を参照して、与えられたかな文字列を、文節に区切って変換文字列に変換した変換文字列候補を出力するとともに、確定指令を受けて、選択された変換文字列候補を確定文字列として出力する変換手段と、
前記変換手段が、各文節に対応付けて辞書より得た校正関連情報に基づいて、校正支援のための出力を行う校正支援手段と、
を備えている。
【００１７】
この発明の文字列変換装置は、与えられたかな文字列に対する変換文字列候補を表示画面に表示するとともに、変換文字列候補に対する校正支援情報を併せて表示または音声にて出力することを特徴としている。
【００１８】
この発明の文字列変換装置は、校正支援情報を、変換文字列候補の近傍であって、変換文字列候補とは異なる行に表示するようにしたことを特徴としている。
【００１９】
この発明の文字列変換装置は、校正支援情報を、前記文字列候補と同じ行に表示するようにしたことを特徴としている。
【００２０】
この発明の文字列変換装置は、校正支援情報を、予め定められた前記文字列候補の表示位置にかかわらない所定の位置に表示するようにしたことを特徴としている。
【００２１】
この発明の文字列変換装置は、与えられたかな文字列に対する変換文字列候補を表示画面に表示し、変換文字列候補の各文節に対して校正支援情報がある旨の表示を行うとともに、校正支援情報有りの表示がなされた文節にカーソルが移動された場合にのみ、校正支援情報を表示するようにしたことを特徴としている。
【００２２】
この発明の文字列変換方法は、かな文字列に対応付けて変換文字列およびその品詞を予め辞書に記憶しておき、前記辞書を参照して、与えられたかな文字列を、文節に区切って変換文字列に変換した変換文字列候補を出力するとともに、確定指令を受けて、選択された変換文字列候補を確定文字列として出力する文字列変換方法において、変換文字列候補における各文節の品詞に基づいて、校正支援のための出力を行うようにしたことを特徴としている。
【００２３】
この発明の文字列変換方法は、変換文字列候補における各文節中の付属語が助詞であるか否かを判定し、同じ助詞を有する文節が所定回数以上連続した場合、助詞が連続している旨の校正支援出力を行うものであることを特徴としている。
【００２４】
この発明の文字列変換方法は、変換文字列候補において、各文節が連体修飾的用法になっているか否かを判断し、連体修飾的用法の文節が所定回数以上連続した場合、修飾語が連続している旨の校正支援出力を行うものであることを特徴としている。
【００２５】
この発明の文字列変換方法は、変換文字列候補における文節の品詞に基づいて、文章が常体であるか敬体であるかを判断し、該判断に基づいて校正支援出力を行うものであることを特徴としている。
【００２６】
この発明の文字列変換方法は、変換文字列候補における文節が助動詞「れる」「られる」を含む場合、受身表現である旨の校正支援出力を行うことを特徴としている。
【００２７】
この発明の文字列変換方法は、かな文字列に対応付けて変換文字列および校正関連情報を予め辞書に記憶しておき、前記辞書を参照して、かな文字列を、文節に区切って変換文字列に変換した変換文字列候補を出力するとともに、確定指令を受けて、選択された変換文字列候補を確定文字列として出力する文字列変換方法において、各文節に対応付けて辞書より得た校正関連情報に基づいて、校正支援のための出力を行うものであることを特徴としている。
【００２８】
この発明の文字列変換方法は、与えられたかな文字列に対する変換文字列候補を表示画面に表示するとともに、変換文字列候補に対する校正支援情報を併せて表示または音声にて出力することを特徴としている。
【００２９】
この発明を説明するために用いた用語の概念は、以下のとおりである。
【００３０】
「かな文字列」とは、かな文字列のみで構成される文字列だけでなく、アルファベットやカタカナや記号等を有する文字列も含む概念である。
【００３１】
「変換文字列」とは、漢字、かな、カタカナ、アルファベット、記号等の単独または組合せによる文字列をいう。
【００３２】
「変換文字列候補」とは、変換文字列の候補をいう。全部が未確定の状態のものだけでなく、一部に確定部分があるものも含む概念である。
【００３３】
「一文」とは、句点等にて終了する１つの文を言う。
【００３４】
「カーソル」とは、入力領域や制御対象領域を明らかにする画面上の表示をいう。実施形態における、注目文節カーソルＣＫがこれに該当する。
【００３５】
「校正支援情報の出力」とは、校正支援情報を、表示、印刷したり、他のソフトウエア等にデータとして与えることなどをいう。また、音声や振動等によって人間に対して報知することも含む概念である。
【００３６】
「記憶媒体」とは、フレキシブル・ディスク、ハードディスク、ＣＤ−ＲＯＭ等の、コンピュータ可読であってプログラムを固定可能な媒体をいう。
【００３７】
「コンピュータにより実行可能なプログラム」とは、記憶媒体に記憶されたプログラムが直接実行可能な場合だけでなく、インストールを行うことによって実行可能となるようなプログラムや他のプログラムと組み合わせることによって実行可能となるプログラム等の間接的に実行可能なものも含む。
【００３８】
【発明の効果】
この発明の文字列変換装置およびこの発明の文字列変換方法は、変換文字列候補における各文節の品詞に基づいて、校正支援のための出力を行うようにしたことを特徴としている。したがって、文字列変換の操作時に校正支援の情報を与えることができる。
【００３９】
この発明の文字列変換装置およびこの発明の文字列変換方法は、同じ助詞を有する文節が所定回数以上連続した場合、助詞が連続している旨の校正支援出力を行うものであることを特徴としている。したがって、文字列変換の操作時に助詞の不適切な連続状態を校正支援情報として与えることができる。
【００４０】
この発明の文字列変換装置およびこの発明の文字列変換方法は、変換文字列候補において、連体修飾的用法の文節が所定回数以上連続した場合、修飾語が連続している旨の校正支援出力を行うものであることを特徴としている。したがって、文字列変換の操作時に修飾語の不適切な連続状態を校正支援情報として与えることができる。
【００４１】
この発明の文字列変換装置は、校正支援手段が、a)連体詞、ｂ）用言の連体形、ｃ）名詞相当語と助詞相当語の組合せの何れかによって構成された文節を見い出すと、この文節によって連体修飾適用法が１回連続したと判断し、当該文節の後に、a)連体詞、ｂ）用言の連体形、ｃ）名詞相当語と助詞相当語の組合せ、d)名詞と助詞「の」の組合せの何れかによって構成された文節が何回続くかに基づいて、連体修飾的用法の文節の連続を判断するものであることを特徴としている。したがって、より正確に、不適切な修飾語の連続を検出することができる。
【００４２】
この発明の文字列変換装置およびこの発明の文字列変換方法は、変換文字列候補における文節の品詞に基づいて、文章が常体であるか敬体であるかを判断し、該判断に基づいて校正支援出力を行うものであることを特徴としている。したがって、文字列変換の操作時にその文体を校正支援情報として与えることができる。
【００４３】
この発明の文字列変換装置は、変換文字列候補の文節が、「です」「ます」または「ございます」またはこれらの他の活用形である場合や、「ください」「なさい」「おっしゃい」を含む場合に、敬体であると判断することを特徴としている。したがって、より正確に、敬体を検出することができる。
【００４４】
この発明の文字列変換装置は、変換文字列候補の文節のうち、句読点や記号の直前の文節が下記の何れかのものである場合に常体であると判断することを特徴としている。ａ）体言と終助詞「か」の組合せ、ｂ）体言と「の」と終助詞の組合せ、ｃ）文節最後の用言または助動詞が命令形の場合、ｄ）文節最後の用言が、終止形と接続助詞の組合せの場合、ｅ）文節最後の用言が、終止形と終助詞の組合せの場合、ｆ）文節最後の用言が、連体形と接続助詞の組合せの場合、ｇ）文節最後の用言が、終止形と終助詞の組合せの場合、ｈ）文節最後の用言が、連体形と「の」と終助詞の組合せの場合。したがって、より正確に、常体を検出することができる。
【００４５】
この発明の文字列変換装置は、予め決定された「常体」か「敬体」かの文体と、校正支援手段の判断した文体とが異なる場合には、文体が異なる旨の出力を行うことを特徴としている。したがって、文字列変換の処理時に、予定された文体と異なる文体が入力されたことを校正支援情報として与えることができる。
【００４６】
この発明の文字列変換装置は、１文の変換文字列候補に対して、各文節の文体が同一でないと判断した場合には、文体が変化した旨の校正支援出力を行うことを特徴としている。したがって、文字列変換の処理時に、文体の変化した文が入力されたことを校正支援情報として与えることができる。
【００４７】
この発明の文字列変換装置およびこの発明の文字列変換方法は、変換文字列候補における文節が助動詞「れる」「られる」を含む場合、受身表現である旨の校正支援出力を行うことを特徴としている。したがって、文字列変換の処理時に、受け身表現である旨の情報を校正支援情報として与えることができる。
【００４８】
この発明の文字列変換装置およびこの発明の文字列変換方法は、各文節に対応付けて辞書より得た校正関連情報に基づいて、校正支援のための出力を行うものであることを特徴としている。したがって、文字列変換の処理時に、辞書より得た情報に基づいて校正支援情報を与えることができる。
【００４９】
この発明の文字列変換装置およびこの発明の文字列変換方法は、変換文字列候補に対する校正支援情報を表示または音声にて出力することを特徴としている。したがって、変換文字列候補の表示に対応して、容易に校正支援情報を得ることができる。
【００５０】
この発明の文字列変換装置は、校正支援情報を、変換文字列候補の近傍であって、変換文字列候補とは異なる行に表示するようにしたことを特徴としている。したがって、校正支援情報の視覚的確認が容易である。
【００５１】
この発明の文字列変換装置は、校正支援情報を、前記文字列候補と同じ行に表示するようにしたことを特徴としている。したがって、校正支援情報の視覚的確認が容易である。
【００５２】
この発明の文字列変換装置は、校正支援情報を、予め定められた前記文字列候補の表示位置にかかわらない所定の位置に表示するようにしたことを特徴としている。したがって、校正支援情報の表示位置が一定され、その確認が容易である。
【００５３】
この発明の文字列変換装置は、変換文字列候補の各文節に対して校正支援情報がある旨の表示を行うとともに、校正支援情報有りの表示がなされた文節にカーソルが移動された場合にのみ、校正支援情報を表示するようにしたことを特徴としている。したがって、校正支援情報の有無を容易に確認できるとともに、その内容の表示必要範囲を小さくすることができる。
【００５４】
【発明の実施の形態】
図１に、この発明の一実施形態による文字列変換装置の全体構成を示す。この実施形態においては、かな文字列記憶部４、文節候補記憶部６、文節生成手段１４、文節候補選定手段１６によって変換手段２が構成されている。また、自立語辞書８と付属語辞書１０によって辞書１２が構成されている。
【００５５】
変換手段２に与えられた「かな文字列」は、かな文字列記憶部１４に記憶される。文節生成手段１４は、辞書１２を参照して、かな文字列記憶手段１４に記憶された「かな文字列」から、可能な文節を生成する。文節生成手段１４は、生成した文節を文節候補記憶部６に記憶する。つまり、各文節の自立語および付属語について辞書から取得した、漢字、品詞、活用形、校正関連情報等を記憶する。文節候補選定手段１６は、文節候補記憶部６に記憶された文節を組み合せて可能な文節経路を見い出す。可能な文節経路が複数存在する場合には、所定の選定処理によって、文節経路（変換文字列候補）を１つに絞る。
【００５６】
文節候補選定手段１６は、この変換文字列候補を表示手段２０に送って表示させる。また、変更指令が与えられると、異なる変換文字列候補を表示手段２０に表示させる。確定指令が与えられると、変換文字列候補を確定文字列として出力する。
【００５７】
校正支援手段１８は、文節候補記憶部６に記憶された文節の品詞に基づいて、校正支援情報を生成して表示手段２０に出力し表示させる。また、文節候補記憶部６に記憶された文節に関する校正関連情報に基づいて、校正支援情報を生成して表示手段２０に出力し表示させる。
【００５８】
図１の文字列変換装置を、ＣＰＵを用いて実現した場合のハードウエア構成を図２に示す。バスライン３４には、ＣＰＵ２２、ハードディスク２４、表示手段であるディスプレイ２０、メモリ２８、フレキシブル・ディスク・コントローラ（ＦＤＤ）３０、キーボード３２が接続されている。ハードディスク２４には、自立語辞書８、文字列変換処理のプログラム４０、校正支援処理のプログラム４２等が記憶されている。これらは、ＦＤＤ３０を介して、フレキシブルディスク（ＦＤ）３６からインストールしたものである。もちろん、ＣＤ−ＲＯＭ等から取り込んだものであってもよい。また、通信回線を介してダウンロードしたものであってもよい。
【００５９】
メモリ２８には、かな文字列を記憶するためのかな文字列記憶部４や文節候補を記憶するための文節候補記憶部６が設けられている。また、付属語辞書１０も設けられている。
【００６０】
図３に、自立語辞書８のデータ構造を示す。なお、自立語とは、その語単独で文節となりうる語をいう。自立語辞書８には、変換対象である文字列（読み）に対応する漢字が記憶されている。さらに、これらに対応して、品詞および校正関連情報が記憶されている。なお、「読み」の欄には、かな文字が記憶されるのが一般的であるが、その一部または全部にアルファベットを含んでいてもよい。また、「漢字」の欄には、漢字だけでなく、かなやアルファベットも記憶されていてもよい。
【００６１】
図４に、付属語辞書１０のデータ構造を示す。付属語とは、その語単独で文節となり得ない語をいう。付属語辞書１０には、付属語と、これに対応して、その品詞、活用形が記憶されている。品詞は、助詞、助動詞、用言の活用語尾に分類して記憶されている。助詞は、さらに格助詞、接続助詞、副助詞、終助詞に分類して記憶されている。また、用言の活用語尾については、その活用形（未然形、連用形、終止形・・・命令形）が記憶されている。さらに、各付属語が、どの言葉の後に用いうるかも記憶されている（先行しうる自立語・付属語）。
【００６２】
なお、この実施形態おいて用いた、品詞の分類を以下に示す。
【００６３】
一般名詞：物や物の状態に付ける一般的な名詞
名詞サ変：後ろにサ変動詞「する」をつけることのできる名詞
名詞ザ変：後ろにザ変動詞「ずる」をつけることのできる名詞
名詞形動：ものごとの姿・状態・性質などを表し、うしろに「だ」「な」を付けることのできる名詞
独立語：動詞や助動詞がつかず、他の言葉とつながりを持たない言葉
連体詞：活用がなく、体言だけを修飾する言葉
接続詞：語句を続けるために使う言葉
感動詞：感動、応答、呼びかけなどを表す言葉
接頭語：他の前に付けて使う言葉
接尾辞：他の言葉の後ろに付けて使う言葉
数詞：数量、順序などを数えて表す言葉
動詞：活用の仕方によりさらに細分類している
形容詞：もごとの性質・状態を表し、言い切るときの形が「い」で終わる言葉
形容動詞：もごとの性質・状態を表し、言い切るときの形が「だ」「な」で終わる言葉
副詞：主に用言を修飾する言葉
【００６４】
【変換処理および校正支援処理の概略】
図５に、文字列変換プログラム４０と校正支援プログラム４２のフローチャートを示す。なお、両プログラムは一体化してもよいし、その一部または全部をモジュール化して分離してもよい。このフローチャートに従って、「わたしはにほんごしょりにかんしんがある」というかな文字列を、変換する処理について説明する。まず、キーボード３２から上記のかな文字列が入力されると（ステップＳ１）、ＣＰＵ２２は、これをメモリ２８内のかな文字列記憶部４に記憶する（図６参照）。
【００６５】
つぎに、ＣＰＵ２２は、かな文字列記憶部４のかな文字列に付き、先頭のかな文字から順に、自立語辞書８および付属語辞書１０を検索して文節を生成する（ステップＳ２）。つまり、可能な文節を全て生成し、メモリ２８内の文節候補記憶部６に記憶する。図６に、このようにして文節候補記憶部６に記憶される文節を、かな文字列に対応付けて示す。図中、α１、α２、α３、α４、β１・・・が生成された文節を示している。文節α１〜α４は、「わ」を先頭とする検索文字列に対応する漢字（および付属語）である。文節α１は検索文字列が１文字であり、文節α２は２文字、文節α３は３文字、文節α４は４文字である。各長さの検索文字列について、１つの文節のみを候補として記憶している。つまり、文節α１においては、「和」以外に多数の漢字が検索されるが、使用頻度や前後の単語との関係等を考慮して、最も優先度の高い漢字「和」を１つだけ選択している。文節α２、α３、α４についても同様である。
【００６６】
また、文節β１、β２、β３は、「た」を先頭とする検索文字列に対応する漢字（および付属語）である。同様にして、可能性のある全ての文節が文節候補記憶部６に記憶される。なお、記憶する際には、図７に示すように、辞書より取得した品詞、活用形、校正関連情報（後述する）を併せて記憶する。文節の位置情報、つまり、かな文字列のどの部分に対応するかの情報も記憶するが、図７においては省略している。
【００６７】
以上のようにして、文節の生成および文節候補記憶部６への記憶が終了すると、ＣＰＵ２２は、文節経路の候補を選定する（ステップＳ３）。つまり、かな文字列に対応づけて、組合せ可能な文節を見いだす。たとえば、文節α１の「和」に続けることが可能な文節は、文節β１〜β３だけであり、それ以外の文節は続けることができない。このような検討を最後の文節まで行って、可能な文節の組合せ（文節経路と呼ぶ）を見いだす。複数の文節経路が見いだされた場合には、学習情報や文節間の関連等に基づき、最も優先度の高い文節経路を１つ選定する。このような文節経路の選定により、たとえば「私は／日本語／処理に／感心が／ある／。」という変換文字列候補が選定される（／は文節の区切りを表す）。
【００６８】
次に、ＣＰＵ２２は、この変換文字列候補の各文節の品詞に基づいて、校正支援の必要性の有無を判断する（ステップＳ４）。校正支援の必要があれば、変換文字列候補をディスプレイ２０に表示する際に、併せて校正支援情報を表示する（ステップＳ５）。校正支援の必要がなければ、変換文字列候補のみをディスプレイ２０に表示する（ステップＳ６）。表示例を図８Ａに示す。図において、文節「私は」がカーソルＣＫで囲まれている。これは、現在の処理対象である文節（注目文節と呼ぶ）が「私は」であることを示すものである。なお、校正支援が必要であるか否かの判断処理は、後で詳述する。
【００６９】
変換文字列候補を表示した後、ユーザからの指令を待つ（ステップＳ７）。ユーザは、ディスプレイ２０に表示された変換文字列候補を見て、種々の指令をキーボード３２（マウス等の場合もある）から入力する
ステップＳ７において、注目文節の移動指令が入力されると、ＣＰＵ２２は、注目文節を指令された方向に従って移動させ、カーソルＣＫの表示も併せて移動させる。たとえば、図８Ａの状態において、注目文節を右に移動させる指令が与えられると、図８Ｂに示すように、カーソルＣＫが右に移動し「日本語」が注目文節となる。さらに、２回右に移動させる指令が与えられると、図８Ｃに示すように注目文節のカーソルＣＫが「感心が」に移動する。
【００７０】
ステップＳ７において、他候補の指令が与えられると、注目文節「感心が」の自立語、付属語について、自立語辞書８、付属語辞書１０から他の候補を選択してくる（ステップＳ１０）。たとえば、「関心が」を選択する。ＣＰＵ２２は、この際に、図７に示す文節候補記憶手段６の「感心が」の部分を、「関心が」に書き換え、品詞や活用形や校正関連情報等も、新たに辞書から取得したものに書き換える。その後、再び、この新たな変換文字列候補について、校正支援の必要性の有無を判断し（ステップＳ４）、表示を行う（ステップＳ５、Ｓ６）。つまり、図８Ｄに示すような表示が行われる。
【００７１】
また、ステップＳ７において、後変換指令が与えられると、注目文節全体を該指令の内容に従って後変換する（ステップＳ１２）。たとえば、カタカナへの後変換指令が与えられると、図８Ｅに示すように注目文節全体が「カンシンガ」に変更される。これに応じて、図７の文節候補記憶部６の内容も上記と同様にして、書き換えられる。ただし、この言葉は、辞書から取得したものではないので（後変換語と呼ぶ）、品詞の欄に、強制的に一般名詞とされる。
【００７２】
また、ステップＳ７において、文節区切りの変更指令が与えられると、注目文節を長く（または短く）して（ステップＳ１１）、再び、文節候補記憶部６の記憶内容に基づいて文節経路候補を選定する（ステップＳ３）。つまり、文節区切り位置の異なる新たな変換文字列候補を生成する。以降の処理は、上記で説明した処理と同様であり、再び、校正支援の必要性の有無を判断した後、表示を行う（ステップＳ４、Ｓ５、Ｓ６）。
【００７３】
以上のような処理を経て、確定指令が入力されると、その時点の変換文字列候補を確定文字列（変換文字列）としてアプリケーション等（ワープロソフトウエア等）に出力する（ステップＳ８）。この実施形態においては、校正支援情報は、アプリケーション等には出力しない。ただし、アプリケーション側で、校正支援情報が有用な場合もあるので、校正支援情報である旨を明らかにして出力するようにしてもよい。特に、校正支援情報が添付されていることによって、ユーザが表示された校正支援に反してでもそのような表現を用いたかったこと等をアプリケーション側で判断することもできる。
【００７４】
ステップＳ８においては、併せて、確定した文字列であることを明らかにするため、その画面上における表示色を変更する。また、校正支援情報の表示を止める。
【００７５】
また、確定指令が与えられて確定文字列を出力した時点で、１つの文が終わっていれば（「。」によって文が終了する）、図７の文節候補記憶部６の内容をクリアする。文の途中で確定指令が出された場合には、文節候補記憶部６の内容はそのまま保持される。つまり、この実施形態では、１つの文の全ての文節が確定されるまで、文節候補記憶部６の内容が保持されるようになっている。
【００７６】
【助詞の連続判定の処理】
次に、ステップＳ４における校正支援の必要性有無の判断処理を説明する。この実施形態では、助詞の連続、修飾語の連続、文体、受け身表現等について、ステップＳ４での判断を行っている。まず、助詞の連続判定処理について、図９のフローチャートを参照して説明する。
【００７７】
ここにいう助詞の連続とは、「私の子の問題の・・・」や「急いで自転車で・・・」のように、助詞「の」や「が」が連続しており不明瞭となる可能性が高い場合をいう。このような場合には、図１０Ａに示すように｛助詞の連続｝という校正支援の表示を行って、ユーザにその旨の注意を与えるようにしている。なお、名詞相当語（一般名詞、固有名詞、名詞サ変、数詞、接尾語、助数詞、後変換語）を含む文節であって、その付属語末尾の助詞が、「の」「で」「が」「を」であるものを対象として判断している。
【００７８】
まず、ステップＳ２０において、ＣＰＵ２２は、「の」のためのカウンタＮ、「で」のためのカウンタＤ、「が」のためのカウンタＧ、「を」のためのカウンタＷＯを、メモリ２８内に確保して、その内容をクリアする。次に、変換文字列候補の最初の文節（図１０Ａの「私の」）が、名詞相当語を含んでいるか否かを判断する（ステップＳ２１）。この判断は、文節候補記憶部６（図７参照）の記憶内容を参照することにより行う。たとえば、「私の」であれば、「一般名詞、格助詞」と記憶されているので、名詞相当語を含む文節であると判断される。
【００７９】
名詞相当語を含む文節であれば、付属語を含む文節であるか否かを判断する（ステップＳ２２）。前述のように、「私の」に関して、付属語である「格助詞」が文節候補記憶部６に記憶されているので、付属語を含むと判断される。さらに、付属語の末尾が助詞であるか否かが判定される（ステップＳ２３）。ここでは、助詞であると判断される。
【００８０】
助詞であると判断されると、当該助詞の種類が検討される（ステップＳ２４）。ここでは、「の」であるから、ステップＳ２５に進んで、カウンタＮをインクリメントする。次に、カウンタＮが「１」であるか否かを判断する（ステップＳ２６）。「の」の連続の最初であれば、カウンタＮは「１」となるので、ステップＳ２７、Ｓ２８、Ｓ２９、Ｓ３０、Ｓ３３を経て、ステップＳ２１に戻る。つまり、次の文節「子の」について、上記の処理を繰り返す。これにより、助詞「の」が連続した場合に、その回数をカウンタＮに記憶することができる。
【００８１】
なお、ステップＳ２４において、「で」「が」「を」であった場合のフローチャートは省略している。これらの場合も、「の」と同様の処理を行っている。
【００８２】
例えば、図１０Ａに示すような変換文字列候補の場合には、最後の文節の処理を終えると、Ｎカウンタの値が「３」となる。その後、ステップＳ３０において、この後に文節がないことを判定すると、ステップＳ３１に進む。ステップＳ３１では、カウンタＮ、Ｇ、Ｄ、ＷＯの何れかが所定値以上であるか否かを判断する。この実施形態では、カウンタＮ、Ｇについては所定値を「３」、カウンタＤ、ＷＯについては所定値を「２」としている。つまり、「の」「が」については３回以上の繰り返し、「で」「を」については２回以上の繰り返しにより、校正支援を行うようにしている。ここでは、カウンタＮが「３」であって所定値以上であるから、現在の文節に対し助詞が連続している旨のフラグを立てる。つまり、図１１Ａに示すように、フラグテーブルの文節番号３の欄に対し、助詞の連続の項目を「１」にする。
【００８３】
図５のステップＳ５では、このフラグに基づいて、３番目の文節の後に、｛助詞の連続｝という校正支援情報を表示する（図１０Ａ参照）。
【００８４】
図９の助詞の連続判定の処理では、異なる助詞が続いた場合には、カウンタをクリアするようにしている（ステップＳ２９）。したがって、図１０Ｂに示すように、途中に助詞「い」が入る場合には、助詞「の」が全体として３個あっても、｛助詞の連続｝という校正支援情報は出さない。ただし、図１０Ｃに示すように、付属語を持たない名詞相当語「即時」が間にある場合には、カウンタをクリアせず、カウントを継続する。つまり、ステップＳ２２、Ｓ３０、Ｓ３３、Ｓ２１の処理経路には、カウンタのクリアが設けられていない。
【００８５】
また、この実施形態では、図１０Ｄに示すように、所定値を越えた文節の直後に表示するのではなく、当該助詞の連続する文節の最後に校正支援情報を表示するようにしている。
【００８６】
なお、図１１Ａに示すようなフラグテーブルを有しているので、図１０Ｅに示すように、変換文字列候補中の途中の文節においても校正支援情報を表示できる。また、フローチャートでは省略したが、図１０Ｆに示すように、途中に句読点や記号等があるとカウンタをクリアするようにしている。
【００８７】
さらに、ステップＳ２３において、助詞「〜だの」「〜もの」「〜ので」「〜ようで」「〜そうで」が末尾に来ても、ステップＳ３５に分岐するようにしている。これもフローチャートでは省略している。
【００８８】
図５の全体フローチャートから明らかなように、候補の変更や文節区切りの変更があった場合には、再度、校正支援の必要性を判断するようにしている（ステップＳ４）。たとえば、図１０Ａのような状態において、「子の」を連体詞である「この」に変更すると、図１０Ｇに示すように校正支援情報の表示は消える。同様に、「私の」を、後変換により「ワタシノ」にすれば、品詞が名詞のみとなるので、図１０Ｈに示すように校正支援情報の表示は消える。
【００８９】
また、文節候補記憶部６には、１つの文が最後まで確定されるまで品詞情報等を保持されるようにしている。したがって、上記の処理は、未確定の文節に対してだけでなく、すでに確定された文節も含めて行うことができる。たとえば、図１０Ａにおいて、「私の」「子の」が確定されおり、「問題の」だけが未確定の場合であっても、同じように校正支援情報は表示される。よって、より実用性の高い、校正支援を行うことができる。
【００９０】
【修飾語の連続判定】
次に、修飾語の連続判定処理について説明する。ここで、修飾語の連続とは、連体修飾語の文節が３以上連続する場合、たとえば、「白い／大きい／かごの／中の・・・」という文のように、修飾語が連続しており不明瞭となる可能性が高い場合をいう。このような場合には、図１３Ａに示すように｛修飾語の連続｝という校正支援の表示を行って、ユーザにその旨の注意を与えるようにしている。
【００９１】
図１２のフローチャートを参照して、修飾語の連続判定の処理を説明する。まず、ステップＳ４０において、名詞相当語と助詞相当語（〜による、〜における、〜に関するをいう）の順で組合せとなる複数の文節を、変換文字列候補の中から探し出す。このような複数の文節を、１つの連体修飾的用法の文節であるとみなす（ステップＳ４０）。この判断は、文節候補記憶部６の記憶内容に基づいて行うことができる。たとえば、「特許に関する」という文は、「特許に」という一般名詞と格助詞との組合せからなる文節と、「関する」という動詞の語幹と活用語尾との組合せからなる文節によって構成されている。ステップＳ４０では、この２つの文節を、１つの連体修飾的用法の文節であるとみなす。「特許」という名詞相当語と「〜に関する」という助詞相当語の組合せと見ることができるからである。ただし、１つの文節とみなすのは、修飾語の連続判定処理においてのみである。
【００９２】
次に、ＣＰＵ２２は、修飾語のためのカウンタＲをメモリ２８内に確保して、その内容をクリアする（ステップＳ４１）。まず、最初の文節の品詞が、連体詞（あらゆる、たいした、とんだ等）であるか否かを判断する（ステップＳ４２）。この判断も、文節候補記憶部６の「品詞」の欄を参照することによって行うことができる（図７参照）。連体詞であれば、カウンタＲをインクリメントする（ステップＳ４９）。そして、次の文節についての解析を行う。
【００９３】
連体詞でなければ、文節が用言（動詞、形容詞、形容動詞）の連体形であるか否かを判断する（ステップＳ４３）。用言であるか否かは、文節候補記憶部６の「品詞」の欄、連体形であるか否かは「活用形」の欄を参照して判断することができる。用言の連体形（たとえば、白い、元気な、大きい等）であれば、カウンタＲをインクリメントし（ステップＳ４９）、次の文節についての解析を行う。
【００９４】
用言の連体形でなければ、文節がステップＳ４０で判断した連体修飾的用法に該当するか否かを判断する（ステップＳ４４）。連体修飾的用法であれば、カウンタＲをインクリメントし（ステップＳ４９）、次の文節についての解析を行う。
【００９５】
連体修飾的用法でなければ、文節が名詞と格助詞「の」との組合せによって構成されているか否かを判断する（ステップＳ４５）。そうでなければ（つまり、ステップＳ４２〜Ｓ４５の連体修飾的用法のいずれでもなければ）、カウンタＲによって、連体修飾的用法の連続回数を判定する（ステップＳ４６）。カウンタＲが「３」以上であれば、現在の文節の直前の文節に対して、修飾語の連続フラグを立てる（図１１参照）。
【００９６】
また、ステップＳ５２において、最後の文節までくれば、同様に、カウンタＲの値が「３」以上か否かを判断する（ステップＳ５４）。３以上であれば、現在の文節に対して、図１１の修飾語の連続フラグを立てる。たとえば、図１３Ａに示すような変換候補文字列であれば、図１１Ｂに示すように、４番目の文節に対してフラグが立てられる。
【００９７】
図５のステップＳ５では、このフラグに基づいて、４番目の文節の後に、｛修飾語の連続｝という校正支援情報を表示する（図１３Ａ参照）。
【００９８】
なお、この実施形態においては、名詞と「の」の組合せによる文節によって修飾語の連続は開始しないものとしている。つまり、カウンタＲが「０」の場合には、カウンタＲをインクリメントしないようにしている（ステップＳ５０）。したがって、図１３Ｂのような変換候補文字列に対しては、校正支援情報は表示されない。ただし、一旦、修飾語の連続が開始すると、名詞と「の」の組合せによる文節の存在により、カウンタＲをインクリメントするようにしている（ステップＳ５０、Ｓ５１）。したがって、図１３Ａのような変換候補文字列に対しては、校正支援情報が表示される。
【００９９】
図１２の修飾語の連続判定の処理では、連体修飾的用法の文節でない文節があった場合には、カウンタＲをクリアするようにしている（ステップ、Ｓ４５、Ｓ４６、Ｓ４８、Ｓ、４９、Ｓ４１）。したがって、図１３Ｃに示すように、途中に文節「中に」があると、連体修飾的用法の文節が全体として３個あっても、｛修飾語の連続｝という校正支援情報は出さない。ただし、図１３Ｄに示すように、付属語を持たない名詞相当語「竹」が間にある場合には、カウンタＲをクリアせず、カウントを継続する。したがって、校正支援情報が表示される。この処理は、図１２のフローチャートにおいては省略している。
【０１００】
また、この実施形態では、図１３Ａに示すように、所定値を越えた文節「かごの」の直後に表示するのではなく、連続する連体修飾的用法の文節の最後の文節「中の」の後に校正支援情報を表示するようにしている。
【０１０１】
なお、図１１Ｂに示すようなフラグテーブルを有しているので、図１３Ｅに示すように、変換文字列候補中の途中の文節においても校正支援情報を表示できる。また、フローチャートでは省略したが、図１３Ｆに示すように、途中に句読点や記号等があるとカウンタをクリアするようにしている。
【０１０２】
図５の全体フローチャートから明らかなように、候補の変更や文節区切りの変更があった場合には、再度、校正支援の必要性を判断するようにしている（ステップＳ４）。この点は、助詞の連続の場合と同じである。
【０１０３】
また、文節候補記憶部６には、１つの文が最後まで確定されるまで品詞情報等を保持されるようにしている。したがって、上記の処理は、未確定の文節に対してだけでなく、すでに確定された文節も含めて行うことができる。この点も、助詞の連続の場合と同じである。
【０１０４】
なお、上記実施形態では、ステップＳ４２、Ｓ４３、Ｓ４４、Ｓ４５の何れかに示す条件を満たすものを連体修飾的用法の文節であるとしている。しかし、処理速度と校正支援の要求される度合い等の観点から、何れかを連体修飾的用法でないものとして扱ってもよい。また、その条件を緩和したり、厳しくしたりしてもよい。
【０１０５】
【文体の判定】
次に、文体の判定処理について説明する。ここで、文体とは、ていねいな表現であるか（敬体、ですます調）、通常の表現（常体、である調）であるかをいうものである。この実施形態では、予め、ユーザに文体を指定させておき、この文体と変換文字列候補の文体が異なる場合に、｛である調｝や｛ですます調｝という校正支援の表示を行って、ユーザにその旨の注意を与えるようにしている。
【０１０６】
図１４のフローチャートを参照して、文体の判定処理を説明する。まず、ステップＳ６０において、ＣＰＵ２２は、メモリ２８内に、敬体フラグＫと常体フラグＪを設定し、その内容をクリアする。次に、変換文字列候補の最初の文節について、助動詞「です」「ます」「ございます」のうち何れか（全ての活用形を含む）を含んでいるか否かを判断する（ステップＳ６１）。これは、文節候補記憶部６の記憶内容に基づいて判断することができる。含んでいれば、敬体であると判断し、敬体フラグＫを「１」にする（ステップＳ６５）。例えば、図１５Ａに示すように変換文字列候補が「吾輩は猫である。」であった場合には、その最初の文節「吾輩は」は、上記の助動詞を含んでいない。したがって、ステップＳ６２に進む。
【０１０７】
ステップＳ６２においては、当該文節が用言の命令形「ください」「なさい」「おっしゃい」（ラ行特別）を含んでいるか否かを判断する。この判断も、文節候補記憶部６の「品詞」「活用形」の欄を参照して行うことができる。これを含んでいれば、敬体であると判断し、敬体フラグＫを「１」にする（ステップＳ６５）。図１５Ａの例では、最初の文節「吾輩は」には、上記の命令形は含まれていないので、ステップＳ６３に進む。
【０１０８】
ステップＳ６３においては、当該文節が句読点や記号の直前のものか否かを判断する。「吾輩は」は、これに該当しないので、ステップＳ６３、Ｓ７２を経て、次の文節についてステップＳ６１以下を繰り返す。
【０１０９】
次の文節「猫である」は、ステップＳ６１、Ｓ６２の何れにも該当しない。「猫である」は句読点の直前の文節であるから、ステップＳ６３において、ステップＳ６４に分岐する。ステップＳ６４においては、文体が常体であるか否かを判断する。このように、常体か否かの判断は、句読点や記号の直前の文節によって行う。
【０１１０】
この実施形態では、以下の何れかの条件を満たす場合に、常体であると判断している。
【０１１１】
a1)文節が用言を含まず、かつ、体言＋終助詞「か」である場合：例「問題か？」
a2)文節が用言を含まず、かつ、体言＋「の」＋終助詞である場合
b1)文節最後の用言または助動詞が命令形の場合：例「考えろ。」
c1)文節最後の用言または助動詞が終止形＋接続助詞の場合：例「考えるが、」
c2)文節最後の用言または助動詞が終止形＋終助詞の場合
c3)文節最後の用言または助動詞が連体形＋接続助詞の場合
c4)文節最後の用言または助動詞が終止形＋終助詞の場合
d1)文節最後の用言が連体形＋「の」＋終助詞の場合：例「考えるのか？」。
【０１１２】
ここで「猫である」は、上記に該当するので、常体であると判断される。よって、常体フラグＪを「１」にする（ステップＳ７１）。
【０１１３】
次の文節は「。」であり、これによって文が終わるので、ステップＳ６６からＳ６７へ分岐する。ステップＳ６７においては、予め設定していた文体（メモリ２８に領域を設けて記憶しておく）と、判定された文体とが異なるか否かを判断する。設定文体が「敬体」であれば、判定文体と異なるので、文体エラー表示フラグを立てる（ステップＳ６８）。
【０１１４】
図５のステップＳ５では、このフラグに基づいて、最後の文節の後に、｛である調｝という校正支援情報を表示する（図１５Ｂ参照）。
【０１１５】
なお、上記の場合に、設定文体が「常体」であれば、判定文体と一致しているので、図１５Ａに示すように、校正支援情報は表示されない。
【０１１６】
また、設定文体が「常体」であって、変換文字列候補が図１５Ｃのようであれば、｛ですます調｝という校正支援情報が表示される。
【０１１７】
さらに、ステップＳ６９において、フラグＫ、Ｊがともに「１」であるか否かを判断している。両フラグがともに「１」であるということは、１つの文章中で文体が変化したことを示している。たとえば、図１５Ｄのような変換文字列候補の場合には、両フラグが「１」となる。この場合には、ステップＳ７０において、文体変化表示フラグを立てる。
【０１１８】
図５のステップＳ５では、このフラグに基づいて、最後の文節の後に、｛文体が変化｝という校正支援情報を表示する（図１５Ｄ参照）。
【０１１９】
なお、この実施形態では、１つの文に対して文体の変化を判定している。しかし、すでに最後の文節まで確定された以前の文をバッファ等に記憶しておけば、複数文にわたっての文体変化判定が可能である。
【０１２０】
図５の全体フローチャートから明らかなように、候補の変更や文節区切りの変更があった場合には、再度、校正支援の必要性を判断するようにしている（ステップＳ４）。この点は、助詞の連続の場合と同じである。
【０１２１】
また、文節候補記憶部６には、１つの文が最後まで確定されるまで品詞情報等を保持されるようにしている。したがって、上記の処理は、未確定の文節に対してだけでなく、すでに確定された文節も含めて行うことができる。この点も、助詞の連続の場合と同じである。
【０１２２】
なお、上記実施形態では、前述のa1)a2)b1)c1)c2)c3)c4)d1)の何れかに示す条件を満たすものを常体としている。しかし、処理速度と校正支援の要求される度合い等の観点から、何れかを常体でないものとして扱ってもよい。また、その条件を緩和したり、厳しくしたりしてもよい。敬体についても同様である。
【０１２３】
【受身表現の判定】
次に、受身表現の判定処理について説明する。受身表現のある文節が見いだされたら、｛受身表現｝という校正支援情報を表示して、ユーザの注意を喚起する。この判定は、助動詞「れる」「られる」を含む文節があるか否かによって行う。たとえば、図１６Ａのように、「このように思われる。」という変換文字列候補であった場合には、「思われる」という文節が検出されて、受け身表示フラグが「１」となる。これを受けて、図１６Ａに示すように校正支援表示がなされる。
【０１２４】
【ら抜き表現の判定】
次に、ら抜き表現の判定処理について説明する。この実施形態では、ら抜き表現に該当する動詞を自立語辞書に登録する際に、「ら抜き表現」である旨を校正関連情報として記憶しておく。図３に示すように、「食べれ」に対して、校正関連情報として「ら抜き表現」を記憶しておく。ＣＰＵ２２は、図５のステップＳ２において、文節を生成する際に、この校正関連情報を文節候補記憶部６に記憶する。さらに、ステップＳ５、Ｓ６において、表示を行う際に、校正関連情報があれば当該文節の後にこれを表示する（図１６Ｂ参照）。
【０１２５】
なお、校正関連情報としては、「ら抜き表現」にとどまらず、校正に有用な他の情報も記憶することができる。
【０１２６】
【校正支援情報の表示】
上記の各実施形態においては、対象となる変換文字列候補と同じ行に校正支援表示を行うようにしている。しかし、変換文字列候補と区別できる他の表示方法であってもよい。たとえば、図１７に示すように、校正支援情報をバルーン１００の中に表示してもよい。このようにすれば、異なる行に校正支援情報を表示できるので、入力している文の可読性が校正支援情報によって損なわれるおそれがない。また、校正支援情報を表示すべき文節との関連も不明瞭になることがない。ただし、入力行近傍の行の文字が隠されてしまい、読みづらくなってしまう。
【０１２７】
また、校正支援表示領域１０２を設けておき、入力文字の位置とは関係なく表示を行うようにしてもよい。この方法であれば、上記の欠点はないが、校正支援情報の有無が確認しづらい。
【０１２８】
また、図１８に示すように表示を行ってもよい。つまり、注目文節のカーソルＣＫが校正支援表示を行うべき文節にない場合には、当該文節に※等のマークを表示する（図１８Ａ参照）。注目文節にカーソルＣＫが移動した場合には、図１８Ｂに示すように、校正支援情報を表示する。また、注目文節カーソルの移動ではなく、※をマウス等によってダブルクリックすることで表示するようにしてもよい。
【０１２９】
また、画面上での表示だけでなく、音声等によって情報を出力してもよい。さらに、他のソフトウエア等に対して、データ等によって出力するようにしてもよい。
【０１３０】
上記の各表示方法は、それぞれ単独で用いてもよく、２以上の方法を併用してもよい。
【０１３１】
【その他】
上記の各実施形態では、校正支援情報を表示するだけであったが、校正候補を表示し、これをユーザに選択させ、訂正入力とするようにしてもよい。
【０１３２】
また、上記の各校正支援を行うか否かは、各校正支援ごとにユーザが選択できるようにしている。
【０１３３】
上記各実施形態では、図１の各機能をＣＰＵを用いて実現しているが、その一部または全部をハードウエアロジックによって構成してもよい。
【図面の簡単な説明】
【図１】この発明の一実施形態による校正支援機能付きの文字列変換装置の全体構成を示す図である。
【図２】図１の文字列変換装置をＣＰＵを用いて実現した場合のハードウエア構成を示す図である。
【図３】自立語辞書８のデータ構成を示す図である。
【図４】付属語辞書１０のデータ構成を示す図である。
【図５】変換処理および校正支援処理の全体を示すフローチャートである。
【図６】文節生成処理を概念的に示すための図である。
【図７】文節候補記憶部６の記憶内容を示す図である。
【図８】変換処理における文字列変換候補を示す図である。
【図９】助詞の連続判定のフローチャートである。
【図１０】助詞の連続を校正支援情報として表示する形態を示す図である。
【図１１】フラグテーブルを示す図である。
【図１２】修飾語の連続判定のフローチャートである。
【図１３】修飾語の連続を校正支援情報として表示する形態を示す図である。
【図１４】文体判定のフローチャートである。
【図１５】文体に関する校正支援情報を表示する形態を示す図である。
【図１６】その他の校正支援情報の表示を示す図である。
【図１７】校正支援情報の表示方法を示す図である。
【図１８】校正支援情報の他の表示方法を示す図である。
【符号の説明】
２・・・変換手段
４・・・かな文字列記憶部
６・・・文節候補記憶部
８・・・自立語辞書
１０・・・付属語辞書
１４・・・文節生成手段
１６・・・文節候補選定手段
１８・・・構校正援手段
２０・・・表示手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to character string conversion, and more particularly to a technique for supporting proofreading of the converted character string.
[0002]
[Prior art and problems]
A system for converting a given kana character string into a character string including kanji has been put into practical use. This system is used in a Japanese word processor or the like as a system (kana-kanji conversion system) for inputting kana characters and converting them into kanji-kanji sentences.
[0003]
In such a kana-kanji conversion system, it is determined by the user to use whether or not the converted kanji magic sentence is appropriate as Japanese. That is, it is determined whether or not the text is appropriate while the text is input using the Kana-Kanji conversion system, or whether the text is appropriate after the text is printed.
[0004]
However, the calibration work as described above is not always easy, and there are cases where sufficient calibration cannot be performed depending on the user.
[0005]
SUMMARY OF THE INVENTION An object of the present invention is to provide a character string conversion apparatus and method that can solve the conventional problems as described above and can easily perform proofreading.
[0006]
[Means for Solving the Problems]
This invention The string converter is
A dictionary that stores the converted character string and its part of speech in association with the kana character string;
With reference to the dictionary, a conversion character string candidate obtained by dividing a given kana character string into a conversion character string by dividing it into clauses is output, and upon receiving a confirmation command, the selected conversion character string candidate is determined as a confirmation character Conversion means for outputting as a column;
Proofreading support means for performing output for proofreading support based on the part of speech of each phrase in the conversion character string candidate,
It has.
[0007]
This invention In the character string conversion device, the proofreading support means determines whether or not an adjunct in each phrase in the conversion character string candidate is a particle, and if the phrase having the same particle continues for a predetermined number of times, the particle continues. It is characterized in that it outputs a calibration support output to the effect.
[0008]
This invention In the character string conversion device, the proofreading support means determines whether or not each phrase is in a combination modification usage in the conversion character string candidate. It is characterized in that it provides a proofreading support output indicating that words are continuous.
[0009]
This invention When the proofreading support means finds a phrase composed of any one of a) a conjunction, a b) a conjunction form of a predicate, and c) a combination of a noun equivalent and a particle equivalent, It is determined that the application method of the combination modification is continued once, and after this clause, a) a conjunction, b) a combined form of a predicate, c) a combination of a noun equivalent and a particle equivalent, d) a noun and a particle “no” It is characterized in that it determines the continuation of the clauses of the link-modifying usage based on how many times the clauses constituted by any of the combinations of the above are continued.
[0010]
This invention In the character string conversion apparatus, the proofreading support means determines whether the sentence is normal or respected based on the part of speech of the phrase in the converted character string candidate, and performs proofreading support output based on the determination It is characterized by being.
[0011]
This invention In the character string conversion device, the proofreading support means that the conversion character string candidate clause is “is”, “mass”, “has”, or other usages of these, “please” “please” “ It is characterized in that it is determined to be a respectful body when it includes “Speaking”.
[0012]
This invention The character string conversion device is characterized in that the proofreading support means determines that the conversion character string candidate clause is normal when the clause immediately before the punctuation mark or symbol is one of the following: Yes.
[0013]
a) Combination of body words and final particle "ka"
b) A combination of body language, “no” and final particle
c) When the last phrase or auxiliary verb in the clause is imperative
d) When the last sentence of the phrase is a combination of a final form and a connective particle
e) When the last sentence of the clause is a combination of final form and final particle
f) When the last sentence of the clause is a combination of a combination form and a connective particle
g) When the last sentence of the clause is a combination of final form and final particle
h) When the last sentence of the clause is a combination of a combination form, “no” and a final particle
This invention The character string conversion device of the above is characterized in that if the predetermined “normal” or “respected” style and the style determined by the proofreading support means are different, the output indicating that the style is different is performed. It is said.
[0014]
This invention When the proofreading support means determines that the sentence style of each phrase is not the same for one sentence converted character string candidate, the proofreading support means performs a proofreading support output indicating that the style has changed. It is a feature.
[0015]
This invention The character string conversion device of the present invention is characterized in that the proofreading support means outputs a proofreading support output indicating that it is a passive expression when the phrase includes the auxiliary verbs “de” and “being”.
[0016]
This invention The string converter is
A dictionary storing conversion character strings and proofreading related information in association with kana character strings;
With reference to the dictionary, a conversion character string candidate obtained by dividing a given kana character string into a conversion character string by dividing it into clauses is output, and upon receiving a confirmation command, the selected conversion character string candidate is determined as a confirmation character Conversion means for outputting as a column;
The conversion means is proofreading support means for performing an output for proofreading support based on the proofreading related information obtained from the dictionary in association with each phrase;
It has.
[0017]
This invention The character string converting apparatus is characterized in that a converted character string candidate for a given character string is displayed on a display screen, and proofreading support information for the converted character string candidate is also displayed or output by voice.
[0018]
This invention This character string conversion device is characterized in that the proofreading support information is displayed in the vicinity of the conversion character string candidate and on a different line from the conversion character string candidate.
[0019]
This invention The character string converting apparatus is characterized in that the proofreading support information is displayed on the same line as the character string candidate.
[0020]
This invention The character string conversion apparatus is characterized in that the proofreading support information is displayed at a predetermined position irrespective of a predetermined display position of the character string candidate.
[0021]
This invention The character string conversion device displays a conversion character string candidate for the given kana character string on the display screen, displays that there is proofreading support information for each phrase of the converted character string candidate, and also provides proofreading support information The feature is that the proofreading support information is displayed only when the cursor is moved to the phrase that is displayed.
[0022]
This invention In the character string conversion method, the conversion character string and its part of speech are stored in a dictionary in advance in association with the kana character string, and the given kana character string is divided into phrases and converted characters by referring to the dictionary. In a character string conversion method for outputting a conversion character string candidate converted into a string and receiving a confirmation command and outputting the selected conversion character string candidate as a confirmation character string, based on the part of speech of each phrase in the conversion character string candidate It is characterized by the fact that output for calibration support is performed.
[0023]
This invention The character string conversion method of (2) determines whether or not the ancillary word in each clause in the converted character string candidate is a particle, and if a phrase having the same particle continues for a predetermined number of times, the particle indicates that the particle is continuous. It is characterized by performing calibration support output.
[0024]
This invention In the conversion string candidate, it is determined whether or not each clause is in a linkage modification usage, and if the linkage modification usage clause continues more than a predetermined number of times, the modifier continues. It is characterized in that it provides a calibration support output to the effect.
[0025]
This invention The character string conversion method of the method determines whether the sentence is normal or respected based on the part of speech of the phrase in the converted character string candidate, and performs proofreading support output based on the determination. It is a feature.
[0026]
This invention This character string conversion method is characterized in that, when a phrase in a converted character string candidate includes auxiliary verbs “de” and “re”, a proofreading support output indicating passive expression is performed.
[0027]
This invention In the character string conversion method, the conversion character string and the proofreading related information are stored in a dictionary in advance in association with the kana character string, and the kana character string is divided into phrases and converted into the conversion character string by referring to the dictionary. In the character string conversion method for outputting the converted converted character string candidate and receiving the confirmation command and outputting the selected converted character string candidate as the confirmed character string, the calibration related information obtained from the dictionary in association with each phrase Based on the above, an output for calibration support is performed.
[0028]
This invention This character string conversion method is characterized in that a converted character string candidate for a given kana character string is displayed on a display screen, and proofreading support information for the converted character string candidate is also displayed or output by voice.
[0029]
The terminology used to describe this invention is as follows.
[0030]
The “kana character string” is a concept including not only a character string including only a kana character string but also a character string having alphabets, katakana, symbols, and the like.
[0031]
The “conversion character string” refers to a character string composed of kanji, kana, katakana, alphabet, symbols, etc. alone or in combination.
[0032]
“Conversion character string candidate” refers to a candidate for a conversion character string. It is a concept that includes not only all indeterminate states but also some that have definite portions.
[0033]
“One sentence” refers to one sentence that ends with a punctuation mark or the like.
[0034]
The “cursor” refers to a display on the screen for clarifying the input area and the control target area. The attention phrase cursor CK in the embodiment corresponds to this.
[0035]
“Output of proofreading support information” means displaying or printing the proofreading support information or giving it to other software as data. Further, it is a concept including notifying a human by voice or vibration.
[0036]
The “storage medium” refers to a computer-readable medium such as a flexible disk, a hard disk, or a CD-ROM that can fix a program.
[0037]
“Computer-executable program” can be executed not only when the program stored in the storage medium can be directly executed, but also when combined with a program that can be executed by installation or other programs. Indirectly executable programs such as programs are also included.
[0038]
【The invention's effect】
This invention Character string converter and This invention This character string conversion method is characterized in that an output for proofreading support is performed based on the part of speech of each phrase in the converted character string candidate. Accordingly, it is possible to provide proofreading support information during the character string conversion operation.
[0039]
This invention Character string converter and This invention This character string conversion method is characterized in that when a phrase having the same particle continues for a predetermined number of times, a proofreading support output indicating that the particle is continuous is performed. Therefore, an inappropriate continuous state of particles can be given as proofreading support information at the time of character string conversion operation.
[0040]
This invention Character string converter and This invention This character string conversion method is characterized in that, in the conversion character string candidate, when the clauses of the combination modification usage continue for a predetermined number of times or more, a proofreading support output indicating that the modifiers are continued is performed. Therefore, an inappropriate continuous state of modifiers can be given as proofreading support information during a character string conversion operation.
[0041]
This invention When the proofreading support means finds a phrase composed of any one of a) a conjunction, a b) a conjunction form of a predicate, and c) a combination of a noun equivalent and a particle equivalent, Judging that the method of application of the conjunctive modification was continued once, and after the clause, a) a conjunction, b) a combined form of a predicate, c) a combination of a noun equivalent and a particle equivalent, d) a noun and a particle “no” It is characterized in that it determines the continuation of the clauses of the combination modification usage based on how many times the clauses constituted by any one of the above combinations continue. Therefore, it is possible to detect a series of inappropriate modifiers more accurately.
[0042]
This invention Character string converter and This invention The character string conversion method of the method determines whether the sentence is normal or respected based on the part of speech of the phrase in the converted character string candidate, and performs proofreading support output based on the determination. It is a feature. Therefore, it is possible to give the style as proofreading support information during the character string conversion operation.
[0043]
This invention In the case of the character string conversion device of, when the phrase of the conversion character string candidate is "is""mas" or "you are" or other usages of these, or "please""please""please" It is characterized by judging that it is a respectful body. Therefore, it is possible to detect the honorable body more accurately.
[0044]
This invention The character string converting apparatus is characterized in that it is determined to be normal when a phrase immediately before a punctuation mark or a symbol is one of the following among the phrases of the converted character string candidate. a) a combination of the body and the final particle “ka”, b) a combination of the body and the “no” and the final particle, c) if the last sentence or auxiliary verb is imperative, and d) the last sentence of the section is terminated. In the case of a combination of a form and a connective particle, e) In the case where the last sentence of the clause is a combination of a final form and a final particle, f) In the case where the last sentence of the clause is a combination of a combined form and a particle, g) When the last word is a combination of a final form and a final particle, h) When the last sentence of a phrase is a combination of a combined form, “no”, and a final particle. Therefore, the normal substance can be detected more accurately.
[0045]
This invention The character string conversion device of the above is characterized in that if the predetermined “normal” or “respected” style and the style determined by the proofreading support means are different, the output indicating that the style is different is performed. It is said. Therefore, it is possible to give, as proofreading support information, that a sentence style different from the scheduled style is input during the character string conversion process.
[0046]
This invention The character string converting apparatus is characterized in that, for a converted character string candidate of one sentence, when it is determined that the style of each phrase is not the same, a proofreading support output indicating that the style has changed is performed. Accordingly, it is possible to provide the proofreading support information that a sentence with a changed style is input during the character string conversion process.
[0047]
This invention Character string converter and This invention This character string conversion method is characterized in that, when a phrase in a converted character string candidate includes auxiliary verbs “de” and “re”, a proofreading support output indicating passive expression is performed. Therefore, at the time of character string conversion processing, information indicating passive expression can be given as proofreading support information.
[0048]
This invention Character string converter and This invention This character string conversion method is characterized in that output for proofreading support is performed based on proofreading related information obtained from a dictionary in association with each phrase. Therefore, the proofreading support information can be given based on the information obtained from the dictionary during the character string conversion process.
[0049]
This invention Character string converter and This invention This character string conversion method is characterized in that the proofreading support information for the converted character string candidate is displayed or output by voice. Therefore, it is possible to easily obtain proofreading support information corresponding to the display of the conversion character string candidate.
[0050]
This invention This character string conversion device is characterized in that the proofreading support information is displayed in the vicinity of the conversion character string candidate and on a different line from the conversion character string candidate. Therefore, it is easy to visually confirm the calibration support information.
[0051]
This invention The character string converting apparatus is characterized in that the proofreading support information is displayed on the same line as the character string candidate. Therefore, it is easy to visually confirm the calibration support information.
[0052]
This invention The character string conversion apparatus is characterized in that the proofreading support information is displayed at a predetermined position irrespective of a predetermined display position of the character string candidate. Therefore, the display position of the calibration support information is fixed and easy to confirm.
[0053]
This invention The character string conversion device displays the fact that there is proofreading support information for each phrase of the conversion character string candidate and proofreads only when the cursor is moved to the phrase for which proofreading support information exists. It is characterized by displaying support information. Therefore, the presence / absence of the calibration support information can be easily confirmed, and the display necessary range of the contents can be reduced.
[0054]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows the overall configuration of a character string conversion apparatus according to an embodiment of the present invention. In this embodiment, the kana character string storage unit 4, the phrase candidate storage unit 6, the phrase generation unit 14, and the phrase candidate selection unit 16 constitute the conversion unit 2. Further, a dictionary 12 is constituted by the independent word dictionary 8 and the attached word dictionary 10.
[0055]
The “kana character string” given to the conversion means 2 is stored in the kana character string storage unit 14. The phrase generation unit 14 refers to the dictionary 12 and generates a possible phrase from the “kana character string” stored in the kana character string storage unit 14. The phrase generation unit 14 stores the generated phrase in the phrase candidate storage unit 6. That is, the kanji, the part of speech, the utilization form, the proofreading related information, etc. acquired from the dictionary for the independent words and the attached words of each phrase are stored. The phrase candidate selection means 16 finds possible phrase paths by combining the phrases stored in the phrase candidate storage unit 6. When there are a plurality of possible phrase paths, the phrase path (converted character string candidate) is narrowed down to one by a predetermined selection process.
[0056]
The phrase candidate selection means 16 sends this converted character string candidate to the display means 20 for display. Further, when a change command is given, different conversion character string candidates are displayed on the display means 20. When a confirmation command is given, the converted character string candidate is output as a confirmed character string.
[0057]
The proofreading support means 18 generates proofreading support information based on the part of speech of the phrase stored in the phrase candidate storage unit 6 and outputs it to the display means 20 for display. Further, based on the proofreading related information related to the phrase stored in the phrase candidate storage unit 6, the proofreading support information is generated and output to the display means 20 for display.
[0058]
FIG. 2 shows a hardware configuration when the character string conversion apparatus of FIG. 1 is realized using a CPU. Connected to the bus line 34 are a CPU 22, a hard disk 24, a display 20 as a display means, a memory 28, a flexible disk controller (FDD) 30, and a keyboard 32. The hard disk 24 stores an independent word dictionary 8, a character string conversion processing program 40, a proofreading support processing program 42, and the like. These are installed from the flexible disk (FD) 36 via the FDD 30. Of course, it may be taken from a CD-ROM or the like. Alternatively, it may be downloaded via a communication line.
[0059]
The memory 28 is provided with a kana character string storage unit 4 for storing kana character strings and a phrase candidate storage unit 6 for storing phrase candidates. An attached word dictionary 10 is also provided.
[0060]
FIG. 3 shows the data structure of the independent word dictionary 8. An independent word is a word that can be a phrase by itself. The independent word dictionary 8 stores kanji corresponding to the character string (reading) to be converted. Corresponding to these, parts of speech and proofreading related information are stored. Note that kana characters are generally stored in the “reading” column, but some or all of them may include alphabets. In the “kanji” column, not only kanji but also kana and alphabets may be stored.
[0061]
FIG. 4 shows the data structure of the attached word dictionary 10. An adjunct is a word that cannot be a phrase by itself. The ancillary dictionary 10 stores an ancillary word, its corresponding part of speech, and a utilization form correspondingly. The parts of speech are classified and stored as particles, auxiliary verbs, and idioms. The particles are further classified and stored as case particles, connection particles, adjunct particles, and final particles. In addition, with respect to the endings of the idioms, the utilizable forms (pre-form, continuous form, end form ... instruction form) are stored. Furthermore, it is also remembered which word each appendage can be used after (an independent word / adjunct that can precede).
[0062]
The part of speech classification used in this embodiment is shown below.
[0063]
General nouns: common nouns attached to objects and states
Nouns Sa-no: Nouns that can be followed by a sa-var
The noun the weird: a noun that can be followed by the variator "Zuru"
Noun movement: A noun that expresses the figure, state, nature, etc. of each object and can add “da” or “na” behind it.
Independent words: words that do not have verbs or auxiliary verbs and have no connection to other words
Conjunctive: a word that is not used and that only modifies the body
Conjunctions: words used to continue words
Impression verb: A word for impression, response, call, etc.
Prefix: Words used in front of other
Suffix: words used after other words
Numerals: words that count and express quantity
Verbs: Further subdivided by usage
Adjective: A word that represents the nature and state of a thing and ends with “i”
Adjective verb: A word that expresses the nature and state of a thing and ends with “da” or “na”
Adverb: a word that mainly modifies the idiom
[0064]
[Outline of conversion process and calibration support process]
FIG. 5 shows a flowchart of the character string conversion program 40 and the proofreading support program 42. Both programs may be integrated, or a part or all of them may be modularized and separated. According to this flowchart, the process of converting the kana character string “I have Japanese language” will be explained. First, when the kana character string is input from the keyboard 32 (step S1), the CPU 22 stores it in the kana character string storage unit 4 in the memory 28 (see FIG. 6).
[0065]
Next, the CPU 22 searches the independent word dictionary 8 and the attached word dictionary 10 in order from the first kana character in the kana character string in the kana character string storage unit 4 to generate a phrase (step S2). That is, all possible phrases are generated and stored in the phrase candidate storage unit 6 in the memory 28. FIG. 6 shows the phrases stored in the phrase candidate storage unit 6 in this way in association with the kana character string. In the figure, α1, α2, α3, α4, β1,... Are generated. The phrases α1 to α4 are kanji (and attached words) corresponding to the search character string starting with “wa”. The phrase α1 has a search character string of one character, the phrase α2 has two characters, the phrase α3 has three characters, and the phrase α4 has four characters. For each length of search character string, only one phrase is stored as a candidate. In other words, in the phrase α1, many kanji characters are searched in addition to “sum”, but only one kanji “sum” having the highest priority is selected in consideration of the frequency of use and the relationship with the preceding and following words. is doing. The same applies to the phrases α2, α3, and α4.
[0066]
The phrases β1, β2, and β3 are kanji (and attached words) corresponding to the search character string starting with “ta”. Similarly, all possible phrases are stored in the phrase candidate storage unit 6. When storing, as shown in FIG. 7, the part of speech, utilization form, and proofreading related information (described later) acquired from the dictionary are also stored. The phrase position information, that is, the information corresponding to which part of the kana character string is stored, but is omitted in FIG.
[0067]
When the phrase generation and the phrase candidate storage unit 6 are completed as described above, the CPU 22 selects a phrase route candidate (step S3). In other words, it finds clauses that can be combined in association with kana character strings. For example, clauses β1 to β3 are the only clauses that can be continued with the “sum” of clause α1, and other clauses cannot be continued. This kind of consideration is performed up to the last phrase, and possible combinations of phrases (called phrase paths) are found. When a plurality of phrase paths are found, one of the highest priority phrase paths is selected based on the learning information and the relation between phrases. By selecting the phrase path in this manner, for example, a conversion character string candidate “I am / Japanese / processing / impressed / is /.” Is selected (/ represents a section break).
[0068]
Next, the CPU 22 determines the necessity of proofreading support based on the part of speech of each phrase of the converted character string candidate (step S4). If proofreading support is necessary, proofreading support information is also displayed when the converted character string candidate is displayed on the display 20 (step S5). If there is no need for calibration support, only the converted character string candidates are displayed on the display 20 (step S6). A display example is shown in FIG. 8A. In the figure, the phrase “I am” is surrounded by a cursor CK. This indicates that the phrase that is the current processing target (referred to as the focused phrase) is “I am”. Note that the process of determining whether or not calibration support is necessary will be described in detail later.
[0069]
After displaying the converted character string candidates, the process waits for a command from the user (step S7). The user views the conversion character string candidates displayed on the display 20 and inputs various commands from the keyboard 32 (may be a mouse or the like).
In step S7, when a movement command for the target phrase is input, the CPU 22 moves the target phrase in accordance with the commanded direction, and also moves the display of the cursor CK. For example, in the state of FIG. 8A, when a command to move the target phrase to the right is given, the cursor CK moves to the right and “Japanese” becomes the target phrase as shown in FIG. 8B. Further, when an instruction to move to the right twice is given, as shown in FIG. 8C, the cursor CK of the target phrase moves to “I'm impressed”.
[0070]
In step S7, when an instruction for another candidate is given, other candidates are selected from the independent word dictionary 8 and the attached word dictionary 10 for the independent word and the attached word of the noticeable phrase “Kansei ga” (step S10). For example, select “I am interested”. At this time, the CPU 22 rewrites the “impressive” part of the phrase candidate storage means 6 shown in FIG. 7 into “interested”, and newly acquires part-of-speech, usage forms, proofreading related information, and the like from the dictionary. Rewrite to Thereafter, the new conversion character string candidate is again checked for the need for proofreading support (step S4) and displayed (steps S5 and S6). That is, a display as shown in FIG. 8D is performed.
[0071]
When a post-conversion command is given in step S7, the entire noticeable phrase is post-converted according to the content of the command (step S12). For example, when a post-conversion command to katakana is given, the entire noticeable phrase is changed to “kansinga” as shown in FIG. 8E. In response to this, the contents of the phrase candidate storage unit 6 in FIG. 7 are also rewritten in the same manner as described above. However, since this word is not obtained from the dictionary (referred to as post-conversion word), it is forced to be a general noun in the part of speech column.
[0072]
When a phrase segment change command is given in step S7, the target phrase is lengthened (or shortened) (step S11), and a phrase path candidate is selected again based on the stored contents of the phrase candidate storage unit 6. (Step S3). That is, a new converted character string candidate having a different phrase break position is generated. The subsequent processing is the same as the processing described above, and after determining whether or not the calibration support is necessary again, display is performed (steps S4, S5, and S6).
[0073]
When the confirmation command is input through the above processing, the converted character string candidate at that time is output as a confirmed character string (converted character string) to an application or the like (word processor software or the like) (step S8). In this embodiment, the calibration support information is not output to an application or the like. However, since the calibration support information may be useful on the application side, it may be output after clarifying that it is the calibration support information. In particular, since the proofreading support information is attached, it is possible for the application side to determine that the user wanted to use such an expression even against the displayed proofreading support.
[0074]
In step S8, the display color on the screen is changed to clarify that the character string is confirmed. Also, the display of the calibration support information is stopped.
[0075]
If one sentence is over when the confirmation command is given and a confirmed character string is output (the sentence ends with “.”), The contents of the phrase candidate storage unit 6 in FIG. 7 are cleared. When a confirmation command is issued in the middle of a sentence, the contents of the phrase candidate storage unit 6 are held as they are. That is, in this embodiment, the contents of the phrase candidate storage unit 6 are held until all the phrases of one sentence are confirmed.
[0076]
[Process of continuous judgment of particle]
Next, the process for determining whether or not calibration support is necessary in step S4 will be described. In this embodiment, the judgment in step S4 is performed for continuation of particles, continuation of modifiers, style, passive expression, and the like. First, particle continuation determination processing will be described with reference to the flowchart of FIG.
[0077]
The continuation of particles here means that the particles “no” and “ga” are consecutive, such as “My child's problem…” and “Hurry up on the bicycle…”. This is the case where there is a high possibility of becoming. In such a case, as shown in FIG. 10A, a proofreading support of {continuous particle} is displayed to give a notice to that effect to the user. It is a clause that includes noun equivalents (general nouns, proper nouns, nouns varieties, numbers, suffixes, classifiers, post-conversion words), and the particles at the end of the adjuncts are "no""de""ga" Judgment is made on what is "O".
[0078]
First, in step S20, the CPU 22 stores in the memory 28 a counter N for “no”, a counter D for “de”, a counter G for “ga”, and a counter WO for “o”. Secure and clear the contents. Next, it is determined whether or not the first phrase (“I” in FIG. 10A) of the converted character string candidate includes a noun equivalent (step S21). This determination is made by referring to the stored contents of the phrase candidate storage unit 6 (see FIG. 7). For example, since “me” is stored as “general noun, case particle”, it is determined that the phrase includes a noun equivalent word.
[0079]
If the phrase includes a noun equivalent word, it is determined whether or not the phrase includes an appendix (step S22). As described above, with regard to “my”, the “case particle” that is an appendix is stored in the phrase candidate storage unit 6, and therefore, it is determined that the appendix includes the appendix. Further, it is determined whether or not the end of the attached word is a particle (step S23). Here, it is determined to be a particle.
[0080]
If it is determined that the particle is a particle, the type of the particle is examined (step S24). Here, since it is “no”, the process proceeds to step S25, and the counter N is incremented. Next, it is determined whether or not the counter N is “1” (step S26). Since the counter N is “1” at the beginning of the sequence of “no”, the process returns to step S21 via steps S27, S28, S29, S30, and S33. That is, the above process is repeated for the next phrase “child”. Thereby, when the particle “no” continues, the number of times can be stored in the counter N.
[0081]
It should be noted that in the step S24, the flowchart when “de”, “g” and “m” are omitted. In these cases, the same process as “no” is performed.
[0082]
For example, in the case of the conversion character string candidate as shown in FIG. 10A, the value of the N counter becomes “3” when the processing of the last phrase is finished. Thereafter, when it is determined in step S30 that there is no clause after this, the process proceeds to step S31. In step S31, it is determined whether any of the counters N, G, D, and WO is equal to or greater than a predetermined value. In this embodiment, the predetermined values for counters N and G are “3”, and the predetermined values for counters D and WO are “2”. That is, the calibration support is performed by repeating three times or more for “no” and “ga” and repeating two or more times for “de” and “wo”. Here, since the counter N is “3” and is equal to or greater than a predetermined value, a flag indicating that the particle continues is set for the current phrase. That is, as shown in FIG. 11A, the continuous particle item is set to “1” in the column of the phrase number 3 in the flag table.
[0083]
In step S5 of FIG. 5, based on this flag, the proofreading support information {continuous particle} is displayed after the third phrase (see FIG. 10A).
[0084]
In the process of continuously determining particles in FIG. 9, when different particles continue, the counter is cleared (step S29). Therefore, as shown in FIG. 10B, when the particle “I” is inserted in the middle, even if there are three particles “NO” as a whole, the proofreading support information of {continuous particle} is not output. However, as illustrated in FIG. 10C, when there is a noun equivalent “immediate” having no attached word, the counter is not cleared and the count is continued. That is, counter clearing is not provided in the processing paths of steps S22, S30, S33, and S21.
[0085]
Further, in this embodiment, as shown in FIG. 10D, the proofreading support information is displayed at the end of the continuous phrase of the particle rather than being displayed immediately after the phrase exceeding the predetermined value.
[0086]
Since the flag table as shown in FIG. 11A is provided, the proofreading support information can be displayed even in the middle of the converted character string candidate as shown in FIG. 10E. Although omitted in the flowchart, as shown in FIG. 10F, the counter is cleared when there are punctuation marks or symbols in the middle.
[0087]
Further, in step S23, even if the particles "~ dano", "~ mono", "~ no de", "~ yo de", and "~ so de" come to the end, the process branches to step S35. This is also omitted in the flowchart.
[0088]
As is clear from the overall flowchart of FIG. 5, when there is a change in candidate or a change in phrase break, the necessity for proofreading support is determined again (step S4). For example, in the state shown in FIG. 10A, when “child” is changed to “this” which is a conjunction, the display of the calibration support information disappears as shown in FIG. 10G. Similarly, if “my” is changed to “watashino” by post-conversion, the part of speech is only a noun, so the display of the proofreading support information disappears as shown in FIG. 10H.
[0089]
The phrase candidate storage unit 6 holds part-of-speech information and the like until one sentence is finalized. Therefore, the above-described processing can be performed not only for an undetermined phrase but also for an already confirmed phrase. For example, in FIG. 10A, even when “my” and “child” are confirmed and only “problem” is not confirmed, the calibration support information is displayed in the same manner. Therefore, calibration support with higher practicality can be performed.
[0090]
[Continuous judgment of modifiers]
Next, the modifier continuous determination process will be described. Here, a series of modifiers means that if there are three or more consecutive clauses of a combination modifier, for example, a sentence such as “white / large / cage / inside ...” This is a case where there is a high possibility that it will be unclear. In such a case, as shown in FIG. 13A, a proofreading support message of {continuation of modifiers} is displayed to give a notice to that effect to the user.
[0091]
With reference to the flowchart of FIG. 12, the process of a modifier continuous determination is demonstrated. First, in step S40, a plurality of clauses that are combined in the order of a noun equivalent word and a particle equivalent word (according to, in terms of to) are searched for from the converted character string candidates. Such a plurality of clauses is regarded as a clause of one linkage modifying usage (step S40). This determination can be made based on the stored contents of the phrase candidate storage unit 6. For example, the sentence “patent” is composed of a phrase composed of a combination of a general noun “case of patent” and a case particle, and a phrase composed of a combination of a verb stem “used” and an inflection ending. In step S40, the two clauses are regarded as one linkage-modifying usage clause. This is because it can be regarded as a combination of a noun equivalent of “patent” and a particle equivalent of “about”. However, only one phrase is considered in the modifier continuous determination process.
[0092]
Next, the CPU 22 secures a counter R for the modifier in the memory 28 and clears the contents (step S41). First, it is determined whether or not the part-of-speech of the first phrase is a conjunction (any, great, bad, etc.) (step S42). This determination can also be made by referring to the “part of speech” column of the phrase candidate storage unit 6 (see FIG. 7). If it is a conjunction, the counter R is incremented (step S49). Then, the next phrase is analyzed.
[0093]
If it is not a conjunction, it is determined whether or not the clause is a conjunction of a predicate (verb, adjective, adjective verb) (step S43). It can be determined by referring to the “part of speech” column of the phrase candidate storage unit 6 and whether it is a combined form or not by referring to the “utilization” column. If it is a continuation form (for example, white, energetic, large, etc.), the counter R is incremented (step S49), and the next phrase is analyzed.
[0094]
If it is not the union form of the predicate, it is determined whether or not the phrase corresponds to the union usage determined in step S40 (step S44). If the usage is a combination modification, the counter R is incremented (step S49), and the next phrase is analyzed.
[0095]
If it is not the combined modification usage, it is determined whether or not the phrase is composed of a combination of a noun and a case particle “no” (step S45). If not (that is, if it is not any of the linkage-modifying usages in steps S42 to S45), the counter R determines the number of consecutive linkage-modifying usages (step S46). If the counter R is “3” or more, a modifier continuous flag is set for the phrase immediately preceding the current phrase (see FIG. 11).
[0096]
If the last phrase is reached in step S52, it is similarly determined whether the value of the counter R is “3” or more (step S54). If the number is 3 or more, the modifier continuous flag in FIG. 11 is set for the current phrase. For example, in the case of a conversion candidate character string as shown in FIG. 13A, a flag is set for the fourth phrase as shown in FIG. 11B.
[0097]
In step S5 in FIG. 5, based on this flag, proofreading support information {continuation of modifiers} is displayed after the fourth clause (see FIG. 13A).
[0098]
In this embodiment, the continuation of modifiers is not started by a clause composed of a combination of a noun and “no”. That is, when the counter R is “0”, the counter R is not incremented (step S50). Therefore, the proofreading support information is not displayed for the conversion candidate character string as shown in FIG. 13B. However, once the continuation of modifiers starts, the counter R is incremented due to the presence of a clause with a combination of a noun and “no” (steps S50 and S51). Therefore, the proofreading support information is displayed for the conversion candidate character string as shown in FIG. 13A.
[0099]
In the modifier continuous determination process of FIG. 12, if there is a clause that is not a clause of the combined modifier usage, the counter R is cleared (steps S45, S46, S48, S, 49, S41). ). Therefore, as shown in FIG. 13C, if there is a phrase “in” in the middle, the proofreading support information {continuation of modifiers} will not be output even if there are three phrases in the combined modification usage as a whole. However, as shown in FIG. 13D, when there is a noun equivalent “bamboo” having no attached word, the counter R is not cleared and the count is continued. Therefore, the calibration support information is displayed. This process is omitted in the flowchart of FIG.
[0100]
Further, in this embodiment, as shown in FIG. 13A, instead of displaying immediately after the phrase “cage” exceeding the predetermined value, the last phrase “in” in the consecutive linkage-modifying usage phrase The calibration support information is displayed later.
[0101]
Since the flag table as shown in FIG. 11B is provided, the proofreading support information can be displayed even in the middle of the converted character string candidate as shown in FIG. 13E. Although omitted in the flowchart, as shown in FIG. 13F, the counter is cleared when there are punctuation marks or symbols in the middle.
[0102]
As is clear from the overall flowchart of FIG. 5, when there is a change in candidate or a change in phrase break, the necessity for proofreading support is determined again (step S4). This is the same as in the case of continuous particles.
[0103]
The phrase candidate storage unit 6 holds part-of-speech information and the like until one sentence is finalized. Therefore, the above-described processing can be performed not only for an undetermined phrase but also for an already confirmed phrase. This is also the same as the case of particle continuation.
[0104]
In the above-described embodiment, what satisfies the condition shown in any of Steps S42, S43, S44, and S45 is a clause of a combined modification usage. However, from the viewpoint of the processing speed and the degree of required calibration support, any of them may be treated as not being a combined modification usage. In addition, the conditions may be relaxed or tightened.
[0105]
[Style determination]
Next, the style determination process will be described. Here, the stylistic means whether it is a delicate expression (respected body, more and more keys) or a normal expression (normal body, more keys). In this embodiment, if the user specifies a style in advance, and the style of the converted character string candidate is different from that of the conversion character string candidate, a calibration support display of {a key} or {mass key} is performed, The user is warned to that effect.
[0106]
The style determination process will be described with reference to the flowchart of FIG. First, in step S60, the CPU 22 sets a honorific flag K and a normal flag J in the memory 28, and clears the contents. Next, it is determined whether or not the first phrase of the conversion character string candidate includes any of the auxiliary verbs “is”, “mass”, and “we are” (including all the utilization forms) (step S61). This can be determined based on the stored contents of the phrase candidate storage unit 6. If it is included, it is determined to be a honorable body, and the honorable body flag K is set to “1” (step S65). For example, as shown in FIG. 15A, when the conversion character string candidate is “I am a cat”, the first phrase “I'm a cat” does not include the auxiliary verb. Accordingly, the process proceeds to step S62.
[0107]
In step S62, it is determined whether or not the clause includes the imperative command forms "please", "please", and "please" (La line special). This determination can also be made with reference to the “part of speech” and “utilization” columns of the phrase candidate storage unit 6. If it is included, it is determined to be a honorable body, and the honorable body flag K is set to “1” (step S65). In the example of FIG. 15A, the first clause “student is” does not include the above-described command form, so the process proceeds to step S63.
[0108]
In step S63, it is determined whether or not the clause is immediately before a punctuation mark or symbol. Since “student” does not correspond to this, after steps S63 and S72, step S61 and subsequent steps are repeated for the next phrase.
[0109]
The next phrase “is a cat” does not correspond to any of steps S61 and S62. Since “is a cat” is a phrase immediately before a punctuation mark, the process branches to step S64 in step S63. In step S64, it is determined whether or not the style is normal. In this way, the determination as to whether or not the object is normal is made based on the phrase immediately before the punctuation mark or symbol.
[0110]
In this embodiment, it is determined that the object is normal when any of the following conditions is satisfied.
[0111]
a1) When the phrase does not include a predicate and is a body + final particle "ka": Example "Is it a problem?"
a2) When the clause does not contain a predicate and is a body + "no" + final particle
b1) When the last phrase or auxiliary verb is imperative: Example “Think.”
c1) When the last sentence or auxiliary verb of the clause is a final form + connected particle: Example “Thinking,”
c2) When the last sentence or auxiliary verb of the clause is a final form + final particle
c3) When the last sentence or auxiliary verb of the clause is a combined form + connected particle
c4) When the last phrase or auxiliary verb is a final form + final particle
d1) When the last sentence of the clause is a combination form + "no" + final particle: example "Do you think?"
[0112]
Here, “being a cat” corresponds to the above, and is thus determined to be normal. Therefore, the normal flag J is set to “1” (step S71).
[0113]
The next phrase is “.”, Which ends the sentence, and branches from step S66 to S67. In step S67, it is determined whether the preset style (stored in the memory 28 with an area) is different from the determined style. If the setting style is “respect style”, it is different from the judgment style, so a style error display flag is set (step S68).
[0114]
In step S5 in FIG. 5, based on this flag, proofreading support information {key to be} is displayed after the last phrase (see FIG. 15B).
[0115]
In the above case, if the setting style is “normal”, it matches the determination style, so the proofreading support information is not displayed as shown in FIG. 15A.
[0116]
If the setting style is “normal” and the conversion character string candidate is as shown in FIG.
[0117]
Further, in step S69, it is determined whether or not the flags K and J are both “1”. The fact that both flags are “1” indicates that the style has changed in one sentence. For example, in the case of a converted character string candidate as shown in FIG. 15D, both flags are “1”. In this case, a style change display flag is set in step S70.
[0118]
In step S5 of FIG. 5, based on this flag, proofreading support information {changed style} is displayed after the last phrase (see FIG. 15D).
[0119]
In this embodiment, a change in style is determined for one sentence. However, if the previous sentence that has been finalized to the last phrase is stored in a buffer or the like, it is possible to determine the style change over a plurality of sentences.
[0120]
As is clear from the overall flowchart of FIG. 5, when there is a change in candidate or a change in phrase break, the necessity for proofreading support is determined again (step S4). This is the same as in the case of continuous particles.
[0121]
The phrase candidate storage unit 6 holds part-of-speech information and the like until one sentence is finalized. Therefore, the above-described processing can be performed not only for an undetermined phrase but also for an already confirmed phrase. This is also the same as the case of particle continuation.
[0122]
In the above-described embodiment, the one satisfying the conditions shown in any one of the aforementioned a1) a2) b1) c1) c2) c3) c4) d1) is used as a normal. However, from the viewpoint of the processing speed and the degree of required calibration support, any of them may be treated as not a normal one. In addition, the conditions may be relaxed or tightened. The same is true for respect.
[0123]
[Decision of passive expression]
Next, the passive expression determination process will be described. When a phrase with passive expression is found, proofreading support information {passive expression} is displayed to call the user's attention. This determination is made based on whether or not there is a phrase including the auxiliary verbs “re” and “re”. For example, as shown in FIG. 16A, in the case of a conversion character string candidate “I think this way”, the phrase “I think” is detected, and the passive display flag becomes “1”. In response to this, a calibration support display is made as shown in FIG. 16A.
[0124]
[Judgment of expression without omission]
Next, the process of determining the blank expression will be described. In this embodiment, when a verb corresponding to a free expression is registered in an independent word dictionary, the fact that it is a “free expression” is stored as proofreading related information. As shown in FIG. 3, “removed expression” is stored as proofreading-related information for “eaten”. The CPU 22 stores this proofreading related information in the phrase candidate storage unit 6 when generating a phrase in step S2 of FIG. Further, when displaying in steps S5 and S6, if there is proofreading related information, it is displayed after the relevant phrase (see FIG. 16B).
[0125]
Note that the proofreading related information is not limited to “rough expression” but can also store other information useful for proofreading.
[0126]
[Display of calibration support information]
In each of the above embodiments, the calibration support display is performed on the same line as the target conversion character string candidate. However, other display methods that can be distinguished from conversion character string candidates may be used. For example, the calibration support information may be displayed in the balloon 100 as shown in FIG. In this way, the proofreading support information can be displayed on different lines, so that the readability of the input sentence is not impaired by the proofreading support information. In addition, the relationship with the phrase for which the proofreading support information should be displayed does not become unclear. However, the characters in the line near the input line are hidden, making it difficult to read.
[0127]
Further, the proofreading support display area 102 may be provided so that display is performed regardless of the position of the input character. This method does not have the above-mentioned drawbacks, but it is difficult to confirm the presence or absence of calibration support information.
[0128]
Moreover, you may display as shown in FIG. That is, when the cursor CK of the target phrase is not in the phrase for which the proofreading support display is to be performed, a mark such as * is displayed in the corresponding phrase (see FIG. 18A). When the cursor CK moves to the target phrase, calibration support information is displayed as shown in FIG. 18B. Further, instead of moving the attention phrase cursor, it may be displayed by double-clicking * with a mouse or the like.
[0129]
In addition to display on the screen, information may be output by voice or the like. Furthermore, the data may be output to other software or the like.
[0130]
Each of the above display methods may be used alone or in combination of two or more methods.
[0131]
[Others]
In each of the above-described embodiments, only the calibration support information is displayed. However, calibration candidates may be displayed, and this may be selected by the user for correction input.
[0132]
Further, whether or not to perform each of the above-described calibration supports can be selected by the user for each calibration support.
[0133]
In each of the above embodiments, each function of FIG. 1 is realized using a CPU, but a part or all of the functions may be configured by hardware logic.
[Brief description of the drawings]
FIG. 1 is a diagram showing an overall configuration of a character string conversion apparatus with a proofreading support function according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a hardware configuration when the character string conversion device of FIG. 1 is realized using a CPU.
FIG. 3 is a diagram showing a data configuration of an independent word dictionary 8;
4 is a view showing a data structure of an attached word dictionary 10. FIG.
FIG. 5 is a flowchart showing the entire conversion process and proofreading support process.
FIG. 6 is a diagram conceptually illustrating a phrase generation process.
FIG. 7 is a diagram showing the contents stored in a phrase candidate storage unit 6;
FIG. 8 is a diagram showing character string conversion candidates in conversion processing;
FIG. 9 is a flowchart of particle continuation determination.
FIG. 10 is a diagram showing a form in which a series of particles is displayed as proofreading support information.
FIG. 11 is a diagram showing a flag table.
FIG. 12 is a flowchart of modifier continuous determination.
FIG. 13 is a diagram illustrating a form in which a series of modifiers is displayed as proofreading support information.
FIG. 14 is a flowchart of style determination.
FIG. 15 is a diagram showing a form for displaying proofreading support information related to style.
FIG. 16 is a diagram showing a display of other proofreading support information.
FIG. 17 is a diagram illustrating a method for displaying calibration support information.
FIG. 18 is a diagram showing another method for displaying calibration support information.
[Explanation of symbols]
2 ... Conversion means
4 ... Kana character string storage
6. Phrase candidate storage unit
8 ... Autonomous dictionary
10 ... Attached word dictionary
14... Phrase generation means
16 ... Phrase candidate selection means
18 ... Proofreading support means
20 ... Display means

Claims

In a character string conversion device that converts a given kana character string into a conversion character string including kanji,
A dictionary that stores the converted character string and its part of speech in association with the kana character string;
When a conversion command is received , referring to the dictionary, the converted character string candidate obtained by dividing the given kana character string into a converted character string by dividing it into phrases is recorded in the phrase candidate storage unit together with the part of speech, and output . When a conversion command, a post-conversion command, or a phrase break change command is received for any of the clauses in the character string candidate, the conversion character string candidate and the part of speech of the clause recorded in the clause candidate storage unit are converted into a new conversion character string. Conversion means for updating the candidate and part of speech, receiving a confirmation command, and outputting the selected conversion character string candidate as a confirmed character string;
Proofreading support means for performing an output for proofreading support based on the part of speech of each phrase in the converted character string candidate recorded in the phrase candidate storage unit;
A character string conversion device comprising:
The phrase candidate storage unit holds conversion character string candidates together with parts of speech until all kana character strings given to form one sentence receive the confirmation command,
The proofreading support means, before all the kana character strings given to compose one sentence receive the confirmation command,
It is determined whether or not an adjunct in each phrase in the converted character string candidate recorded in the phrase candidate storage unit is a particle, and counts whether or not a phrase having the same particle continues for a predetermined number of times. or in the case of continuous is to execute the process intends rows proofreading output indicating that particle is continuous,
When a conversion command , a post-conversion command, or a phrase delimiter change command is received for any of the clauses in the conversion character string candidate, not only the undefined clause but also the one sentence for the post-conversion or post-change clause Taking into account the clause that has already been output as the confirmed character string, and executing the proofreading support output again,
Character string converter characterized by the above.

A character string conversion method using a computer,
The conversion character string and its part of speech are stored in advance in the dictionary in association with the kana character string,
When the conversion command is received, the conversion means of the computer refers to the dictionary and records the converted character string candidate obtained by dividing the given character string into a converted character string by dividing it into a phrase, together with the part of speech, in the phrase candidate storage unit When the conversion command, post-conversion command, or phrase delimitation change command for any phrase in the conversion character string candidate is received, the conversion character string candidate and the part of speech of the phrase recorded in the phrase candidate storage unit In a character string conversion method for updating a new converted character string candidate and part of speech, receiving a confirmation command, and outputting the selected converted character string candidate as a confirmed character string,
The phrase candidate storage unit holds conversion character string candidates together with parts of speech until all kana character strings given to form one sentence receive the confirmation command,
As a proofreading support process before all the kana character strings given to compose one sentence receive the confirmation command, the proofreading support means of the computer,
It is determined whether or not an adjunct in each phrase in the converted character string candidate recorded in the phrase candidate storage unit is a particle, and counts whether or not a phrase having the same particle continues for a predetermined number of times. or in the case of continuous is to execute the process intends rows proofreading output indicating that particle is continuous,
When a conversion command , a post-conversion command, or a phrase delimiter change command is received for any of the clauses in the conversion character string candidate, not only the undefined clause but also the one sentence for the post-conversion or post-change clause Taking into account the clause that has already been output as the confirmed character string, and executing the proofreading support output again,
Character string conversion method characterized by