JP4039635B2

JP4039635B2 - Language information processing device

Info

Publication number: JP4039635B2
Application number: JP2004173156A
Authority: JP
Inventors: 直人中村; 智子瀬川; 郁子長澤; くにお松井; 誠塩津
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2004-06-10
Filing date: 2004-06-10
Publication date: 2008-01-30
Anticipated expiration: 2023-01-30
Also published as: JP2004303273A

Description

本発明は、言語表現を扱う情報処理システムにおいて、区間を指定して入力された言語表現を表す文字列に基づいて、言語に関する規則や知識の登録処理やキーワードを用いた検索処理などを行う言語情報処理装置に関するものである。
言語表現を扱う情報処理システムとしては、機械翻訳システム，文章推敲システムや文章の特徴抽出システムなどがあり、このような情報処理システムでは、言語情報処理装置によって、熟語や言い回しなどの言語表現を表す文字列を区間を指定して入力し、この文字列を解析することにより、言語に関する規則や知識を登録して活用している。 The present invention relates to a language for performing a rule registration or knowledge registration process or a search process using a keyword based on a character string representing a language expression input by specifying a section in an information processing system that handles language expressions. The present invention relates to an information processing apparatus.
Information processing systems that handle linguistic expressions include machine translation systems, sentence selection systems, and sentence feature extraction systems. In such information processing systems, linguistic expressions such as idioms and phrases are expressed by linguistic information processing devices. By inputting a character string by specifying a section and analyzing the character string, rules and knowledge about the language are registered and utilized.

また、言語情報処理装置は、文書中の特定の言語表現をキーワードとしてデータベースを検索するような用途にも利用することができる。 The language information processing apparatus can also be used for searching a database using a specific language expression in a document as a keyword.

図７に、従来の言語情報処理装置の構成例を示す。
図７において、抽出処理部３０１は、利用者からの指示に応じて、文書を表す文字列全体のなかから区間指定で示された文字列を抽出し、分解処理部３０２は、抽出された文字列を処理単位に分解して、解析処理部３１１に送出する構成となっている。
この分解処理部３０２は、例えば、言語表現を表す文字列を自立語の語幹と活用語尾などの語尾情報とからなる形態素を蓄積している形態素辞書３０３に基づいて、指定区間の文字列を形態素に分解している。 FIG. 7 shows a configuration example of a conventional language information processing apparatus.
In FIG. 7, the extraction processing unit 301 extracts the character string indicated by the section designation from the entire character string representing the document in accordance with an instruction from the user, and the decomposition processing unit 302 extracts the extracted character The sequence is divided into processing units and sent to the analysis processing unit 311.
The decomposition processing unit 302, for example, converts a character string representing a linguistic expression into a morpheme based on a morpheme dictionary 303 that stores morphemes composed of stems of independent words and ending information such as inflection endings. Has been broken down.

このようにして得られた形態素の集まりの入力を受けて、解析処理部３１１は、形態素相互間のつながりを解析し、この結果を用いて、登録処理部３１２は、所定の規則で結びついた形態素の連なりとして、熟語および言い回しを熟語辞書３１３に登録する。
ここで、熟語辞書３１３への登録作業では、膨大な数の熟語をまとめて登録する場合が多い。 In response to the input of the morpheme collection obtained in this way, the analysis processing unit 311 analyzes the connection between morphemes, and using this result, the registration processing unit 312 uses the morpheme linked according to a predetermined rule. As a series of phrases, idioms and phrases are registered in the idiom dictionary 313.
Here, in the registration work to the idiom dictionary 313, an enormous number of idioms are often registered together.

このような場合には、利用者は文書中の該当する区間を次々に指定していき、これに応じて、抽出処理部３０１が抽出した文字列を順次に蓄積していき、全ての区間の指定が終了したのちに、分解処理部３０２による分解処理および熟語辞書３１３への登録処理を一括してバッチ処理している。 In such a case, the user designates corresponding sections in the document one after another, and accordingly, the character strings extracted by the extraction processing unit 301 are sequentially accumulated, After the designation is completed, the decomposition processing by the decomposition processing unit 302 and the registration processing to the idiom dictionary 313 are batch processed.

ところで、上述したように、解析処理部３１１は、形態素相互間の関係を解析するのだから、この解析処理部３１１への入力は、形態素の連なりに分解されていなければならない。
このため、従来は、分解処理部３０２において、指定区間の文字列の全てを形態素に分解できなかった場合は、その時点で該当する指定区間の文字列についての処理を中止し、エラーメッセージなどでその文字列の指定を受け付けることができなかった旨を利用者に通知していた。 As described above, since the analysis processing unit 311 analyzes the relationship between morphemes, the input to the analysis processing unit 311 must be decomposed into a series of morphemes.
For this reason, conventionally, in the disassembly processing unit 302, when all the character strings in the specified section could not be decomposed into morphemes, the processing for the character string in the corresponding specified section is stopped at that time, and an error message or the like is issued. The user was notified that the specification of the character string could not be accepted.

また、熟語や言い回しを動詞や形容詞の語幹として登録するためには、例えば「指定区間は、自立語で始まって、自立語で終わっていなければならない」というような制約条件が必要となる。
このような制約条件についての検討は、従来は、解析処理部３１１で行っており、分解処理部３０２から受け取った形態素の連なりが制約条件を満たしていない場合は、その指定区間の文字列についての処理は直ちに中止される。そして、この場合も、形態素に分解できなかった場合と同様に、その文字列の指定を受け付けることができなかった旨などを利用者に通知していた。 In addition, in order to register a idiom or phrase as a verb or adjective stem, a constraint condition such as “the specified section must start with an independent word and end with an independent word” is necessary.
Conventionally, such a constraint condition has been examined by the analysis processing unit 311. If the sequence of morphemes received from the decomposition processing unit 302 does not satisfy the constraint condition, the character string in the designated section Processing is stopped immediately. Also in this case, the user is notified that the designation of the character string has not been accepted, as in the case where the character string cannot be decomposed.

このように、従来の言語情報処理装置は、利用者が言語情報処理装置における処理単位や制約条件を意識して、これらに整合するように文字列の区間を正確に指定することを前提としている。
したがって、言語情報処理装置を使いこなすためには、利用者が、形態素など言語情報処理装置における処理単位に関する十分な知識と経験を身につけている必要があった。 Thus, the conventional linguistic information processing apparatus is based on the premise that the user is aware of the processing units and restrictions in the linguistic information processing apparatus and accurately specifies the character string section so as to match them. .
Therefore, in order to make full use of the language information processing apparatus, the user needs to have sufficient knowledge and experience regarding processing units in the language information processing apparatus such as morphemes.

しかしながら、一般の利用者は、そのような知識や経験を持っていない場合が多く、また、上述した形態素などの処理単位は、常識的な言語の単位と同一ではないため、処理単位の境界や制約条件に整合する区間を正確に指定することは非常に難しい。
また、十分な知識を持った利用者が指定区間の入力を行った場合でも、膨大な数の熟語や言い回しを一括して登録しようとした場合などには、利用者による指定にミスが発生しやすくなるため、多数の指定区間が受け付けられずに排除されてしまう。 However, general users often do not have such knowledge and experience, and the processing units such as morphemes described above are not the same as common sense language units. It is very difficult to specify exactly the interval that matches the constraints.
In addition, even if a user with sufficient knowledge inputs a specified section, if a large number of idioms and phrases are to be registered at once, an error will occur in the specification by the user. Since it becomes easy, many designation | designated areas will be excluded without being accepted.

従来の言語情報処理装置においては、受け付けを拒否された指定区間に対応する熟語や言い回しを登録するためには、利用者が指定区間の入力を訂正して登録作業を繰り返すしかなかった。しかし、この作業は利用者にとって煩わしいものであり、利用者の負担を大きくしていた。
本発明は、処理条件との不整合を含んだ言語表現の入力を柔軟に受け付ける言語情報処理装置を提供することを目的とする。 In a conventional language information processing apparatus, in order to register a idiom or phrase corresponding to a specified section that has been rejected, the user has to correct the input of the specified section and repeat the registration work. However, this operation is troublesome for the user and increases the burden on the user.
An object of the present invention is to provide a language information processing apparatus that flexibly accepts input of a language expression including inconsistency with processing conditions.

図１に、本発明にかかわる現下情報処理装置の原理ブロック図を示す。
図１に示した言語情報処理装置は、文字列入力手段１１１と、指定区間入力手段１１２と、分解手段１１３と、第１の判定手段１１４と、第１の修正手段１１５と、第２の判定手段１２１と、第２の修正手段１２２と、抽出手段１１６とから構成される。
本発明にかかわる言語情報処理装置の原理は、以下の通りである。 FIG. 1 shows a principle block diagram of a current information processing apparatus according to the present invention.
The language information processing apparatus shown in FIG. 1 includes a character string input unit 111, a designated section input unit 112, a decomposition unit 113, a first determination unit 114, a first correction unit 115, and a second determination. It comprises means 121, second correction means 122, and extraction means 116.
The principle of the language information processing apparatus according to the present invention is as follows.

処理の対象となる言語表現の入力を受けて、所定の処理を実行する言語情報処理装置において、文字列入力手段１１１は、処理の対象となる言語表現を含んだ文字列を入力する。指定区間入力手段１１２は、文字列に含まれている言語表現の範囲を示す指定区間を入力する。分解手段１１３は、文字列入力手段１１１によって入力された文字列を言語表現の意味解析処理の単位である処理単位に分解する。第１の判定手段１１４は、指定区間の境界が、分解手段１１３によって得られる一連の処理単位のいずれかの境界に一致しているか否かに基づいて、指定区間の正当性を判定する。第１の修正手段１１５は、第１の判定手段１１４によって指定区間が正当でないと判定されたときに、指定区間の境界位置を処理単位のいずれかの境界に一致するように移動することによって指定区間を修正する。第２の判定手段１２１は、第１の修正手段１１５によって修正された指定区間に含まれる処理単位の順序や種類について、先頭及び末尾の処理単位は自立語である、または、指定区間が自立語を一つだけ含む、という制約条件を満たしているか否かに基づいて、指定区間の正当性を判定する。第２の修正手段１２２は、第２の判定手段１２１によって指定区間が正当でないと判定されたときに、先頭の処理単位が非自立語である場合は自立語が現れるまで文頭に向かって指定区間を拡張する、または、末尾の処理単位が非自立語である場合は自立語が現れるまで文頭に向かって指定区間を縮小する、または、先頭の自立語以外の指定を無視する、という修正規則に従って指定区間を修正する。抽出手段１１６は、分解手段１１３によって得られた一連の処理単位から、修正によって得られた指定区間に含まれる文字列に対応する処理単位を抽出する。 In a language information processing apparatus that receives a language expression to be processed and executes a predetermined process, the character string input unit 111 inputs a character string including the language expression to be processed. The designated section input means 112 inputs a designated section indicating the range of language expression included in the character string. The decomposition unit 113 decomposes the character string input by the character string input unit 111 into processing units which are units of language expression semantic analysis processing. The first determination unit 114 determines the validity of the specified section based on whether or not the boundary of the specified section matches any boundary of a series of processing units obtained by the decomposition unit 113. The first correcting means 115 is designated by moving the boundary position of the designated section so as to coincide with one of the boundaries of the processing unit when the first determining means 114 determines that the designated section is not valid. Correct the interval. The second determination unit 121 determines whether the first and last processing units are independent words for the order and type of processing units included in the designated section modified by the first modification unit 115 , or the designated section is an independent word. The validity of the specified section is determined based on whether or not the constraint condition that only one is included is satisfied. When the second determination unit 121 determines that the specified section is not valid when the second determination unit 121 determines that the first processing unit is a non-independent word, the specified section toward the head of the sentence until an independent word appears. According to the modification rule that the specified section is reduced toward the beginning of the sentence until the independent word appears, or the specification other than the first independent word is ignored if the processing unit at the end is a non-independent word Correct the specified section. The extracting unit 116 extracts a processing unit corresponding to the character string included in the designated section obtained by the correction from the series of processing units obtained by the decomposing unit 113.

このように構成された言語情報処理装置の動作は、下記の通りである。
請求項１の発明は、文字列入力手段１１１によって入力された文字列の全てを分解手段１１３による分解処理に供しているから、指定区間入力手段１１２によって示された指定区間の文字列とともに、その前後の文字列に関する情報を得ることができる。
したがって、第１の判定手段１１４により、指定区間が正当でない旨の判定結果が得られた場合に、第１の修正手段１１５は、指定区間およびその前後の文字列に関する情報に基づいて、指定区間の境界をこの指定区間の前後の文字列を含めた範囲で移動することが可能である。 The operation of the language information processing apparatus configured as described above is as follows.
In the invention of claim 1, since all of the character strings input by the character string input means 111 are subjected to the decomposition processing by the decomposition means 113, the character string of the designated section indicated by the designated section input means 112 is Information about the preceding and following character strings can be obtained.
Therefore, when the first determination unit 114 obtains a determination result that the specified section is not valid, the first correction unit 115 determines the specified section based on the information about the specified section and the character string before and after the specified section. Can be moved within the range including the character string before and after the specified section.

このとき、第１の修正手段１１５が、指定区間の境界によって分けられてしまった処理単位について、指定範囲に含めるか排除するかを決定するための適切な規則にしたがって指定区間の境界を移動すれば、指定区間の境界と処理単位の境界との不整合を解消し、修正された指定区間に含まれる複数の処理単位を処理対象の言語表現に関する情報として後段の処理に供することができる。 At this time, the first correcting unit 115 moves the boundary of the designated section according to an appropriate rule for determining whether to include or exclude the processing unit divided by the boundary of the designated section in the designated range. For example, the inconsistency between the boundary of the designated section and the boundary of the processing unit can be eliminated, and a plurality of processing units included in the modified designated section can be used for subsequent processing as information on the language expression to be processed.

つまり、このようにして修正された指定区間に含まれる複数の処理単位が、第２の判定手段１２１に入力され、この第２の判定手段１２１による判定処理、すなわち、これらの処理単位の配列が制約条件を満たしているか否かを判定する処理に供される。
第２の判定手段１２１が、制約条件に照らして指定区間が正当でないと判定した場合に、第２の修正手段１２２は、上述した制約条件に基づいて、指定区間の境界をこの指定区間に含まれる処理単位ごとに移動する。 That is, a plurality of processing units included in the designated section corrected in this way are input to the second determination unit 121, and the determination process by the second determination unit 121, that is, the arrangement of these processing units is determined. It is subjected to a process for determining whether or not the constraint condition is satisfied.
When the second determination unit 121 determines that the specified section is not valid in light of the constraint condition, the second correction unit 122 includes the boundary of the specified section in the specified section based on the constraint condition described above. Move by processing unit.

このとき、第２の修正手段１２２が、例えば、指定区間に複数の自立語があることが第2の判定手段による判定結果で示されたときに、先頭の自立語以外の指定を無視するように境界を移動すれば、指定区間に含まれる処理単位の並び方と上述した制約条件によって示される言語表現における構造との不整合を解消することができる。これにより、抽出手段１１６は、正当な指定区間に基づいて言語表現に関する情報を抽出し、登録処理などの処理に供することができる。
At this time, the second correcting means 122 ignores designations other than the first independent word , for example, when the determination result by the second determining means indicates that there are a plurality of independent words in the designated section. If the boundary is moved , the inconsistency between the arrangement of the processing units included in the designated section and the structure in the language expression indicated by the constraint condition described above can be eliminated. As a result, the extraction unit 116 can extract information related to the language expression based on the legitimate designated section and can be used for processing such as registration processing.

本発明にかかわる言語情報処理装置によれば、利用者が指定した区間の文字列およびその前後の文字列に関する情報に基づいて、指定区間の境界と形態素の境界との不整合や制約条件との不整合を検出し、該当する指定区間の境界位置を移動することにより、これらの不整合を解消することができる。これにより、不整合を含んだ指定区間を柔軟に受け付けて言語表現の解析，登録処理を行うことが可能となり、利用者の作業負担を大幅に軽減することができる。 According to the language information processing apparatus according to the present invention, based on the information about the character string of the section specified by the user and the character string before and after the section, the inconsistency between the boundary of the specified section and the boundary of the morpheme and the constraint condition By detecting inconsistencies and moving the boundary position of the corresponding designated section, these inconsistencies can be eliminated. As a result, it is possible to flexibly accept specified sections including inconsistencies and perform language expression analysis and registration processing, and the work burden on the user can be greatly reduced.

以下、図面に基づいて、本発明の実施形態について詳細に説明する。
図２に、本発明の言語情報処理装置の実施例構成図を示す。
図２において、言語表現保持部２０１は、登録したい熟語や言い回しを含んだ文などの言語表現をそれぞれ１つの単位として蓄積しており、表示データ作成部２０２は、この言語表現保持部２０１に蓄積された言語表現を表示するための表示データを作成し、表示用メモリ２０３を介して、ディスプレイ装置２０４に送出する構成となっている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 2 shows a block diagram of an embodiment of the language information processing apparatus of the present invention.
In FIG. 2, the linguistic expression holding unit 201 accumulates linguistic expressions such as sentences including phrases and phrases to be registered as a unit, and the display data creation unit 202 accumulates in the linguistic expression holding unit 201. Display data for displaying the displayed language expression is created and sent to the display device 204 via the display memory 203.

ここで、上述した言語表現保持部２０１は、例えば、句点で区切られた１つの文を言語表現の１単位とし、各文に通し番号を付けて蓄積しておけばよい。
また、このとき、表示データ作成部２０２は、言語表現保持部２０１から少なくとも１つの文を順次に読み出して、ディスプレイ装置２０４による表示画面の行数や桁数に合わせて文字コードを配置した表示データを作成し、表示用メモリ２０３に格納すればよい。 Here, the language expression holding unit 201 described above may store, for example, one sentence divided by punctuation as one unit of language expression, and add a serial number to each sentence.
At this time, the display data creation unit 202 sequentially reads at least one sentence from the language expression holding unit 201 and displays the display data in which character codes are arranged according to the number of lines and the number of digits on the display screen by the display device 204. May be created and stored in the display memory 203.

この場合は、利用者はディスプレイ装置２０４によって表示された文を見ながら、マウス２０５やキーボード２０６を操作して、これらの文に含まれている熟語や言い回しを表す文字列の区間を指定すればよい。
このようにして指定された区間を示す情報は、表示画面上での位置の範囲を例えば行および桁で示す情報として、入力制御部２０７を介して、まず、表示データ作成部２０２に送出される。 In this case, the user can operate the mouse 205 and the keyboard 206 while looking at the sentences displayed on the display device 204 and designate a section of a character string representing an idiom or phrase included in these sentences. Good.
Information indicating the section designated in this way is first sent to the display data creation unit 202 via the input control unit 207 as information indicating the range of positions on the display screen, for example, by lines and digits. .

この情報に基づいて、表示データ作成部２０２が、該当する文の指定された区間に含まれる文字に対応する属性情報を変更することにより、例えば、指定区間に含まれる文字に下線が施され、これにより、利用者が区間の指定を確認できるようになっている。
また、このとき、入力制御部２０７は、利用者からの区間指定があった旨を読出処理部２１１に通知し、これに応じて、この読出処理部２１１は、表示用メモリ２０３から該当する文に含まれる全ての文字列に対応する文字コードをその属性情報とともに読み出して、文字コード列を文字列保持部２１２に送出して保持するとともに、属性情報を区間情報検出部２１３に送出する。 Based on this information, the display data creation unit 202 changes the attribute information corresponding to the character included in the specified section of the corresponding sentence, for example, the character included in the specified section is underlined, As a result, the user can confirm the designation of the section.
At this time, the input control unit 207 notifies the reading processing unit 211 that the section has been designated by the user, and in response to this, the reading processing unit 211 reads the corresponding sentence from the display memory 203. The character codes corresponding to all the character strings included in the character string are read together with the attribute information, and the character code string is sent to and held in the character string holding unit 212, and the attribute information is sent to the section information detecting unit 213.

この区間情報検出部２１３は、受け取った属性情報の中から区間指定を示す属性情報を検出し、この検出結果に基づいて、指定された区間の範囲を示す区間情報を作成して、区間情報保持部２１４に送出すればよい。
このとき、区間情報検出部２１３は、例えば、表示データ作成部２０２から１行の桁数などの文の表示形式に関する情報を受け取り、この情報に基づいて、指定区間の文における位置を文頭からの文字数として算出すればよい。また、１つの文のなかに、複数の指定区間がある場合は、各指定区間に番号を付けて、その番号とともに、区間情報保持部２１４に保持すればよい。 The section information detection unit 213 detects attribute information indicating section specification from the received attribute information, creates section information indicating the range of the specified section based on the detection result, and stores section information. What is necessary is just to send to the part 214.
At this time, for example, the section information detection unit 213 receives information on the display format of the sentence such as the number of digits in one line from the display data creation unit 202, and based on this information, the position in the sentence of the specified section is determined from the beginning of the sentence. What is necessary is just to calculate as the number of characters. Further, when there are a plurality of designated sections in one sentence, each designated section may be numbered and held together with the number in the section information holding unit 214.

したがって、図３(a) に示すように、「彼は腹を立てました。」という文の下線を付して示した区間が指定された場合は、言語表現の蓄積単位であるこの文「彼は腹を立てました。」が文字情報保持部２１２に送出されるとともに、区間情報検出部２１３により、下線で示された区間の範囲を示す区間情報が検出され、区間情報保持部２１４に、表１に示すように、区間番号０に対応する区間情報が格納される。 Therefore, as shown in Fig. 3 (a), when the section indicated by the underline of the sentence "He is angry" is specified, this sentence " "He was angry." Is sent to the character information holding unit 212, and the section information detecting unit 213 detects section information indicating the range of the section indicated by the underline. As shown in Table 1, section information corresponding to section number 0 is stored.

ここで、表１においては、文の先頭文字から各文字に順に第０番から番号を付し、区間に含まれる番号の範囲を示すことにより、その区間の範囲を文字位置の範囲として示している。

Here, in Table 1, each character is numbered from the first character in order from the first character of the sentence, and the range of the number is indicated by indicating the range of the number included in the interval. Yes.

このように、表示用メモリ２０３の内容を読出処理部２１１が読み出して、文字コード列と属性情報とに分離し、区間情報検出部２１３が属性情報から区間情報を抽出することにより、熟語などを含んだ言語表現そのものに関する文字情報と、登録すべき熟語などの範囲を示す区間情報との入力をそれぞれ受け付けることができる。
すなわち、マウス２０５やキーボード２０６の操作に応じて、入力制御部２０６が表示データ作成部２０２や読出処理部２１１を制御して上述した動作を起動することにより、これらの各部により、文字列入力手段１１１および指定区間入力手段１１２の機能を実現することができる。 In this way, the content of the display memory 203 is read by the read processing unit 211 and separated into character code strings and attribute information, and the section information detection unit 213 extracts section information from the attribute information, so that idioms and the like can be obtained. It is possible to accept input of character information related to the included language expression itself and section information indicating a range of idioms to be registered.
That is, the input control unit 206 controls the display data creation unit 202 and the read processing unit 211 in accordance with the operation of the mouse 205 and the keyboard 206 to start the above-described operation, so that these units can perform character string input means. 111 and the function of the designated section input means 112 can be realized.

このようにして入力された文字情報は、言語表現保持部２０１に蓄積された１つの単位の言語表現全体に相当するものであるから、分解手段１１３に相当する分解処理部２２１が、形態素辞書２２２を参照しながらこの文字情報を従来と同様にして形態素に分解することにより、指定区間の言語表現とともにその前後の言語表現に関する情報を得ることができる。 Since the character information input in this way corresponds to the entire language expression of one unit stored in the language expression holding unit 201, the decomposition processing unit 221 corresponding to the disassembling unit 113 performs the morpheme dictionary 222. By disassembling this character information into morphemes in the same manner as in the past, it is possible to obtain information related to the language expression before and after the language expression of the designated section.

ここで、上述した形態素辞書２２２には、図４に示すように、「彼」，「腹」，「立て」などの自立語の語幹である形態素とともに、「は」，「を」，「ました」，「。」などの非自立語である形態素が、それぞれの属性などの情報とともに蓄積されている。但し、図４においては、各形態素に対応する情報の一部として、自立語である場合には丸印を付し、非自立語である場合にはバツ印を付して示した。 Here, in the morpheme dictionary 222 described above, as shown in FIG. 4, “ha”, “wo”, “masashi”, together with morphemes that are stems of independent words such as “he”, “belly”, “stand”, etc. Morphemes that are non-independent words such as "" and "." Are stored together with information such as their attributes. However, in FIG. 4, as a part of information corresponding to each morpheme, a circle mark is attached when it is an independent word, and a cross mark is attached when it is a non-independent word.

例えば、図３(a) に示した例文“彼は腹を立てました。”を分解処理部２２１によって形態素に分解すると、図３(b) にハイフンで区切って示すような各形態素が得られ、形態素保持部２２３を介して不整合検出部２２４に送出される。
この不整合検出部２２４は第１の判定手段１１４に相当するものであり、分解処理部２２１で得られた分解結果と、対応する区間情報とを照合して、指定区間の境界が形態素の境界と一致しているか否かを判定し、一致しない旨の判定結果を得たときに、不整合を検出したとして、修正処理部２２５を起動する構成となっている。 For example, when the example sentence “He is angry” shown in FIG. 3A is decomposed into morphemes by the decomposition processing unit 221, each morpheme is obtained as shown in FIG. And sent to the mismatch detection unit 224 via the morpheme holding unit 223.
The inconsistency detection unit 224 corresponds to the first determination unit 114, and the decomposition result obtained by the decomposition processing unit 221 is collated with the corresponding section information, and the boundary of the specified section is the boundary of the morpheme. It is determined whether or not they match, and when a determination result indicating that they do not match is obtained, it is determined that a mismatch has been detected and the correction processing unit 225 is activated.

このとき、不整合検出部２２４は、指定区間の開始位置が形態素の前側の境界に一致しているか否かおよび指定区間の終了位置が形態素の後ろ側の境界に一致しているか否かをそれぞれ判定すればよい。
例えば、図３に示した例について不整合の検出処理を行うと、指定区間の開始位置は形態素の境界に一致しているが、指定区間の終了位置は非自立語である「ました」にかかっており、形態素の境界に一致していないことが分かる。 At this time, the inconsistency detection unit 224 determines whether the start position of the specified section matches the front boundary of the morpheme and whether the end position of the specified section matches the back boundary of the morpheme, respectively. What is necessary is just to judge.
For example, if inconsistency detection processing is performed for the example shown in FIG. 3, the start position of the specified section matches the boundary of the morpheme, but the end position of the specified section is a non-independent word “Dai” It can be seen that it does not coincide with the morpheme boundaries.

この場合に、不整合検出部２２４は、不整合を検出した指定区間の境界を指定して修正処理部２２５を起動し、該当する指定区間の境界と形態素の境界との不整合の修正処理を依頼する。
この修正処理部２２５は、修正規則保持部２２６内の修正規則に従って、後述する修正処理を行う構成となっている。 In this case, the inconsistency detection unit 224 activates the correction processing unit 225 by designating the boundary of the designated section in which the inconsistency is detected, and performs a process for correcting inconsistency between the boundary of the corresponding designated section and the boundary of the morpheme. Ask.
The correction processing unit 225 is configured to perform correction processing described later in accordance with the correction rules in the correction rule holding unit 226.

ここで、修正規則保持部２２６には、例えば、次に挙げる２つの規則の規則（１）および規則（２）を保持しておき、指定区間の境界が含まれている形態素が自立語であるか否かに応じて適用すればよい。
規則（１）該当する形態素が自立語である場合は、指定区間を該当する形態素全体に拡張する。 Here, the modified rule holding unit 226 holds, for example, the following two rules (1) and (2), and the morpheme including the boundary of the specified section is an independent word. It may be applied depending on whether or not.
Rule (1) When the corresponding morpheme is an independent word, the specified section is extended to the entire corresponding morpheme.

規則（２）該当する形態素が非自立語である場合は、その形態素を指定区間から排除する。
図３に示した指定区間の例を修正する際には、該当する形態素である「ました」が非自立語であることから規則（２）が適用され、指定区間から形態素「ました」が排除される。この場合に、修正処理部２２５は、区間情報保持部２１４の該当する区間番号に対応する区間情報を文字位置「２〜５」に修正して、図３(c) に示すように、指定区間の終了位置を形態素「ました」の直前の形態素である「立て」の後ろ側に移動すればよい。 Rule (2) If the corresponding morpheme is a non-independent word, the morpheme is excluded from the designated section.
When the example of the specified section shown in Fig. 3 is modified, rule (2) is applied because the corresponding morpheme "Sat" is a non-independent word. Eliminated. In this case, the correction processing unit 225 corrects the section information corresponding to the corresponding section number in the section information holding unit 214 to the character position “2 to 5”, and as shown in FIG. May be moved to the rear side of the “stand” which is the morpheme immediately before the morpheme “sata”.

このように、修正処理部２２５が修正規則保持部２２６内の修正規則に従って動作することにより、図１に示した第１の修正手段１１５の機能を実現し、指定区間の境界と形態素の境界との不整合を解消することができる。
また、図２において、転送処理部２２７は抽出手段１１６として動作し、不整合が無い旨の検出結果あるいは上述した修正処理部２２５による修正処理が終了した旨の通知に応じて、区間情報保持部２１４に保持された区間情報に従って、形態素保持部２２３から指定区間に含まれる形態素を読み出し、順次に解析処理部３１１に送出すればよい。 As described above, the correction processing unit 225 operates according to the correction rule in the correction rule holding unit 226, thereby realizing the function of the first correction unit 115 shown in FIG. Inconsistency can be resolved.
In FIG. 2, the transfer processing unit 227 operates as the extraction unit 116, and in response to a detection result indicating that there is no inconsistency or a notification indicating that the correction processing by the correction processing unit 225 described above has ended. According to the section information held in 214, the morphemes included in the designated section may be read from the morpheme holding unit 223 and sequentially sent to the analysis processing unit 311.

このようにして、不整合を含んだ区間指定も柔軟に受け付けて、該当する文字列を形態素に分解し、この分解結果を解析処理および登録処理に供することができる。
この場合は、解析処理部３１１に入力される文字列は全て形態素に分解されているから、解析処理部３１１および登録処理部３１２がは、従来と同様の解析処理および登録処理を行って、指定区間の文字列によって表された熟語や言い回しを熟語辞書３１３に登録すればよい。 In this way, it is possible to flexibly accept section designations including inconsistencies, decompose the corresponding character string into morphemes, and use the decomposition results for analysis processing and registration processing.
In this case, since all character strings input to the analysis processing unit 311 are decomposed into morphemes, the analysis processing unit 311 and the registration processing unit 312 perform analysis processing and registration processing similar to those in the past, and specify The idiom or phrase expressed by the character string of the section may be registered in the idiom dictionary 313.

上述したようにして、利用者による指定区間の境界を自動的に修正することを可能としたことにより、利用者が言語表現を登録する際に、言語情報処理装置における処理単位を意識する必要を無くし、利用者が直観的に判断した文字列の区間を受け付けて、該当する言語表現を確実に入力することが可能となる。
したがって、同じ言語表現を繰り返し入力する手間を省いて利用者の作業負担を軽減し、専門的な知識の少ない利用者にとっても使いやすい言語情報処理装置を実現することができる。 As described above, it is possible to automatically correct the boundary of the designated section by the user, so that the user needs to be aware of the processing unit in the language information processing apparatus when registering the language expression. It is possible to receive the section of the character string intuitively determined by the user and reliably input the corresponding language expression.
Therefore, it is possible to realize a language information processing apparatus that is easy to use even for a user with less specialized knowledge by reducing the burden on the user by eliminating the trouble of repeatedly inputting the same language expression.

なお、言語表現保持部２０１に言語表現を蓄積する単位としては、文法的に完結したいわゆる「文」に限らず、登録すべき熟語などを含んだ文の一部などでもよい。ただし、蓄積する言語表現の１単位は、全て形態素に分解可能であることが必要である。
また、指定区間の境界を修正するための規則としては、更に、次に挙げる規則（３）のような例も考えられる。 The unit for accumulating language expressions in the language expression holding unit 201 is not limited to a so-called “sentence” that is grammatically complete, but may be a part of a sentence including idioms to be registered. However, one unit of language expression to be accumulated must be decomposable into morphemes.
Further, as a rule for correcting the boundary of the designated section, an example such as the following rule (3) is also conceivable.

規則（３）形態素に分解できなかった文字列の途中に指定区間の境界がある場合には、その文字列全体に指定区間を拡張する。
この規則（３）は、形態素に分解できなかった文字列を固有名詞として捉え、その文字列全体を指定区間に含めることにより、利用者の意図をくみ取ろうとするものである。
これにより、言語表現入力装置に備えられた形態素辞書２２２に蓄積されていない固有名詞などを含んでいる場合においても、不完全な区間指定を柔軟に受け付けることができる。 Rule (3) If there is a boundary of a specified section in the middle of a character string that could not be decomposed into morphemes, the specified section is extended to the entire character string.
This rule (3) is to capture a user's intention by capturing a character string that could not be decomposed into morphemes as a proper noun and including the entire character string in a specified section.
Thereby, even when a proper noun that is not stored in the morpheme dictionary 222 provided in the language expression input device is included, incomplete section designation can be flexibly accepted.

また、上述した実施例のように、会話的に言語表現の入力処理および解析，登録処理を進める場合には、修正処理部２２５による修正結果を表示データ作成部２０２を介してディスプレイ装置２０４に表示することにより、利用者に専門的な知識を経験的に習得させることも可能である。
一方、利用者が多数の言語表現を一括して入力し、これらの言語表現に関する解析，登録処理をバッチ的に処理する場合もある。 In addition, as in the above-described embodiment, when the input processing, analysis, and registration processing of language expressions are performed interactively, the correction result by the correction processing unit 225 is displayed on the display device 204 via the display data creation unit 202. By doing so, it is also possible to let the user acquire specialized knowledge empirically.
On the other hand, there is a case where a user inputs a large number of language expressions at once, and the analysis and registration processes related to these language expressions are processed in batches.

図５に、本発明にかかわる言語情報処理装置の別実施例構成図を示す。
図５において、言語情報処理装置は、図２に示した文字情報保持部２１２の代わりに、文情報保持部２１５と読出処理部２２８とを備えて構成されている。
この場合は、利用者によって区間が指定されたときに、区間指定が施された文の言語表現保持部２０１における格納場所を示す文情報を文情報保持部２１５に保持しておき、解析，登録処理を行う際に、読出処理部２２８が、この文情報に基づいて、言語表現保持部２０１から該当する文を読み出して、その全ての文字列を分解処理部２２１に送出すればよい。 FIG. 5 is a block diagram showing another embodiment of the language information processing apparatus according to the present invention.
5, the language information processing apparatus includes a sentence information holding unit 215 and a read processing unit 228 instead of the character information holding unit 212 shown in FIG.
In this case, when a section is designated by the user, sentence information indicating the storage location in the language expression holding unit 201 of the sentence for which the section is specified is held in the sentence information holding unit 215 for analysis and registration. When performing the processing, the read processing unit 228 may read the corresponding sentence from the language expression holding unit 201 based on the sentence information and send all the character strings to the decomposition processing unit 221.

例えば、登録すべき熟語などを含んだ言語表現にそれぞれ文番号が与えられており、この文番号に対応して言語表現保持部２０１に蓄積されている場合は、文情報保持部２１５は、表示データ作成部２０２から該当する文番号を受け取り、この文番号を上述した文情報として保持しておけばよい。
この文番号に基づいて、読出処理部２２８が言語表現保持部２０１を検索すれば、該当する文を構成する全ての文字列を読み出すことができ、指定された区間の文字列とともにその前後の文字列を分解処理部２２１による形態素への分解処理に供することができる。 For example, when a sentence number is given to each linguistic expression including idioms to be registered and stored in the linguistic expression holding unit 201 corresponding to the sentence number, the sentence information holding unit 215 displays The corresponding sentence number may be received from the data creation unit 202 and the sentence number may be held as the sentence information described above.
If the reading processing unit 228 searches the linguistic expression holding unit 201 based on the sentence number, all the character strings constituting the corresponding sentence can be read, and the character string before and after the character string in the specified section can be read. The column can be subjected to a morpheme decomposition process by the decomposition processing unit 221.

したがって、指定区間に含まれる文字列に関する情報とともに、その前後の文字列の情報を用いて、指定区間の境界と形態素の境界との整合性を判断し、検出された不整合を指定区間の境界を移動することによって解消することができ、不整合を含んだ区間の指定を柔軟に受け付けて登録処理を行うことができる。
また、この場合は、解析処理や登録処理とともに、分解処理や修正処理を一括して行うことができるから、情報処理装置のプロセッサの処理能力を有効に活用することができる。 Therefore, using the information on the character string included in the specified section and the information on the character strings before and after it, the consistency between the boundary of the specified section and the boundary of the morpheme is determined, and the detected inconsistency is identified as the boundary of the specified section. Can be eliminated, and the registration process can be performed by flexibly receiving the designation of the section including the inconsistency.
Further, in this case, since the decomposition process and the correction process can be performed together with the analysis process and the registration process, the processing capability of the processor of the information processing apparatus can be effectively utilized.

更に、熟語や言い回しを動詞や形容詞の語幹として登録する場合などに必要とされる制約条件に関する整合性をチェックし、そのチェック結果に応じて指定区間の境界を修正することもできる。
図６に、本発明にかかわる言語情報処理装置の更に別の実施例構成図を示す。
図６において、言語情報処理装置は、図２に示した言語情報処理装置に、条件保持部２３１と条件チェック部２３２と修正処理部２３３と修正規則保持部２３４とを付加し、上述した指定区間と形態素の境界との不整合の検出および解消を経たのちに動作し、その処理結果を転送処理部２２７を介して解析処理部３１１に送出する構成となっている。 Furthermore, it is possible to check the consistency of constraints required when registering idioms and phrases as verbs and adjective stems, and to modify the boundaries of the designated section according to the check results.
FIG. 6 is a block diagram showing still another embodiment of the language information processing apparatus according to the present invention.
In FIG. 6, the language information processing apparatus adds a condition holding unit 231, a condition check unit 232, a correction processing unit 233, and a correction rule holding unit 234 to the language information processing apparatus shown in FIG. And the morpheme boundary are detected and resolved, and the processing result is sent to the analysis processing unit 311 via the transfer processing unit 227.

図６において、条件保持部２３１は指定区間に含まれる形態素の順序や種類について、例えば、「先頭および末尾の形態素は自立語である」などの制約条件を保持しており、条件チェック部２３２は、受け取った一連の形態素がこの制約条件を満たしているか否かを判定すればよい。すなわち、条件保持部２３１と条件チェック部２３２とによって、第２の判定手段１２１の機能が果たされている。 In FIG. 6, the condition holding unit 231 holds a constraint condition such as “the first and last morphemes are independent words” with respect to the order and types of morphemes included in the specified section, and the condition check unit 232 It is only necessary to determine whether or not the received series of morphemes satisfies this constraint condition. That is, the function of the second determination unit 121 is fulfilled by the condition holding unit 231 and the condition check unit 232.

例えば、図３(c) に示した修正結果が入力された場合は、先頭の形態素「腹」および末尾の形態素「立て」の両方が自立語であるから、条件チェック部２３２は、この指定区間は制約条件を満たしていると判断し、これらの形態素を解析処理部３１１に送出する。
一方、図３(d) に示すように、文字列「腹を立てました」が指定区間とされた場合は、指定区間の開始位置および終了位置共に形態素の境界と整合しているから、条件チェック部２３２には、先頭の形態素「腹」から末尾の形態素「ました」までの４つの形態素が入力される。 For example, when the correction result shown in FIG. 3 (c) is input, since both the first morpheme “belly” and the last morpheme “stand” are independent words, the condition check unit 232 determines the specified section. Determines that the constraint condition is satisfied, and sends these morphemes to the analysis processing unit 311.
On the other hand, as shown in Fig. 3 (d), if the character string "I got angry" is set as the specified section, the start position and end position of the specified section match the morpheme boundary. The check unit 232 receives four morphemes from the first morpheme “belly” to the last morpheme “sata”.

この場合は、末尾の形態素が自立語ではないから、条件チェック部２３２は制約条件を満たしていないと判断し、修正処理部２３３に指定区間の修正処理を依頼する。
ここで、修正規則保持部２３４は、例えば、次に挙げる２つの規則（４），規則（５）を保持しており、修正処理部２３３による修正処理に供している。
規則（４）先頭の形態素が非自立語である場合は、自立語が現れるまで文頭に向かって指定区間を拡張する。 In this case, since the last morpheme is not an independent word, the condition check unit 232 determines that the constraint condition is not satisfied, and requests the correction processing unit 233 to perform the correction process for the specified section.
Here, the correction rule holding unit 234 holds, for example, the following two rules (4) and (5), which are used for correction processing by the correction processing unit 233.
Rule (4) If the first morpheme is a non-independent word, the designated section is expanded toward the beginning of the sentence until the independent word appears.

規則（５）末尾の形態素が非自立語である場合は、自立語が現れるまで文頭に向かって指定区間を縮小する。
例えば、図３(d) に示した例の場合は、修正処理部２３３が、規則（５）を適用して指定区間の終了位置を修正し、図３(e) に示すように、指定区間から形態素「ました」を削除して、指定区間の終了位置を「立て」の後ろ側とすることにより、上述した制約条件を満たす形態素の連なりを得ることができる。 Rule (5) If the last morpheme is a non-independent word, the specified section is reduced toward the beginning of the sentence until the independent word appears.
For example, in the case of the example shown in FIG. 3D, the correction processing unit 233 corrects the end position of the specified section by applying the rule (5), and the specified section as shown in FIG. By deleting the morpheme “sata” from the end and setting the end position of the specified section behind the “stand”, it is possible to obtain a series of morphemes that satisfy the constraint conditions described above.

このように、修正処理部２３３が修正規則保持部２３４内の修正規則に従って修正処理を行うことにより、図１に示した第２の修正手段１１５の機能を実現することができる。
これにより、制約条件との不整合を含んだ指定区間も柔軟に受け付けて、解析，登録処理を進めることが可能となるから、入力した言語表現を確実に登録することが可能となるから、同じ言語表現を繰り返し入力する手間を省くことができる。 As described above, the correction processing unit 233 performs the correction process according to the correction rule in the correction rule holding unit 234, thereby realizing the function of the second correction unit 115 shown in FIG.
As a result, it is possible to flexibly accept specified sections that include inconsistencies with the constraint conditions, and to proceed with analysis and registration processing. Therefore, it is possible to reliably register the input language expression. This saves you the trouble of repeatedly entering language expressions.

また、利用者が制約条件を意識する必要性を除去するので、利用者の作業負担を大幅に軽減するとともに、専門的な知識の少ない利用者にも使いやすい言語情報処理装置を提供することができる。
ここで、登録しようとする言語表現が上述した制約条件「先頭および末尾の形態素は自立語である」が満たしていれば、その言語表現にそのまま活用語尾を付けたり、また、接頭語を付加したりすることができ、該当する言語表現を有効に活用することができる。特に、言語表現を動詞や形容詞として登録したい場合には、上述したような制約条件を満たしていることが望まれる。 In addition, since it eliminates the need for the user to be aware of the constraints, it is possible to greatly reduce the workload of the user and provide a language information processing device that is easy to use even for users with less specialized knowledge. it can.
Here, if the linguistic expression to be registered satisfies the above-mentioned restriction condition “the morphemes at the beginning and end are independent words”, the linguistic expression is used as it is, or a prefix is added. And can make effective use of the corresponding language expression. In particular, when it is desired to register a linguistic expression as a verb or an adjective, it is desirable that the above-described constraints are satisfied.

したがって、上述した制約条件についてのチェックおよび修正機能は、動詞や形容詞などのように、語尾が活用する言語表現を登録する際に、特に有効である。
なお、キーワード検索などの場合は、例えば「指定区間が自立語を一つだけ含む」というような制約条件が考えられる。
この場合は、修正規則保持部２３４に、規則（６）として「先頭の自立語以外の指定を無視する」を保持しておき、先頭の自立語のみをキーワードとして検索処理部に送出すればよい。 Therefore, the check and correction functions for the constraint conditions described above are particularly effective when registering linguistic expressions utilized by endings such as verbs and adjectives.
In the case of keyword search or the like, for example, a constraint condition that “the specified section includes only one independent word” can be considered.
In this case, it is only necessary to store “ignore designations other than the first independent word” as the rule (6) in the correction rule holding unit 234 and send only the first independent word as a keyword to the search processing unit. .

本発明にかかわる言語情報処理装置によれば、機械翻訳システム，文章推敲システムや文章の特徴抽出システムおよび文書中の特定の言語表現をキーワードとしてデータベースを検索する用途のように、言語表現の登録処理やキーワードの検索処理にそれぞれの処理において必要とされる規則に沿って登録あるいは検索すべき言語表現を指定することが期待されるシステムにおいて、このような規則との不整合を含んだ指定区間を柔軟に受け付けて言語表現の解析，登録処理を行うことが可能となる。 According to the language information processing apparatus according to the present invention, language expression registration processing, such as a machine translation system, a sentence search system, a sentence feature extraction system, and a database search using a specific language expression in a document as a keyword. In a system that is expected to specify the language expression to be registered or searched in accordance with the rules required for each processing in the search processing of keywords and keywords, a specified section including inconsistencies with such rules is used. It is possible to flexibly accept and analyze and register language expressions.

したがって、機械翻訳システム，文章推敲システムや文章の特徴抽出システムおよび文書中の特定の言語表現をキーワードとしてデータベースを検索するような言語表現を扱う情報処理システムにおいて、利用者の作業負担を大幅に軽減することができるので、このような言語表現を扱う情報処理システムの分野において極めて有用である。 Therefore, the user's workload is greatly reduced in machine translation systems, sentence review systems, sentence feature extraction systems, and information processing systems that handle linguistic expressions such as searching a database using a specific linguistic expression in a document as a keyword. Therefore, it is extremely useful in the field of information processing systems that handle such language expressions.

本発明にかかわる言語情報処理装置の原理ブロック図である。It is a principle block diagram of the language information processing apparatus concerning this invention. 本発明にかかわる言語情報処理装置の実施例構成図である。It is an Example block diagram of the language information processing apparatus concerning this invention. 指定区間の修正動作を説明する図である。It is a figure explaining the correction operation | movement of a designated area. 形態素辞書の説明図である。It is explanatory drawing of a morpheme dictionary. 本発明にかかわる言語情報処理装置の別実施例構成図である。It is another Example block diagram of the language information processing apparatus concerning this invention. 本発明にかかわる言語情報処理装置の更に別の実施例構成図である。It is another Example block diagram of the language information processing apparatus concerning this invention. 従来の言語情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the conventional language information processing apparatus.

Explanation of symbols

１１１文字列入力手段
１１２指定区間入力手段
１１３分解手段
１１４第１の判定手段
１１５第１の修正手段
１１６抽出手段
１２１第２の判定手段
１２２第２の修正手段
２０１言語表現保持部
２０２表示データ作成部
２０３表示用メモリ
２０４ディスプレイ装置
２０５マウス
２０６キーボード
２０７入力制御部
２１１読出処理部
２１２文字列保持部
２１３区間情報検出部
２１４区間情報保持部
２１５文情報保持部
２２１，３０２分解処理部
２２２，３０３形態素辞書
２２３形態素保持部
２２４不整合検出部
２２５，２３３修正処理部
２２６，２３４修正規則保持部
２２７転送処理部
２２８読出処理部
２３１条件保持部
２３２条件チェック部
３０１抽出処理部
３１１解析処理部
３１２登録処理部
３１３熟語辞書 111 Character string input means 112 Designated section input means 113 Decomposition means 114 First determination means 115 First correction means 116 Extraction means 121 Second determination means 122 Second correction means 201 Language expression holding section 202 Display data creation section 203 Display Memory 204 Display Device 205 Mouse 206 Keyboard 207 Input Control Unit 211 Reading Processing Unit 212 Character String Holding Unit 213 Section Information Detection Unit 214 Section Information Holding Unit 215 Sentence Information Holding Units 221 and 302 Decomposition Processing Units 222 and 303 Morphological Dictionary 223 Morphological holding unit 224 Inconsistency detecting unit 225, 233 Correction processing unit 226, 234 Correction rule holding unit 227 Transfer processing unit 228 Reading processing unit 231 Condition holding unit 232 Condition checking unit 301 Extraction processing unit 311 Analysis processing unit 312 Registration processing unit 313 Idioms dictionary

Claims

In a language information processing apparatus that receives an input of a language expression to be processed and executes a predetermined process,
A character string input means for inputting a character string including a language expression to be processed;
A designated section input means for inputting a designated section indicating a range of language expressions included in the character string;
Decomposing means for decomposing the character string input by the character string input means into processing units which are units of semantic analysis processing of language expression;
First determination means for determining the validity of the specified section based on whether or not the boundary of the specified section matches any boundary of a series of processing units obtained by the decomposing means;
When the first determination means determines that the specified section is not valid, the specified section is corrected by moving the boundary position of the specified section so as to coincide with any boundary of the processing unit. 1 correction means;
Restriction that the first and last processing units are independent words or that the specified section contains only one independent word for the order and type of processing units included in the specified area modified by the first modification means A second determination means for determining the validity of the specified section based on whether or not a condition is satisfied;
When the second determination means determines that the specified section is not valid, if the first processing unit is a non-independent word, the specified section is expanded toward the beginning of the sentence until the independent word appears, or at the end When the processing unit is a non-independent word, the specified section is corrected according to a correction rule that reduces the specified section toward the beginning of the sentence until the independent word appears or ignores specifications other than the first independent word . Corrective means,
An linguistic information processing apparatus comprising: extraction means for extracting a processing unit corresponding to a character string included in a specified section obtained by correction from a series of processing units obtained by the decomposing means.