JP3043038B2

JP3043038B2 - Linguistic expression feature determination device

Info

Publication number: JP3043038B2
Application number: JP2214123A
Authority: JP
Inventors: 賢治今村; 芳史大山; 恒昭加藤; 雅博奥; 統之堀井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1990-08-13
Filing date: 1990-08-13
Publication date: 2000-05-22
Anticipated expiration: 2015-05-22
Also published as: JPH0496860A

Description

【発明の詳細な説明】「産業上の利用分野」この発明は、文章の言語表現の特徴を自動的に判定す
る、言語表現の特徴判定装置に関する。Description: TECHNICAL FIELD The present invention relates to a linguistic expression feature determination device that automatically determines the characteristics of a linguistic expression of a sentence.

「従来の技術」文においては、同じ意味の内容であっても言語表現が
異なる場合がある。例えば、男性表現では「おまえもが
んばれよ」という表現を用いる文も、女性表現では「あ
なたもがんばってね」等の表現を用いる。また、「そん
なことはない」という意味の文も、東京弁では「そんな
ことはねぇ」等、関西弁では「そんなことあらへん」
等、九州弁では「そげんこつなか」等の表現を用いる。In the "prior art" sentences, the linguistic expressions may be different even if they have the same meaning. For example, in a male expression, a sentence that uses the expression "you do your best" is used, and in a female expression, an expression such as "you also do your best" is used. Also, a sentence that means "There is no such thing" is such as "No such thing" in the Tokyo dialect, and "Such a thing" in the Kansai dialect.
The Kyushu dialect uses expressions such as "Sogenkotsunaka".

このように、人により同じ意味を表す文の表現が異な
るため電報例文集のようなメッセージ集から好みの文章
を選択する場合、その表現が男性表現であるか女性表現
であるか、あるいはどの方言であるか等、利用者に対し
て指針を与えることが望ましい。In this way, since the expression of the sentence expressing the same meaning differs from person to person, when selecting a favorite sentence from a message collection such as a telegram example sentence, whether the expression is a male expression, a female expression, or any dialect It is desirable to give guidelines to users, such as

「発明が解決しようとする課題」従来用いられていたメッセージ集データベース検索装
置及びそのデータベース構築装置は第15図に示す構成を
とり、このメッセージ集検索装置で表現の種類を表示し
ようとした場合、予めメッセージ集データベース中に表
現の種類を入力しておく必要があり、従来ではこの場
合、言語表現の種類を入手により判定し、メッセージ集
データベースに入力しなければならなかった。"Problem to be Solved by the Invention" The message collection database search device and the database construction device used heretofore have the configuration shown in FIG. 15, and when trying to display the type of expression with this message collection search device, It is necessary to input the type of expression in the message collection database in advance, and in this case, conventionally, it was necessary to determine the type of linguistic expression by obtaining it and input it to the message collection database.

この発明の目的は人手により言語表現の特徴を判定し
ていた労力を軽減するため、言語表現の特徴を自動的に
判定する言語表現の特徴判定装置を提供することにあ
る。SUMMARY OF THE INVENTION An object of the present invention is to provide a linguistic expression feature determining apparatus for automatically determining the characteristics of a linguistic expression in order to reduce the labor for manually determining the characteristics of the linguistic expression.

「課題を解決するための手段」上記目的を達成するため、この発明の言語表現の特徴
判定装置は、特徴的な言語表現を構成する、表記、標準
形、品詞及び活用形を含む形態素の列を記した特徴表現
辞書と、任意の文を形態素ごとに分割し各形素の表記、
標準形、品詞及び活用形の列とする形態素解析手段と、
その形態素解析結果である各形態素の表記、標準形、品
詞及び活用形の列と特徴表現辞書をマッチさせ、その文
章を特徴づける表現部分を抽出し、特徴的表現の重みの
合計値を得る特徴表現抽出手段と、その特徴表現抽出手
段から得られた複数の特徴的表現の重みの合計値の差か
らその文章の特徴を判定する特徴判定手段をと具備す
る。"Means for Solving the Problems" In order to achieve the above object, a linguistic expression feature determination apparatus according to the present invention includes a sequence of morphemes including a notation, a standard form, a part of speech, and an inflected form which constitute a characteristic linguistic expression , And a feature expression dictionary that divides arbitrary sentences into morphemes,
A morphological analysis means to be a sequence of standard forms, parts of speech and inflected forms;
A feature that matches the strings of notation, standard form, part of speech, and inflected form of each morpheme that is the result of the morphological analysis with the feature expression dictionary, extracts the expression part characterizing the sentence, and obtains the sum of the weights of the characteristic expressions. An expression extraction unit, and a characteristic determination unit that determines the characteristic of the sentence from the difference between the total values of the weights of the plurality of characteristic expressions obtained from the characteristic expression extraction unit.

「作用」この発明の言語表現の特徴判定装置では、まず、入力
された文を形態素解析し、各形態素の表記、標準形、品
詞、活用形を求める。次に、形態素解析結果を基に特徴
表現辞書を検索し、文中の特徴的表現及びそれらの重み
を抽出し、各表現の重みを合計し、特徴表現重み合計値
を求める。これを複数の特徴表現辞書に関して繰り返
し、各々の特徴表現重み合計値を求める。最後に、最も
大きな特徴表現重み合計値と、その他の特徴表現重み合
計値との差を算出し、その差がある値以内のものに関し
て、入力文の特徴として判定する。[Operation] In the linguistic expression feature determination apparatus of the present invention, first, an input sentence is morphologically analyzed, and the notation, standard form, part of speech, and inflected form of each morpheme are obtained. Next, the feature expression dictionary is searched based on the result of the morphological analysis, the characteristic expressions in the sentence and their weights are extracted, and the weights of each expression are summed to obtain a total value of the feature expression weights. This is repeated for a plurality of feature expression dictionaries, and the total value of each feature expression weight is determined. Finally, the difference between the largest total value of the feature expression weights and the total value of the other feature expression weights is calculated, and those having a difference within a certain value are determined as features of the input sentence.

「実施例」以下、図面を用いてこの発明の実施例を説明する。Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図はこの発明による言語表現の特徴判定装置の実
施例の構成を示す。言語表現の特徴判定装置本体６はハ
ードウエア的にはCPU、メモリなどで構成されるが、機
能的には、形態素解析部１、特徴表現抽出部２、特徴判
定部３、制御部４、切替スイッチ５、ｎ個の特徴表現辞
書10₁〜10_nから構成される。ここでｎは１より大きな整
数で、判定したい言語表現の特徴数である。例えば、男
性表現か女性表現かを判定する場合は、ｎ＝２である。FIG. 1 shows the configuration of an embodiment of a linguistic expression feature determining apparatus according to the present invention. The language expression feature determination device body 6 is composed of a CPU, a memory, and the like in terms of hardware, but functionally includes a morphological analysis unit 1, a feature expression extraction unit 2, a feature determination unit 3, a control unit 4, and a switching unit. switch 5, n pieces characterized composed expression dictionary 10 ₁ to 10 _n of. Here, n is an integer greater than 1 and is the number of features of the linguistic expression to be determined. For example, when determining whether the expression is a male expression or a female expression, n = 2.

第２図は特徴表現辞書10₁〜10_nの１レコードの構成例
である。同図における７は特徴表現を構成する形態素の
数ｍが記録され、その形態素の字面そのものを示す表記
101、標準形102、品詞103、活用形104が順次記録され
る。標準形102は形態素から表記の揺らぎ等を取り除い
た文字列であり、例えば、「片仮名」「かたかな」「カ
タカナ」は表記は異なるが、標準形は同じ形態素であ
る。活用形104は形態素が用言もしくは助動詞である時
の活用形である。一形態素は、これら表記101,標準形10
2、品詞103、活用形104により表される。表記101、標準
形102、品詞103、活用形104は空欄である場合もある
が、一形態素に関しては、表記101、標準形102、品詞10
3、活用形104の全ての項目が空欄であることはない。特
徴表現を構成するｍ個の形態素中の第１番目20₁から以
下同様に第ｍ番目の形態素20mまで１レコード中に含ま
れる。そのレコードの最後に、その特徴表現の重みｗが
設けられる。ｗは正の数とする。第１図における特徴表
現辞書10₁〜10_nは、それぞれこれらのレコードが、重み
ｗの大きい順に、重みｗが等しいものは形態素数ｍの多
い順に並んでいる。Figure 2 is an example of the configuration of a record of the feature representation dictionary 10 ₁ to 10 _n. In the figure, reference numeral 7 denotes a notation indicating the number m of morphemes constituting the feature expression and indicating the character face of the morpheme itself.
101, standard form 102, part of speech 103, and inflected form 104 are sequentially recorded. The standard form 102 is a character string obtained by removing the fluctuation of the notation from the morpheme. For example, “Katakana”, “Katakana”, and “Katakana” have different notations, but the standard form is the same morpheme. The inflected form 104 is an inflected form when the morpheme is a verb or an auxiliary verb. One morpheme is these notations 101, standard form 10
2, expressed by part of speech 103 and inflected form 104. Notation 101, standard form 102, part-of-speech 103, inflected form 104 may be blank, but for one morpheme, notation 101, standard form 102, part-of-speech 10
3. Not all items in the inflected form 104 are blank. From the first 20 ₁ in the m morphemes constituting the feature representation to the m-th morpheme 20m Similarly contained in one record. At the end of the record, a weight w of the feature expression is provided. w is a positive number. In the feature expression dictionaries 10 ₁ to 10 _n in FIG. ₁ , these records are arranged in descending order of the weight w, and those having the same weight w are arranged in descending order of the morpheme number m.

次に、第１図の装置構成図、第２図の特徴表現辞書の
１レコードの構成例、及び第３図のフローチャートを参
照して作用を説明する。Next, the operation will be described with reference to the apparatus configuration diagram of FIG. 1, a configuration example of one record of the feature expression dictionary of FIG. 2, and the flowchart of FIG.

第３図において、ステップ１は第１図における形態素
解析部１の作用に相当する。以下同様に、ステップ４、
５、６、７、８及び９は特徴表現抽出部２の作用、ステ
ップ12、13、14、15及び16は特徴判定部３の作用、ステ
ップ２、３、10及び11は制御部４の作用に相当する。In FIG. 3, step 1 corresponds to the operation of the morphological analysis unit 1 in FIG. Similarly, step 4,
5, 6, 7, 8 and 9 are operations of the feature expression extraction unit 2, steps 12, 13, 14, 15 and 16 are operations of the feature determination unit 3, and steps 2, 3, 10, and 11 are operations of the control unit 4. Is equivalent to

形態素解析部１（ステップ１）では、入力された文章
に対して、形態素解析を行い、入力文を各形態素に分割
し、表記、標準形、品詞、活用形情報を付与する（ステ
ップ１）。入力メッセージが形態素数ｌから成り立って
いるものとすると、形態素解析結果はｌ個の表記、標準
形、品詞、活用形の列となる。また、形態素解析された
入力文の文節数をＰとする。The morphological analysis unit 1 (step 1) performs a morphological analysis on the input sentence, divides the input sentence into each morpheme, and gives notation, standard form, part of speech, and inflected form information (step 1). Assuming that the input message is composed of the morpheme number l, the morphological analysis result is a sequence of l notations, standard form, part of speech, and inflected form. Also, let P be the number of phrases in the input sentence that has undergone morphological analysis.

次に、カウンタｉに１をセットし、（ステップ２）、
切替スイッチ５を操作して特徴表現抽出部２と特徴表現
辞書10iとを接続する（ステップ３）。Next, 1 is set to a counter i (step 2),
The changeover switch 5 is operated to connect the feature expression extraction unit 2 to the feature expression dictionary 10i (step 3).

特徴表現抽出部２（ステップ４、５、６、７、８、
９）では、形態素解析結果で特徴表現辞書10iを検索
し、マッチする特徴表現を抽出し、その重みの合計値を
算出する。Feature expression extraction unit 2 (steps 4, 5, 6, 7, 8,
In 9), the feature expression dictionary 10i is searched with the result of the morphological analysis, a matching feature expression is extracted, and the total value of the weights is calculated.

まず、この処理の初期設定として、カウンタｊに１
を、特徴表現重みの合計値sum（ｉ）に０を代入する
（ステップ４）。次に、特徴表現辞書10iのレコードと
形態素解析結果のｊ番目以降の形態素列とを辞書の先頭
から順番に比較し（ステップ５）、マッチするものがあ
る場合はステップ７を、辞書の最後まで比較してもマッ
チするものがない場合はステップ８を実行する（ステッ
プ６）。ここで、比較すべき辞書のレコードが特徴表現
を構成する形態素数ｍから成り立っている場合、形態素
解析結果のｊ番目から（ｊ＋ｍ−１）番目の形態素の表
記、標準形、品詞、活用形と、辞書に記されているｍ個
の形態素の表記、標準形、品詞、活用形の全てが一致し
た場合、マッチしたとみなす。特徴表現辞書のレコード
のある形態素の表記、標準形、品詞、活用形のいずれか
の項目が空欄であった場合は、無条件にその形態素の空
欄項目は一致しているとする。First, as an initial setting of this processing, the counter j is set to 1
Is substituted into the sum sum (i) of the feature expression weights (step 4). Next, the records of the feature expression dictionary 10i and the morpheme strings after the j-th morpheme analysis result are compared in order from the top of the dictionary (step 5). If there is a match, step 7 is performed until the end of the dictionary. If no match is found even after the comparison, step 8 is executed (step 6). Here, when the dictionary record to be compared is composed of the morpheme number m constituting the feature expression, the j-th to (j + m-1) -th morpheme notations, the standard form, the part of speech, and the inflected form of the morphological analysis result If all of the notation, standard form, part of speech, and inflected form of the m morphemes written in the dictionary match, it is regarded as a match. If any of the morpheme notation, standard form, part of speech, and inflected form items in the record of the feature expression dictionary are blank, the blank items of the morpheme are unconditionally matched.

ステップ７は、ステップ５においてマッチするものが
あった場合の処理で、カウンタｊにｍを加え、辞書との
比較対象とする形態素列を進める。また、特徴表現重み
合計値sum（ｉ）にマッチした辞書のレコードの重みｗ
を加える。Step 7 is a process in the case where there is a match in step 5, where m is added to the counter j, and the morpheme sequence to be compared with the dictionary is advanced. Also, the weight w of the dictionary record that matches the feature expression weight sum value sum (i)
Add.

ステップ８は、ステップ５においてマッチするものが
なかった時の処理で、カウンタｊを１つ進める。Step 8 is a process performed when there is no match in step 5, and the counter j is advanced by one.

そして、ｊがｌ以下であったらステップ５より処理を
繰り返し、ｊがｌより大きければ特徴表現抽出部２の処
理を終了し、ステップ10に進む（ステップ９）。Then, if j is equal to or less than 1, the processing is repeated from step 5, and if j is greater than 1, the processing of the feature expression extraction unit 2 ends, and the processing proceeds to step 10 (step 9).

ステップ10では、カウンタｉに１を加え、ステップ11
では、ｉが特徴表現辞書の数ｎ以下であれば、ステップ
３、４、５、６、７、８、９、10を繰り返し、ｉがｎよ
り大きければステップ12に進む。In step 10, 1 is added to the counter i, and in step 11
Then, if i is equal to or smaller than the number n of the feature expression dictionaries, steps 3, 4, 5, 6, 7, 8, 9, and 10 are repeated, and if i is larger than n, the process proceeds to step 12.

特徴判定部３（ステップ12、13、14、15、16、17、1
8）は、ステップ１〜11までの処理で得られたｎ個の特
徴表現重み合計値sum（１）〜sum（ｎ）から入力文の特
徴を判定する。Feature determination unit 3 (steps 12, 13, 14, 15, 16, 17, 1
8) determines the features of the input sentence from the n feature expression weight sums sum (1) to sum (n) obtained in the processing of steps 1 to 11.

まず、ｎ個の特徴表現重み合成値を全て文節数Ｐで割
り、正規化する（ステップ12）。これにより、特徴表現
重み合計値sum（１）〜sum（ｎ）は、文の長さに影響さ
れない数値となる。First, all n feature expression weight composite values are divided by the number of clauses P and normalized (step 12). As a result, the feature expression weight sums sum (1) to sum (n) are numerical values that are not affected by the sentence length.

次に、正規化されたｎ個の特徴表現重み合計値を、値
の大きな順にソートする（ステップ13）。ソートされた
結果は、最も大きな特徴表現重み合計値をsum（x₁）、
２番目に大きな特徴表現重み合計値をsum（x₂）、以下
同様に、sum（x₃）、・・・sum（x_n）であるとする。但
し、x₁、x₂、x₃、・・・、x_nは、それぞれ１からｎまで
の整数で、特徴表現の番号である。Next, the normalized n feature expression weight sums are sorted in descending order of value (step 13). The sorted result is the sum of the largest feature expression weight sum (x ₁ ),
The sum of the second largest feature expression weights is sum (x ₂ ), and similarly, sum (x ₃ ),... Sum (x _n ). Here, x ₁ , x ₂ , x ₃ ,..., X _n are integers from 1 to n, respectively, and are feature expression numbers.

次に、最も大きな特徴表現重み合計値sum（x₁）と特
徴表現しきい値ｓとの比較を行う（ステップ14）。その
結果、sum（x₁）＜ｓであれば、判定不能、すなわち入
力文はどの特徴も持たない文であると判定する。Next, the largest sum of the feature expression weights sum (x ₁ ) and the feature expression threshold value s are compared (step 14). As a result, if sum (x ₁ ) <s, it is determined that determination is impossible, that is, the input sentence is a sentence having no feature.

ステップ15では、カウンタｋに１を代入する。そして
ｋはｎ以下で、しかもsum（x_k）＞｛sum（x₁）ｓ｝であ
れば（ステップ16）、カウンタｋに１を加え（ステップ
17）、ステップ16を繰り返す。ステップ16が成り立たな
くなった時点で、ステップ18に進む。但し、特徴表現し
きい値ｓは予め定められた正の数である。In step 15, 1 is substituted for the counter k. If k is equal to or smaller than n and sum (x _k )> {sum (x ₁ ) s} (step 16), 1 is added to the counter k (step 16).
17), repeat step 16. When step 16 is not satisfied, the process proceeds to step 18. However, the feature expression threshold value s is a predetermined positive number.

ステップ18では表現x₁、x₂、・・・、x_k-1を入力文の
特徴として判定する。In step 18 expression x _1, x _2, judges ..., and x _k-1 as a feature of the input sentence.

第４図は、ステップ14、15、16、17、18の作用を表し
た模式図で、特徴表現辞書数ｎ＝３の場合、正規化され
た特徴表現重み合計値sum（１〜３）を数直線上にプロ
ットしたものである。第４図における（１）（２）はsu
m（x₁）＞ｓである場合で、正規化された特徴表現重み
合計値が｛sum（x₁）−ｓ｝より大きくsum（x₁）以下の
範囲にあるものを入力文の特徴として判定する。また、
（３）はsum（x₁）≦ｓである場合で、この場合は言語
表現の特徴は判定不能であるとする。FIG. 4 is a schematic diagram showing the operation of steps 14, 15, 16, 17, and 18. When the number of feature expression dictionaries is n = 3, normalized normalized feature expression weight sums sum (1 to 3) are calculated. This is plotted on a number line. (1) and (2) in FIG.
If m (x ₁ )> s, and the sum of normalized feature expression weights is greater than {sum (x ₁ ) −s} and equal to or less than sum (x ₁ ), the feature of the input sentence judge. Also,
(3) is a case where sum (x ₁ ) ≦ s. In this case, it is assumed that the feature of the linguistic expression cannot be determined.

次に、具体的な例について説明する。 Next, a specific example will be described.

具体例１「おまえもがんばれよ。」という文の言語表現が男性
表現か、女性表現かをこの装置を用いて判定する場合に
ついて説明する。但し、特徴表現辞書数ｎ＝２で、男性
表現の特徴表現10₁は第５図、女性表現の特徴表現辞書1
0₂は第６図の如くであるとし、第５図及び第６図中の
「−」は、その項目が空欄であることを示す。以下、男
性表現の特徴表現辞書10₁を男性表現辞書、女性表面の
特徴表現辞書10₂を女性表現辞書と記す。同様に、男性
（女性）表現の特徴表現を構成する形態素数は男性（女
性）表現形態素数、男性（女性）表現の特徴表現重みを
男性（女性）表現重み、男性（女性）表現の特徴表現重
み合計値を男性（女性）表現重み合計値と記す。また、
特徴表現しきい値ｓ＝0.5であるとする。Specific Example 1 A case will be described in which it is determined whether the linguistic expression of the sentence "You should do your best" is a male expression or a female expression using this device. However, the feature expression dictionary number n = 2, characterized representation 10 ₁ 5 view of male representation, wherein expression dictionary 1 female representation
0 ₂ and is as Figure 6, of FIG. 5 and in FIG. 6, "-" indicates that the item is blank. Hereinafter, the feature representation dictionary 10 ₁ of the male representation male representation dictionary, referred to as a feature representation dictionary 10 ₂ female surface and the female representation dictionary. Similarly, the morpheme numbers constituting the characteristic expression of the male (female) expression are the male (female) expression morpheme number, the characteristic expression weight of the male (female) expression is the male (female) expression weight, and the characteristic expression of the male (female) expression The total weight is referred to as a male (female) expression weight total. Also,
It is assumed that the feature expression threshold value s = 0.5.

まず、形態素解析部１では、入力文を形態素解析し、
その入力文中の各形態素に表記、標準形、品詞、活用形
情報を付与する。この具体例では、メッセージは「おま
え」「も」「がんばれ」「よ」「。」という表記で表さ
れる形態素正ｌ＝５、文節数Ｐ＝２の形態素に分割され
る（ステップ１）。形態素解析結果の例を第７図に示
す。First, the morphological analysis unit 1 morphologically analyzes the input sentence,
Notation, standard form, part of speech, and inflected form information are added to each morpheme in the input sentence. In this specific example, the message is divided into morphemes with the morpheme positive l = 5 and the number of phrases P = 2 represented by the notation "you", "mo", "do your best", "yo" and "." (Step 1). FIG. 7 shows an example of the morphological analysis result.

次に、カウンタｉに１をセットし（ステップ２）、切
替スイッチ５を操作して特徴表現抽出部２と男性表現辞
書10₁とを接続する（ステップ３）。Next, 1 is set to the counter i (step 2), by operating the changeover switch 5 connects the characteristic expression extraction unit 2 and the male expression dictionary 10 ₁ (step 3).

特徴表現抽出部２では、カウンタｊに１を、男性表現
重み合計値sum（１）に０を代入する（ステップ４）。
そして、ｊ＝１番目以降の形態素列、すなわち、「おま
え」「も」「がんばれ」「よ」「。」とマッチする表現
を男性表現辞書10₁から検索する。男性表現辞書10₁が第
５図の如くであるとすると、「おまえ」に対して第４レ
コードがマッチするので（ステップ５、６）、ｊに男性
表現形態素数ｍ＝１を加え、sum（１）に男性表現重み
ｗ＝1.0を加える（ステップ７）。The feature expression extraction unit 2 substitutes 1 for the counter j and 0 for the male expression weight sum value sum (1) (step 4).
Then, j = 1 th and subsequent morpheme string, ie, "You", "be", "Do your best,""good","." And to search match the representation from male representation dictionary 10 _1. When male expression dictionary 10 ₁ and is as Figure 5, since the fourth record is matched against "you" (step 5,6), the male representation morphemes number m = 1 in addition to j, sum ( The male expression weight w = 1.0 is added to 1) (step 7).

次にｊ＝２番目以降の形態素列、すなわち「も」「が
んばれ」「よ」「。」とマッチする表現を第５図の男性
表現辞書から検索する（ステップ５）と、マッチするも
のはない（ステップ６）ので、ｊに１を加える（ステッ
プ８）。Next, a search is made from the male expression dictionary of FIG. 5 for expressions that match j = second and subsequent morpheme strings, that is, “mo”, “ganbare”, “yo”, and “.” (Step 5). Since (Step 6), 1 is added to j (Step 8).

次にｊ＝３番目以降の形態素列、すなわち「がんば
れ」「よ」「。」とマッチする表現を第５図の男性表現
辞書から検索する（ステップ５）と、第１レコードとマ
ッチする（ステップ６）ので、ｊに男性表現形態素数ｍ
＝２を加え、sum（１）に男性表現重みｗ＝1.0を加える
（ステップ７）。Next, a search is made from the male expression dictionary of FIG. 5 for expressions matching j = third and subsequent morpheme sequences, that is, “Ganbare”, “Yo”, and “.” (Step 5). 6) So j is a male morphological morpheme m
= 2, and a male expression weight w = 1.0 is added to sum (1) (step 7).

次にｊ＝５番目以降の形態素列、すなわち、「。」と
マッチする表現を第５図の男性表現辞書から検索する
（ステップ５）と、マッチするものはない（ステップ
６）ので、ｊに１を加える（ステップ８）。すると、ｊ
＝６＞ｌとなるので、特徴表現抽出部２の処理を終了し
（ステップ９）、結果として男性表現重み合計値sum
（１）＝2.0が得られる。Next, the morpheme sequence j = fifth or later, that is, an expression that matches "." Is searched from the male expression dictionary of FIG. 5 (step 5), and there is no match (step 6). 1 is added (step 8). Then j
= 6> l, the processing of the feature expression extraction unit 2 is terminated (step 9), and as a result, the male expression weight sum sum
(1) = 2.0 is obtained.

ステップ10では、カウンタｉに１を加える。すると、であるので、ステップ３に飛び（ステップ11）、切替ス
イッチ５を操作して特徴表現抽出部２と女性表現辞書10
₂とを接続する（ステップ３）。In step 10, 1 is added to the counter i. Then Therefore, the process jumps to step 3 (step 11) and operates the changeover switch 5 to operate the feature expression extraction unit 2 and the female expression dictionary 10.
₂ is connected (step 3).

特徴表現抽出部２では、男性表現辞書10₁を対象に行
ったのと同様な処理を女性表現辞書10₂を対象にして行
う（ステップ４、５、６、７、８、９）。女性表現辞書
12が第６図の如くであるとすると、形態素解析結果とマ
ッチするものは全くないので、結果として女性表現重み
合計値sum（２）＝０が得られる。The feature expression extraction unit 2, the same processing as performed targeting male expression dictionary 10 ₁ carried out to women expression dictionary 10 ₂ (step 4,5,6,7,8,9). Woman expression dictionary
If 12 is as shown in FIG. 6, there is no match with the morphological analysis result, and as a result, the female expression weight sum value sum (2) = 0 is obtained.

そして、カウンタｉに１を加えると（ステップ10）、
ｉ＝３＞ｎとなるので、特徴判定部３の処理を行う。Then, when 1 is added to the counter i (step 10),
Since i = 3> n, the processing of the feature determination unit 3 is performed.

特徴判定部３では、まず、全ての特徴表現重み合計値
を文節数Ｐ（＝２）で割り、正規化を行う（ステップ1
2）。すると正規化された特徴表現重み合計値は、sum
（１）＝1.0、sum（２）＝０となる。First, the feature determining unit 3 divides the total sum of all feature expression weights by the number of phrases P (= 2) to perform normalization (step 1).
2). Then, the normalized feature expression weight sum is sum
(1) = 1.0 and sum (2) = 0.

次に、ｎ＝２個の特徴表現重み合計値を値の大きな順
にソートする（ステップ13）。本具体例では、sum
（１）＞sum（２）であるので、特徴表現重み合計値の
最も大きな表現x₁＝１、特徴表現重み合計値が２番目に
大きな表現x₂＝２となる。Next, n = 2 feature expression weight sums are sorted in descending order of value (step 13). In this specific example, sum
Since (1)> sum (2), the expression x ₁ = 1 having the largest total value of the feature expression weights and the expression x ₂ = 2 having the second largest total value of the feature expression weights are obtained.

次に最も大きな特徴表現重み合計値sum（x₁）と特徴
表現しきい値ｓとを比較する（ステップ14）。本具体例
では、sum（１）≧ｓであるので、ステップ15に進む。Next, the largest feature expression weight sum value sum (x ₁ ) is compared with the feature expression threshold value s (step 14). In this specific example, since sum (1) ≧ s, the process proceeds to step 15.

次に、カウンタｋに１を代入し（ステップ15）、ｋ≦
ｎで、しかもsum（x₁）＞｛sum（x₁）−ｓ｝である間、
カウンタｋに１を加える（ステップ16、17）。その結
果、ｋ＝２で上記条件式が成り立たなくなるので、結果
として、入力文「おまえもがんばれよ。」は、表現x₁＝
１の特徴、すなわち、男性表現の特徴を持つ文であると
判定される（ステップ18）。Next, 1 is substituted for the counter k (step 15), and k ≦
n and sum (x ₁ )> {sum (x ₁ ) −s},
1 is added to the counter k (steps 16 and 17). As a result, since the conditional expression is not satisfied by k = 2, as a result, an input sentence "You also Ganbareyo.", The expression x ₁ =
It is determined that the sentence has the feature 1, that is, the feature of the male expression (step 18).

具体例２「あなたもがんばってください。」という文の言語表
現が男性表現か、女性表現か、本装置を用いて判定する
場合について説明する。但し、具体例１と同様に、特徴
表現辞書数ｎ＝２、男性表現辞書10₁は第５図、女性表
現辞書10₂は第６図の如くであるとし、特徴表現しきい
値ｓ＝0.5であるとする。Specific Example 2 A case will be described in which the linguistic expression of the sentence "Please do your best also" is determined to be a male expression or a female expression using the present device. However, as with embodiment 1, wherein expression dictionary number n = 2, male expression dictionary 10 ₁ and FIG. 5, the female expression dictionary 10 ₂ is as Figure 6, wherein expression threshold s = 0.5 And

まず、形態素解析部１は、入力されたメッセージを形
態素解析する。すると、形態素数ｌ＝６、文節数Ｐ＝２
の形態素列が得られる。形態素解析結果を第８図に示
す。First, the morphological analysis unit 1 performs a morphological analysis on the input message. Then, the number of morphemes l = 6, the number of segments P = 2
Is obtained. FIG. 8 shows the results of the morphological analysis.

次に、特徴表現抽出部２では具体例１の処理と同様
に、第５図で表される男性表現辞書10₁を検索し、マッ
チするものを調べるが、本具体例ではマッチするものが
ないため、結果としてsum（１）＝０が得られる。Then, in the same manner as the processing of feature expression extraction section 2 In Example 1, the fifth searched male expression dictionary 10 ₁ represented in the figure, but find a match, there is no match in the specific example Therefore, sum (1) = 0 is obtained as a result.

続いて、特徴表現抽出部２では、具体例１の処理と同
様に、第６図で表される女性表現辞書10₂を検索し、マ
ッチするものを調べる。本具体例の場合、「あなた」と
いう形態素が第６図の女性表現辞書の第６レコードにマ
ッチするのみであるので、結果としてsum（２）＝0.3が
得られる。Subsequently, the feature expression extraction unit 2, similarly to the processing in Example 1, to find the female expression dictionary 10 ₂ represented by Figure 6 examines a match. In the case of this specific example, since the morpheme “you” only matches the sixth record of the female expression dictionary of FIG. 6, sum (2) = 0.3 is obtained as a result.

特徴判定部３では、まず特徴表現重み合計値sum
（１）〜sum（２）を文節数Ｐで正規化し、sum（１）＝
０、sum（２）＝0.15を得る。次にsum（１）〜sum
（２）をソートし、正規化された特徴表現重み合計値su
m（１）〜sum（２）の最大値、すなわちsum（２）＝0.1
5と、特徴表現しきい値ｓ＝0.5とから特徴を判定する
が、sum（２）＜ｓであるので、入力文「あなたもがん
ばってください。」は、男性表現か、女性表現か判定不
能となる。In the feature determination unit 3, first, the feature expression weight sum sum
Normalize (1) to sum (2) with the number of clauses P, and sum (1) =
0, sum (2) = 0.15 is obtained. Next, sum (1)-sum
(2) is sorted, and normalized feature expression weight sum su
The maximum value of m (1) to sum (2), that is, sum (2) = 0.1
The feature is determined from 5 and the feature expression threshold value s = 0.5. Since sum (2) <s, the input sentence "Please do your best," indicates that it is impossible to determine whether it is a male expression or a female expression. Become.

具体例３「そんなことあらへん。」という文の言語表現が東京
弁か、関西弁か、九州弁か、この装置を用いて判定する
場合について説明する。但し、特徴表現辞書数ｎ＝３
で、東京弁の特徴表現辞書10₁は第９図、関西弁の特徴
表現辞書10₂は第10図、九州弁の特徴表現辞書10₃は第11
図の如くであるとし、第９図、第10図及び第11図中の
「−」は、その項目が空欄であることを示す。以下、具
体例１と同様に、東京弁の特徴表現辞書を東京弁辞書、
関西弁の特徴表現辞書を関西弁辞書、九州弁の特徴表現
辞書を九州弁辞書と記す。同様に、東京弁（関西弁、九
州弁）の特徴表現を構成する形態素数を東京弁（関西
弁、九州弁）形態素数、東京弁（関西弁、九州弁）の特
徴表現重みも東京弁（関西弁、九州弁）重み、東京弁
（関西弁、九州弁）の特徴表現重み合計値を東京弁（関
西弁、九州弁）重み合計値と記す。また特徴表現しきい
値ｓ＝0.5とする。Example 3 A case will be described in which the linguistic expression of the sentence "Such a thing is not true" is determined using this device, whether it is a Tokyo dialect, a Kansai dialect, or a Kyushu dialect. However, the number of feature expression dictionaries n = 3
In, FIG. 9 is characterized expression dictionary 10 ₁ Tokyo valve, characterized expression dictionary 10 ₂ 10 view of Kansai dialect, the feature expression dictionary 10 ₃ Kyushu valve 11
As shown in the figure, "-" in FIGS. 9, 10, and 11 indicates that the item is blank. Hereinafter, similarly to the specific example 1, the feature expression dictionary of the Tokyo dialect is the Tokyo dialect dictionary,
The Kansai dialect feature expression dictionary is called the Kansai dialect dictionary, and the Kyushu dialect feature expression dictionary is called the Kyushu dialect dictionary. Similarly, the morpheme numbers that make up the feature expression of the Tokyo dialect (Kansai dialect, Kyushu dialect) are the Tokyo dialect (Kansai dialect, Kyushu dialect) morpheme number, and the characteristic expression weight of the Tokyo dialect (Kansai dialect, Kyushu dialect) is also the Tokyo dialect ( The sum of the characteristic expression weights of the Kansai dialect and Kyushu dialect and the weight of the characteristic expression of the Tokyo dialect (Kansai dialect and Kyushu dialect) are referred to as the Tokyo dialect (Kansai dialect and Kyushu dialect) weight. Also, the feature expression threshold value s is set to 0.5.

まず、形態素解析部１は、入力されたメッセージを形
態素解析する。すると、形態素数ｌ＝６、Ｐ＝２の形態
素列が得られる。形態素解析結果を第12図に示す。First, the morphological analysis unit 1 performs a morphological analysis on the input message. Then, a morpheme sequence with the number of morphemes l = 6 and P = 2 is obtained. Fig. 12 shows the results of the morphological analysis.

次に、特徴表現抽出部２では、具体例１の処理と同様
に、第９図で表される東京弁辞書10₁を検索し、マッチ
するものを調べるが、本具体例ではマッチするものがな
いため、結果として東京弁重み合計値sum（１）＝０が
得られる。Next, the feature expression extraction unit 2, similarly to the processing in embodiment 1, those searching Tokyo valve dictionary 10 ₁ represented by FIG. 9, but find a match, a match in this example As a result, the sum of Tokyo valve weights sum (1) = 0 is obtained as a result.

続いて、特徴表現抽出部２では、具体例１の処理と同
様に、第10図で表される関西弁辞書10₂を検索し、マッ
チするものを調べる。本具体例の場合、「あら」「へ」
「ん」という形態素列が関西弁辞書10₂の第２レコード
にマッチするのみであるので、結果として関西弁重み合
計値sum（２）＝1.0が得られる。Subsequently, the feature expression extraction unit 2, similarly to the processing in embodiment 1, searches the Kansai dictionary 10 ₂ represented by FIG. 10, checks a match. In this specific example, "Oh""He"
Since the morpheme string of "I" is only matched in the second record of the Kansai dialect dictionary 10 _2, as a result Kansai dialect weighted sum sum (2) = 1.0 is obtained.

次に、特徴表現抽出部２では、具体例１の処理と同様
に、第11図で表される九州弁辞書10₃を検索し、マッチ
するものを調べるが、本具体例ではマッチするものがな
いため、結果として九州弁重み合計値sum（３）＝０が
得られる。Next, the feature expression extraction unit 2, similarly to the processing in embodiment 1, those 11 searches the Kyushu valve dictionary 10 ₃ represented in the figure, but find a match, a match in this example As a result, the Kyushu valve weight sum value sum (3) = 0 is obtained as a result.

特徴判定部３では、まず、特徴表現重み合計値sum
（１）〜sum（３）を文節Ｐで正規化し、sum（１）＝
０、sum（２）＝0.5、sum（３）＝０を得る。次に正規
化された特徴表現重み合計値sum（１）〜sum（３）をソ
ートする。すると、最大値としてsum（２）＝0.5が得ら
れる。sum（２）≧ｓであるので、sum（２）−ｓ（＝
０）より大きな正規化済み特徴表現重み合計値を持つ表
現を調べると、表現２、すなわち関西弁のみが当てはま
るため、入力文「そんなことあらへん。」は、関西弁の
特徴を持つ文であると判定する。In the feature determination unit 3, first, the feature expression weight sum sum
Normalize (1) to sum (3) with clause P, and sum (1) =
0, sum (2) = 0.5, and sum (3) = 0 are obtained. Next, the normalized feature expression weight sums sum (1) to sum (3) are sorted. Then, sum (2) = 0.5 is obtained as the maximum value. Since sum (2) ≧ s, sum (2) −s (=
0) When an expression having a larger normalized feature expression weight total value is examined, only the expression 2, that is, the Kansai dialect, is applied. Therefore, the input sentence “such a thing” is a sentence having the characteristics of the Kansai dialect. Is determined.

「発明の効果」以上説明したように、この発明によれば、人手によら
ず、言語表現の特徴を判定することができ、従来、人手
により特徴を発定していた労力を軽減することができ
る。従って第15図に示したメッセージ集データベース検
索装置におい、第13図に示すように検索したメッセージ
が、この発明の言語表現の特徴判定装置で例えば男性表
現であるか女性表現であるかなどの表現の種類が自動的
に判定されて表示されるようにすれば、メッセージ集デ
ータベースに予め言語表現の種類を判定、蓄積しておく
必要がない。また第14図に示すように、メッセージをメ
ッセージ集データベースに登録する際に、この発明の言
語表現の特徴判定装置で、その言語表現の特徴を自動的
に判定して、その判定結果も登録されるようにすれば、
メッセージをデータベースに登録する際に人手により言
語表現の種類を判定する必要がない。[Effects of the Invention] As described above, according to the present invention, it is possible to determine the characteristics of linguistic expressions independently of humans, and it is possible to reduce the labor required to determine the characteristics manually in the past. it can. Accordingly, in the message collection database search device shown in FIG. 15, the message searched as shown in FIG. 13 is expressed by the linguistic expression feature determination device of the present invention, for example, whether it is a male expression or a female expression. If the type of the language expression is automatically determined and displayed, it is not necessary to determine and store the type of the language expression in the message collection database in advance. As shown in FIG. 14, when a message is registered in the message collection database, the feature of the linguistic expression is automatically determined by the linguistic expression feature determining apparatus of the present invention, and the determination result is also registered. If you do
There is no need to manually determine the type of linguistic expression when registering a message in the database.

[Brief description of the drawings]

第１図はこの発明に係わる言語表現の特徴判定装置の実
施例の構成を示すブロック図、第２図はその特徴表現辞
書の１レコードの構成例を示す図、第３図は第１図の言
語表現の特徴判定装置の動作を示す流れ図、第４図は特
徴表現辞書数ｎ＝３の時の特徴表現判定部３の作用を模
式化した図、第５図は、男性表現に関する特徴表現辞書
の例を示す図、第６図は女性表現に関する特徴表現辞書
の例を示す図、第７図は「おまえもがんばれよ。」とい
う文の形態素解析結果の例を示す図、第８図は「あなた
もがんばってください。」という文の形態素解析結果の
例を示す図、第９図は東京弁に関する特徴表現辞書の例
を示す図、第10図は関西弁に関する特徴表現辞書の例を
示す図、第11図は九州弁に関する特徴表現辞書の例を示
す図、第12図は、「そんなことあらへん。」という文の
形態素解析結果の例を示す図、第13図はこの発明による
言語表現の特徴判定装置を用いたメッセージ集データベ
ース検索装置を示すブロック図、第14図はこの発明装置
をメッセージ集データベース構築装置を適用した例を示
すブロック図、第15図は従来のメッセージ集データベー
ス検索装置及び構築装置を示すブロック図である。１……形態素解析部、２……特徴表現抽出部、３……特
徴判定部、４……制御部、５……切替スイッチ、６……
言語表現の特徴判定装置、７……特徴表現を構成する形
態素数、８……特徴表現重み、10₁……特徴表現辞書
１、10₂……特徴表現辞書２、10_n……特徴表現辞書ｎ、
20₁……形態素情報１、20_m……形態素情報ｍ、101……
形態素の表記、102……形態素の標準形、103……形態粗
の品詞、104……形態粗の活用形。FIG. 1 is a block diagram showing the configuration of an embodiment of a linguistic expression feature determination apparatus according to the present invention, FIG. 2 is a diagram showing an example of the configuration of one record of the feature expression dictionary, and FIG. FIG. 4 is a flowchart showing the operation of the linguistic expression feature determination apparatus. FIG. 4 is a diagram schematically illustrating the operation of the feature expression determination unit 3 when the number of feature expression dictionaries is n = 3. FIG. 6 is a diagram showing an example of a feature expression dictionary relating to female expressions, FIG. 7 is a diagram showing an example of a morphological analysis result of the sentence "You also do my best," and FIG. FIG. 9 shows an example of a feature expression dictionary for the Tokyo dialect, FIG. 10 shows an example of a feature expression dictionary for the Kansai dialect, and FIG. Fig. 11 shows an example of a feature expression dictionary for the Kyushu dialect. FIG. 13 is a block diagram showing an example of the result of morphological analysis of the sentence "Nanakoto Arahen." FIG. 13 is a block diagram showing a message collection database search device using the linguistic expression feature determination device according to the present invention, and FIG. FIG. 15 is a block diagram showing an example in which a message collection database construction apparatus is applied to the invention apparatus, and FIG. 15 is a block diagram showing a conventional message collection database search apparatus and construction apparatus. 1 ... morphological analysis unit, 2 ... feature expression extraction unit, 3 ... feature determination unit, 4 ... control unit, 5 ... changeover switch, 6 ...
Feature determining device for linguistic expression, 7: morpheme number constituting feature expression, 8: feature expression weight, 10 ₁ ... feature expression dictionary 1, 10 ₂ ... feature expression dictionary 2, 10 _n ... feature expression dictionary n,
20 ₁ … morpheme information 1, 20 _m … morpheme information m, 101…
Notation of morpheme, 102: Standard form of morpheme, 103: Part-of-speech of coarse form, 104: Utilization form of coarse form.

───────────────────────────────────────────────────── フロントページの続き (72)発明者奥雅博東京都千代田区内幸町１丁目１番６号日本電信電話株式会社内 (72)発明者堀井統之東京都千代田区内幸町１丁目１番６号日本電信電話株式会社内 (56)参考文献特開昭63−278174（ＪＰ，Ａ) 特開昭63−68973（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/20 - 17/28 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Masahiro Oku 1-1-6 Uchisaiwaicho, Chiyoda-ku, Tokyo Nippon Telegraph and Telephone Corporation (72) Inventor Noriyuki Horii 1-1-6 Uchisaiwaicho, Chiyoda-ku, Tokyo Nippon Telegraph and Telephone Corporation (56) References JP-A-63-278174 (JP, A) JP-A-63-68973 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 17/20-17/28

Claims

(57) [Claims]

1. A plurality of feature expression dictionaries, each of which stores a sequence of morphemes including notations, standard forms, and parts of speech, and weights of the feature expressions, constituting a characteristic linguistic expression for each type of feature expression. A morphological analysis unit that divides the input sentence into morphemes and forms a sequence of the notation, standard form, part of speech, and inflected form of each morpheme. A feature expression extraction unit that matches the sequence with the feature expression dictionary, extracts an expression part characterizing the sentence, and obtains a total value of the weights of the feature expressions, and a plurality of feature expressions obtained from the feature expression extraction unit. A feature determination unit that determines a feature of the sentence from a difference in a total value of expression weights;