JP4849596B2

JP4849596B2 - Question answering apparatus, question answering method, and question answering program

Info

Publication number: JP4849596B2
Application number: JP2005354207A
Authority: JP
Inventors: 真樹村田; 青馬; 均井佐原
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2005-12-08
Filing date: 2005-12-08
Publication date: 2012-01-11
Anticipated expiration: 2025-12-08
Also published as: JP2007157006A

Description

本発明は、コンピュータによる自然言語処理システムとして、自然言語で表現された質問に対する解答を出力する質問応答技術に関し、特に、入力されたキーワードをキーワード抽出技術によって増加させ、増加したキーワードによって構成される複数の質問に対する解を自動的に求めて出力する質問応答装置、質問応答方法および質問応答プログラムに関する。 The present invention relates to a question answering technique for outputting an answer to a question expressed in a natural language as a natural language processing system by a computer, and more particularly, the input keyword is increased by a keyword extraction technique and is constituted by the increased keywords. The present invention relates to a question answering apparatus, a question answering method, and a question answering program for automatically obtaining and outputting solutions to a plurality of questions.

質問応答装置とは、自然言語による質問を入力すると、その解答そのものを出力する装置である。例えば、「パーキンソン病の兆候は脳のどの部分にある細胞の死が関係していますか。」という質問を入力すると、Ｗｅｂ、新聞記事、事典などのデータを含む大量の電子化テキストから「パーキンソン病は、中脳の黒質にあるメラニン細胞が変性し、黒質細胞内で作られる神経伝達物質のドーパミンがなくなり発病する、とされている。」といった文を探し出し、「黒質」と的確に解答を出力する。 A question answering device is a device that, when a question in natural language is input, outputs the answer itself. For example, if you enter the question “Which part of the brain is related to the death of Parkinson's disease?”, A large amount of electronic text containing data from the Web, newspaper articles, encyclopedias, etc. The disease is said to be caused by the degeneration of melanocytes in the substantia nigra and the absence of the neurotransmitter dopamine produced in the substantia nigra cells. " Output the answer to.

質問応答装置は、論理式やデータベースからではなく、自然言語で記述された普通の文（テキストデータ）から解答を取り出すことができるため、大量の既存の文書データを利用することができる。また、質問応答装置は、キーワードで検索された記事から使用者自らが解答を探す必要がある情報検索システムなどと異なり、解答自体を出力する。そのため、使用者は、より早く解答の情報を得ることができる。このように質問応答装置は有用であるため、より使いやすい実用的な質問応答装置の実現が期待されている。 Since the question answering apparatus can extract an answer from a normal sentence (text data) described in a natural language, not from a logical expression or a database, a large amount of existing document data can be used. Also, the question answering device outputs the answer itself, unlike an information search system in which the user himself / herself needs to find an answer from an article searched by a keyword. Therefore, the user can obtain the answer information earlier. Since the question answering apparatus is useful as described above, it is expected to realize a practical question answering apparatus that is easier to use.

一般的な質問応答装置（または質問応答システム）は、おおまかに、解答表現推定処理、文書検索処理、解答抽出処理という３つの処理手段で構成されている。 A general question answering apparatus (or question answering system) is roughly composed of three processing means: an answer expression estimation process, a document search process, and an answer extraction process.

解答表現推定処理は、入力した質問中の疑問代名詞の表現などに基づいて解答表現を推定する処理である。解答表現とは、所望される解答の言語表現の類型であって、解答となる言語表現の意味に基づいた類型（解答タイプ）、解答となる言語表現の表記に基づいた類型（解答表現タイプ）などがある。質問応答装置は、どのような質問の言語表現がどのような解答表現を要求しているかという対応関係を参照して、入力した質問の解答の解答タイプを推定する。質問応答装置は、例えば、入力した質問が「日本の面積はどのくらいですか」である場合には、所定の対応関係を参照して、質問中の「どのくらい」という表現から解答タイプは「数値表現」であると推定する。また、質問が「日本の首相はだれですか」という場合には、質問中の「だれ」という表現から、解答タイプは「固有名詞（人名）」であると推定する。 The answer expression estimation process is a process of estimating the answer expression based on the expression of the question pronoun in the input question. The answer expression is the type of the desired language expression of the answer, the type based on the meaning of the language expression as the answer (answer type), and the type based on the notation of the language expression as the answer (answer expression type) and so on. The question answering device estimates the answer type of the answer of the input question with reference to the correspondence relationship of what kind of question language expression requires what kind of answer expression. For example, when the input question is “How big is the area in Japan?”, The question answering device refers to a predetermined correspondence, and the answer type is “numeric expression” from the expression “how much” in the question. ". If the question is “Who is the prime minister of Japan?”, It is estimated from the expression “who” in the question that the answer type is “proprietary noun (person name)”.

文書検索処理は、質問からキーワードを取り出し、このキーワードを用いて解答を検索する対象となっている文書データ群を検索し、解答が記述されていると考えられる文書データを抽出する処理である。質問応答装置は、例えば、入力された質問が「日本の首都はどこですか」である場合に、質問から「日本」および「首都」をキーワードとして抽出し、検索対象の文書データ群から、キーワード「日本」および「首都」を含む文書データを検索する。 The document search process is a process of extracting a keyword from a question, searching a document data group for which an answer is to be searched using this keyword, and extracting document data that is considered that the answer is described. For example, when the input question is “Where is the capital of Japan?”, The question answering device extracts “Japan” and “capital” from the question as keywords, and extracts the keyword “ Search for document data including "Japan" and "Capital".

解答抽出処理は、文書検索処理で抽出されたキーワードを含む文書データから、推定した解答タイプに適合する言語表現を抽出し、解答として出力する処理である。質問応答装置は、例えば、文書検索処理において検索されたキーワード「日本」および「首都」を含む文書データから、解答表現推定処理において推定した解答タイプ「固有名詞（地名）」に適合する言語表現「東京」を抽出して解答とする。 The answer extraction process is a process of extracting a linguistic expression suitable for the estimated answer type from the document data including the keyword extracted in the document search process, and outputting it as an answer. The question answering device, for example, uses a linguistic expression “matching a proper noun (place name)” estimated in the answer expression estimation process from document data including the keywords “Japan” and “capital” searched in the document search process. Extract “Tokyo” as the answer.

前記のような処理を行うことにより、質問応答装置は、質問「日本の首都はどこですか」に対して解答「東京」を出力する。 By performing the processing as described above, the question answering apparatus outputs the answer “Tokyo” to the question “Where is the capital of Japan?”.

なお、質問応答装置（または質問応答システム）に関する具体的な従来技術として、例えば、下記の非特許文献１に、複数の記事を使って解答の推定を行う質問応答システムにおいて、複数の記事から得られた解答の候補の得点を少しずつ減らしながら加算し、合計点が最も高い候補を解答として出力する技術について記載されている。
村田真樹，井佐原均，質問応答システムにおける逓減加点法に基づく複数記事情報の利用，情報処理学会自然言語処理研究会 2004-NL-160，2004年．九州大学． In addition, as a specific conventional technique related to the question answering device (or question answering system), for example, in the following non-patent document 1, a question answering system that estimates answers using a plurality of articles is obtained from a plurality of articles. A technique is described in which candidates with the highest total score are output as an answer by adding while gradually reducing the scores of the answer candidates.
Masaki Murata, Hitoshi Isahara, Use of multiple article information based on the gradual addition point method in the question answering system, IPSJ Natural Language Processing Study Group 2004-NL-160, 2004. Kyushu University.

従来の質問応答装置では、検索された文書データから解答となりうる言語表現を解答候補として抽出し、抽出した解答候補それぞれの解答タイプを判定する。そして、質問から推定した解答タイプと同じか類似する解答タイプと判定した解答候補の評価を高くし、原則的には、解答タイプが同じ解答候補であって所定の評価を得たものを解答として出力する。 In a conventional question answering apparatus, a linguistic expression that can be an answer is extracted from the retrieved document data as an answer candidate, and an answer type of each of the extracted answer candidates is determined. Then, increase the evaluation of the answer candidate determined to be the same or similar to the answer type estimated from the question, and in principle, the answer candidate is the same answer type and has a predetermined evaluation as the answer Output.

しかし、従来の質問応答装置は、質問の入力によって問い合わせられた質問に対する解答のみを出力するシステムであって、問い合わせられた質問以外の質問に対する解答を出力することはできなかった。 However, the conventional question answering device is a system that outputs only the answer to the question inquired by inputting the question, and cannot output the answer to the question other than the inquired question.

本発明は、上記従来技術の問題点を解決し、問い合わせられた質問に対する解答および問い合わせられた質問以外の質問に対する解答を出力する質問応答装置、質問応答方法および質問応答プログラムの提供を目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a question answering apparatus, a question answering method, and a question answering program for solving the problems of the prior art and outputting an answer to a question asked and an answer to a question other than the question asked. .

上記課題を解決するため、本発明は、自然言語で表現された質問データに対する解答を出力する質問応答装置であって、複数のキーワードが入力キーワードとして入力されるキーワード入力手段と、前記入力キーワードに基づいて、前記入力キーワードの数より多いキーワードを抽出して出力キーワードとして出力するキーワード増加手段と、前記出力キーワードによって構成される質問に対する解答の候補である解答候補を、予め記憶された解答候補の検索対象である文書データ群から抽出する解答候補抽出手段と、前記抽出された各解答候補が質問と対応付けられた表を解答表として出力する解答表出力手段とを備えることを特徴とする。 In order to solve the above problems, the present invention provides a question answering apparatus that outputs an answer to question data expressed in a natural language, a keyword input means for inputting a plurality of keywords as input keywords, Based on the keyword increase means for extracting more keywords than the number of input keywords and outputting them as output keywords, and answer candidates that are answer candidates for the questions configured by the output keywords are stored as answer candidates stored in advance. It is characterized by comprising answer candidate extracting means for extracting from a document data group to be searched, and answer table output means for outputting a table in which each extracted answer candidate is associated with a question as an answer table.

また、本発明は、前記の質問応答装置において、前記キーワード増加手段は、前記入力キーワードをキーワード抽出用の文書データが格納されたキーワード抽出用データベースで全文検索し、前記入力キーワードの周辺に出現したパターンを抽出するパターン抽出手段と、前記パターン抽出手段で抽出したパターンを前記キーワード抽出用データベースで全文検索し、前記パターンによって抽出される表現を抽出し、前記抽出した表現を出力キーワードとして出力するキーワード抽出手段とを備えることを特徴とする。 In the question answering apparatus according to the present invention, the keyword increasing unit searches the input keyword in a keyword extraction database in which document data for keyword extraction is stored and appears around the input keyword. A pattern extraction means for extracting a pattern, and a keyword for full-text search of the pattern extracted by the pattern extraction means in the keyword extraction database, extracting an expression extracted by the pattern, and outputting the extracted expression as an output keyword And an extracting means.

また、本発明は、前記の質問応答装置において、前記キーワード増加手段は、前記入力キーワードと同じ分野の単語を、単語と単語の分野との対応情報が格納されたデータベースから抽出し、出力キーワードとして出力することを特徴とする。 In the question answering apparatus according to the present invention, the keyword increasing means extracts a word in the same field as the input keyword from a database in which correspondence information between the word and the word field is stored, and outputs it as an output keyword. It is characterized by outputting.

また、本発明は、前記の質問応答装置において、前記キーワード増加手段は、予めデータベース中に記憶された、意味的類似による単語の分類情報であるシソーラスデータに基づいて、前記入力された入力キーワードと、前記シソーラスデータ中の単語との類似度を算出する類似度算出手段と、前記算出された類似度の大きさに基づいてキーワードを抽出し、出力キーワードとして出力するキーワード抽出手段とを備えることを特徴とする。 In the question answering apparatus according to the present invention, the keyword increasing means may be configured to input the input keyword and the input keyword based on thesaurus data that is preliminarily stored in a database and is word classification information based on semantic similarity. , A similarity calculation means for calculating a similarity to a word in the thesaurus data, and a keyword extraction means for extracting a keyword based on the calculated magnitude of the similarity and outputting it as an output keyword. Features.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、予め用意された問題とその問題に対する解答の組の多数のセットを用いて、どういう問題のときにどういう解答になるかを学習し、その学習結果に基づいて、前記出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答の候補である解答候補を抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input means, and the keyword increasing means is the input first The third keyword is output as an output keyword based on the keyword, and the fourth keyword is output as an output keyword based on the input second keyword. The answer candidate extraction means is prepared in advance. Using a large number of sets of answer questions and answers to the problem, it is learned what kind of answer the problem is, and based on the learning result, the output third keyword and fourth answer An answer candidate that is a candidate for an answer to a question configured by keywords is extracted.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、予め記憶手段中に格納された大量の文書データ群中から前記出力された第３のキーワードと第４のキーワードを含む文書データを取り出し、取り出された文書データの言語表現から、前記大量の文書データ群中に出現する頻度を用いて、前記出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補を抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input means, and the keyword increasing means is the input first The third keyword is output as an output keyword based on the keyword, and the fourth keyword is output as an output keyword based on the input second keyword. The document data including the output third keyword and the fourth keyword is extracted from a large amount of document data group stored in the document data, and from the linguistic expression of the extracted document data, Answers to questions composed of the output third keyword and fourth keyword using the frequency of appearance And extracting the complement.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記第２のキーワードに対応付けられた疑問代名詞が入力される疑問代名詞入力手段と、前記疑問代名詞入力手段に入力された疑問代名詞に基づいて、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定手段とを備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第２のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、前記解答タイプ推定手段によって推定された解答タイプに適合する言語表現を、前記第３のキーワードと第２のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first pronoun and a second keyword are input as the input keyword to the keyword input unit, and the question pronoun associated with the second keyword A type of linguistic expression of candidate answers to a question composed of an output keyword output by the keyword increase means based on the question pronoun input means input to the question pronoun input means Answer type estimating means for estimating a certain answer type, and the keyword increasing means outputs a third keyword as an output keyword based on the inputted first keyword, and the inputted second keyword. The keyword is output as an output keyword, and the answer candidate extraction means is a search target of the answer candidate. The document data group including the third keyword and the second keyword output by the keyword increasing means is searched from the document data group, and estimated by the answer type estimating means from the document data extracted by the search processing A linguistic expression suitable for the answer type is extracted as an answer candidate of a question composed of the third keyword and the second keyword.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、予め定められた前記第２のキーワードに対応付けられた疑問代名詞に基づいて、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定手段とを備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第２のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、前記解答タイプ推定手段によって推定された解答タイプに適合する言語表現を、前記第３のキーワードと第２のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input unit, and the keyword is associated with the predetermined second keyword. Answer type estimation means for estimating an answer type that is a type of a linguistic expression of a candidate answer to a question configured by an output keyword output by the keyword increase means based on the given question pronoun, and the keyword increase The means outputs a third keyword as an output keyword based on the input first keyword, and outputs the input second keyword as an output keyword. The answer candidate extraction means includes the answer candidate extraction means. From the document data group that is the candidate search target, the keyword output unit outputs the first The document data including the keyword and the second keyword is searched, and from the document data extracted by the search process, a language expression that matches the answer type estimated by the answer type estimating means is obtained as the third keyword. And a second keyword as a candidate for answering a question.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、疑問代名詞が入力される疑問代名詞入力手段と、前記疑問代名詞入力手段に入力された疑問代名詞に基づいて、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定手段とを備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第２のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、前記解答タイプ推定手段によって推定された解答タイプに適合する言語表現を、前記第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, in the question answering apparatus, the keyword input means is configured to input a first keyword and a second keyword as the input keyword, and a question pronoun input means for inputting a question pronoun, Based on the question pronoun input to the question pronoun input means, an answer type estimation means for estimating an answer type that is a type of linguistic expression of answer candidates for a question configured by the output keyword output by the keyword increasing means And the keyword increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs a fourth keyword based on the input second keyword. Output as an output keyword, and the answer candidate extraction means is a document that is a search target of the answer candidate The document data including the third keyword and the second keyword output by the keyword increasing means is retrieved from the data group, and is estimated by the answer type estimating means from the document data extracted by this search processing. The linguistic expression suitable for the answer type is extracted as an answer candidate of a question composed of the third keyword and the fourth keyword.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、予め定められた疑問代名詞に基づいて、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定手段とを備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第２のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、前記解答タイプ推定手段によって推定された解答タイプに適合する言語表現を、前記第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input to the keyword input unit as the input keyword, and the keyword is determined based on a predetermined question pronoun. Answer type estimation means for estimating an answer type that is a type of linguistic expression of an answer candidate for a question configured by the output keyword output by the increase means, and the keyword increase means includes the input first The third keyword is output as an output keyword based on the keyword, the fourth keyword is output as an output keyword based on the input second keyword, and the answer candidate extraction means The third key output by the keyword increasing means from the document data group to be searched Document data including a keyword and a second keyword is retrieved, and from the document data extracted by the retrieval process, a language expression suitable for the answer type estimated by the answer type estimating means is obtained as the third keyword. And a fourth keyword are extracted as answer candidates for the question.

また、本発明は、前記の質問応答装置において、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプが入力される解答タイプ入力手段を備え、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第４のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、前記解答タイプ入力手段に入力された解答タイプに適合する言語表現を、前記出力された第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 Further, the present invention provides an answer type input means for inputting an answer type, which is a type of linguistic expression of an answer candidate for a question constituted by an output keyword output by the keyword increasing means in the question answering apparatus. The keyword input means receives a first keyword and a second keyword as the input keyword, and the keyword increase means assigns a third keyword based on the input first keyword. Based on the input second keyword, a fourth keyword is output as an output keyword, and the answer candidate extraction unit is configured to output the answer keyword from the document data group that is the search target for the answer candidate. Document including the third keyword and the fourth keyword output by the keyword increasing means A linguistic expression suitable for the answer type input to the answer type input means from the document data extracted by the search process is obtained by the output third keyword and fourth keyword. It is characterized in that it is extracted as an answer candidate for a question that is composed.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第４のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、予め定められた解答タイプに適合する言語表現を、前記出力された第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input means, and the keyword increasing means is the input first The third keyword is output as an output keyword based on the keyword, and the fourth keyword is output as an output keyword based on the input second keyword. The answer candidate extracting means is configured to output the answer candidate The document data including the third keyword and the fourth keyword output by the keyword increasing means is searched from the document data group that is the search target, and the document data extracted in the search process is determined in advance. The linguistic expression suitable for the answer type is composed of the output third keyword and fourth keyword. And extracting as the answer candidate in question is.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプであって、前記キーワード入力手段に入力された第２のキーワードに対応付けられた解答タイプが入力される解答タイプ入力手段を備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第２のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、前記解答タイプ入力手段に入力された解答タイプに適合する言語表現を、前記出力された第３のキーワードと第２のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 Further, the present invention is the above question answering apparatus, wherein the keyword input means is configured by inputting the first keyword and the second keyword as the input keyword and outputting the keyword by the keyword increasing means. An answer type input means for inputting an answer type corresponding to the second keyword input to the keyword input means, which is an answer type that is a type of language expression of answer candidates for the question to be answered, The keyword increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs the input second keyword as an output keyword. The answer candidate extracting means From the document data group to be searched for the answer candidate, the keyword increasing means The document data including the output third keyword and the second keyword is searched, and from the document data extracted by the search process, a language expression suitable for the answer type input to the answer type input means is obtained. It is extracted as an answer candidate of a question constituted by the outputted third keyword and second keyword.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードを出力キーワードとして出力し、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第２のキーワードとを含む文書データを検索し、この検索処理で抽出された文書データから、予め定められた、前記第２のキーワードに対応付けられた解答タイプに適合する言語表現を、前記出力された第３のキーワードと第２のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input means, and the keyword increasing means is the input first The third keyword is output as an output keyword, the input second keyword is output as an output keyword, and the answer candidate extraction means is a document data group that is a search target for the answer candidate The document data including the third keyword and the second keyword output by the keyword increasing means is searched, and the second keyword is determined in advance from the document data extracted by the search process. The linguistic expression suitable for the associated answer type is converted into the output third keyword and second keyword. And extracts as the answer candidate questions made me.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプであって、前記キーワード入力手段に入力された第２のキーワードに対応付けられた解答タイプが入力される解答タイプ入力手段を備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記第２のキーワードのうち前記出力された第４のキーワードに類似するものを、前記第４のキーワードのそれぞれについて、類似キーワードとして決定する類似キーワード決定手段を備え、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第４のキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、前記出力された第４のキーワードが類似する類似キーワードに対応付けられて前記解答タイプ入力手段に入力された解答タイプに適合する言語表現を、前記出力された第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 Further, the present invention is the above question answering apparatus, wherein the keyword input means is configured by inputting the first keyword and the second keyword as the input keyword and outputting the keyword by the keyword increasing means. An answer type input means for inputting an answer type corresponding to the second keyword input to the keyword input means, which is an answer type that is a type of language expression of answer candidates for the question to be answered, The keyword increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs a fourth keyword as an output keyword based on the input second keyword. And the second keyword similar to the output fourth keyword Each of the fourth keywords includes similar keyword determination means for determining as a similar keyword, and the answer candidate extraction means is output from the document data group that is the search target of the answer candidates by the keyword increase means. Document data including the third keyword and the fourth keyword is searched, and the answer type input is performed by associating the output fourth keyword with a similar keyword similar to the extracted document data. The linguistic expression suitable for the answer type input to the means is extracted as an answer candidate of a question constituted by the output third keyword and fourth keyword.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記第２のキーワードのうち前記出力された第４のキーワードに類似するものを、前記第４のキーワードのそれぞれについて、類似キーワードとして決定する類似キーワード決定手段を備え、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第４のキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、前記出力された第４のキーワードが類似する類似キーワードに予め対応付けられた解答タイプに適合する言語表現を、前記出力された第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input means, and the keyword increasing means is the input first The third keyword is output as an output keyword based on the keyword, the fourth keyword is output as the output keyword based on the input second keyword, and the output of the second keyword is output. Similar keyword determination means for determining a similar keyword to the fourth keyword for each of the fourth keywords is provided, and the answer candidate extraction means is a document data group that is a search target for the answer candidates. To include the third keyword and the fourth keyword output by the keyword increasing means. The document data is searched, and from the document data extracted by the search process, a language expression that matches an answer type that is previously associated with a similar keyword similar to the output fourth keyword is output as the output first data. It is characterized in that it is extracted as an answer candidate for a question composed of the third keyword and the fourth keyword.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、前記第２のキーワードに対応付けられた疑問代名詞が入力される疑問代名詞入力手段と、前記疑問代名詞入力手段に入力された疑問代名詞に基づいて、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定手段とを備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記第２のキーワードのうち前記出力された第４のキーワードに類似するものを、前記第４のキーワードのそれぞれについて、類似キーワードとして決定する類似キーワード決定手段を備え、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第４のキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、前記出力された第４のキーワードが類似する類似キーワードに対応付けられて前記疑問代名詞入力手段に入力された疑問代名詞に基づいて解答タイプ推定手段が推定した解答タイプに適合する言語表現を、前記出力された第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first pronoun and a second keyword are input as the input keyword to the keyword input unit, and the question pronoun associated with the second keyword A type of linguistic expression of candidate answers to a question composed of an output keyword output by the keyword increase means based on the question pronoun input means input to the question pronoun input means Answer type estimating means for estimating a certain answer type, and the keyword increasing means outputs a third keyword as an output keyword based on the inputted first keyword, and the inputted second keyword. Based on the keyword, the fourth keyword is output as an output keyword, and the second keyword Of these, the fourth keyword includes similar keyword determination means for determining a similar keyword for each of the fourth keywords, and the answer candidate extraction means is a search target of the answer candidates. The document data including the third keyword and the fourth keyword output by the keyword increasing unit is searched from a certain document data group, and the output fourth keyword is extracted from the document data extracted by the search processing. A linguistic expression that matches the answer type estimated by the answer type estimating means based on the question pronouns input to the question pronoun input means associated with similar keywords that are similar to each other. It extracts as a candidate answer of the question comprised by 4 keywords.

また、本発明は、前記の質問応答装置において、前記キーワード入力手段には、前記入力キーワードとして第１のキーワードと第２のキーワードとが入力され、予め定められた、前記第２のキーワードに対応付けられた疑問代名詞に基づいて、前記キーワード増加手段によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定手段とを備え、前記キーワード増加手段は、前記入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、前記入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力し、前記第２のキーワードのうち前記出力された第４のキーワードに類似するものを、前記第４のキーワードのそれぞれについて、類似キーワードとして決定する類似キーワード決定手段を備え、前記解答候補抽出手段は、前記解答候補の検索対象である文書データ群から、前記キーワード増加手段によって出力された第３のキーワードと第４のキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、前記出力された第４のキーワードが類似する類似キーワードに対応付けられた疑問代名詞に基づいて解答タイプ推定手段が推定した解答タイプに適合する言語表現を、前記出力された第３のキーワードと第４のキーワードとによって構成される質問の解答候補として抽出することを特徴とする。 In the question answering apparatus according to the present invention, a first keyword and a second keyword are input as the input keyword to the keyword input unit, and the predetermined keyword corresponds to the second keyword. An answer type estimating means for estimating an answer type that is a type of a linguistic expression of an answer candidate for a question configured by an output keyword output by the keyword increasing means based on the attached question pronoun, the keyword The increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs a fourth keyword as an output keyword based on the input second keyword. Among the second keywords, those similar to the output fourth keyword are added to the fourth keyword. Similar keyword determination means for determining each keyword as a similar keyword is provided, wherein the answer candidate extraction means includes a third keyword output by the keyword increase means from the document data group that is the search target of the answer candidate, Document data including the fourth keyword is searched, and answer type estimation means based on the question pronoun associated with the similar keyword similar to the output fourth keyword from the document data extracted by the search processing The linguistic expression suitable for the answer type estimated by is extracted as an answer candidate of a question composed of the output third keyword and fourth keyword.

また、本発明は、前記の質問応答装置において、前記類似キーワード決定手段は、予め記憶手段内に格納された大量の文書データ群中から、前記キーワード抽出手段によって出力された第４のキーワードと共起して出現する語である共起語を抽出するとともに、前記第４のキーワードのそれぞれについて、前記抽出された各共起語と共起して前記文書データ群中に出現する回数を要素とするベクトルである共起ベクトルを求め、各第４のキーワードについての共起ベクトルと前記キーワード入力手段に入力された第２のキーワードと同一の第４のキーワードについての共起ベクトルとの類似の度合いを求め、求められた類似の度合いに基づいて決まる、前記各第４のキーワードと類似する第２のキーワードと同一の第４のキーワードを、前記類似キーワードとすることを特徴とする。 Further, according to the present invention, in the question answering apparatus, the similar keyword determination unit is configured to share the fourth keyword output by the keyword extraction unit from a large amount of document data group stored in advance in the storage unit. Extracting a co-occurrence word that is a word that appears and appearing, and for each of the fourth keywords, the number of occurrences in the document data group co-occurring with each of the extracted co-occurrence words The degree of similarity between the co-occurrence vector for each fourth keyword and the co-occurrence vector for the same fourth keyword as the second keyword input to the keyword input means And a fourth keyword identical to the second keyword similar to each of the fourth keywords, determined based on the obtained degree of similarity, Characterized by a similar keyword.

また、本発明は、前記の質問応答装置において、前記類似キーワード決定手段は、予めデータベース中に記憶された、意味的類似による単語の分類情報であるシソーラスデータに基づいて、前記キーワード増加手段によって出力された第４のキーワード毎に、前記第４のキーワードと同一の単語と、前記キーワード入力手段に入力された第２のキーワードと同一の単語との類似度を算出する類似度算出手段と、前記算出された類似度の大きさに基づいて決まる、前記第４のキーワードと類似する第２のキーワードを、前記類似キーワードとすることを特徴とする。 In the question answering apparatus according to the present invention, the similar keyword determination unit outputs the keyword increase unit based on thesaurus data that is preliminarily stored in the database and is the thesaurus data that is word classification information based on semantic similarity. Similarity calculation means for calculating the similarity between the same word as the fourth keyword and the same word as the second keyword input to the keyword input means for each of the fourth keywords, A second keyword similar to the fourth keyword determined based on the calculated degree of similarity is set as the similar keyword.

また、本発明は、自然言語で表現された質問データに対する解答を出力する質問応答方法であって、複数のキーワードを入力キーワードとして入力するステップと、前記入力キーワードに基づいて、前記入力キーワードの数より多いキーワードを抽出して出力キーワードとして出力するステップと、前記出力キーワードによって構成される質問に対する解答の候補である解答候補を、予め記憶された解答候補の検索対象である文書データ群から抽出するステップと、前記抽出された各解答候補が質問と対応付けられた表を解答表として出力するステップとを有することを特徴とする。 The present invention is also a question answering method for outputting an answer to question data expressed in a natural language, the step of inputting a plurality of keywords as input keywords, and the number of input keywords based on the input keywords Extracting more keywords and outputting them as output keywords, and extracting answer candidates as answer candidates for a question configured by the output keywords from a document data group that is a search target for answer candidates stored in advance. And a step of outputting, as an answer table, a table in which each extracted answer candidate is associated with a question.

また、本発明は、自然言語で表現された質問データに対する解答を出力する質問応答装置が備えるコンピュータに実行させるためのプログラムであって、前記コンピュータに、複数のキーワードを入力キーワードとして入力する処理と、前記入力キーワードに基づいて、前記入力キーワードの数より多いキーワードを抽出して出力キーワードとして出力する処理と、前記出力キーワードによって構成される質問に対する解答の候補である解答候補を、予め記憶された解答候補の検索対象である文書データ群から抽出する処理と、前記抽出された各解答候補が質問と対応付けられた表を解答表として出力する処理とを実行させるための質問応答プログラムである。 Further, the present invention is a program for causing a computer provided in a question answering apparatus that outputs an answer to question data expressed in a natural language to execute a plurality of keywords as input keywords in the computer. Based on the input keywords, a process for extracting more keywords than the number of the input keywords and outputting them as output keywords, and answer candidates that are answer candidates for the questions constituted by the output keywords are stored in advance. A question answering program for executing a process of extracting from a document data group that is a search target of answer candidates and a process of outputting a table in which each of the extracted answer candidates is associated with a question as an answer table.

本発明の質問応答装置によれば、問い合わせられた質問に対する解答だけでなく、問い合わせられた質問以外の質問に対する解答を、各質問に対応付けた形式で出力することが可能となる。すなわち、本発明の質問応答装置によれば、ユーザは、解答を知りたいジャンルのキーワードを少数入力するだけで、入力されたキーワードに基づいて増加したキーワードによって構成される多数の質問に対する解答を自動的に得ることができる。 According to the question answering device of the present invention, it is possible to output not only the answer to the inquired question but also the answer to the question other than the inquired question in a format associated with each question. That is, according to the question answering apparatus of the present invention, the user can automatically answer a large number of questions composed of keywords increased based on the input keywords by only inputting a small number of keywords of a genre that the user wants to know the answers. Can be obtained.

例えば、本発明の質問応答装置によれば、ユーザが第１のキーワードと第２のキーワードとを入力すると、第１のキーワードに基づいて、第１のキーワードの数より多い第３のキーワードが抽出されるとともに、第２のキーワードに基づいて、第２のキーワードの数より多い第４のキーワードが抽出され、抽出された第３のキーワードと第４のキーワードに基づいて構成される質問に対する解答を機械学習の手法を用いて自動的に出力することが可能となる。 For example, according to the question answering apparatus of the present invention, when the user inputs the first keyword and the second keyword, the third keyword more than the number of the first keywords is extracted based on the first keyword. At the same time, a fourth keyword larger than the number of the second keywords is extracted based on the second keyword, and an answer to the question configured based on the extracted third keyword and the fourth keyword is obtained. It is possible to output automatically using a machine learning technique.

また、例えば、本発明の質問応答装置によれば、ユーザが第１のキーワードと第２のキーワードと、第２のキーワードに対応付けられた疑問代名詞とを入力すると、第１のキーワードの数より多い第３のキーワードが抽出されるとともに、上記入力された疑問代名詞に基づいて解答タイプが推定され、第３のキーワードと第２のキーワードと疑問代名詞に基づいて構成される質問に対する解答を、上記推定された解答タイプを用いて自動的に出力することが可能となる。 Further, for example, according to the question answering apparatus of the present invention, when the user inputs the first keyword, the second keyword, and the question pronoun associated with the second keyword, the number of the first keyword A large number of third keywords are extracted, and the answer type is estimated based on the input question pronoun, and the answer to the question configured based on the third keyword, the second keyword, and the question pronoun is It is possible to automatically output using the estimated answer type.

また、例えば、本発明の質問応答装置によれば、ユーザが第１のキーワードと第２のキーワードと、解答タイプとを入力すると、第１のキーワードに基づいて、第１のキーワードの数より多い第３のキーワードが抽出されるとともに、第２のキーワードに基づいて、第２のキーワードの数より多い第４のキーワードが抽出され、抽出された第３のキーワードと第４のキーワードに基づいて構成される質問に対する解答を、上記入力された解答タイプを用いて自動的に出力することが可能となる。 For example, according to the question answering apparatus of the present invention, when the user inputs the first keyword, the second keyword, and the answer type, the number is larger than the number of the first keywords based on the first keyword. A third keyword is extracted, and a fourth keyword larger than the number of second keywords is extracted based on the second keyword, and is configured based on the extracted third keyword and fourth keyword. It is possible to automatically output the answer to the question to be asked using the inputted answer type.

また、例えば、本発明の質問応答装置によれば、ユーザが第１のキーワードと第２のキーワードと、第２のキーワードに対応付けられた解答タイプとを入力すると、第１のキーワードに基づいて、第１のキーワードの数より多い第３のキーワードが抽出されるとともに、第２のキーワードに基づいて、第２のキーワードの数より多い第４のキーワードが抽出され、さらに、抽出された第４のキーワードに類似する第２のキーワード（と同一の第４のキーワード）が類似キーワードとして決定される。そして、抽出された第３のキーワードと第４のキーワードに基づいて構成される質問に対する解答を、上記決定された類似キーワードに対応付けられた解答タイプを用いて自動的に出力することが可能となる。 Further, for example, according to the question answering apparatus of the present invention, when the user inputs the first keyword, the second keyword, and the answer type associated with the second keyword, based on the first keyword A third keyword larger than the number of first keywords is extracted, and a fourth keyword larger than the number of second keywords is extracted based on the second keyword. A second keyword similar to the keyword (the same fourth keyword) is determined as a similar keyword. An answer to a question configured based on the extracted third keyword and fourth keyword can be automatically output using an answer type associated with the determined similar keyword. Become.

まず、本発明の実施の形態の説明の前に、上記非特許文献１に記載された技術について説明する。非特許文献１では、質問応答システムにおける逓減加点法に基づく複数記事情報の利用について記載されている。以下に非特許文献１の記載内容について説明する。 First, before describing the embodiment of the present invention, the technique described in Non-Patent Document 1 will be described. Non-Patent Document 1 describes the use of multiple article information based on a gradual addition point method in a question answering system. The contents described in Non-Patent Document 1 will be described below.

質問応答システムは、与えられた質問に対してその答えを出力するシステムのことで、例えば、「日本の首都はどこですか」という質問文が与えられると、「東京は日本の首都で、その国の最も大きく重要な都市であり、東京は日本の４７都道府県のうちの一つである。」という文をウェブや新聞記事などの電子テキストから探し出し、「東京」と答える。質問応答システムは、情報検索の代りとして重要になるだろうし、また将来の人工知能システムの基本要素にもなるであろう重要なものである。 A question answering system is a system that outputs an answer to a given question. For example, when a question sentence is given, "Where is the capital of Japan?" "Tokyo is one of the 47 prefectures of Japan." Searches for electronic sentences such as the web and newspaper articles, and answers "Tokyo." The question answering system will be important as an alternative to information retrieval, and will be an important component of future artificial intelligence systems.

非特許文献１では、質問応答システムの精度向上のために、複数の記事から得た解の候補の得点を減らしながら加点する新しい方法を提案している。この方法を逓減加点法と呼ぶ。 Non-Patent Document 1 proposes a new method for adding points while reducing the scores of solution candidates obtained from a plurality of articles in order to improve the accuracy of the question answering system. This method is called a gradual addition point method.

質問の答えが複数の記事で見つかることは多く、そのような場合は、複数の記事を使って答えを推定した方が一つの記事を使って推定するよりも良い答えを得ることができると思われるので、複数の記事から得た解の候補の得点を加算することで、複数の記事の情報を利用する手法が考えられる。しかし、ただ単純に得点を加算するだけではシステムの性能を下げる場合がある。 Question answers are often found in multiple articles, and in such cases, using multiple articles to estimate the answer may yield a better answer than using a single article Therefore, a method of using the information of a plurality of articles by adding the scores of solution candidates obtained from a plurality of articles can be considered. However, simply adding scores may reduce system performance.

そこで、非特許文献１では、この単純に加算する際に生じる問題に対処するために、得点の加算の際に得点を減らしながら加算する手法を用いる。より具体的に言うと、非特許文献１の方法では、ｉ番目の解の候補の得点にはｋ^(i-1)の重みをかけておいて、その後で得点を加算する。最終的な答えは合計得点により判断する。例えば、「東京」が三つの記事から解の候補として抽出され、それらの得点が２６、２１、２０であり、ｋが０．３であったとする。この場合、「東京」の合計得点は、３４．１となる（＝２６＋２１×０．３＋２０×０．３²）。このような方法でそれぞれの候補の得点を計算し、最も高い合計得点を持つ候補を解とする。 Therefore, in Non-Patent Document 1, in order to deal with the problem that occurs when this addition is simply performed, a method of adding while reducing the score when adding the score is used. More specifically, according to the method of Non-Patent Document 1, the score of the i-th solution candidate is weighted with k ⁽ⁱ⁻¹⁾ and then the score is added. The final answer is determined by the total score. For example, it is assumed that “Tokyo” is extracted as a solution candidate from three articles, the scores are 26, 21, and 20, and k is 0.3. In this case, the total score of “Tokyo” is 34.1 (= 26 + 21 × 0.3 + 20 × 0.3 ² ). The score of each candidate is calculated by such a method, and the candidate having the highest total score is set as a solution.

次に、非特許文献１における複数記事の利用における逓減加点法の利用について詳細に説明する。「日本の首都はどこですか」という質問文が与えられたとする。このとき、得るべき答えは「東京」である。一般的な質問応答システムは、図２１のように、解の候補と得点をリストとして出力でき、また、解の候補を取り出した記事を指し示す記事番号も出力することができる。なお、図中に示す順位は、得点の大きさの順位を示す。 Next, the use of the progressive addition method in the use of multiple articles in Non-Patent Document 1 will be described in detail. Suppose you are given the question "Where is the capital of Japan?" At this time, the answer to be obtained is “Tokyo”. As shown in FIG. 21, the general question answering system can output the solution candidates and scores as a list, and can also output the article number indicating the article from which the solution candidates are extracted. The ranking shown in the figure indicates the ranking of the score size.

図２１に示すリストの例だと、最も得点の大きい候補は「京都」であり、誤った解を出力することになる。解の候補の得点を単純に加算する方法は、すでに提案されている。図２１に示すリストを用いると、解の候補の得点を単純に加算する方法によれば、図２２に示す結果を得る。 In the example of the list shown in FIG. 21, the candidate with the highest score is “Kyoto”, and an incorrect solution is output. A method of simply adding the scores of solution candidates has already been proposed. When the list shown in FIG. 21 is used, the result shown in FIG. 22 is obtained according to the method of simply adding the scores of the solution candidates.

図２２では、「東京」の得点が一番順位が高く、システムは、正しく「東京」を解として出力することができる。この、解の候補の得点を単純に加算する方法は、複数の記事の情報を利用することで正しい解を得ることができた。しかし、この方法には、高頻度の解の候補を取り出しやすいという問題がある。これは、特に性能が高いシステムで深刻な問題である。もともと性能が高いシステムでは、システムの出力した元の得点の方が単純に加算した得点よりも信頼できる場合が多く、単純に加算する方法は、しばしばシステムの性能を劣化させることになる。 In FIG. 22, the score of “Tokyo” is the highest, and the system can correctly output “Tokyo” as a solution. This method of simply adding the candidate scores of the solution could obtain the correct solution by using the information of multiple articles. However, this method has a problem that it is easy to pick out candidates for high-frequency solutions. This is a serious problem especially in high performance systems. In a system with originally high performance, the original score output by the system is often more reliable than a score obtained by simply adding, and the method of simply adding often degrades the performance of the system.

この問題に対処するために、非特許文献１の技術は、得点を減らしながら加算する新しい方法を提案している。解の候補の得点を単純に加算する代りに、得点を減らす重みをつけて得点を加算するのである。この方法は、高頻度語を取り出し易いという悪い効果を減じ、なおかつシステムの性能を向上させる効果を持つ。 In order to deal with this problem, the technique of Non-Patent Document 1 proposes a new method of adding while reducing the score. Instead of simply adding the candidate scores of the solution, the points are added with a weight that reduces the score. This method has the effect of reducing the bad effect of easily extracting high-frequency words and improving the performance of the system.

この、非特許文献１で提案する方法の有効性を示す例をあげる。「日本の首都は西暦１０００年の時はどこでしたか。」と質問が与えられ、システムは図２３に示す結果を出力したとする。図２３に示すように、「京都」の得点が一番高い。ここで、上記質問に対する正解は「京都」であり、解の候補の得点を単純に加算しなければ、このシステムは正解を出力している。しかし、単純に加算する方法を用いると、その結果は図２４に示す表のようになり、間違った解の「東京」をシステムの解としてしまう。 The example which shows the effectiveness of the method proposed by this nonpatent literature 1 is given. It is assumed that the question “Where was the capital of Japan in the year 1000 AD” was given, and the system output the result shown in FIG. As shown in FIG. 23, “Kyoto” has the highest score. Here, the correct answer to the above question is “Kyoto”, and this system outputs the correct answer unless the score of the candidate solution is simply added. However, if a simple addition method is used, the result is as shown in the table of FIG. 24, and the wrong solution “Tokyo” is used as the system solution.

ここで、得点を減らしながら加算する非特許文献１の新しい方法を利用してみる。ここでは、細かいシステムの仕様として、ｉ番目の候補の得点に０．３^(i-1)を乗じることとする。その場合、「東京」の得点は２．８であり（＝２．１＋１．８×０．３＋１．５×０．３²＋１．４×０．３³）、システムの出力結果は、図２５に示す表のようになり、「京都」の得点が一番高いので、正解の「京都」を解として正しく出力することができる。すなわち、非特許文献１で提案する方法は、最初の例（「日本の首都はどこですか」という質問文が与えられた場合）でも正しい解を得ることができる。最初の例に適用すると、「東京」の得点は４．３となり（＝３．２＋２．８×０．３＋２．５×０．３²＋２．４×０．３³）、出力結果は図２６に示す表のようになり、「東京」が最も高い得点となり、解として正しく出力される。 Here, a new method of Non-Patent Document 1 for adding while reducing the score will be used. Here, as a detailed system specification, the score of the i-th candidate is multiplied by 0.3 ^(i-1) . In that case, the score of “Tokyo” is 2.8 (= 2.1 + 1.8 × 0.3 + 1.5 × 0.3 ² + 1.4 × 0.3 ³ ), and the output result of the system is shown in FIG. Since the score of “Kyoto” is the highest, the correct answer “Kyoto” can be output correctly as a solution. That is, the method proposed in Non-Patent Document 1 can obtain a correct solution even in the first example (when a question sentence “where is the capital of Japan” is given). When applied to the first example, the score for “Tokyo” is 4.3 (= 3.2 + 2.8 × 0.3 + 2.5 × 0.3 ² + 2.4 × 0.3 ³ ), and the output result is shown in FIG. As shown in the table, “Tokyo” has the highest score and is output correctly as a solution.

得点を減らしながら加算する非特許文献１に記載された方法は、高頻度の解の候補を取り出しやすい欠点を減じながら、なおかつ複数記事の情報を利用し精度向上を実現できるものである。 The method described in Non-Patent Document 1 for adding while reducing the score can improve the accuracy by using the information of a plurality of articles while reducing the disadvantage that it is easy to extract a candidate for a high-frequency solution.

非特許文献１に記載された質問応答システムは、以下の三つの基本要素からなる。
１．解表現の推定
質問応答システムは、疑問代名詞の表現などに基づいて解表現（解がどのような言語表現か）を推定する。例えば、入力の質問文が「日本の面積はどのくらいですか」だとすると、「どのくらい」という表現から、解表現は数値表現であろうと推測する。
２．文書検索
質問応答システムは、質問文からキーワードを取り出し、これらのキーワードを用いて文書を検索する。この検索により、解が書いてありそうな文書群を集めることになる。例えば、入力の質問文が、「日本の面積はどのくらいですか」だとすると、「日本」、「面積」がキーワードとして抽出され、これらを含む文書を検索することになる。
３．解の抽出
質問応答システムは、解が書いてありそうな文書群から、推定した解表現に適合する言語表現を抽出し、それを解として出力する。例えば、入力の質問文が、「日本の面積はどのくらいですか」だとすると、文書検索で検索した「日本」、「面積」を含む文書群から、解表現として推定した数値表現にあたる言語表現を解として抽出する。 The question answering system described in Non-Patent Document 1 is composed of the following three basic elements.
1. Estimation of solution expression The question answering system estimates the solution expression (what kind of language expression the solution is) based on the expression of interrogative pronouns. For example, if the input question sentence is “How big is the area in Japan?”, It is assumed from the expression “how much” the solution expression will be a numerical expression.
2. Document Search The question answering system retrieves keywords from a question sentence, and searches for documents using these keywords. This search collects documents that are likely to contain solutions. For example, if the input question sentence is “How big is Japan?”, “Japan” and “Area” are extracted as keywords, and documents including these are searched.
3. Solution Extraction The question answering system extracts a linguistic expression suitable for the estimated solution expression from a group of documents in which the solution is likely to be written, and outputs it as a solution. For example, if the input question is “How big is Japan?”, The language expression equivalent to the numerical expression estimated as the solution expression from the document group including “Japan” and “area” searched by the document search is used as the solution. Extract.

以下に、非特許文献１で提案する技術について、詳細に説明する。 Below, the technique proposed in Non-Patent Document 1 will be described in detail.

（解表現の推定）
人手で作成したヒューリスティックルールを使って解表現を推定する。１６個のルールを作成する。そのいくつかを以下に示す。
・質問文に「誰」という表現がある場合、解表現は人名である。
・質問文に「いつ」という表現がある場合、解表現は時間表現である。
・質問文に「どのくらいの」という表現がある場合、解表現は数値表現である。 (Estimation of solution expression)
Estimate solution expressions using heuristic rules created manually. Create 16 rules. Some of them are shown below.
・ If there is an expression “who” in the question sentence, the answer is the name of the person.
・ If there is an expression “when” in the question sentence, the answer expression is a time expression.
・ If there is an expression “how much” in the question sentence, the solution expression is a numerical expression.

（文書検索）
文書検索のためのキーワードは、公知のキーワード抽出ツールであるＣｈａＳｅｎにより取り出し、付属語などはキーワードから除外する。文書検索は以下のように行なう。 (Document search)
Keywords for document search are extracted by ChaSen, which is a known keyword extraction tool, and attached words and the like are excluded from the keywords. The document search is performed as follows.

まず、以下の式で文書検索を行ない、上位ｋ_dr1個の記事を取り出す。 First, a document search is performed using the following formula to extract the top k _dr1 articles.

ただし、ｄは記事で、ｔは質問文から取り出したキーワードで、ｔｆ（ｄ，ｔ）は、記事ｄに出現するキーワードｔの頻度で、ｄｆ（ｔ）はキーワードｔが出現する頻度で、Ｎは記事の総数で、 length(d)は記事ｄの長さで、Δは記事長の平均である。ｋ_tとｋ₊は実験で定める定数である。この式は、ロバートソンのＯｋａｐｉウェイティング（例えば、下記の文献（１）、文献（２）参照）の式に基づくもので、情報検索でよく用いられる式である（例えば、下記の文献（３）、文献（４）参照）。但し、質問応答では多くの種類のキーワードがマッチすることが重要なので、ｋ_tの値としては大きな値を用いる。 Where d is an article, t is a keyword extracted from the question sentence, tf (d, t) is the frequency of the keyword t appearing in the article d, df (t) is the frequency of the keyword t appearing, and N Is the total number of articles, length (d) is the length of article d, and Δ is the average of article length. k _t and k ₊ are constants determined by experiments. This equation is based on the equation of Robertson's Okapi weighting (see, for example, the following document (1) and document (2)), and is often used in information retrieval (for example, the following document (3) Reference (4)). However, in the question and answer so important that the keywords are many types to match, using a large value as the value of k _t.

文献（１）：S.E. Robertson and S.Walker, Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,(1994).
文献（２）：S.E. Robertson, S.Walker, S.Jones, M.M. HancockBeaulieu, and M.Gatford, Okapi at trec-3, TREC-3,(1994).
文献（３）：村田真樹，内元清貴，小作浩美，馬青，内山将夫，井佐原均，位置情報と分野情報を用いた情報検索，言語処理学会誌，Ｖｏｌ．７，Ｎｏ．２（２０００）．
文献（４）：Masaki Murata, Qing Ma, and Hitoshi Isahara, High performance information retrieval using many characteristics and many techniques, Proceedings of the Third NTCIR Workshop (CLIR),(2002).
次に、以下の式で記事をリランキングし、上位ｋ_dr2個の記事を取り出す。 Reference (1): SE Robertson and S. Walker, Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, (1994).
Reference (2): SE Robertson, S. Walker, S. Jones, MM Hancock Beaulieu, and M. Gatford, Okapi at trec-3, TREC-3, (1994).
Reference (3): Masaki Murata, Kiyotaka Uchimoto, Hiromi Osaku, Ma Aoi, Masao Uchiyama, Hitoshi Isahara, Information Retrieval using Location Information and Field Information, Journal of the Language Processing Society, Vol. 7, no. 2 (2000).
Reference (4): Masaki Murata, Qing Ma, and Hitoshi Isahara, High performance information retrieval using many characteristics and many techniques, Proceedings of the Third NTCIR Workshop (CLIR), (2002).
Next, the article is _reranked by the following formula, and the top k _dr2 articles are taken out.

ただし、Ｔはキーワードの集合で、ｄｉｓｔ（ｔ１，ｔ２）はキーワードｔ１とｔ２の間の距離で、便宜上ｔ１＝ｔ 2のときｄｉｓｔ（ｔ１，ｔ２）＝０．５としている。ｗ_dr2はｔ２の関数で実験により定められる。 However, T is a set of keywords, dist (t1, t2) is the distance between the keywords t1 and t2, and for convenience t1 = t2, dist (t1, t2) = 0.5. _wdr2 is a function of t2 and is determined by experiment.

一般には、質問応答システムでは質問文から取り出した複数のキーワードが近くに出現することを保証するために、記事を段落などの小さい単位に分割するが、非特許文献１のシステムでは、上記の、リランキングによりキーワードが近くにある場合に得点をあげる式を用いるので、記事を分割する必要がなく、記事をそのまま文書検索に使えるのである。この文書検索では、上位２０記事を取り出し、それを次の解の抽出で利用する。 In general, in a question answering system, an article is divided into small units such as paragraphs in order to guarantee that a plurality of keywords extracted from a question sentence appear nearby. Since the formula for scoring when keywords are near by reranking is used, there is no need to divide the article, and the article can be used for document search as it is. In this document search, the top 20 articles are extracted and used for extracting the next solution.

（解の抽出）
文書検索で得た記事から、名詞、未知語連続を取り出し、それらを解の候補とする。それぞれの候補には、解の候補とキーワードの近さに基づく得点Ｓｃｏｒｅ_near（ｃ）と解表現の意味制約を満足しているか否かに基づくＳｃｏｒｅ_sem（ｃ）の二つの得点を与え、その合計点が最も大きい候補を解とする。 (Solution extraction)
A series of nouns and unknown words are extracted from articles obtained by document search, and they are used as solution candidates. Each candidate is given two scores, Score _near (c) based on the proximity of the solution candidate and the keyword, and Score _sem (c) based on whether or not the semantic constraint of the solution expression is satisfied. The candidate with the largest sum is taken as the solution.

Ｓｃｏｒｅ_near（ｃ）は、以下の式で与えられる。 Score _near (c) is given by the following equation.

ただし、ｃは解の候補であり、ｗ_dr2は実験で定められる関数である。 However, c is a solution candidate, and _wdr2 is a function determined by experiment.

解表現の意味制約に基づく得点Ｓｃｏｒｅ_sem（ｃ）は、人手で作成した規則により与えられる。非特許文献１では、４５の規則を作成した。そのいくつかを以下に示す。
・推定した解表現（人名や地名など）と一致する候補に１０００を与える。解の候補が人名か地名かと特定する方法には、例えばＳＶＭに基づく固有表現抽出技術を利用する。固有表現抽出技術の例については、後述する。
・解表現が「国名」の場合に解の候補が国名のときに１０００を与える。
・質問文が「何＋名詞Ｘ」の場合、名詞Ｘを最後に持つ候補に１０００を与える。 The score Score _sem (c) based on the semantic constraint of the solution expression is given by a rule created manually. In Non-Patent Document 1, 45 rules were created. Some of them are shown below.
• 1000 is given to a candidate that matches the estimated solution expression (person name, place name, etc.). For example, a specific expression extraction technique based on SVM is used as a method for specifying whether a candidate for a solution is a person name or a place name. An example of the specific expression extraction technique will be described later.
When the solution expression is “country name”, 1000 is given when the solution candidate is the country name.
When the question sentence is “what + noun X”, 1000 is given to the candidate having the noun X at the end.

非特許文献１における実験では、以下の得点加算法を利用している。
（１）オリジナル法
得点の加算を行わない方法。
（２）単純加算法
複数の記事から取り出した解の候補の得点を加算し、その得点をそのまま加算した合計得点に基づき解を出力する。
（３）逓減加点法
複数記事から取り出した候補の得点を加算する。この方法は、ｉ番目の候補の得点にはｋ^(i-1)の値を乗じてから得点を加算する。すなわち、加算結果は、以下の式で表される。 In the experiment in Non-Patent Document 1, the following score addition method is used.
(1) Original method A method that does not add points.
(2) Simple addition method The candidate scores of solutions extracted from a plurality of articles are added, and a solution is output based on the total score obtained by adding the scores as they are.
(3) Gradual addition point method Candidate scores taken from multiple articles are added. In this method, the score of the i th candidate is multiplied by the value of k ^{(i−1) and then} the score is added. That is, the addition result is expressed by the following equation.

ただし、Ｓｃｏｒｅ_decreasedは、最終的な加算後の値の１０００より下の桁の数字で、ｓｃｏｒｅ_original（ｉ）は、元の値の１０００より下の桁の数字である。ｎは１０００より上の桁で同じ数字を持つ複数の記事から得られた同じ解の候補の出現回数である。ｋは実験で定める定数である。
（４）融合法
この方法は、オリジナル法、単純加算法、逓減加点法の組み合わせである。この方法はまず学習データでこれらの方法のうちどの方法が最も良い精度を出すかを調べて、最も精度の高かった方法を利用して問題を解く。 However, Score _decreased is a number below 1000 of the final value after addition, and score _original (i) is a number below 1000 of the original value. n is the number of appearances of the same solution candidate obtained from a plurality of articles having the same number in the digits above 1000. k is a constant determined by experiments.
(4) Fusion method This method is a combination of the original method, simple addition method, and gradual addition point method. In this method, first, which of the methods gives the best accuracy is checked with the learning data, and the method with the highest accuracy is used to solve the problem.

この方法には融合による精度向上という効果と、公平な評価ができるという効果がある。 This method has the effect of improving accuracy through fusion and the effect of being able to make a fair evaluation.

以下に、本発明の実施の形態について、図を用いて説明する。図１は、本発明の第１の実施の形態における質問応答装置の構成の一例を示す図である。第１の実施の形態では、例えば、第１のキーワード「日本」と第２のキーワード「面積」が入力されると、第１のキーワード「日本」に基づいて、第１のキーワードを、例えば「日本」、「アメリカ」、「ドイツ」という３つの第３のキーワードに増加させる。また、第２のキーワード「面積」に基づいて、第２のキーワードを、例えば「面積」、「人口」、「緯度」という３つの第４のキーワードに増加させる。そして、第３のキーワードと第４のキーワードとの組み合わせにより構成される、例えば「日本の面積は？」、「アメリカの人口は？」、「ドイツの緯度は？」・・・といった各質問に対する解答の候補を機械学習の手法を用いて求めて、解答として出力する。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a diagram illustrating an example of a configuration of a question answering apparatus according to the first embodiment of the present invention. In the first embodiment, for example, when the first keyword “Japan” and the second keyword “area” are input, the first keyword “Japan” is selected based on the first keyword “Japan”. Increase to three third keywords: “Japan”, “USA” and “Germany”. Further, based on the second keyword “area”, the second keyword is increased to, for example, three fourth keywords “area”, “population”, and “latitude”. For each question such as "What is the area of Japan?", "What is the population of America?", "What is the latitude of Germany?" The answer candidate is obtained by using a machine learning technique and output as an answer.

質問応答装置１は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。質問応答装置１は、キーワード入力部１１、キーワード増加部１２、質問作成部１３、解答候補抽出部１４、解答表出力部１５、キーワード抽出用データベース（ＤＢ）１６を備える。また、図中、１７は後述する解答候補抽出部１４による機械学習の結果（学習結果）が蓄積されている学習データベース（ＤＢ）である。 The question answering apparatus 1 is an apparatus that increases an input keyword and outputs an answer to a question configured by the increased keyword. The question answering apparatus 1 includes a keyword input unit 11, a keyword increase unit 12, a question creation unit 13, an answer candidate extraction unit 14, an answer table output unit 15, and a keyword extraction database (DB) 16. In the figure, reference numeral 17 denotes a learning database (DB) in which results of machine learning (learning results) by an answer candidate extraction unit 14 described later are accumulated.

キーワード入力部１１には、キーワードが入力される。例えば、第１のキーワード「日本」と第２のキーワード「面積」が入力される。キーワード増加部１２は、後述するキーワード抽出技術を用いて、入力された各キーワードと同じ分野のキーワードをキーワード抽出用ＤＢ１６から抽出する。キーワードの抽出の結果、キーワードの総数が増加する。例えば、キーワード増加部１２は、入力された第１のキーワードに基づいて、第１のキーワードの数より多い第３のキーワードを出力する。また、例えば、キーワード増加部１２は、入力された第２のキーワードに基づいて、第２のキーワードの数より多い第４のキーワードを出力する。 A keyword is input to the keyword input unit 11. For example, the first keyword “Japan” and the second keyword “area” are input. The keyword increasing unit 12 extracts keywords in the same field as the input keywords from the keyword extraction DB 16 by using a keyword extraction technique described later. As a result of keyword extraction, the total number of keywords increases. For example, the keyword increasing unit 12 outputs a third keyword that is larger than the number of first keywords based on the input first keyword. In addition, for example, the keyword increasing unit 12 outputs a fourth keyword that is larger than the number of second keywords based on the input second keyword.

質問作成部１３は、増加したキーワードである第３のキーワードと第４のキーワードとによって構成される質問を複数作成する。例えば、増加後の第３のキーワードの一つが「Ｘ」、第４のキーワードの一つが「Ｙ」とすると、所属の格助詞「の」を用いて、「ＸのＹは？」という質問を作成する。 The question creating unit 13 creates a plurality of questions configured by the third keyword and the fourth keyword, which are increased keywords. For example, if one of the increased third keywords is “X” and one of the fourth keywords is “Y”, the question “What is Y of X?” create.

解答候補抽出部１４は、後述する機械学習の手法によって、上記質問作成部１３によって作成された質問に対する解答の候補である解答候補を抽出する。解答表出力部１５は、抽出された各解答候補が質問と対応付けられた表を解答表として出力する。例えば、図２に示すような解答表を出力する。 The answer candidate extraction unit 14 extracts answer candidates that are answer candidates for the question created by the question creation unit 13 by a machine learning method described later. The answer table output unit 15 outputs a table in which each extracted answer candidate is associated with a question as an answer table. For example, an answer table as shown in FIG. 2 is output.

図２に示す解答表の例では、例えば、「日本の面積は？」という質問に対応する解答として、解答表のデータ項目「日本」に対応する行とデータ項目「面積」と対応する列とが交差する枡目に「Ａ１」（ｋｍ²）が格納され、「アメリカの人口は？」という質問に対応する解答として、解答表のデータ項目「アメリカ」に対応する行とデータ項目「人口」と対応する列とが交差する枡目に「Ｂ２」（万人）が格納される。 In the example of the answer table shown in FIG. 2, for example, as an answer corresponding to the question “What is the area of Japan?”, A row corresponding to the data item “Japan” in the answer table and a column corresponding to the data item “Area” “A1” (km ² ) is stored in the cell where “is crossed”, and as an answer corresponding to the question “What is the American population?”, The row corresponding to the data item “USA” and the data item “population” in the answer table “B2” (10,000 people) is stored in the cell where the corresponding column intersects.

本発明の実施の形態においては、抽出された解答候補を所定の単位（例えばｋｍ²）に換算した表現を解答表に格納してもよく、また、抽出された解答候補についての単位のまま解答表に格納してもよい。 In the embodiment of the present invention, an expression obtained by converting the extracted answer candidate into a predetermined unit (for example, km ² ) may be stored in the answer table, or the answer may be kept in the unit of the extracted answer candidate. It may be stored in a table.

もちろん、本発明において出力される解答表は、図２に示すものに限られるものではなく、例えば、「日本の面積は？→Ａ１（ｋｍ²）」、「アメリカの人口は？→Ｂ２（万人）」といった、各解答候補が矢印によって質問と対応付けられたデータが、解答表の各行のデータとして格納される形式の解答表を出力する構成を採ることもできる。 Of course, the answer table output in the present invention is not limited to the one shown in FIG. 2. For example, “Area of Japan? → A1 (km ² )”, “American population? → B2 (10,000 It is also possible to adopt a configuration in which an answer table in a format in which data in which each answer candidate is associated with a question by an arrow, such as “person”, is stored as data in each row of the answer table.

キーワード抽出用ＤＢ１６は、一定量の文書データを格納したデータベースである。キーワード抽出用ＤＢ１６は、例えば、新聞、雑誌、Ｗｅｂデータ（ネットワーク上のデータ）等から抽出したデータ（一定量の文書データ）を格納している。学習ＤＢ１７には、後述する学習結果が蓄積されている。例えば、『質問「日本の首都は？」で答え「東京」』という問題から抽出される素性の集合のときに、どのような解答（「正解」または「不正解」）になりやすいかが、学習結果として蓄積されている。 The keyword extraction DB 16 is a database that stores a certain amount of document data. The keyword extraction DB 16 stores, for example, data (a fixed amount of document data) extracted from newspapers, magazines, Web data (data on the network), and the like. The learning DB 17 stores learning results described later. For example, what kind of answer (“correct answer” or “incorrect answer”) is likely to occur when the set of features extracted from the question “Tokyo” is the question “What is the capital city of Japan?” Accumulated as a learning result.

キーワード増加部１２は、パターン抽出部１２１とキーワード抽出部１２２とを備える。パターン抽出部１２１は、キーワード入力部１１に入力されたキーワードをキーワード抽出用ＤＢ１６で全文検索し、複数の入力キーワードの周辺に出現したパターンを抽出する。キーワード抽出部１２２は、パターン抽出部１２１で抽出したパターンをキーワード抽出用ＤＢ１６で全文検索し、該パターンによって抽出される表現をキーワードとして出力する。 The keyword increasing unit 12 includes a pattern extracting unit 121 and a keyword extracting unit 122. The pattern extraction unit 121 performs a full-text search on the keyword input DB 16 for keywords input to the keyword input unit 11 and extracts patterns that appear around a plurality of input keywords. The keyword extraction unit 122 performs a full text search in the keyword extraction DB 16 for the pattern extracted by the pattern extraction unit 121, and outputs an expression extracted by the pattern as a keyword.

本発明の実施の形態においては、図１に示す構成から質問作成部１３を省略し、解答候補抽出部１４が、機械学習の手法を用いて、キーワード増加部１２によって出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。すなわち、解答候補抽出部１４は、予め用意された問題と、その問題に対する解答の組の多数のセットを用いて、どういう問題のときにどういう解答になるかを学習し、その学習結果に基づいて、キーワード増加部１２によって出力された第３のキーワードと第４のキーワードによって構成される質問に対する解答候補を抽出する構成を採ってもよい。 In the embodiment of the present invention, the question creating unit 13 is omitted from the configuration shown in FIG. 1, and the answer candidate extracting unit 14 uses the machine learning technique to output the third keyword output by the keyword increasing unit 12. And a fourth keyword may be used to extract and output answer candidates for a question. In other words, the answer candidate extraction unit 14 learns what kind of problem and what kind of answer is obtained using a set of questions prepared in advance and answers to the problem, and based on the learning result. Alternatively, a configuration may be adopted in which answer candidates for the question configured by the third keyword and the fourth keyword output by the keyword increasing unit 12 are extracted.

以下に、キーワード増加部１２によるキーワード抽出処理を説明する。パターン抽出部１２１は、入力された少数のキーワードをキーワード抽出用ＤＢ１６で全文検索し、該少数のキーワードの周辺に出現したパターンｃ_iを抽出する。キーワード抽出部１２２は、抽出したパターンｃ_iをキーワード抽出用ＤＢ１６で全文検索し、パターンｃ_iによって抽出される表現ｅｘｐを抽出すると同時に、抽出した表現ｅｘｐをＳｃｏｒｅ（スコア；評価値）の値の大きい順にソートしてキーワードとして出力する。 Below, the keyword extraction process by the keyword increase part 12 is demonstrated. The pattern extraction unit 121 performs a full text search for a small number of input keywords in the keyword extraction DB 16 and extracts patterns c _i that appear around the small number of keywords. The keyword extraction unit 122 performs a full text search on the extracted pattern c _i in the keyword extraction DB 16 and extracts the expression exp extracted by the pattern c _i , and at the same time, extracts the expression exp based on the value of Score (score; evaluation value). Sort in descending order and output as keywords.

本発明の実施の形態においては、キーワード抽出部１２２は、抽出した表現ｅｘｐについて、Ｓｃｏｒｅの値が大きいものから順に所定の個数取り出してキーワードとして出力する構成を採ってもよい。また、キーワード抽出部１２２は、抽出した表現ｅｘｐについて、Ｓｃｏｒｅの値が所定の閾値以上のものをキーワードとして出力する構成を採ってもよい。 In the embodiment of the present invention, the keyword extraction unit 122 may take a configuration in which a predetermined number of extracted expressions exp are extracted in order from the one with the highest Score value and output as a keyword. In addition, the keyword extraction unit 122 may take a configuration in which an extracted expression exp having a score value equal to or greater than a predetermined threshold is output as a keyword.

（パターンの例の説明）
以下に、パターン抽出部１２１が抽出するパターンについて、該パターンが国名Ａである場合を例にとって説明する。 (Description of pattern example)
Hereinafter, the pattern extracted by the pattern extraction unit 121 will be described by taking as an example the case where the pattern is the country name A.

・入力キーワード：
日本
中国
朝鮮
タイ
韓国
・抽出パターンの例(1) ：（両端とも利用、スピードは遅いが性能は良い）
日、Ａ軍
人のＡ人女性
日本はＡと
〔Ａ通信・
省。駐Ａ大使な
・抽出パターンの例(2) ：（片方のみ利用、片方は平仮名文字、スピードは早い）
［..Ａ国］。・ Input keywords:
Japan
China
Korea
Thailand
Korea ・ Example of extraction pattern (1): (Used at both ends, slow speed but good performance)
Sun, A army
A female
Japan is A
[A communication
Ministry. Ambassador to A ・ Example of extraction pattern (2): (Only one is used, one is Hiragana, and the speed is fast)
[..A country].

語。Ａ
［..Ａ国］側
［..Ａ国］伝来
Ａ語入力
ただし、［..Ａ..］は、それ自体が国名Ａにマッチすることを意味する。例えば［Ａ国］だとそのマッチした用語の最後が国であることを意味する。 word. A
[..A country] side
[..A country]
A word input However, [..A ..] means that country name A itself matches. For example, [Country A] means that the end of the matched term is the country.

（キーワード抽出の具体的な説明）
入力する少数のキーワードとして、例えば、評価データの代表形で毎日新聞での頻度の多い方から有名そうな用語を五つ選択するものとする。また、例えば、ＣＤ毎日新聞（コンパクトディスクに記録された毎日新聞）１９９１−２０００年度版をキーワード抽出用ＤＢ１６とする。抽出の手順例は以下のとおりである。 (Specific explanation of keyword extraction)
As a small number of keywords to be input, for example, it is assumed that five terms that are likely to be famous from those with a high frequency in daily newspapers are selected as representative forms of evaluation data. Further, for example, the CD Mainichi Newspaper (Mainichi Newspaper recorded on a compact disc), 1999-2000, is used as the keyword extraction DB 16. An example of the extraction procedure is as follows.

(1) 少数の複数のキーワードをキーワード抽出用ＤＢ１６で全文検索し、複数のキーワードの周辺に出現したパターンをｃ_iとして抽出する（キーワードの周辺に出現するパターンがそのキーワードだけ（一個）の場合は抽出しない）。（周辺に出現するパターンの定義は適宜行なう）。周辺に出現するパターンとして例えば、キーワードの前後（左右）３文字列を用いる場合は、前後それぞれ文字が１個、２個、３個の場合があるので、１個のキーワードで９通りのパターンができることになる。また、キーワード（自分自身）を含めたパターンとすることもできる。 (1) a small number of the plurality of keywords and full-text search on the keyword extraction DB 16, when the appearance pattern around the plurality of keywords are extracted as c _i (pattern appearing around the keywords only that keyword (one) Is not extracted). (Definitions of patterns that appear in the vicinity are made as appropriate). For example, when using three character strings before and after (left and right) keywords as the patterns appearing in the vicinity, there are cases where there are one, two, and three characters respectively before and after, so there are nine patterns with one keyword. It will be possible. It can also be a pattern including a keyword (self).

(2) 次に抽出したパターンｃ_iをキーワード抽出用ＤＢ１６で全文検索し、パターンｃ_iによって抽出される表現ｅｘｐを抽出する。 (2) Next, the extracted pattern c _i is full-text searched in the keyword extraction DB 16 to extract the expression exp extracted from the pattern c _i .

(3) 抽出した表現ｅｘｐをＳｃｏｒｅの値の大きい順にソートして、キーワードとして出力する。 (3) The extracted expressions exp are sorted in descending order of Score values and output as keywords.

Ｓｃｏｒｅとして、以下のものがある。 There are the following as Score.

・手法１（決定リスト法）
手法１は、抽出した表現ｅｘｐのＳｃｏｒｅとして、パターンｃ_iの中でｐ_iが最も大きかったパターンのｐ_iを使用する手法である。ここで、ｐ_iはパターンｃ_iで抽出される表現ｅｘｐでの入力キーワードの割合（確からしさ、すなわち確信度となる）である。・ Method 1 (decision list method)
Method 1 is a method of using p _i of the pattern with the largest p _i in the pattern c _i as the score of the extracted expression exp. Here, p _i is the ratio of input keywords in the expression exp extracted with the pattern c _i (the probability, that is, the certainty level).

例えば、パターンｃ₁についてキーワード抽出用ＤＢ１６で全文検索した結果、ｅｘｐ１、ｅｘｐ２、ｅｘｐ３、ｅｘｐ４、ｅｘｐ５までの５個のｅｘｐが抽出され、この５個のｅｘｐのうち、ｅｘｐ１〜ｅｘｐ３までの３個が入力キーワードであった場合、ｐ₁は３／５である。 For example, as a result of the full text search for the pattern c _{1 in the} keyword extraction DB 16, five exps from exp1, exp2, exp3, exp4, and exp5 are extracted, and among these five exps, three from exp1 to exp3 Is an input keyword, p ₁ is 3/5.

・手法２（ベイズ法）
手法２は、抽出した表現ｅｘｐのＳｃｏｒｅとして、全てのパターンｃ_iのｐ_iを掛け合わせたものを使用する。・ Method 2 (Bayes method)
Method 2 uses a score obtained by multiplying all the patterns c _i by p _i as the score of the extracted expression exp.

なお、実際にはｐ_i＝０の可能性が大きいため、本発明の実施の形態では、上記式（８）に代えて、以下の式（９）
Π（（１−Δ）／Δ＊ｐ_i＋１）式（９）
を利用する構成をとることもできる。ここで、Δは微小値の定数であり、例えば、０．０００１を用いる。 In practice, the possibility of p _i = 0 is high, so in the embodiment of the present invention, instead of the above formula (8), the following formula (9)
Π ((1−Δ) / Δ * p _i +1) Equation (9)
It is also possible to take a configuration that uses. Here, Δ is a constant of a minute value, for example, 0.0001 is used.

例えば、Ｓｃｏｒｅを計算しているｅｘｐが、パターンｃ_iについての検索処理によって取得できなかった場合は、ｐ_i＝０として、上記の式（９）を用いて計算する。 For example, exp which calculates the Score is the case can not be acquired by the search process for the pattern c _i, as p _i = 0, is calculated using the above equation (9).

・手法３（類似度に基づく方法）
手法３は、抽出した表現ｅｘｐのＳｃｏｒｅとして、抽出されたパターンの個数（総数）を用いる。つまり、多くのパターンで抽出されたものほどＳｃｏｒｅを大きくする。・ Method 3 (method based on similarity)
Method 3 uses the number (total number) of extracted patterns as the score of the extracted expression exp. That is, the score is increased as the number of patterns extracted is increased.

・手法４（下記研究（１）参照）
手法４は、抽出した表現ｅｘｐのＳｃｏｒｅとして、ｐ_iの重みを加えた抽出されたパターンの個数を用いるものである。・ Method 4 (Refer to Research (1) below)
Method 4 as Score of the extracted expression exp, is to use a number of the extracted pattern plus the weight of p _i.

ただし、ｆ_iはパターンｃ_iが出現した入力キーワードの個数である。 Here, f _i is the number of input keywords in which the pattern c _i appears.

研究（１）:Ellen Riloff and Rosie Jones "Learning dictionaries for information extraction by multi-level bootstrapping" Proceedings of AAAI-99,(1999) 。 Study (1): Ellen Riloff and Rosie Jones "Learning dictionaries for information extraction by multi-level bootstrapping" Proceedings of AAAI-99, (1999).

・手法５（下記文献（５）参照）
手法５は、抽出した表現ｅｘｐのＳｃｏｒｅとして、少なくとも一つは確からしくなる値を用いるものである。・ Method 5 (Refer to the following document (5))
Method 5 uses at least one value that is likely to be the score of the extracted expression exp.

上記式（１２）は、確からしくない（１−ｐ_i）を掛け合わせることで一つも確からしくないことになり、そして、これを１から引くと少なくとも一つは確からしくなる。 In the above equation (12), by multiplying (1− _pi ) which is not certain, no one is uncertain, and when subtracting this from 1, at least one becomes uncertain.

文献（５）: 村田真樹, 井佐原均 "同義テキストの照合に基づくパラフレーズに関する知識の自動獲得" 情報処理学会自然言語処理研究会 2001-NL-142,(2001) 。 Reference (5): Masaki Murata, Hitoshi Isahara "Automatic Acquisition of Knowledge about Paraphrases Based on Collation of Synonymous Texts" IPSJ SIG 2001-NL-142, (2001).

上記手法１、２、４、５では、Ｓｃｏｒｅが同じときは、手法３のＳｃｏｒｅでソートし、手法３では手法５のＳｃｏｒｅでソートする。 In the above methods 1, 2, 4, and 5, when the score is the same, the score is sorted by the score of the method 3, and in the method 3, the score is sorted by the score of the method 5.

図３は、パターンとしてキーワードの左と先頭のいずれかを含む１〜３文字と右側のそれの組み合わせを用いて行ったキーワードの抽出結果に対して、予め用意した所定の種類数の正解データを使って、適合率・再現率を求めた結果の一例を示す図である。ここで、正解データとしては、例えば、図４に示すようなデータ例を用意する（図４は、国名データの例を示しており、国名を国ごとに行に分けて格納し、行頭を代表形としてそれ以外は代表形の異表記として同じ行に格納している）。図４に示すデータ形式と同様のデータ形式を持つ正解データを、例えば、国名データの他に、衛星、祝日、太陽系惑星、世界遺産等に関するデータのように、多種類用意する。 FIG. 3 shows a predetermined number of types of correct data prepared in advance for keyword extraction results obtained by using a combination of 1 to 3 characters including either the left or the beginning of the keyword as a pattern and that on the right side. It is a figure which shows an example of the result of having used and calculated | required the precision and the recall. Here, as the correct answer data, for example, a data example as shown in FIG. 4 is prepared (FIG. 4 shows an example of country name data, the country name is divided into rows for each country, and the head of the line is represented. Other than that, it is stored in the same line as a variant of the representative form). A variety of correct data having a data format similar to the data format shown in FIG. 4 is prepared, for example, data related to satellites, holidays, solar system planets, world heritage, etc. in addition to country name data.

図３において、ＡＰは、情報検索（下記文献（６）参照）で用いるaverage precision の平均であり、正解記事を上位から取ったたびに求めた適合率の平均である。本願の内容の場合は、正解キーワード分を上位から取ったたびに求めた適合率の平均（ただし、入力キーワードは正解キーワードから除く）である。 In FIG. 3, AP is an average of average precision used in information retrieval (refer to the following document (6)), and is an average of relevance ratios obtained every time correct articles are taken from the top. In the case of the contents of the present application, it is an average of the relevance ratios obtained every time the correct keyword is taken from the top (however, the input keyword is excluded from the correct keyword).

文献（６）: 村田真樹, 馬青, 内元清貴, 小作浩美, 内山将夫, 井佐原均 "位置情報と分野情報を用いた情報検索" 言語処理学会誌, Vol.7,No.2,(2000) 。 Reference (6): Masaki Murata, Ma Aoi, Kiyotaka Uchimoto, Hiromi Osaku, Masao Uchiyama, Hitoshi Isahara "Information Retrieval Using Location Information and Field Information" Journal of Language Processing Society, Vol.7, No.2, ( 2000).

ＲＰは、r-precision の平均であり、正解記事数分だけを検索した時に正解の記事が含まれている割合である。本願の内容の場合は、正解キーワード分だけを抽出した時に正解キーワードが含まれている割合である。なお、適合率は正解率と同じであり、正解キーワードが含まれる割合のことである。ＴＰは、上位５個での精度の平均である。 RP is an average of r-precision, and is a ratio of including correct articles when searching for the number of correct articles. In the case of the contents of the present application, it is a ratio in which correct keywords are included when only correct keywords are extracted. Note that the relevance rate is the same as the correct answer rate, and is the rate at which correct keywords are included. TP is the average accuracy of the top five.

（制約に基づく抽出方法の説明）
（ａ）字種とＫＲを利用する方法
図３に示す例で、抽出方法には、さらに字種とＫＲを利用する方法を用いた。ここで、字種とは、漢字、カタカナ、ひらがな、記号、数字などであり、例えば英語だと、アルファベット、数字、記号、単語の先頭が大文字かどうかなどである。 (Explanation of extraction method based on constraints)
(A) Method of Using Character Type and KR In the example shown in FIG. 3, a method of further using character type and KR was used as the extraction method. Here, the character types are kanji, katakana, hiragana, symbols, numbers, and the like. For example, in English, alphabets, numbers, symbols, and whether the beginning of a word is capitalized or the like.

字種を利用する方法では、入力した少数（この例では５個）のキーワードになかった字種を含む表現を抽出しない方法である。例えば、入力した５個のキーワードにひらがなが無かった場合は、ひらがなを含む表現を抽出しないようにするものである。 The method using character types is a method that does not extract expressions including character types that were not found in a small number of input keywords (five in this example). For example, when there are no hiragana characters in five input keywords, an expression including hiragana characters is not extracted.

ＫＲを利用する方法では、ｐ_iをｐ_i* ｆ_i/ ｎ_iに置き換えた方法である。この方法の利点は、ｐ_iが同じでもｆ_i/ ｎ_iの値により確信度を変えることができるものである。ただし、ｎ_iは入力キーワードの個数で、手法３のときはＫＲの場合は１をｆ_iに置き換えた。なお、評価では抽出した結果でキーワードの異表記は除いた。また、字種による方法以外にも次のような方法もある。 In the method using KR, p _i is replaced with p _i * f _i / n _i . The advantage of this method is that the certainty factor can be changed by the value of f _i / n _i even if p _i is the same. However, n _i is the number of input keywords, and in the case of Method 3, in the case of KR, 1 is replaced with f _i . In the evaluation, keywords were not included in the extracted results. In addition to the character type method, there are also the following methods.

（ｂ）品詞に基づく方法
品詞に基づく方法では、例えば、入力表現に名詞しかない場合は出力時に名詞以外の表現を省く、また、入力表現に形容詞しかない場合は出力時に形容詞以外の表現を省くというものである。さらに、表現が複数の単語で構成されている場合は、末尾の単語（形態素）の品詞の情報を使うようにすることができる。 (B) Method based on part of speech In the method based on part of speech, for example, if there is only a noun in the input expression, the expression other than the noun is omitted at the time of output, and if the input expression only has an adjective, the expression other than the adjective is omitted at the time of output. That's it. Furthermore, when the expression is composed of a plurality of words, the part of speech information of the last word (morpheme) can be used.

（例による説明１）
入力キーワードとして次のものであった場合、
「楽しい」「哀しい」「嬉しい」「とても嬉しい」「とても哀しい」
抽出物として次のものが得られる場合、
「とても」「新しい」「美しい」「とても美しい」「とても難しい」
上記抽出物の表現中の末尾の単語の品詞を推定し、上記入力キーワードでは、末尾の単語の品詞は「形容詞」しかないので、抽出物の中で、末尾の単語の品詞が「形容詞」でない、副詞（「とても」）を除いて出力するようにする。 (Description 1 by example)
If the input keyword is:
“Fun” “sad” “happy” “very happy” “very sad”
If the following is obtained as an extract:
"Very""New""Beautiful""Verybeautiful""Verydifficult"
Estimate the part of speech of the last word in the expression of the extract, and in the above input keyword, the part of speech of the last word in the extract is not “adjective” because the last word has only “adjective” , Excluding adverbs ("very").

（例による説明２）
入力キーワードとして次のものであった場合、
「楽しい」「歓喜」「悲痛」「悲しい」
上記入力キーワードでは、「形容詞」と「名詞」のように複数種類があった場合は、それらの品詞は出力し、それらの品詞以外の表現は出力しないようにする。 (Description 2 by example)
If the input keyword is:
"Fun""joy""sadness""sad"
In the above input keyword, when there are plural types such as “adjective” and “noun”, those parts of speech are output, and expressions other than those parts of speech are not output.

なお、前述のような末尾の単語（形態素）の品詞の推定等の品詞情報を得るためには、次のような形態素解析システム（形態素解析手段）が必要になる。 In order to obtain part-of-speech information such as the estimation of the part-of-speech of the last word (morpheme) as described above, the following morpheme analysis system (morpheme analysis means) is required.

・形態素解析システムの説明
日本語を単語に分割するために、キーワード抽出部１２２で形態素解析システムを利用することが必要になる。ここではChaSenについて説明する（奈良先端大で開発されている形態素解析システム茶筌。http://chasen.aist-nara.ac.jp/index.html.jp で公開されている）。 Description of the morphological analysis system In order to divide Japanese into words, it is necessary to use the morphological analysis system in the keyword extraction unit 122. Here, ChaSen will be explained (the morphological analysis system tea bowl developed at Nara Institute of Technology. Http://chasen.aist-nara.ac.jp/index.html.jp).

これは、日本語文を分割し、さらに、各単語の品詞も推定してくれる。例えば、「学校へ行く」を入力すると以下の結果を得ることができる。 This splits the Japanese sentence and also estimates the part of speech of each word. For example, if “go to school” is entered, the following results can be obtained.

学校ガッコウ学校名詞−一般
へヘへ助詞−格助詞−一般
行くイク行く動詞−自立五段・カ行促音便基本形
ＥＯＳ
このように各行に一個の単語が入るように分割され、各単語に読みや品詞の情報が付与される。 School Gacco School Noun-General To He To particle-Case particle-General Go Iku Go Verb-independence
In this way, each line is divided so that one word is included, and reading and part-of-speech information are given to each word.

（ｃ）共通部分文字列に基づく方法
例えば、入力表現がすべて同じ「しい」という共通末尾表現を持っている場合、出力時に「しい」を持たない表現を省くものである。なお、これは末尾だけでなく、先頭の文字列でも同様にできる。 (C) Method based on common partial character string For example, when input expressions all have the same common end expression “Shi”, an expression that does not have “Shi” is omitted at the time of output. This can be done not only at the end but also at the top character string.

（例による説明）
入力キーワードとして次のものであった場合、
「悲しい」「楽しい」「嬉しい」
抽出されるものが次の場合、
「歓喜」「悲痛」「美しい」「新しい」
上記入力キーワードの共通部分文字列が「しい」なので、「しい」を持たない「歓喜」と「悲痛」を削除して出力するものである。 (Description by example)
If the input keyword is:
"Sad""fun""happy"
If the following is extracted:
"Joy""Sorrow""Beautiful""New"
Since the common partial character string of the input keyword is “Shi”, “Joy” and “Sadness” that do not have “Shi” are deleted and output.

（ｄ）ユーザによる制約の指定
上記では、入力表現から自動で制約を得る方法を説明したが、この制約はユーザにさせることもできる。例えば、ユーザが「漢字のみ」というオプションを選択すると出力では漢字以外の字種を用いた表現を出力しないことができる。また、ユーザが末尾は「しい」というオプションを選択すると出力では「しい」を末尾に持たない表現を出力しないようにすることができる。さらに、ユーザが品詞は名詞というオプションを選択すると出力では名詞以外の表現を出力しないようにする。 (D) Specification of constraint by user In the above description, the method of automatically obtaining the constraint from the input expression has been described. However, this constraint can be made to be allowed by the user. For example, when the user selects the option of “Kanji only”, the output using a character type other than Kanji can not be output. In addition, when the user selects the option “Shi” at the end, it is possible to prevent the output not having “Shi” at the end in the output. Furthermore, when the user selects the option that the part of speech is a noun, the output is made so that expressions other than the noun are not output.

次に、質問作成部１３が作成した質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問（「ＸのＹは？」）に対する解答候補を抽出する処理について説明する。解答候補抽出部１４は、機械学習の手法を用いて解答候補を抽出する。
（機械学習の手法）
機械学習の手法は、問題−解答の組のセットを多く用意し、それで学習を行ない、どういう問題のときにどういう解答になるかを学習し、その学習結果を利用して、新しい問題のときも解答を推測できるようにする方法である（例えば、下記の文献（７）参照）。 Next, a question created by the question creation unit 13 or a question configured by the output keyword output by the keyword increase unit 12 when the question answering apparatus 1 does not include the question creation unit 13 (“X” The process of extracting answer candidates for “Y?”) Will be described. The answer candidate extraction unit 14 extracts answer candidates using a machine learning technique.
(Machine learning method)
The machine learning method prepares many sets of question-answer pairs, learns with it, learns what kind of answer is the problem, and uses the learning result to create a new problem. This is a method that makes it possible to guess the answer (for example, see the following document (7)).

文献（７）：村田真樹，機械学習に基づく言語処理，龍谷大学理工学部．招待講演．2004. http://www2.nict.go.jp/jt/a132/members/murata/ps/rk1-siryou.pdf
どういう問題のときに、という、問題の状況を機械に伝える際に、素性（解析に用いる情報で問題を構成する各要素）というものが必要になる。問題を素性によって表現するのである。例えば、日本語文末表現の時制の推定の問題において、問題：「彼が話す。」−−−解答「現在」が与えられた場合に、素性の一例は、「彼が話す。」「が話す。」「話す。」「す」「。」となる。 Reference (7): Masaki Murata, Language Processing Based on Machine Learning, Faculty of Science and Engineering, Ryukoku University. Invited lecture. 2004.http: //www2.nict.go.jp/jt/a132/members/murata/ps/rk1-siryou.pdf
In order to convey the problem situation to the machine, what kind of problem is required, features (elements constituting the problem with information used for analysis) are required. The problem is expressed by the feature. For example, in the problem of estimating the tense of Japanese sentence ending expressions, the problem: “He speaks.” --- If the answer “present” is given, an example of a feature is “He speaks.” . "" Speaking. "" Su "". "

すなわち、機械学習の手法は、素性の集合−解答の組のセットを多く用意し、それで学習を行ない、どういう素性の集合のときにどういう解答になるかを学習し、その学習結果を利用して、新しい問題のときもその問題から素性の集合を取り出し、その素性の場合の解答を推測する方法である。 In other words, the machine learning method prepares many sets of feature set-answer pairs, learns with it, learns what type of feature set the answer will be, and uses the learning results. This is a method of extracting a set of features from a new problem and guessing an answer in the case of the feature.

まず、機械学習の手法一般についての説明をする。機械学習の手法としては、一般に、ｋ近傍法、シンプルベイズ法、決定リスト法、最大エントロピー法、サポートベクトルマシン法などの手法を用いる。 First, general machine learning techniques will be described. As a method of machine learning, generally, methods such as a k-nearest neighbor method, a simple Bayes method, a decision list method, a maximum entropy method, and a support vector machine method are used.

ｋ近傍法は、最も類似する一つの事例のかわりに、最も類似するｋ個の事例を用いて、このｋ個の事例での多数決によって分類先（解）を求める手法である。ｋは、あらかじめ定める整数の数字であって、一般的に、１から９の間の奇数を用いる。 The k-nearest neighbor method is a method for obtaining a classification destination (solution) by using the k most similar cases instead of the most similar case, and by majority decision of the k cases. k is a predetermined integer number, and generally an odd number between 1 and 9 is used.

シンプルベイズ法は、ベイズの定理にもとづいて各分類になる確率を推定し、その確率値が最も大きい分類を求める分類先とする方法である。 The Simple Bayes method is a method of estimating the probability of each classification based on Bayes' theorem and determining the classification having the highest probability value as a classification destination.

シンプルベイズ法において、文脈ｂで分類ａを出力する確率は、以下の式（１３）で与えられる。 In the simple Bayes method, the probability of outputting the classification a in the context b is given by the following equation (13).

ただし、ここで文脈ｂは、あらかじめ設定しておいた素性ｆ_j（∈Ｆ，１≦ｊ≦ｋ）の集合である。ｐ（ｂ）は、文脈ｂの出現確率である。ここで、分類ａに非依存であって定数のために計算しない。Ｐ（ａ）（ここでＰはｐの上部にチルダ）とＰ（ｆ_i｜ａ）は、それぞれ教師データから推定された確率であって、分類ａの出現確率、分類ａのときに素性ｆ_iを持つ確率を意味する。Ｐ（ｆ_i｜ａ）として最尤推定を行って求めた値を用いると、しばしば値がゼロとなり、式（１４）の値がゼロで分類先を決定することが困難な場合が生じる。そのため、スームージングを行う。ここでは、以下の式（１５）を用いてスームージングを行ったものを用いる。 Here, the context b is a set of features f _j (εF, 1 ≦ j ≦ k) set in advance. p (b) is the appearance probability of the context b. Here, since it is independent of the classification a and is a constant, it is not calculated. P (a) (where P is a tilde at the top of p) and P (f _i | a) are the probabilities estimated from the teacher data, respectively, and the appearance probability of class a, and the feature f for class a means the probability of having _i . When the value obtained by performing maximum likelihood estimation as P (f _i | a) is used, the value often becomes zero, and it may be difficult to determine the classification destination because the value of Expression (14) is zero. Therefore, smoothing is performed. Here, a smoothed image using the following equation (15) is used.

ただし、ｆｒｅｑ（ｆ_i，ａ）は、素性ｆ_iを持ちかつ分類がａである事例の個数、ｆｒｅｑ（ａ）は、分類がａである事例の個数を意味する。 Here, freq (f _i , a) means the number of cases having the feature f _i and the classification a, and freq (a) means the number of cases having the classification a.

決定リスト法は、素性と分類先の組とを規則とし、それらをあらかじめ定めた優先順序でリストに蓄えおき、検出する対象となる入力が与えられたときに、リストで優先順位の高いところから入力のデータと規則の素性とを比較し、素性が一致した規則の分類先をその入力の分類先とする方法である。 The decision list method uses features and combinations of classification destinations as rules, stores them in the list in a predetermined priority order, and when input to be detected is given, from the highest priority in the list This is a method in which input data is compared with the feature of the rule, and the classification destination of the rule having the same feature is set as the classification destination of the input.

決定リスト方法では、あらかじめ設定しておいた素性ｆ_j( ∈Ｆ，１≦ｊ≦ｋ）のうち、いずれか一つの素性のみを文脈として各分類の確率値を求める。ある文脈ｂで分類ａを出力する確率は以下の式によって与えられる。 In the decision list method, the probability value of each classification is obtained using only one of the features f _j (εF, 1 ≦ j ≦ k) set in advance as a context. The probability of outputting classification a in a context b is given by

ｐ（ａ｜ｂ）＝ｐ（ａ｜ｆmax ）式（１６）
ただし、ｆmax は以下の式によって与えられる。 p (a | b) = p (a | fmax) Equation (16)
However, fmax is given by the following equation.

また、Ｐ（ａ_i｜ｆ_j）（ここでＰはｐの上部にチルダ）は、素性ｆ_jを文脈に持つ場合の分類ａ_iの出現の割合である。 P (a _i | f _j ) (where P is a tilde at the top of p) is the rate of appearance of the classification a _i when the feature f _j is in the context.

最大エントロピー法は、あらかじめ設定しておいた素性ｆ_j（１≦ｊ≦ｋ）の集合をＦとするとき、以下所定の条件式（式（１８））を満足しながらエントロピーを意味する式（１９）を最大にするときの確率分布ｐ（ａ，ｂ）を求め、その確率分布にしたがって求まる各分類の確率のうち、最も大きい確率値を持つ分類を求める分類先とする方法である。 In the maximum entropy method, when a set of preset features f _j (1 ≦ j ≦ k) is F, an expression (entropy) that satisfies a predetermined conditional expression (equation (18)) below ( 19) This is a method of obtaining a probability distribution p (a, b) when maximizing 19) and determining a classification having the largest probability value among the classification probabilities obtained according to the probability distribution.

ただし、Ａ、Ｂは分類と文脈の集合を意味し、ｇ_j（ａ，ｂ）は文脈ｂに素性ｆ_jがあって、なおかつ分類がａの場合１となり、それ以外で０となる関数を意味する。また、Ｐ（ａ_i｜ｆ_j）（ここでＰはｐの上部にチルダ）は、既知データでの（ａ，ｂ）の出現の割合を意味する。 However, A and B mean a set of classifications and contexts, and g _j (a, b) is a function that is 1 if the context b has a feature f _j and the classification is a, and is 0 otherwise. means. Further, P (a _i | f _j ) (where P is a tilde at the top of p) means the rate of appearance of (a, b) in the known data.

式（１８）は、確率ｐと出力と素性の組の出現を意味する関数ｇをかけることで出力と素性の組の頻度の期待値を求めることになっており、右辺の既知データにおける期待値と、左辺の求める確率分布に基づいて計算される期待値が等しいことを制約として、エントロピー最大化( 確率分布の平滑化) を行なって、出力と文脈の確率分布を求めるものとなっている。最大エントロピー法の詳細については、以下の文献（８）および文献（９）に記載されている。 Expression (18) is to obtain the expected value of the frequency of the output and feature pair by multiplying the probability p and the function g meaning the appearance of the pair of output and feature, and the expected value in the known data on the right side And the expected value calculated based on the probability distribution calculated on the left side is the constraint, and entropy maximization (smoothing of the probability distribution) is performed to determine the probability distribution of the output and the context. Details of the maximum entropy method are described in the following documents (8) and (9).

文献（８）：Eric Sven Ristad, Maximum Entropy Modeling for Natural Language,(ACL/EACL Tutorial Program, Madrid, 1997
文献（９）：Eric Sven Ristad, Maximum Entropy Modeling Toolkit, Release 1.6beta, (http://www.mnemonic.com/software/memt,1998) ）
サポートベクトルマシン法は、空間を超平面で分割することにより、二つの分類からなるデータを分類する手法である。 Reference (8): Eric Sven Ristad, Maximum Entropy Modeling for Natural Language, (ACL / EACL Tutorial Program, Madrid, 1997
(9): Eric Sven Ristad, Maximum Entropy Modeling Toolkit, Release 1.6beta, (http://www.mnemonic.com/software/memt,1998))
The support vector machine method is a method of classifying data composed of two classifications by dividing a space by a hyperplane.

図２７にサポートベクトルマシン法のマージン最大化の概念を示す。図２７において、白丸は正例、黒丸は負例を意味し、実線は空間を分割する超平面を意味し、破線はマージン領域の境界を表す面を意味する。図２７（Ａ）は、正例と負例の間隔が狭い場合（スモールマージン）の概念図、図２７（Ｂ）は、正例と負例の間隔が広い場合（ラージマージン）の概念図である。 FIG. 27 shows the concept of margin maximization in the support vector machine method. In FIG. 27, a white circle means a positive example, a black circle means a negative example, a solid line means a hyperplane that divides the space, and a broken line means a surface that represents the boundary of the margin area. FIG. 27A is a conceptual diagram when the interval between the positive example and the negative example is small (small margin), and FIG. 27B is a conceptual diagram when the interval between the positive example and the negative example is wide (large margin). is there.

このとき、二つの分類が正例と負例からなるものとすると、学習データにおける正例と負例の間隔（マージン) が大きいものほどオープンデータで誤った分類をする可能性が低いと考えられ、図２７（Ｂ）に示すように、このマージンを最大にする超平面を求めそれを用いて分類を行なう。 At this time, if the two classifications consist of positive and negative examples, the larger the interval (margin) between the positive and negative examples in the learning data, the less likely it is to make an incorrect classification with open data. As shown in FIG. 27B, a hyperplane that maximizes this margin is obtained, and classification is performed using it.

基本的には上記のとおりであるが、通常、学習データにおいてマージンの内部領域に少数の事例が含まれてもよいとする手法の拡張や、超平面の線形の部分を非線形にする拡張（カーネル関数の導入) がなされたものが用いられる。 Basically, it is as described above. Usually, an extension of the method that the training data may contain a small number of cases in the inner area of the margin, or an extension that makes the linear part of the hyperplane nonlinear (kernel) The one that has been introduced) is used.

この拡張された方法は、以下の識別関数を用いて分類することと等価であり、その識別関数の出力値が正か負かによって二つの分類を判別することができる。 This extended method is equivalent to classification using the following discriminant function, and the two classes can be discriminated depending on whether the output value of the discriminant function is positive or negative.

ただし、ｘは識別したい事例の文脈（素性の集合) を、ｘ_iとｙ_j（ｉ＝１，…，ｌ，ｙ_j∈｛１，−１｝）は学習データの文脈と分類先を意味し、関数ｓｇｎは、
ｓｇｎ（ｘ）＝１（ｘ≧０）
−１（otherwise ）
であり、また、各α_iは式（２２）と式（２３）の制約のもと式（２１）を最大にする場合のものである。 Where x is the context (set of features) to be identified, and x _i and y _j (i = 1,..., L, y _j ∈ {1, -1}) mean the context and classification destination of the learning data. And the function sgn is
sgn (x) = 1 (x ≧ 0)
-1 (otherwise)
In addition, each α _i is for maximizing Expression (21) under the constraints of Expression (22) and Expression (23).

また、関数Ｋはカーネル関数と呼ばれ、様々なものが用いられるが、本形態では以下の多項式のものを用いる。 The function K is called a kernel function, and various functions are used. In this embodiment, the following polynomial is used.

Ｋ（ｘ，ｙ）＝（ｘ・ｙ＋１）ｄ式（２４）
Ｃ、ｄは実験的に設定される定数である。後述する具体例ではＣはすべての処理を通して１に固定した。また、ｄは、１と２の二種類を試している。ここで、α_i＞０となるｘ_iは、サポートベクトルと呼ばれ、通常、式（２０）の和をとっている部分は、この事例のみを用いて計算される。つまり、実際の解析には学習データのうちサポートベクトルと呼ばれる事例のみしか用いられない。 K (x, y) = (x · y + 1) d Equation (24)
C and d are constants set experimentally. In a specific example to be described later, C is fixed to 1 through all the processes. Moreover, two types of 1 and 2 are tried for d. Here, x _i satisfying α _i > 0 is called a support vector, and the portion taking the sum of Expression (20) is normally calculated using only this case. That is, only actual cases called support vectors are used for actual analysis.

なお、拡張されたサポートベクトルマシン法の詳細については、以下の文献（１０）および文献（１１）に記載されている。 The details of the extended support vector machine method are described in the following literature (10) and literature (11).

文献（１０）：Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods,(Cambridge University Press,2000)
文献（１１）：Taku Kudoh, Tinysvm:Support Vector machines,(http://cl.aist-nara.ac.jp/taku-ku//software/Tiny SVM/index.html,2000)
サポートベクトルマシン法は、分類の数が２個のデータを扱うものである。したがって、分類の数が３個以上の事例を扱う場合には、通常、これにペアワイズ法またはワンＶＳレスト法などの手法を組み合わせて用いることになる。 Reference (10): Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, (Cambridge University Press, 2000)
Reference (11): Taku Kudoh, Tinysvm: Support Vector machines, (http://cl.aist-nara.ac.jp/taku-ku//software/Tiny SVM / index.html, 2000)
The support vector machine method handles data with two classifications. Therefore, when handling cases with three or more classifications, a pair-wise method or a one-VS rest method is usually used in combination with this.

ペアワイズ法は、ｎ個の分類を持つデータの場合に、異なる二つの分類先のあらゆるペア（ｎ（ｎ−１）／２個）を生成し、各ペアごとにどちらがよいかを二値分類器、すなわちサポートベクトルマシン法処理モジュールで求めて、最終的に、ｎ（ｎ−１）／２個の二値分類による分類先の多数決によって、分類先を求める方法である。 In the pairwise method, in the case of data having n classifications, every pair (n (n-1) / 2) of two different classification destinations is generated, and a binary classifier indicates which is better for each pair. That is, it is obtained by the support vector machine method processing module and finally obtains the classification destination by majority decision of the classification destination by n (n−1) / 2 binary classification.

ワンＶＳレスト法は、例えば、ａ、ｂ、ｃという三つの分類先があるときは、分類先ａとその他、分類先ｂとその他、分類先ｃとその他、という三つの組を生成し、それぞれの組についてサポートベクトルマシン法で学習処理する。そして、学習結果による推定処理において、その三つの組のサポートベクトルマシンの学習結果を利用する。推定すべき二項関係の候補が、その三つのサポートベクトルマシンではどのように推定されるかを見て、その三つのサポートベクトルマシンのうち、その他でないほうの分類先であって、かつサポートベクトルマシンの分離平面から最も離れた場合のものの分類先を求める解とする方法である。例えば、ある候補が、「分類先ａとその他」の組の学習処理で作成したサポートベクトルマシンにおいて分離平面から最も離れた場合には、その候補の分類先はa と推定する。 For example, when there are three classification destinations a, b, and c, the one VS rest method generates three sets of classification destination a and other, classification destination b and other, classification destination c and other, The learning process is performed on the set of the support vector machine method. Then, in the estimation process based on the learning result, the learning results of the three sets of support vector machines are used. Look at how the three support vector machines estimate the binomial candidates to be estimated. Of the three support vector machines, the other classification destination and the support vector This is a method for obtaining a classification destination of the one farthest from the separation plane of the machine. For example, when a candidate is farthest from the separation plane in the support vector machine created by the learning process of the “classification destination a and other” group, the candidate classification destination is estimated as a.

次に、本発明の実施の形態における具体的な解答候補の抽出手法を説明する。
（解答候補の抽出手法１）
＜問題の構成＞
予め、
問題『質問「Ｘ１のＹ１は？」で答え「Ｚ１」』−−−解答「正解」
問題『質問「Ｘ２のＹ２は？」で答え「Ｚ２」』−−−解答「正解」
問題『質問「Ｘ３のＹ３は？」で答え「Ｚ３」』−−−解答「不正解」
という、問題と解答の対を多数作成する。 Next, a specific method for extracting answer candidates in the embodiment of the present invention will be described.
(Answer candidate extraction method 1)
<Problem structure>
In advance,
The question “Answer“ Z1 ”in the question“ What is Y1 of X1? ”” --- Answer “Correct”
Answer “Question“ What is Y2 of X2? ”“ Z2 ”” --- Answer “Correct”
Answer “Z3” with the question “What is Y3 of X3?” --- Answer “Incorrect”
Create many pairs of questions and answers.

また、例えば、上記の問題を表現する、以下のような素性を用意する。
・Ｘｉ，Ｙｉ，Ｚｉの単語自体
・Ｘｉ，Ｙｉ，Ｚｉの単語の意味クラス
・Ｘｉ，Ｙｉで検索した記事数
・Ｘｉ，Ｙｉで検索した記事にＺｉが存在する記事数
・Ｘｉ，Ｙｉが近接して（ある単語数の範囲内に）出現した記事数
・Ｘｉ，Ｙｉ，Ｚｉが近接して（ある単語数の範囲内に）出現した記事数
・Ｘｉ，Ｙｉで検索した記事に最も多く出現した単語とＺｉが一致するかどうか
・Ｘｉ，Ｙｉで検索した記事にｊ番目に多く出現した単語とＺｉが一致するかどうか
・Ｘｉ，Ｙｉをキーワードとして、例えば、解答候補抽出部１４が、新聞記事データ・百科事典データなどの文書データ群（図示を省略）から解答の書いてありそうな記事群を取り出し、その取り出した記事群の言語表現を解答の候補として取り出し、取り出された解答の候補を、優先順序（例えば、Ｓｃｏｒｅ_near（ｃ））で並び替えた場合に、その順序の最も高い候補とＺｉが一致したかどうか、また、その順序のｊ番目の候補とＺｉが一致したかどうか
上記の処理によって、素性の集合と解答の組の多数のセットが用意される。 Also, for example, the following features expressing the above problem are prepared.
-Xi, Yi, Zi words themselves-Semantic class of Xi, Yi, Zi words-Number of articles searched by Xi, Yi-Number of articles where Zi exists in articles searched by Xi, Yi-Xi, Yi are close The number of articles that appeared (within a certain number of words) · The number of articles that appeared in close proximity (within the range of a certain number of words) · Appeared most frequently in articles searched with Xi, Yi Whether or not Zi matches the word that has been searched for · whether or not Zi matches the word that appears jth most frequently in the article searched by Xi and Yi • For example, the answer candidate extraction unit 14 uses the newspaper as a keyword Take out an article group that seems to be written an answer from the document data group (not shown) such as article data / encyclopedia data, take out the language expression of the taken out article group as an answer candidate, and take out the answer candidate The Previous order (e.g., Score _near (c)) when rearranged in, whether the highest candidate and Zi of the order match, also, or above if j th candidate and Zi of the order match By processing, a large number of sets of feature sets and answer sets are prepared.

ここで、優先順序として用いるＳｃｏｒｅ_near（ｃ）については、前述の非特許文献１に記述されており、解答の候補とキーワードの近さに基づく得点を示している。 Here, Score _near (c) used as the priority order is described in Non-Patent Document 1 described above, and indicates a score based on the answer candidates and the proximity of the keywords.

次に、意味クラスを説明する。一般に、各単語がどういう意味クラスを持つかを記述した表があり、その表を使えば、単語の意味クラスを求めることができる。例えば分類語彙表がある。分類語彙表では単語は分類番号と呼ばれる１０桁の数字で表現され、この数字の良く似ている単語ほど良く似た単語となる。例えば、この数字の最初の３桁や５桁を単語の意味クラスとして利用する。例えば、「村人」の分類番号は１２３０１０２０５０であり、これは１２３（人種、国民、社会階層などの意味クラス）、１２３０１（国民、住民などの意味クラス) に属する単語であることが示される。 Next, semantic classes will be described. In general, there is a table describing what semantic class each word has, and by using the table, the semantic class of the word can be obtained. For example, there is a classification vocabulary table. In the classification vocabulary table, a word is expressed by a 10-digit number called a classification number, and a word having a similar similarity is a similar word. For example, the first 3 or 5 digits of this number are used as a word semantic class. For example, the classification number of “villager” is 1230102050, which indicates that the word belongs to 123 (semantic class such as race, citizen, social hierarchy) and 12301 (semantic class such as citizen, resident).

問題構成と素性の定義をすれば、あとは機械学習の手法で扱える。すなわち、解答候補抽出部１４は、用意された素性と解答の組の多数のセットを用いて、どういう素性の集合のときにどういう解答になるかを学習し、その学習結果を利用して、新たな問題についての素性の集合の場合に推測される解答を、解答候補として抽出する。 Once you define the problem structure and features, you can use machine learning techniques. That is, the answer candidate extraction unit 14 learns what kind of answer is obtained when using a set of prepared features and answers, and uses the learning result to create a new Answers estimated in the case of a set of features for a simple problem are extracted as answer candidates.

＜問題や素性の具体例＞
問題の具体例：
問題『質問「日本の首都は？」で答え「東京」』−−−解答「正解」
問題『質問「日本の首都は？」で答え「大阪」』−−−解答「不正解」
問題『質問「日本の首都は？」で答え「パン」』−−−解答「不正解」
素性の具体例：
問題『質問「日本の首都は？」で答え「東京」』−−−解答「正解」の場合
・Ｘｉの単語自体：日本
・Ｙｉの単語自体：首都
・Ｚｉの単語自体：東京
・Ｘｉの意味クラス：１２５９０（地名のクラス）
・Ｙｉの意味クラス：１２５４０（都市集落のクラス）
・Ｚｉの意味クラス：１２５９０（地名のクラス）
（意味クラスとして分類語彙表の最初の５桁を利用する。）
・Ｘｉ，Ｙｉで検索した記事数：日本と首都を含む記事数。例えば１０００
・Ｘｉ，Ｙｉで検索した記事にＺｉが存在する記事数：日本と首都と東京を含む記事数。例えば１００
・Ｘｉ，Ｙｉが近接して（ある単語数の範囲内に）出現した記事数：例えば、日本と首都が１０単語以内にある記事数。例えば５００
・Ｘｉ，Ｙｉ，Ｚｉが近接して（ある単語数の範囲内に）出現した記事数：例えば、日本と首都と東京が１０単語以内にある記事数。例えば５０
・Ｘｉ，Ｙｉで検索した記事に最も多く出現した単語とＺｉが一致するかどうか：例えば、ここでは特に単語は名詞にしぼり、名詞としては、「こと」が最も頻度が多かったとすると、「こと」と「東京」が一致しないのでこの素性は「いいえ」となる。
・Ｘｉ，Ｙｉで検索した記事にｊ番目に多く出現した単語とＺｉが一致するかどうか：例えば、ここでは特に単語は名詞にしぼり、名詞としては、「東京」が二番目に頻度が多かったとすると、ｊ＝２の場合の素性は「はい」となる。
・Ｘｉ，Ｙｉをキーワードとして、例えば、解答候補抽出部１４が、新聞記事データ・百科事典データなどの文書データ群から解答の書いてありそうな記事群を取り出し、その取り出した記事群の言語表現を解答の候補を取り出し、取り出された解答の候補を、優先順序（例えば、Ｓｃｏｒｅ_near（ｃ））で並び替えた場合に、その順序の最も高い候補とＺｉが一致したかどうか、また、その順序のｊ番目の候補とＺｉが一致したかどうか
例えば、１番目の候補が「こと」、２番目の候補が「東京」の場合は、「その順序の最も高い候補とＺｉが一致したかどうか」は「いいえ」になり、「その順序のｊ番目の候補とＺｉが一致したかどうか」は、ｊ＝２のとき「はい」になる。 <Specific examples of problems and features>
Examples of problems:
Answer "Question" What is the capital of Japan? "" Tokyo "--- Answer" Correct "
The question “Answer“ Osaka ”in the question“ What is the capital of Japan? ”” --- Answer “Incorrect”
The question “Answer“ Pan ”in the question“ What is the capital of Japan? ”——— Answer“ Incorrect ”
Specific examples of features:
The question “Question“ What is the capital of Japan? ”Answers“ Tokyo ”” --- If the answer is “correct” • Xi word itself: Japan • Yi word itself: capital • Zi word itself: Tokyo • Meaning of Xi Class: 12590 (class of place name)
・ Yi meaning class: 12540 (city village class)
・ Zi semantic class: 12590 (class of place name)
(The first five digits of the classification vocabulary table are used as the semantic class.)
-Number of articles searched by Xi, Yi: Number of articles including Japan and the capital. For example 1000
-Number of articles in which Zi exists in articles searched by Xi, Yi: Number of articles including Japan, the capital, and Tokyo. 100 for example
Number of articles in which Xi and Yi appear close to each other (within a certain number of words): For example, the number of articles in which Japan and the capital are within 10 words. For example 500
-Number of articles in which Xi, Yi, Zi appear close to each other (within a certain number of words): For example, the number of articles in which Japan, the capital, and Tokyo are within 10 words. For example 50
・ Whether or not Zi matches the word that appears most frequently in the articles searched by Xi and Yi: For example, here, the word is particularly limited to a noun, and as a noun, “ko” is the most frequent. ”And“ Tokyo ”do not match, so this feature is“ No ”.
-Whether the j-th most frequently appearing word in articles searched with Xi, Yi matches Zi: For example, here, the word is particularly limited to a noun, and as a noun, "Tokyo" is the second most frequent Then, the feature in the case of j = 2 is “Yes”.
Using Xi and Yi as keywords, for example, the answer candidate extraction unit 14 extracts a group of articles likely to be written from a group of document data such as newspaper article data / encyclopedia data, and language representation of the extracted article group And when the retrieved answer candidates are rearranged in the priority order (for example, Score _near (c)), whether or not Zi matches the candidate with the highest order, and Whether the j-th candidate in the order matches Zi. For example, if the first candidate is “that” and the second candidate is “Tokyo,” “whether Zi matches the highest candidate in the order. “No”, and “whether Zi matches the j-th candidate in that order” is “Yes” when j = 2.

より多くの事例で学習すると、例えば、解答候補抽出部１４は、
Ｙｉの単語自体：首都
Ｚｉの意味クラス：１２５９０（地名のクラス）
で、「その順序の最も高い候補とＺｉが一致したかどうか」は「はい」
または「その順序のｊ番目の候補とＺｉが一致したかどうか」のｊ＝２のときが「はい」になれば、
解答「正解」
となるように学習し、
Ｙｉの単語自体：首都
Ｚｉの意味クラス：１２５９０（地名のクラス）以外なら、
解答「不正解」
Ｙｉの単語自体：首都
「その順序の最も高い候補とＺｉが一致したかどうか」は「いいえ」かつ、
「その順序のｊ番目の候補とＺｉが一致したかどうか」のｊ＝２から１０全てで「いいえ」ならば、
解答「不正解」
といったことを学習する。 When learning with more cases, for example, the answer candidate extraction unit 14
Yi word itself: Semantic class of capital Zi: 12590 (place name class)
"If the Zi matches the highest candidate in that order" is "Yes"
Or, if j = 2 of “whether Zi matches the j-th candidate in that order” is “yes”,
Answer “Correct”
Learn to be
Yi word itself: Semantic class of capital Zi: except 12590 (class of place name)
Answer "Incorrect"
Yi's word itself: the capital “whether Zi matches the highest candidate in that order” is “No” and
If j = 2 to 10 in “whether Zi matches the jth candidate in that order” or not,
Answer "Incorrect"
To learn.

学習結果は、解答候補抽出部１４によって、学習ＤＢ１７中に蓄積される。そして、解答候補抽出部１４は、学習ＤＢ１７中の学習結果情報を使い、例えば、新しい問題（すなわち、キーワード増加部１２によって出力された出力キーワードによって構成される質問）：
質問「フランスの首都は？」−答え「パリ」については、
Ｙｉの単語自体：首都
Ｚｉの意味クラス：１２５９０（地名のクラス）で、
「その順序の最も高い候補とＺｉが一致したかどうか」は「はい」または、
「その順序のｊ番目の候補とＺｉが一致したかどうか」のｊ＝２のときが「はい」
なので、「正解」と判断する。 The learning results are accumulated in the learning DB 17 by the answer candidate extraction unit 14. And the answer candidate extraction part 14 uses the learning result information in learning DB17, for example, a new question (Namely, the question comprised by the output keyword output by the keyword increase part 12):
Question "What is the capital of France?"-For the answer "Paris"
Yi word itself: Semantic class of capital Zi: 12590 (class of place name)
“Whether Zi matches the highest candidate in that order” is “Yes” or
“Yes” when j = 2 of “whether Zi matches the j-th candidate in that order”
Therefore, it is judged as “correct”.

また、新しい問題：
質問「フランスの首都は？」−答え「信号」については、上記学習結果を用いて、
Ｙｉの単語自体：首都
Ｚｉの意味クラス：１２５９０（地名のクラス）以外
なので、「不正解」と判断する。 Also new issues:
Question "What is the capital of France?"-For the answer "Signal", using the above learning results,
Since the word Yi itself is other than the semantic class of the capital Zi: 12590 (class of place name), it is determined as “incorrect”.

ここで、機械学習の手法によって、解答を求めるだけでなく、その解答がどのくらい正解になりやすいかの度合い、どのくらい不正解になりやすいかの度合いも同時に求めることができる。 Here, not only the answer is obtained by the machine learning method, but also the degree of the correct answer and the degree of the incorrect answer can be obtained at the same time.

すなわち、解答候補抽出部１４は、用意した素性の集合と解答の組の多数のセットを用いて、まず、どのような素性の集合のときにどのような解答（正解または不正解）となるかということを機械学習し、どのような素性の集合のときにどのような解答となるかということを示す情報を、学習結果情報として学習ＤＢ１７に格納する。そして、解答候補抽出部１４は、新たな問題（キーワード増加部１２によって出力された出力キーワードによって構成される質問）を用いて、新たに解答候補抽出部１４が作成した問題）から素性の集合を抽出し、抽出された素性の集合の場合にどのような解答になりやすいか、すなわち、「正解となりやすい」かの度合いを、学習ＤＢ１７に格納された学習結果情報に基づいて求める。 That is, the answer candidate extraction unit 14 first uses what set of feature sets and answer sets to determine what kind of answer (correct answer or incorrect answer) at what kind of feature set. That is, machine learning is performed, and information indicating what kind of answer is obtained in what feature set is stored in the learning DB 17 as learning result information. Then, the answer candidate extraction unit 14 uses a new question (a question composed of the output keywords output by the keyword increase unit 12) to extract a set of features from a question newly created by the answer candidate extraction unit 14). Based on the learning result information stored in the learning DB 17, the level of what kind of answer is likely to be obtained in the case of the extracted feature set, that is, “it is likely to be correct” is obtained.

そして、解答候補抽出部１４は、例えば、「正解となりやすい」かの度合いが最も大きいときの、問題（質問−答えの対）における、「答え」を、解答候補として解答表出力部１５に対して出力する。解答表出力部１５は、各解答候補が質問と対応付けられた表を解答表として出力する。本発明の実施の形態においては、例えば、「正解となりやすい」かの度合いの大きい順に所定の個数の問題を選択し、選択した問題における「答え」を解答候補としてもよい。また、例えば、「正解となりやすい」かの度合いが所定の閾値以上の問題を選択し、選択した問題における「答え」を解答候補としてもよい。また、例えば、「正解となりやすい」かの度合いが最も大きい問題についての当該度合いの所定の割合（例えば、９０％等）を閾値とし、「正解となりやすい」かの度合いがこの閾値以上の問題を選択し、選択した問題における「答え」を解答候補としてもよい。 Then, for example, the answer candidate extraction unit 14 sends the “answer” in the question (question-answer pair) to the answer table output unit 15 as an answer candidate when the degree of “probably correct” is the highest. Output. The answer table output unit 15 outputs a table in which each answer candidate is associated with a question as an answer table. In the embodiment of the present invention, for example, a predetermined number of questions may be selected in descending order of “prone to be correct”, and “answers” in the selected questions may be set as answer candidates. Further, for example, it is possible to select a question whose degree of “it is likely to be correct” is a predetermined threshold or more and set “answer” in the selected question as a candidate answer. In addition, for example, a predetermined ratio (for example, 90%) of the degree of the problem having the greatest degree of “prone to be correct” is set as a threshold, and the problem in which the degree of “prone to be correct” is equal to or greater than this threshold is set. The “answer” in the selected question may be selected and set as the answer candidate.

上記の、「正解となりやすい」かの度合いの求め方は、解答候補抽出部１４が機械学習の手法として用いる様々な方法によって異なる。 The method of obtaining the degree of “prone to be correct” is different depending on various methods used by the answer candidate extraction unit 14 as a machine learning technique.

例えば、本発明の実施の形態において、解答候補抽出部１４が、機械学習の手法としてｋ近傍法を用いる場合、解答候補抽出部１４は、上記用意した素性の集合と解答の組の多数のセットを用いて、素性の集合のうち重複する素性の割合（同じ素性をいくつ持っているかの割合）に基づく問題同士の類似度を定義する。そして、解答候補抽出部１４は、定義した類似度と問題（と解答）とを、学習結果情報として、学習ＤＢ１７に格納しておく。 For example, in the embodiment of the present invention, when the answer candidate extraction unit 14 uses the k-nearest neighbor method as the machine learning method, the answer candidate extraction unit 14 sets a large number of sets of feature sets and answer sets prepared above. Is used to define the degree of similarity between problems based on the proportion of overlapping features (the proportion of how many of the same features) in the feature set. Then, the answer candidate extraction unit 14 stores the defined similarity and problem (and answer) in the learning DB 17 as learning result information.

そして、解答候補抽出部１４は、質問作成部１３によって作成された質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問を用いて新たに問題を作成すると、学習ＤＢ１７内に格納された類似度と問題を参照して、新たに作成された問題との類似度が高い順にｋ個の問題（と解答）を学習ＤＢ１７に格納された問題（と解答）から選択し、選択したｋ個の問題での多数決によって決まった分類先（正解または不正解）を、新たに作成された問題に対する解答とする。ｋ近傍法を用いる場合、「正解となりやすい」かの度合いは、上記選択されたｋ個の問題での多数決の票数、すなわち、「正解」という分類が獲得した票数となる。 And the answer candidate extraction part 14 is comprised by the output keyword output by the keyword increase part 12 when the question produced by the question creation part 13 or when the question answering apparatus 1 takes the structure which is not provided with the question creation part 13 is taken. When a new problem is created using the question to be asked, the similarity and the problem stored in the learning DB 17 are referred to, and the k questions (and answers) in descending order of the similarity to the newly created problem. Are selected from the questions (and answers) stored in the learning DB 17, and the classification destination (correct answer or incorrect answer) determined by the majority decision in the selected k questions is set as the answer to the newly created problem. In the case of using the k-nearest neighbor method, the degree of “it is likely to be a correct answer” is the number of votes of the majority vote in the selected k problems, that is, the number of votes obtained by the classification “correct answer”.

解答候補抽出部１４は、「正解となりやすい」かの度合いが最も大きいときの、問題（質問−答えの対）における、「答え」を、解答候補として解答表出力部１５に対して出力する。本発明の実施の形態においては、例えば、「正解となりやすい」かの度合いの大きい順に所定の個数の問題を選択し、選択した問題における「答え」を解答候補としてもよい。また、例えば、「正解となりやすい」かの度合いが所定の閾値以上の問題を選択し、選択した問題における「答え」を解答候補としてもよい。また、例えば、「正解となりやすい」かの度合いが最も大きい問題についての当該度合いの所定の割合（例えば、９０％等）を閾値とし、「正解となりやすい」かの度合いがこの閾値以上の問題を選択し、選択した問題における「答え」を解答候補としてもよい。 The answer candidate extraction unit 14 outputs the “answer” in the question (question-answer pair) to the answer table output unit 15 as an answer candidate when the degree of “it is likely to be correct” is the highest. In the embodiment of the present invention, for example, a predetermined number of questions may be selected in descending order of “prone to be correct”, and “answers” in the selected questions may be set as answer candidates. Further, for example, it is possible to select a question whose degree of “it is likely to be correct” is a predetermined threshold or more and set “answer” in the selected question as a candidate answer. In addition, for example, a predetermined ratio (for example, 90%) of the degree of the problem having the greatest degree of “prone to be correct” is set as a threshold, and the problem in which the degree of “prone to be correct” is equal to or greater than this threshold is set. The “answer” in the selected question may be selected and set as the answer candidate.

また、例えば、本発明の実施の形態において、解答候補抽出部１４が、機械学習の手法としてシンプルベイズ法を用いる場合、例えば、解答候補抽出部１４は、上記用意した素性の集合と解答の組の多数のセットを学習結果情報として学習ＤＢ１７に格納しておく。 For example, in the embodiment of the present invention, when the answer candidate extraction unit 14 uses the Simple Bayes method as a machine learning method, for example, the answer candidate extraction unit 14 includes the set of the prepared feature and the answer. Are stored in the learning DB 17 as learning result information.

解答候補抽出部１４は、質問作成部１３によって作成された質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問を用いて新たに問題を作成すると、新たに作成した問題から素性の集合を抽出する。そして、解答候補抽出部１４は、学習ＤＢ１７内に格納された解答と素性の集合とのセットをもとに、ベイズの定理に基づいて、新たに作成された問題から抽出した素性の集合の場合の各分類になる確率を算出して、その確率の値が最も大きい分類を、その問題に対する解答とする。シンプルベイズ法を用いる場合、「正解となりやすい」かの度合いは、「正解」という分類になる確率となる。 The answer candidate extraction unit 14 includes a question created by the question creation unit 13 or an output keyword output by the keyword increase unit 12 when the question answering apparatus 1 does not include the question creation unit 13. When a new problem is created using a question, a set of features is extracted from the newly created problem. Then, the answer candidate extracting unit 14 is a case of a feature set extracted from a newly created problem based on a Bayes' theorem based on a set of answers and feature sets stored in the learning DB 17. The probability of becoming each of the categories is calculated, and the category having the largest probability value is set as the answer to the problem. When the Simple Bayes method is used, the degree of “prone to be correct” is the probability of being classified as “correct”.

また、例えば、本発明の実施の形態において、解答候補抽出部１４が、機械学習の手法として決定リスト法を用いる場合、例えば、解答候補抽出部１４は、予め用意した問題についての素性と分類先との規則を所定の優先順序で並べたリストを学習ＤＢ１７内に格納する。解答候補抽出部１４は、質問作成部１３によって作成された質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問を用いて新たに問題を作成すると、新たに作成した問題から素性の集合を抽出する。 Further, for example, in the embodiment of the present invention, when the answer candidate extraction unit 14 uses a decision list method as a machine learning method, for example, the answer candidate extraction unit 14 has a feature and a classification destination for a prepared problem. Is stored in the learning DB 17 in a predetermined priority order. The answer candidate extraction unit 14 includes a question created by the question creation unit 13 or an output keyword output by the keyword increase unit 12 when the question answering apparatus 1 does not include the question creation unit 13. When a new problem is created using a question, a set of features is extracted from the newly created problem.

そして、解答候補抽出部１４は、学習ＤＢ１７内に格納されたリストの優先順位の高い順に、上記新たに作成した問題から抽出された素性と規則の素性とを比較し、素性が一致した規則の分類先をその問題に対する解答とする。決定リスト法を用いる場合、「正解となりやすい」かの度合いは、所定の優先順位またはそれに相当する数値、尺度となる。 Then, the answer candidate extraction unit 14 compares the feature extracted from the newly created problem with the rule feature in descending order of the priority stored in the list stored in the learning DB 17, and finds the rule with the matching feature. The classification destination is the answer to the question. When the decision list method is used, the degree of “it is likely to be correct” is a predetermined priority or a numerical value or scale corresponding to it.

また、例えば、本発明の実施の形態において、解答候補抽出部１４が、機械学習の手法として最大エントロピー法を用いる場合、例えば、解答候補抽出部１４は、予め用意した問題の解答となりうる分類を特定し、所定の条件式を満足しかつエントロピーを示す式を最大にするときの素性の集合と解答となりうる分類の二項からなる確率分布を求めて、学習ＤＢ１７内に格納する。 Also, for example, in the embodiment of the present invention, when the answer candidate extraction unit 14 uses the maximum entropy method as a machine learning method, for example, the answer candidate extraction unit 14 selects a classification that can be an answer to a prepared question. A probability distribution consisting of a set of features when satisfying a predetermined conditional expression and maximizing an expression showing entropy and a binary term of a classification that can be an answer is obtained and stored in the learning DB 17.

そして、解答候補抽出部１４は、質問作成部１３によって作成された質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問を用いて新たに問題を作成すると、学習ＤＢ１７内に格納された確率分布を利用して、新たな問題の素性の集合についてその解答となりうる分類の確率を求めて、最も大きい確率値を持つ解答となりうる分類を特定し、その特定した分類をその問題に対する解答とする。 And the answer candidate extraction part 14 is comprised by the output keyword output by the keyword increase part 12 when the question produced by the question creation part 13 or when the question answering apparatus 1 takes the structure which is not provided with the question creation part 13 is taken. When a new question is created using the question to be asked, the probability probability of the classification that can be the answer to the set of features of the new problem is obtained using the probability distribution stored in the learning DB 17, and the largest probability value A category that can be an answer having the following is specified, and the specified category is set as an answer to the problem.

すなわち、最大エントロピー法を用いる場合、「正解となりやすい」かの度合いは、「正解」という分類になる確率となる。 That is, when the maximum entropy method is used, the degree of “prone to be correct” is the probability of being classified as “correct”.

また、例えば、本発明の実施の形態において、解答候補抽出部１４が、機械学習の手法としてサポートベクトルマシン法を用いる場合、例えば、解答候補抽出部１４は、予め用意した問題の解答となりうる分類を特定し、分類を正例と負例に分割して、カーネル関数を用いた所定の実行関数に従って問題の素性の集合を次元とする空間上で、その問題の正例と負例の間隔を最大にし、かつ正例と負例を超平面で分割する超平面を求めて学習ＤＢ１７内に格納する。 Further, for example, in the embodiment of the present invention, when the answer candidate extracting unit 14 uses the support vector machine method as a machine learning method, for example, the answer candidate extracting unit 14 may classify as a possible answer to a prepared question. Is divided into positive and negative examples, and the space between the positive and negative examples of the problem is defined on a space whose dimension is a set of problem features according to a predetermined execution function using a kernel function. The hyperplane for maximizing and dividing the positive example and the negative example by the hyperplane is obtained and stored in the learning DB 17.

そして、解答候補抽出部１４は、質問作成部１３によって作成された質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問を用いて新たに問題を作成すると、学習ＤＢ１７内の超平面を利用して、新たな問題の素性の集合が超平面で分割された空間において正例側か負例側のどちらにあるかを特定し、その特定された結果に基づいて定まる分類を、その問題に対する解答とする。 And the answer candidate extraction part 14 is comprised by the output keyword output by the keyword increase part 12 when the question produced by the question creation part 13 or when the question answering apparatus 1 takes the structure which is not provided with the question creation part 13 is taken. When a new problem is created using the question to be asked, the hyperplane in the learning DB 17 is used, and either the positive example side or the negative example side in the space in which the feature set of the new problem is divided by the hyperplane is used. A classification determined based on the identified result is determined as the answer to the question.

すなわち、サポートベクトルマシン法を用いる場合、「正解となりやすい」かの度合いは、分離平面からの正例の空間への距離の大きさとなる。より詳しくは、解答が正解である問題を正例、解答が不正解である問題を負例とする場合に、分離平面に対して正例側の空間に位置する問題が、解答が正解である問題と判断され、分離平面からの距離が大きい問題ほど「正解となりやすい」かの度合いが大きくなる。 That is, when the support vector machine method is used, the degree of “it is likely to be correct” is the size of the distance from the separation plane to the positive example space. More specifically, when a problem with a correct answer is a positive example and a problem with an incorrect answer is a negative example, a problem that is located in the space on the positive side with respect to the separation plane is a correct answer. The problem is judged to be a problem, and the problem that the distance from the separation plane is large is more likely to be “correct answer”.

＜具体例＞
例えば、質問作成部１３によって作成された質問、または、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワードによって構成される質問「フランスの首都は？」を解く場合について説明する。 <Specific example>
For example, a question created by the question creating unit 13 or a question “French capital” composed of an output keyword output by the keyword increasing unit 12 when the question answering apparatus 1 does not include the question creating unit 13. Will be explained.

まず、解答候補抽出部１４が、出力キーワード「フランス」、「首都」を含む文書を、新聞記事データ・百科事典データなどの文書データ群から取得する。質問「フランスの首都は？」からの、キーワード「フランス」、「首都」の取り出しには、形態素解析技術などを使う。質問応答装置１が質問作成部１３を省略し、解答候補抽出部１４がキーワード増加部１２によって出力された出力キーワードにより構成される質問に対する解答候補を抽出する構成を採る場合には、上記の質問「フランスの首都は？」からキーワード「フランス」、「首都」を形態素解析技術を用いて取り出す必要はなく、解答候補抽出部１４は、キーワード増加部１２によって出力された出力キーワード「フランス」、「首都」をそのまま用いて、それらを含む文書を上記文書データ群から取得する。 First, the answer candidate extraction unit 14 acquires a document including the output keywords “France” and “capital” from a document data group such as newspaper article data / encyclopedia data. To extract the keywords "France" and "Capital" from the question "What is the French capital?", Use morphological analysis technology. When the question answering apparatus 1 omits the question creating unit 13 and the answer candidate extracting unit 14 extracts the answer candidates for the question configured by the output keyword output by the keyword increasing unit 12, the above question is used. It is not necessary to extract the keywords “France” and “Capital” from “What is the French capital?” Using the morphological analysis technology, and the answer candidate extraction unit 14 outputs the output keywords “France”, “ Documents including them are acquired from the document data group using the “capital” as it is.

解答候補抽出部１４は、キーワード「フランス」、「首都」を含む文書中の言語表現を、質問の答えの表現の候補として取り出す。この表現の取り出しには、例えば、前述の非特許文献１に記載された解の抽出の処理を用いる。 The answer candidate extraction unit 14 extracts a linguistic expression in the document including the keywords “France” and “capital” as a candidate for an answer expression of the question. For extraction of this expression, for example, the solution extraction process described in Non-Patent Document 1 is used.

取り出された答えの表現の候補を、例えば、Ｓｃｏｒｅ_near（ｃ）の値の大きい順に並び替え、その値の上位何個かの候補を取り出し、その候補（候補１, 候補２，・・・）について、
問題『質問「フランスの首都は？」で答え「候補１」』
問題『質問「フランスの首都は？」で答え「候補２」』
・・・
を作成する。ここで、「候補１」、「候補２」は、上記の質問の「答え」の表現の候補を示している。 The extracted answer expression candidates are rearranged in the descending order of, for example, the value of Score _near (c), and the top few candidates of the value are extracted, and the candidates (candidate 1, candidate 2,...) Are extracted. about,
The question “Question“ What is the French capital? ”Answered“ Candidate 1 ”]
The question “Question“ What is the French capital? ”Answered“ Candidate 2 ”]
...
Create Here, “candidate 1” and “candidate 2” indicate candidates for the expression “answer” of the above question.

作成された問題（質問−答えの対）について、前述した機械学習の手法を適用し、「正解となりやすい」かの度合いが最も大きいときの、問題（質問−答えの対）における「答え」を、解答候補として、解答表出力部１５に対して出力する。 Apply the machine learning method described above to the created question (question-answer pair), and find the “answer” in the question (question-answer pair) when the degree of “prone to be correct” is the greatest. Then, it is output to the answer table output unit 15 as an answer candidate.

解答表出力部１５は、解答表において、質問「フランスの首都は？」に対する解答が格納される枡目（例えば、データ項目「フランス」に対応する行とデータ項目「首都」に対応する列とが交差する枡目）に、対応する解答候補を格納する。 The answer table output unit 15 stores the answer to the question “What is the French capital?” In the answer table (for example, a row corresponding to the data item “France” and a column corresponding to the data item “Capital”). The corresponding answer candidate is stored in the grid where the crosses.

（解答候補の抽出手法２）
＜問題の構成＞
解答候補抽出部１４は、
問題『質問「Ｘ１のＹ１は？」』−−−解答「地名」
問題『質問「Ｘ２のＹ２は？」』−−−解答「地名」
問題『質問「Ｘ３のＹ３は？」』−−−解答「人名」
問題『質問「Ｘ４のＹ４は？」』−−−解答「数値」
．．．
という、問題と解答の対を多数作成する。
素性としては、
Ｘｉ，Ｙｉの単語自体
Ｘｉ，Ｙｉの単語の意味クラス
などが考えられる。
問題構成と素性の定義をすれば、あとは機械学習の手法で扱える。 (Answer candidate extraction method 2)
<Problem structure>
The answer candidate extraction unit 14
Question “Question“ What is Y1 of X1? ”” --- Answer “Location Name”
Problem “Question“ What is Y2 of X2? ”——— Answer“ Place Name ”
Problem “Question“ What's Y3 of X3? ”” --- Answer “Person Name”
Question "Question" What is Y4 of X4? "" --- Answer "Number"
. . .
Create many pairs of questions and answers.
As features,
The meaning classes of the words Xi and Yi themselves may be considered.
Once you define the problem structure and features, you can use machine learning techniques.

＜問題や素性の具体例＞
問題の具体例：
問題『質問「日本の首都は？」』−−−解答「地名」
問題『質問「日本の首相は？」』−−−解答「人名」
問題『質問「日本の面積は？」』−−−解答「数値」
素性の具体例：
問題『質問「日本の首都は？」』−−−解答「地名」の場合、
・Ｘｉの単語自体：日本
・Ｙｉの単語自体：首都
・Ｘｉの意味クラス：１２５９０（地名のクラス）
・Ｙｉの意味クラス：１２５４０（都市集落のクラス）
（意味クラスとして分類語彙表の最初の５桁を利用）
もっと多くの事例で学習すると、例えば、
Ｙｉの単語自体：首都
だと、
解答「地名」
となるように学習し、
Ｙｉの単語自体：首相
だと、
解答「人名」、
Ｙｉの単語自体：面積
だと、
解答「数値」、
といったことを、解答候補抽出部１４が学習し、その学習結果を学習ＤＢ１７内に蓄積する。 <Specific examples of problems and features>
Examples of problems:
Question "Question" What is the capital of Japan? "--- Answer" Location "
Problem “Question“ What is the Japanese Prime Minister? ”——— Answer“ Person Name ”
Problem “Question“ What is the area of Japan? ”——— Answer“ Number ”
Specific examples of features:
Question "Question" What is the capital of Japan? "--- If the answer is" place name "
-Xi word itself: Japan-Yi word itself: capital-Semantic class of Xi: 12590 (class of place name)
・ Yi meaning class: 12540 (city village class)
(Use the first 5 digits of the classification lexicon as the semantic class)
Learning more examples, for example,
Yi's word itself:
Answer “place name”
Learn to be
Yi ’s word itself:
Answer "person name",
Yi ’s word itself:
Answer “number”,
That is, the answer candidate extraction unit 14 learns and accumulates the learning result in the learning DB 17.

そして、解答候補抽出部１４は、学習ＤＢ１７内に蓄積された学習結果を用いて、解答を判断する。
例えば、新しい問題：
『質問「フランスの首都は？」』についての解答は、
Ｙｉの単語自体：首都
なので、「地名」と判断する。 And the answer candidate extraction part 14 judges an answer using the learning result accumulate | stored in learning DB17.
For example, a new problem:
The answer to the question "What is the French capital?"
Yi word itself: Since it is a capital, it is determined as “place name”.

まず、解答候補抽出部１４は、前述した機械学習の手法を利用して、
問題「フランスの首都は？」について、
解答が「地名」であるという結果を取得する。 First, the answer candidate extraction unit 14 uses the machine learning technique described above,
Regarding the question "What is the French capital?"
The result that the answer is “place name” is acquired.

取得された解答「地名」を、解答表の枡目に格納する解答候補を抽出する際の解答タイプとして利用する。 The obtained answer “place name” is used as an answer type when extracting answer candidates to be stored in the cells of the answer table.

すなわち、解答候補抽出部１４は、新聞記事データ・百科事典データなどの文書データ群から、質問作成部１３が作成した質問「フランスの首都は？」を構成するキーワード（「フランス」、「首都」）を含む文書を取り出し、取り出された文書に含まれる言語表現のうち、上記解答タイプに適合するものを解答候補として解答表出力部１５に対して出力する。解答候補抽出部１４は、質問応答装置１が質問作成部１３を備えない構成を採るときはキーワード増加部１２によって出力された出力キーワード（「フランス」、「首都」）を含む文書を上記文書データ群から取り出し、取り出された文書に含まれる言語表現のうち、上記解答タイプに適合するものを、出力キーワード「フランス」と「首都」とによって構成される質問「フランスの首都は？」に対する解答候補として解答表出力部１５に対して出力する。 That is, the answer candidate extraction unit 14 uses keywords (“France”, “Capital”) that constitute the question “What is the French capital?” Created by the question creation unit 13 from a document data group such as newspaper article data / encyclopedia data. ) And the language expression included in the extracted document that matches the answer type is output as an answer candidate to the answer table output unit 15. When the question answering apparatus 1 does not include the question creating unit 13, the answer candidate extracting unit 14 selects a document including the output keywords (“France” and “Capital”) output by the keyword increasing unit 12 as the document data. Candidate answers to the question “What is the French capital?” Composed of the output keywords “France” and “Capital” for the language expressions that are extracted from the group and included in the extracted document and that match the above answer type Is output to the answer table output unit 15.

なお、本発明においては、例えば、「ＸのＹは？」という質問に対する解答候補を抽出する際に、例えば、解答候補抽出部１４が、機械学習の手法を用いるのではなく、新聞記事データ・百科事典データなどの大量の文書データ群（図示を省略）からキーワード「Ｘ」とキーワード「Ｙ」を含む記事群を取り出し、その取り出した記事群の言語表現のうち、上記文書データ群中に出現する頻度が所定の閾値以上のものを解答候補として出力する構成を採ることもできる。また、本発明の実施の形態においては、上記取り出した記事群の言語表現について、上記文書データ群中に出現する頻度の高い順に所定の個数取り出して、解答候補として出力する構成を採ることもできる。 In the present invention, for example, when extracting an answer candidate for the question “What is Y of X?”, For example, the answer candidate extracting unit 14 does not use a machine learning technique, An article group including the keyword “X” and the keyword “Y” is extracted from a large amount of document data group (not shown) such as encyclopedia data, and appears in the document data group among the linguistic expressions of the extracted article group. It is also possible to adopt a configuration in which the answering frequency is output as answer candidates. Further, in the embodiment of the present invention, it is possible to adopt a configuration in which a predetermined number of linguistic expressions of the extracted article group are extracted in descending order of appearance in the document data group and output as answer candidates. .

ここで、上記の解答候補抽出部１４による、解答タイプを用いた解答候補の出力の際には、非特許文献１の説明において述べた固有表現抽出技術を用いる。固有表現とは、人名、地名、組織名などの固有名詞、金額などの数値表現といった、特定の事物・数量を意味する言語表現のことで、固有表現抽出とは、そういった固有表現を文章中から計算機で自動で抽出する技術である．例えば、「日本の首相は小泉純一郎である」という文に対して固有表現抽出を行なうと、固有表現の「日本」と「小泉純一郎」が地名、人名として、抽出される。本発明の実施の形態においては、解答候補抽出部１４が、抽出された固有表現が上記解答タイプに適合するかを判断し、適合する固有表現を、解答候補として出力する。 Here, when the answer candidate extraction unit 14 outputs answer candidates using the answer type, the specific expression extraction technique described in the description of Non-Patent Document 1 is used. A specific expression is a linguistic expression that means a specific thing / quantity, such as a numerical name such as a name, a place name, or an organization name, and a numerical expression such as a monetary amount. This is a technology for automatic extraction by a computer. For example, if a specific expression is extracted for a sentence “the Japanese prime minister is Junichiro Koizumi”, the specific expressions “Japan” and “Junichiro Koizumi” are extracted as place names and personal names. In the embodiment of the present invention, the answer candidate extraction unit 14 determines whether or not the extracted unique expression matches the answer type, and outputs the matching specific expression as an answer candidate.

以下に、固有表現抽出の一般的な手法の例について説明する。
（１）機械学習を用いる手法
機械学習を用いて固有表現を抽出する手法がある（例えば、以下の文献（１２）参照）。 Hereinafter, an example of a general technique for extracting a specific expression will be described.
(1) A method using machine learning There is a method of extracting a specific expression using machine learning (for example, see the following document (12)).

文献（１２）：浅原正幸，松本裕治，日本語固有表現抽出における冗長的な形態素解析の利用情報処理学会自然言語処理研究会 NL153-7 2002
まず、例えば、「日本の首相は小泉さんです。」という文を、各文字に分割し、分割した文字について、以下のように、 B−LOCATION、 I−LOCATION等の正解タグを付与することによって、正解を設定する。以下の一列目は、分割された各文字であり、各文字の正解タグは二列目である。
日 B−LOCATION
本 I−LOCATION
の O
首 O
相 O
は O
小 B−PERSON
泉 I−PERSON
さ O
ん O
で O
す O
。 O
上記において、B −？？？は、ハイフン以下の固有表現の種類の始まりを意味するタグである。例えば、 B−LOCATIONは、地名という固有表現の始まりを意味しており、 B−PERSONは、人名という固有表現の始まりを意味している。また、I −？？？は、ハイフン以下の固有表現の種類の始まり以外を意味するタグであり、O はこれら以外である。従って、例えば、文字「日」は、地名という固有表現の始まりに該当する文字であり、文字「本」までが地名という固有表現である。 Reference (12): Masayuki Asahara, Yuji Matsumoto, Utilization of Redundant Morphological Analysis in Japanese Named Expression Extraction Information Processing Society of Japan Natural Language Processing Study Group NL153-7 2002
First, for example, the sentence “Japan's prime minister is Mr. Koizumi” is divided into each character, and the correct characters such as B-LOCATION and I-LOCATION are assigned to the divided characters as follows. Set the correct answer. The first column below is each divided character, and the correct tag of each character is the second column.
Sun B-LOCATION
I-LOCATION
O
Neck O
Phase O
Is O
Small B-PERSON
Izumi I-PERSON
O
N
At O
O
. O
In the above, B-? ? ? Is a tag that signifies the start of the type of proper expression below the hyphen. For example, B-LOCATION means the beginning of a unique expression called place name, and B-PERSON means the beginning of a unique expression called person name. I-? ? ? Is a tag that means something other than the beginning of the type of proper expression below the hyphen, and O is something else. Therefore, for example, the character “day” is a character that corresponds to the beginning of the unique name “place name”, and the character “book” is the unique name “place name”.

このように、各文字の正解を設定しておき、このようなデータから学習し、新しいデータでこの正解を推定し、この正解のタグから、各固有表現の始まりと、どこまでがその固有表現かを認識して、固有表現を推定する。 In this way, the correct answer of each character is set, learned from such data, this correct answer is estimated with new data, and from this correct answer tag, the beginning of each proper expression and how far it is. Is recognized and the proper expression is estimated.

この各文字に設定された正解のデータから学習するときには、システムによってさまざまな情報を素性という形で利用する。例えば、
日 B−LOCATION
の部分は、
日本−B 名詞−B
などの情報を用いる。日本−B は、日本という単語の先頭を意味し、名詞−B は、名詞の先頭を意味する。単語や品詞の認定には、例えば前述したChaSenによる形態素解析を用いる。ChaSenを用いれば、入力された日本語を単語に分割することができる。例えば、ChaSenは、前述したように、日本語文を分割し、さらに、各単語の品詞も推定してくれる。例えば、「学校へ行く」を入力すると以下の結果を得ることができる。 When learning from the correct data set for each character, the system uses various information in the form of features. For example,
Sun B-LOCATION
Part of
Japan-B Noun-B
Such information is used. Japan-B means the beginning of the word Japan, and noun-B means the beginning of the noun. For recognition of words and parts of speech, for example, morphological analysis by ChaSen described above is used. If ChaSen is used, the input Japanese can be divided into words. For example, ChaSen divides a Japanese sentence and estimates the part of speech of each word as described above. For example, if “go to school” is entered, the following results can be obtained.

なお、例えば、上記の文献（１２）では、素性として、入力文を構成する文字の、文字自体（例えば、「小」という文字）、字種（例えば、ひらがなやカタカナ等）、品詞情報、タグ情報（例えば、「 B−PERSON」等）を利用している。 For example, in the above-mentioned document (12), as features, characters themselves (for example, “small”), character types (for example, hiragana and katakana), part-of-speech information, tags, Information (for example, “B-PERSON” etc.) is used.

これら素性を利用して学習する。タグを推定する文字やその周辺の文字にどういう素性が出現するかを調べ、どういう素性が出現しているときにどういうタグになりやすいかを学習し、その学習結果を利用して新しいデータでのタグの推定を行なう。機械学習には、例えばサポートベクトルマシンを用いる。 Learning using these features. Investigate what features appear in the characters that estimate the tag and the surrounding characters, learn what features are likely to appear when the features appear, and use the learning results to create new data Perform tag estimation. For machine learning, for example, a support vector machine is used.

固有表現抽出には、上記の手法の他にも種々の手法がある。例えば、最大エントロピーモデルと書き換え規則を用いて固有表現を抽出する手法がある（文献（１３）参照）。 In addition to the above-described method, there are various methods for extracting the proper expression. For example, there is a technique of extracting a specific expression using a maximum entropy model and a rewrite rule (see Document (13)).

文献（１３）：内元清貴，馬青，村田真樹，小作浩美，内山将夫，井佐原均，最大エントロピーモデルと書き換え規則に基づく固有表現抽出，言語処理学会誌, Vol.7, No.2, 2000 参照）。 Reference (13): Kiyotaka Uchimoto, Ma Aoi, Maki Murata, Hiromi Osaku, Masao Uchiyama, Hitoshi Isahara, Named Expression Extraction Based on Maximum Entropy Model and Rewrite Rules, Journal of the Language Processing Society, Vol.7, No.2, 2000).

また、例えば、以下の文献（１４）に、サポートベクトルマシンを用いて日本語固有表現抽出を行う手法について記載されている。 Also, for example, the following document (14) describes a technique for extracting Japanese proper expressions using a support vector machine.

文献（１４）：山田寛康，工藤拓，松本裕治，Support Vector Machineを用いた日本語固有表現抽出，情報処理学会論文誌, Vol.43, No.1", 2002
（２）形態素解析を用いる手法
形態素解析システム（例えば、前述したChaSen）を用いれば、入力された日本語を単語に分割することができる。 Reference (14): Hiroyasu Yamada, Taku Kudo, Yuji Matsumoto, Japanese Named Expression Extraction using Support Vector Machine, Journal of Information Processing Society of Japan, Vol.43, No.1 ", 2002
(2) Method using morpheme analysis If a morpheme analysis system (for example, ChaSen described above) is used, the input Japanese can be divided into words.

例えば、ChaSenは、前述したように、日本語文を分割し、さらに、各単語の品詞も推定してくれる。例えば、「学校へ行く」を入力すると以下の結果を得ることができる。 For example, ChaSen divides a Japanese sentence and estimates the part of speech of each word as described above. For example, if “go to school” is entered, the following results can be obtained.

具体的には、
入力：
日本の首都は東京です
出力：
日本ニッポン日本名詞−固有名詞−地域−国
のノの助詞−連体化
首都シュト首都名詞−一般
はハは助詞−係助詞
東京トウキョウ東京名詞−固有名詞−地域−一般
ですデスです助動詞特殊・デス基本形
EOS
は chasen の出力であり、名詞−固有名詞−地域という品詞が出力される。
このシステムを使って、例えば地名の固有表現を取り出すことができる。 In particular,
input:
The capital of Japan is Tokyo Output:
Japanese Japanese nouns-proprietary nouns-region-country nouns-incorporated capitals sta capital nouns-general is ha particles-corresponders tokyo tokyo nouns-proprietary nouns-regions-generals
EOS
Is the output of chasen, and the part-of-speech of noun-proper noun-region is output.
Using this system, it is possible to extract, for example, a specific expression of a place name.

また、
入力：
村山首相が言った
出力：
村山ムラヤマ村山名詞−固有名詞−人名−姓
首相シュショウ首相名詞−一般
がガが助詞−格助詞−一般
言っイッ言う動詞−自立五段・ワ行促音便連用タ接続
たタた助動詞特殊・タ基本形
EOS
も chasen の出力であるが、これだと名詞−固有名詞−人名という品詞が出力される。このシステムを使って、例えば人名の固有表現を取り出すことができる。
（３）作成したルールを用いる手法
人手でルールを作って固有表現を取り出すという方法もある。 Also,
input:
The output that Prime Minister Murayama said:
Murayama Murayama Murayama nouns-proper nouns-personal names-surname prime minister Shusha prime nouns-general ga is a particle-case particle-general saying ii verb-self-supporting five-stage, wa-gakuin
EOS
Is the output of chasen, but in this case, the part of speech of noun-proper noun-person name is output. Using this system, for example, it is possible to extract a specific expression of a person name.
(3) A method using a created rule There is also a method of manually creating a rule to extract a specific expression.

例えば、
名詞＋「さん」だと人名とする
名詞＋「首相」だと人名とする
名詞＋「町」だと地名とする
名詞＋「市」だと地名とする
などである。 For example,
Noun + “san” is the name of the person + “prime” is the name of the person + “town” is the name of the place + “city” is the place of name.

図５は、本発明の第１の実施の形態における質問応答処理フローの一例を示す図である。キーワード入力部１１に、第１のキーワードと第２のキーワードを入力キーワードとして入力する（ステップＳ１）。例えば、第１のキーワード「日本」と、第２のキーワード「面積」とを入力する。 FIG. 5 is a diagram showing an example of a question response process flow according to the first embodiment of the present invention. The first keyword and the second keyword are input to the keyword input unit 11 as input keywords (step S1). For example, the first keyword “Japan” and the second keyword “area” are input.

キーワード増加部１２のパターン抽出部１２１で、入力キーワードをキーワード抽出用ＤＢ１６で全文検索し、入力キーワードの周辺に出現したパターンをｃ_iとして抽出する（ステップＳ２）。周辺に出現するパターンの定義は適宜行なう。パターンｃ_iの抽出は、第１のキーワードと第２のキーワードそれぞれについて行う。 The pattern extracting unit 121 of the keyword increasing unit 12 performs a full text search for the input keyword in the keyword extracting DB 16, and extracts a pattern that appears around the input keyword as c _i (step S2). The pattern appearing in the vicinity is appropriately defined. Extraction of the pattern c _i are performed for each of the first keyword and the second keyword.

キーワード増加部１２のキーワード抽出部１２２で、パターン抽出部１２１で抽出したパターンｃ_iをキーワード抽出用ＤＢ１６で全文検索し、パターンｃ_iによって抽出される表現ｅｘｐを抽出すると同時に、抽出した表現ｅｘｐをＳｃｏｒｅの値の大きい順にソートし、キーワードとして出力する（ステップＳ３）。ステップＳ３の処理によって、例えば、第１のキーワードが、「日本」、「アメリカ」、「ドイツ」という３つの第３のキーワードに増加し、第２のキーワードが、「面積」、「人口」、「緯度」の３つの第４のキーワードに増加する。 In keyword extraction section 122 of the keyword increasing portion 12, and full-text search for a pattern c _i extracted by the pattern extraction unit 121 by the keyword extraction DB 16, and at the same time extracts the representation exp extracted by the pattern c _i, the extracted expression exp Sorting is performed in descending order of the Score value, and the result is output as a keyword (step S3). By the process of step S3, for example, the first keyword is increased to three third keywords “Japan”, “America”, and “Germany”, and the second keyword is “area”, “population”, Increase to three fourth keywords of “latitude”.

次に、質問作成部１３が、出力されたキーワードにより構成される質問を作成する（ステップＳ４）。ステップＳ４においては、第３のキーワードと第４のキーワードとにより構成される質問を作成する。例えば、質問作成部１３は、第３のキーワード「アメリカ」と第４のキーワード「人口」とにより構成される質問「アメリカの人口は？」を作成する。質問応答装置１が質問作成部１３を備えない構成を採るときは、上記ステップＳ４の処理は、省略される。 Next, the question creation unit 13 creates a question composed of the output keywords (step S4). In step S4, a question composed of the third keyword and the fourth keyword is created. For example, the question creating unit 13 creates a question “What is the American population?” Composed of the third keyword “USA” and the fourth keyword “population”. When the question answering apparatus 1 has a configuration that does not include the question creating unit 13, the process of step S4 is omitted.

次に、解答候補抽出部１４は、作成された各質問に対する解答候補を、上述した機械学習の手法を用いて抽出する（ステップＳ５）。質問応答装置１が質問作成部１３を備えない構成を採るときは、上記ステップＳ５において、解答候補抽出部１４は、キーワード増加部１２によって出力されたキーワードによって構成される質問に対する解答候補を、機械学習の手法を用いて抽出する。そして、解答表出力部１５が、解答表を出力する（ステップＳ６）。 Next, the answer candidate extraction unit 14 extracts the answer candidates for each created question using the above-described machine learning technique (step S5). When the question answering apparatus 1 does not include the question creating unit 13, in step S5, the answer candidate extracting unit 14 selects an answer candidate for the question configured by the keyword output by the keyword increasing unit 12 as a machine. Extract using learning techniques. And the answer table output part 15 outputs an answer table (step S6).

図６は、本発明の第１の実施の形態における質問応答装置の構成の別の例を示す図である。質問応答装置１０は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図６中に示す質問応答装置１０が備える構成要素のうち、図１に示す質問応答装置１が備える構成要素と同一の符号が付けられたものは、当該質問応答装置１が備える構成要素と同様の機能を有する。 FIG. 6 is a diagram illustrating another example of the configuration of the question answering apparatus according to the first embodiment of the present invention. The question answering device 10 is a device that increases an input keyword and outputs an answer to a question constituted by the increased keyword. Among the constituent elements included in the question answering apparatus 10 shown in FIG. 6, the same reference numerals as those included in the question answering apparatus 1 shown in FIG. 1 are the same as the constituent elements included in the question answering apparatus 1. It has the function of.

本発明の実施の形態においては、図６に示す構成から質問作成部１３を省略し、解答候補抽出部１４が、キーワード増加部６０によって出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 In the embodiment of the present invention, the question creating unit 13 is omitted from the configuration shown in FIG. 6, and the answer candidate extracting unit 14 is configured by the third keyword and the fourth keyword output by the keyword increasing unit 60. A configuration may be adopted in which answer candidates for a question to be asked are extracted and output.

質問応答装置１０のキーワード増加部６０は、キーワード入力部１１に入力されたキーワードを増加させる。すなわち、キーワード増加部６０は、例えば、キーワード入力部１１に入力された第１のキーワードに基づいて、第１のキーワードの数より多い第３のキーワードを出力する。また、キーワード増加部６０は、例えば、キーワード入力部１１に入力された第２のキーワードに基づいて、第２のキーワードの数より多い第４のキーワードを出力する。 The keyword increasing unit 60 of the question answering apparatus 10 increases the keyword input to the keyword input unit 11. That is, the keyword increasing unit 60 outputs, for example, a third keyword larger than the number of the first keywords based on the first keyword input to the keyword input unit 11. Moreover, the keyword increase part 60 outputs the 4th keyword more than the number of 2nd keywords based on the 2nd keyword input into the keyword input part 11, for example.

単語データデータベース（ＤＢ）６１には、単語と単語の分野との対応情報が格納されている。例えば、図７に示すような、単語と単語の分野との対応情報が格納されている。例えば、「国名」という分野に対応する単語として、日本、アメリカ、ドイツ、・・・といった単語が格納されている。 The word data database (DB) 61 stores correspondence information between words and word fields. For example, correspondence information between words and word fields as shown in FIG. 7 is stored. For example, words such as Japan, the United States, Germany,... Are stored as words corresponding to the field “country name”.

また、シソーラスデータベース（ＤＢ）６２には、意味的類似による単語の分類情報であるシソーラスデータが格納されている。例えば、シソーラスＤＢ６２には、図８に示すような、単語と単語に振られた１０桁の数字（分類番号）との対応情報がシソーラスデータとして格納されている。図８に示す例では、シソーラスデータが分類語彙表の形式で示されている。 The thesaurus database (DB) 62 stores thesaurus data, which is word classification information based on semantic similarity. For example, in the thesaurus DB 62, correspondence information between words and 10-digit numbers (classification numbers) assigned to the words as shown in FIG. 8 is stored as thesaurus data. In the example shown in FIG. 8, thesaurus data is shown in the form of a classification vocabulary table.

なお、分類語彙表とは、一般に、単語を意味に基づいて整理した表であり、各単語に対して分類番号という数字が付与されている。この１０桁の分類番号は、７レベルの階層構造を示しており、上位５レベルは分類番号の最初の５桁で表現され、６レベル目は次の２桁、最下層のレベルは最後の３桁で表現されている。 The classification vocabulary table is generally a table in which words are arranged based on meaning, and a number called a classification number is assigned to each word. This 10-digit classification number indicates a 7-level hierarchical structure, with the top five levels being represented by the first five digits of the classification number, the sixth level is the next two digits, and the lowest level is the last three levels. It is expressed in digits.

類似度算出部１００は、シソーラスＤＢ６２中のシソーラスデータに基づいて、キーワード入力部１１に入力されたキーワードとシソーラスデータ中の単語との類似度を算出する。キーワード抽出部１０１は、例えば、算出された類似度が予め定めた閾値以上の単語をキーワードとして抽出し、出力する。また、キーワード抽出部１０１は、例えば、算出された類似度が大きい順に所定の個数の単語をシソーラスデータ中から取り出して、キーワードとして出力する構成を採ることもできる。 The similarity calculation unit 100 calculates the similarity between the keyword input to the keyword input unit 11 and the word in the thesaurus data based on the thesaurus data in the thesaurus DB 62. The keyword extraction unit 101 extracts, for example, a word whose calculated similarity is equal to or greater than a predetermined threshold as a keyword and outputs the keyword. In addition, the keyword extraction unit 101 can take a configuration in which a predetermined number of words are extracted from the thesaurus data in the descending order of the calculated similarity and are output as keywords.

本発明の実施の形態においては、キーワード抽出部１０１は、単語データＤＢ６１中に格納された、単語と単語の分野との対応情報に基づいて、キーワード入力部１１に入力されたキーワードと同じ分野の単語をキーワードとして抽出し、出力する構成を採ることもできる。 In the embodiment of the present invention, the keyword extraction unit 101 has the same field as the keyword input to the keyword input unit 11 based on the correspondence information between the word and the field of the word stored in the word data DB 61. It is also possible to adopt a configuration in which words are extracted as keywords and output.

上記の質問応答装置１０を用いた場合の質問応答処理フローは、図５に示す質問応答処理フローと、ステップＳ２、ステップＳ３の処理が異なる以外は、同様である。質問応答装置１０を用いた場合の質問応答処理フローの一例においては、図５のステップＳ２およびステップＳ３の代わりに、キーワード増加部６０のキーワード抽出部１０１で、キーワード入力部１１に入力されたキーワードと同じ分野の単語を単語データＤＢ６１中から抽出し、キーワードとして出力する。 The question answering process flow when the above question answering apparatus 10 is used is the same as the question answering process flow shown in FIG. 5 except that the processes in steps S2 and S3 are different. In an example of the question answering process flow when the question answering apparatus 10 is used, the keyword input to the keyword input unit 11 by the keyword extracting unit 101 of the keyword increasing unit 60 instead of steps S2 and S3 in FIG. Are extracted from the word data DB 61 and output as keywords.

例えば、キーワード入力部１１に第１のキーワード「日本」が入力されたとすると、キーワード抽出部１０１は、図７に示す単語データＤＢ６１から、単語「日本」が対応する「国名」という分野に属する（対応する）単語である「日本」、「アメリカ」、「ドイツ」、・・・を抽出し、第３のキーワードとして出力する。また、例えば、キーワード入力部１１に第２のキーワード「面積」が入力されたとすると、キーワード抽出部１０１は、図７に示す単語データＤＢ６１から、単語「面積」が対応する「数値表現」という分野に属する（対応する）単語である「面積」、「人口」、「緯度」、・・・を抽出し、第４のキーワードとして出力する。 For example, if the first keyword “Japan” is input to the keyword input unit 11, the keyword extraction unit 101 belongs to the field “country name” to which the word “Japan” corresponds from the word data DB 61 shown in FIG. Corresponding) words “Japan”, “America”, “Germany”,... Are extracted and output as a third keyword. For example, if the second keyword “area” is input to the keyword input unit 11, the keyword extraction unit 101 reads the field “numerical expression” corresponding to the word “area” from the word data DB 61 illustrated in FIG. 7. “Area”, “population”, “latitude”,... That belong to (corresponding to) are extracted and output as a fourth keyword.

また、質問応答装置１０を用いた場合の質問応答処理フローの別の例においては、図５のステップＳ２およびステップＳ３の代わりに、例えば、キーワード増加部６０の類似度算出部１００が、キーワード入力部１１に入力されたキーワードとシソーラスＤＢ６２中の単語との類似度を算出し、キーワード増加部６０のキーワード抽出部１０１が、算出された類似度が予め定めた閾値以上の単語をキーワードとして出力する。 In another example of the question answering process flow when the question answering device 10 is used, for example, the similarity calculating unit 100 of the keyword increasing unit 60 performs keyword input instead of step S2 and step S3 in FIG. The similarity between the keyword input to the unit 11 and the word in the thesaurus DB 62 is calculated, and the keyword extraction unit 101 of the keyword increasing unit 60 outputs a word whose calculated similarity is equal to or greater than a predetermined threshold as a keyword. .

なお、例えば、キーワード抽出部１０１は、算出された類似度が大きい順に所定の個数の単語をシソーラスデータ中から取り出して、キーワードとして出力する構成を採ることもできる。 For example, the keyword extraction unit 101 can take a configuration in which a predetermined number of words are extracted from the thesaurus data in descending order of the calculated similarity and are output as keywords.

類似度算出部１００は、入力されたキーワードとシソーラスＤＢ６２中の単語との類似度を、例えば以下のようにして算出する。図８に示すシソーラスＤＢ６２内に格納されたシソーラスデータ（分類語彙表）中の各単語に振られた、１０桁の分類番号における各桁の数字の一致の割合を用いて、類似度を求める。すなわち、例えば、分類語彙表中の各単語に振られた分類番号について、キーワード入力部１１に入力されたキーワードと同一の単語に振られた分類番号との間での、各桁の数字の一致の割合を算出し、算出された値を類似度とする。なお、例えば、分類番号の６桁目と７桁目、および、８桁目と９桁目と１０桁目は、それぞれ連続した１つの数字として考える。 The similarity calculation unit 100 calculates the similarity between the input keyword and the word in the thesaurus DB 62, for example, as follows. The similarity is obtained by using the proportion of the digits of each digit in the 10-digit classification number assigned to each word in the thesaurus data (classification vocabulary table) stored in the thesaurus DB 62 shown in FIG. That is, for example, with respect to the classification number assigned to each word in the classification vocabulary table, the digit number matches between the keyword input to the keyword input unit 11 and the classification number assigned to the same word. The ratio is calculated, and the calculated value is set as the similarity. For example, the sixth and seventh digits of the classification number and the eighth, ninth and tenth digits are considered as one continuous number.

例えば、キーワード入力部１１に第１のキーワードとして入力されたキーワードが「日本」である場合、図８に示す分類語彙表中の単語「日本」と「アメリカ」には、それぞれ以下のような分類番号が振られている。以下では、分類番号の上位５レベルと、６レベル目と、最下層のレベルとの間を空白で区切って示す。 For example, if the keyword input as the first keyword in the keyword input unit 11 is “Japan”, the words “Japan” and “America” in the classification vocabulary table shown in FIG. The number is given. In the following, the upper five levels, the sixth level, and the lowest level of the classification number are shown separated by blanks.

日本：１２５９００１０１２
アメリカ：１２５９００４１９２
例えば、両単語の分類番号の上位５レベルにおいて、最初の５桁が一致するので、算出されるキーワード「日本」と分類語彙表中の単語「アメリカ」との類似度は、類似度５である。 Japan: 12590 01 012
United States: 12590 04 192
For example, since the first five digits match in the top five levels of the classification numbers of both words, the similarity between the calculated keyword “Japan” and the word “America” in the classification lexicon is 5 .

また、例えば、キーワード入力部１１に第２のキーワードとして入力されたキーワードが「面積」である場合、分類語彙表中の単語「面積」と「人口」には、それぞれ以下のような分類番号が振られている。 For example, when the keyword input as the second keyword in the keyword input unit 11 is “area”, the words “area” and “population” in the classification vocabulary table have the following classification numbers, respectively. It is shaken.

面積：１２６３０１３０１５
人口：１２６３０１００１２
例えば、両単語の分類番号の上位５レベルにおいて、最初の５桁が一致するので、算出されるキーワード「面積」と分類語彙表中の単語「人口」との類似度は、類似度５である。 Area: 12630 13 015
Population: 12630 10 012
For example, since the first five digits match at the top five levels of the classification numbers of both words, the similarity between the calculated keyword “area” and the word “population” in the classification lexicon is similarity 5. .

また、例えば、キーワード入力部１１に第２のキーワードとして入力されたキーワードが「人口」である場合、分類語彙表中の単語「人口」と「緯度」には、それぞれ以下のような分類番号が振られている。 For example, when the keyword input as the second keyword in the keyword input unit 11 is “population”, the words “population” and “latitude” in the classification vocabulary table have the following classification numbers, respectively. It is shaken.

人口：１２６３０１００１２
緯度：１２６３０１００１５
例えば、両単語の分類番号の上位５レベルにおいて、最初の５桁が一致し、また、６レベル目の２桁の数字「１０」が一致するので、算出されるキーワード「人口」と分類語彙表中の単語「緯度」との類似度は、類似度７である。 Population: 12630 10 012
Latitude: 12630 10 015
For example, in the top five levels of the classification numbers of both words, the first five digits match and the two-digit number “10” at the sixth level matches, so the calculated keyword “population” and the classification vocabulary table The similarity with the word “latitude” in the middle is 7.

また、例えば、キーワード入力部１１に第２のキーワードとして入力されたキーワードが「人口」である場合、分類語彙表中の単語「人口」と「アメリカ」には、それぞれ以下のような分類番号が振られている。 For example, when the keyword input as the second keyword in the keyword input unit 11 is “population”, the words “population” and “America” in the classification vocabulary table have the following classification numbers, respectively. It is shaken.

人口：１２６３０１００１２
アメリカ：１２５９００４１９２
例えば、両単語の分類番号の上位５レベルにおいて、最初の２桁が一致するため、算出されるキーワード「人口」と分類語彙表中の単語「アメリカ」との類似度は、類似度２である。 Population: 12630 10 012
United States: 12590 04 192
For example, since the first two digits match at the top five levels of the classification numbers of both words, the similarity between the calculated keyword “population” and the word “America” in the classification vocabulary table is similarity 2 .

図９は、本発明の第２の実施の形態における質問応答装置の構成の一例を示す図である。第２の実施の形態においては、例えば、第１のキーワード「日本」と、第２のキーワード「首都」＋疑問代名詞「はどこですか？」が入力されると、第１のキーワード「日本」に基づいて、第１のキーワードを、例えば「日本」、「アメリカ」、「ドイツ」の３つに増加させる。そして、増加後の第１のキーワードと、第２のキーワード「首都」＋疑問代名詞「はどこですか？」により構成される、例えば「日本の首都はどこですか？」、「アメリカの首都はどこですか？」、「ドイツの首都はどこですか？」という各質問に対する解答を出力する。 FIG. 9 is a diagram illustrating an example of the configuration of the question answering apparatus according to the second embodiment of the present invention. In the second embodiment, for example, when the first keyword “Japan” and the second keyword “capital” + question pronoun “where is?” Are entered, the first keyword “Japan” Based on this, the first keyword is increased to, for example, “Japan”, “America”, and “Germany”. And the first keyword after the increase and the second keyword “capital” + interrogative pronoun “Where is?” For example, “Where is the capital of Japan?”, “Where is the capital of America?” ? "And" Where is the German capital? "

質問応答装置２は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図９に示す質問応答装置２の構成要素のうち、キーワード入力部１１、解答表出力部１５、キーワード抽出用ＤＢ１６、パターン抽出部１２１、キーワード抽出部１２２は、それぞれ、図１に示す質問応答装置１の、同符号の構成要素と同様である。本発明の実施の形態においては、図９に示す構成から後述する質問作成部２３を省略し、解答候補抽出部２４が、キーワード増加部１８によって出力されたキーワードによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 The question answering device 2 is a device that increases the number of input keywords and outputs an answer to a question composed of the increased keywords. Among the constituent elements of the question answering device 2 shown in FIG. 9, the keyword input unit 11, answer table output unit 15, keyword extraction DB 16, pattern extraction unit 121, and keyword extraction unit 122 are the question answering device shown in FIG. 1 are the same as those of the same reference numerals. In the embodiment of the present invention, the question creation unit 23 to be described later is omitted from the configuration shown in FIG. 9, and the answer candidate extraction unit 24 selects answer candidates for the question configured by the keywords output by the keyword increase unit 18. A configuration of extracting and outputting may be adopted.

キーワード入力部１１には、キーワードが入力される。例えば、第１のキーワード「日本」と第２のキーワード「首都」が入力される。疑問代名詞入力部２１には、キーワード入力部１１に入力された第２のキーワードに対応付けられた疑問代名詞が入力される。例えば、「はどこですか？」という疑問代名詞が入力される。この他、疑問代名詞入力部２１に入力される疑問代名詞として、例えば、「は何時ですか？」、「は誰ですか？」などが挙げられる。なお、疑問代名詞入力部２１に入力される疑問代名詞は、ユーザの指定入力に基づいて入力されるものであってもよいし、また、質問応答装置２とは別のコンピュータによって入力されるものであってもよい。 A keyword is input to the keyword input unit 11. For example, the first keyword “Japan” and the second keyword “capital” are input. The interrogative pronoun input unit 21 receives the interrogative pronoun associated with the second keyword input to the keyword input unit 11. For example, the question pronoun “Where is?” Is input. In addition, examples of the question pronouns input to the question pronoun input unit 21 include “What time is it?” And “Who is?”. The interrogative pronoun input to the interrogative pronoun input unit 21 may be input based on a user's designated input, or may be input by a computer different from the question answering device 2. There may be.

解答タイプ推定部２２は、疑問代名詞入力部２１に入力された疑問代名詞に基づいて、後述する質問作成部２３によって作成される質問、または、質問応答装置２が質問作成部２３を備えない構成を採るときは、後述するキーワード増加部１８によって出力されたキーワードによって構成される質問に対する解答候補の言語表現の類型である解答タイプを推定する。例えば、入力された疑問代名詞が「はどこですか？」である場合には、解答タイプは「固有名詞（地名）」であると推定する。本発明の実施の形態においては、解答タイプ推定部２２は、疑問代名詞入力部２１に入力された疑問代名詞ではなく、予め定められた疑問代名詞に基づいて、上記解答タイプを推定してもよい。 The answer type estimation unit 22 has a configuration in which a question created by a question creation unit 23 described later based on the question pronoun input to the question pronoun input unit 21 or the question answering device 2 does not include the question creation unit 23. When adopting, an answer type that is a type of language expression of an answer candidate for a question configured by a keyword output by the keyword increasing unit 18 described later is estimated. For example, when the input question pronoun is “Where?”, It is estimated that the answer type is “proprietary noun (place name)”. In the embodiment of the present invention, the answer type estimation unit 22 may estimate the answer type based on a predetermined question pronoun instead of the question pronoun input to the question pronoun input unit 21.

キーワード増加部１８は、キーワード抽出技術を用いて、入力された第１のキーワードと同じ分野のキーワードをキーワード抽出用ＤＢ１６から抽出して、第１のキーワードを増加させ、第３のキーワードとして出力する。第２の実施の形態では、キーワード増加部１８は、第２のキーワード（例えば、「首都」）については増加させずに、質問作成部２３に対して出力する。質問応答装置２が質問作成部２３を備えない構成を採るときは、キーワード増加部１８は、第３のキーワードと第２のキーワードを解答候補抽出部２４に対して出力する。 The keyword increase unit 18 uses the keyword extraction technique to extract keywords in the same field as the input first keyword from the keyword extraction DB 16, increases the first keyword, and outputs it as a third keyword. . In the second embodiment, the keyword increasing unit 18 outputs the second keyword (for example, “capital”) to the question creating unit 23 without increasing it. When the question answering device 2 is configured not to include the question creating unit 23, the keyword increasing unit 18 outputs the third keyword and the second keyword to the answer candidate extracting unit 24.

質問作成部２３は、キーワード増加部１８の処理によって出力された第３のキーワードと、第２のキーワードと、疑問代名詞入力部２１に入力された疑問代名詞（または予め定められた疑問代名詞）とに基づいて、複数の質問を作成する。 The question creating unit 23 outputs the third keyword output by the processing of the keyword increasing unit 18, the second keyword, and the question pronoun (or predetermined question pronoun) input to the question pronoun input unit 21. Create multiple questions based on it.

知識データベース（ＤＢ）２５には、解答候補の検索対象となる文書データ群が蓄積される。蓄積される文書データ群としては、例えば、新聞記事データ・百科事典データなどの文書データ群が挙げられる。 In the knowledge database (DB) 25, document data groups to be searched for answer candidates are accumulated. Examples of the document data group to be accumulated include document data groups such as newspaper article data and encyclopedia data.

解答候補抽出部２４は、知識ＤＢ２５から、質問作成部２３によって作成された各質問を構成するキーワード（または、キーワード増加部１８から出力された第３のキーワードと第２のキーワード）を含む文書データを検索し、この検索処理で抽出された文書データから、解答タイプ推定部２２によって推定された解答タイプに適合する言語表現を、解答候補として抽出する。 The answer candidate extracting unit 24 includes document data including keywords (or the third keyword and the second keyword output from the keyword increasing unit 18) constituting each question created by the question creating unit 23 from the knowledge DB 25. , And from the document data extracted by this search process, a linguistic expression that matches the answer type estimated by the answer type estimation unit 22 is extracted as an answer candidate.

解答表出力部１５は、抽出された各解答候補が質問と対応付けられた表を解答表として出力する。例えば、図１０に示すような解答表を出力する。 The answer table output unit 15 outputs a table in which each extracted answer candidate is associated with a question as an answer table. For example, an answer table as shown in FIG. 10 is output.

図１０に示す解答表においては、例えば、「日本の首都はどこですか？」という質問に対する解答として、データ項目「日本」に対応する行とデータ項目「首都」に対応する列とが交差する枡目に、「東京」が格納され、「アメリカの首都はどこですか？」という質問に対する解答として、データ項目「アメリカ」に対応する行とデータ項目「首都」に対応する列とが交差する枡目に、「ワシントン」が格納され、「ドイツの首都はどこですか？」という質問に対する解答として、データ項目「アメリカ」に対応する行とデータ項目「首都」に対応する列とが交差する枡目に、「ベルリン」が格納される。 In the answer table shown in FIG. 10, for example, as an answer to the question “Where is the capital of Japan?”, A row corresponding to the data item “Japan” and a column corresponding to the data item “capital” intersect. In the eyes, "Tokyo" is stored, and the answer to the question "Where is the US capital?" Intersects the row corresponding to the data item "USA" and the column corresponding to the data item "capital" To the intersection of the row corresponding to the data item “USA” and the column corresponding to the data item “Capital” as an answer to the question “Where is the German capital?” , “Berlin” is stored.

図１１は、本発明の第２の実施の形態における質問応答処理フローの一例を示す図である。キーワード入力部１１に、第１のキーワードと第２のキーワードを入力キーワードとして入力する（ステップＳ１１）。例えば、第１のキーワード「日本」と第２のキーワード「首都」が入力される。また、疑問代名詞入力部２１に、第２のキーワードに対応付けられた疑問代名詞が入力される（ステップＳ１２）。例えば、第２のキーワード「首都」に対応付けられた疑問代名詞「はどこですか？」が入力される。 FIG. 11 is a diagram illustrating an example of a question response process flow according to the second embodiment of the present invention. The first keyword and the second keyword are input to the keyword input unit 11 as input keywords (step S11). For example, the first keyword “Japan” and the second keyword “capital” are input. In addition, the question pronoun associated with the second keyword is input to the question pronoun input unit 21 (step S12). For example, the question pronoun “where is?” Associated with the second keyword “capital” is input.

キーワード増加部１８のパターン抽出部１２１で、第１のキーワードをキーワード抽出用ＤＢ１６で全文検索し、第１のキーワードの周辺に出現したパターンをｃ_iとして抽出する（ステップＳ１３）。周辺に出現するパターンの定義は適宜行なう。 A pattern extraction unit 121 of the keyword increasing portion 18, the first keyword and full-text search on the keyword extraction DB 16, extracts the emerging pattern around the first keyword as c _i (step S13). The pattern appearing in the vicinity is appropriately defined.

キーワード増加部１８のキーワード抽出部１２２で、パターン抽出部１２１で抽出したパターンｃ_iをキーワード抽出用ＤＢ１６で全文検索し、パターンｃ_iによって抽出される表現ｅｘｐを抽出すると同時に、抽出した表現ｅｘｐをＳｃｏｒｅの値の大きい順にソートし、第３のキーワードとして出力する（ステップＳ１４）。ステップＳ１４の処理によって、例えば、第１のキーワードが、「日本」、「アメリカ」、「ドイツ」という３つの第３のキーワードに増加する。 In keyword extraction section 122 of the keyword increasing portion 18, and full-text search for a pattern c _i extracted by the pattern extraction unit 121 by the keyword extraction DB 16, and at the same time extracts the representation exp extracted by the pattern c _i, the extracted expression exp Sorting is performed in descending order of the Score value, and the result is output as the third keyword (step S14). By the process of step S14, for example, the first keyword increases to three third keywords “Japan”, “America”, and “Germany”.

解答タイプ推定部２２が、疑問代名詞入力部２１に入力された疑問代名詞に基づいて、解答タイプを推定する（ステップＳ１５）。例えば、入力された疑問代名詞が「はどこですか？」である場合には、解答タイプ推定部２２は、解答タイプが「固有名詞（地名）」であると推定する。 The answer type estimation unit 22 estimates the answer type based on the question pronoun input to the question pronoun input unit 21 (step S15). For example, when the input question pronoun is “where is?”, The answer type estimation unit 22 estimates that the answer type is “proper noun (place name)”.

質問作成部２３が、疑問代名詞入力部２１に入力された疑問代名詞を用いて、第３のキーワードと第２のキーワードとにより構成される質問を作成する（ステップＳ１６）。例えば、質問作成部２３は、第３のキーワード「アメリカ」と第２のキーワード「首都」とにより構成される質問「アメリカの首都はどこですか？」を作成する。質問応答装置２が質問作成部２３を備えない構成を採るときは、上記ステップＳ１６の処理は、省略される。 The question creating unit 23 creates a question composed of the third keyword and the second keyword using the question pronoun input to the question pronoun input unit 21 (step S16). For example, the question creating unit 23 creates a question “Where is the US capital?” Composed of the third keyword “USA” and the second keyword “capital”. When the question answering apparatus 2 adopts a configuration that does not include the question creating unit 23, the process of step S16 is omitted.

次に、解答候補抽出部２４は、作成された各質問に対する解答候補を抽出する（ステップＳ１７）。すなわち、解答候補抽出部２４は、知識ＤＢ２５から、質問作成部２３によって作成された各質問を構成するキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、解答タイプ推定部２２によって推定された解答タイプに適合する言語表現を、解答候補として抽出する。質問応答装置２が質問作成部２３を備えない構成を採るときは、上記ステップＳ１７において、解答候補抽出部２４は、知識ＤＢ２５から、キーワード増加部１８によって出力されたキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、解答タイプ推定部２２によって推定された解答タイプに適合する言語表現を、キーワード増加部１８によって出力されたキーワードによって構成される質問に対する解答候補として抽出する。そして、解答表出力部１５が、解答表を出力する（ステップＳ１８）。例えば、上述した図１０に示すような解答表が出力される。 Next, the answer candidate extraction unit 24 extracts answer candidates for each created question (step S17). That is, the answer candidate extraction unit 24 searches the knowledge DB 25 for document data including the keyword that constitutes each question created by the question creation unit 23, and uses the answer type estimation unit from the document data extracted by this search process. A linguistic expression that matches the answer type estimated by 22 is extracted as an answer candidate. When the question answering apparatus 2 does not include the question creating unit 23, in step S17, the answer candidate extracting unit 24 searches the knowledge DB 25 for document data including the keyword output by the keyword increasing unit 18. Then, from the document data extracted by this search processing, a linguistic expression that matches the answer type estimated by the answer type estimating unit 22 is extracted as an answer candidate for the question constituted by the keyword output by the keyword increasing unit 18. . Then, the answer table output unit 15 outputs the answer table (step S18). For example, the answer table as shown in FIG. 10 described above is output.

図１２は、本発明の第２の実施の形態の変形例１の構成例を示す図である。質問応答装置２０は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図１２中に示す質問応答装置１０が備える構成要素のうち、図６に示す質問応答装置１０が備える構成要素または図９に示す質問応答装置２が備える構成要素と同一の符号が付けられたものは、当該質問応答装置１０または質問応答装置２が備える構成要素と同様の機能を有する。 FIG. 12 is a diagram illustrating a configuration example of Modification 1 of the second embodiment of the present invention. The question answering device 20 is a device that increases the number of input keywords and outputs an answer to a question composed of the increased keywords. Among the constituent elements included in the question answering apparatus 10 shown in FIG. 12, the constituent elements included in the question answering apparatus 10 shown in FIG. 6 or the constituent elements provided in the question answering apparatus 2 shown in FIG. Has the same function as the constituent elements of the question answering device 10 or the question answering device 2.

質問応答装置２０のキーワード増加部６３は、キーワード入力部１１に入力された第１のキーワードを増加させて、第３のキーワードとして出力する。また、キーワード入力部１１に入力された第２のキーワードについては、増加させずに、質問作成部２３に対して出力する。本発明の実施の形態においては、図１２に示す構成から質問作成部２３を省略し、解答候補抽出部２４が、キーワード増加部６３によって出力された第３のキーワードと第２のキーワードとによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 The keyword increasing unit 63 of the question answering apparatus 20 increases the first keyword input to the keyword input unit 11 and outputs it as a third keyword. The second keyword input to the keyword input unit 11 is output to the question creating unit 23 without being increased. In the embodiment of the present invention, the question creation unit 23 is omitted from the configuration shown in FIG. 12, and the answer candidate extraction unit 24 is configured by the third keyword and the second keyword output by the keyword increase unit 63. A configuration may be adopted in which answer candidates for a question to be asked are extracted and output.

上記の質問応答装置２０を用いた場合の質問応答処理フローは、図１１に示す質問応答処理フローと、ステップＳ１３、ステップＳ１４の処理が異なる以外は、同様である。質問応答装置１０を用いた場合の質問応答処理フローの一例においては、図１１のステップＳ１３およびステップＳ１４の代わりに、キーワード増加部６３のキーワード抽出部１０１で、キーワード入力部１１に入力された第１のキーワードと同じ分野の単語を単語データＤＢ６１中から抽出し、第３のキーワードとして出力する。 The question answering process flow when the above question answering apparatus 20 is used is the same as the question answering process flow shown in FIG. 11 except that the processes in steps S13 and S14 are different. In an example of the question answering process flow when the question answering device 10 is used, the keyword extracting unit 101 of the keyword increasing unit 63 inputs the keyword input unit 11 to the keyword input unit 11 instead of steps S13 and S14 of FIG. A word in the same field as the one keyword is extracted from the word data DB 61 and output as a third keyword.

また、質問応答装置１０を用いた場合の質問応答処理フローの別の例においては、図１１のステップＳ１３およびステップＳ１４の代わりに、キーワード増加部６３の類似度算出部１００が、キーワード入力部１１に入力された第１のキーワードとシソーラスＤＢ６２中の単語との類似度を算出し、キーワード増加部６３のキーワード抽出部１０１が、算出された類似度が予め定めた閾値以上の単語を第３のキーワードとして出力する。 Further, in another example of the question answering process flow when the question answering device 10 is used, the similarity calculating unit 100 of the keyword increasing unit 63 is replaced with the keyword inputting unit 11 instead of step S13 and step S14 of FIG. The degree of similarity between the first keyword input to the word and the word in the thesaurus DB 62 is calculated, and the keyword extraction unit 101 of the keyword increase unit 63 determines that the calculated similarity is equal to or greater than a predetermined threshold. Output as a keyword.

また、キーワード抽出部１０１は、例えば、算出された類似度が大きい順に所定の個数の単語をシソーラスデータ中から取り出して、第３のキーワードとして出力する構成を採ることもできる。 In addition, the keyword extraction unit 101 can take a configuration in which a predetermined number of words are extracted from the thesaurus data in descending order of the calculated similarity and are output as the third keyword.

本発明の第２の実施の形態の変形例２においては、図９に示す質問応答装置２または図１２に示す質問応答装置２０において、疑問代名詞入力部２１には、キーワード入力部１１に入力されるキーワードと対応付けられていない疑問代名詞が入力される。質問応答装置２のキーワード増加部１８（または質問応答装置２０のキーワード増加部６３）は、キーワード入力部１１に入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、キーワード入力部１１に入力された第２のキーワードに基づいて、第４のキーワードを出力キーワードとして出力する。解答タイプ推定部２２は、疑問代名詞入力部２１に入力された疑問代名詞に基づいて、解答タイプを推定する。そして、解答候補抽出部２４は、知識ＤＢ２５から、キーワード増加部１８によって出力されたキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、解答タイプ推定部２２によって推定された解答タイプに適合する言語表現を、キーワード増加部１８によって出力されたキーワードによって構成される質問に対する解答候補として抽出する。そして、解答表出力部１５が、解答表を出力する。なお、上記の本発明の第２の実施の形態の変形例２においては、解答タイプ推定部２２は、疑問代名詞入力部２１に入力された疑問代名詞ではなく、予め定められた疑問代名詞に基づいて解答タイプを推定する構成を採ってもよい。 In the second modification of the second embodiment of the present invention, in the question answering device 2 shown in FIG. 9 or the question answering device 20 shown in FIG. 12, the question pronoun input unit 21 is input to the keyword input unit 11. A questionable pronoun that is not associated with a keyword is entered. The keyword increasing unit 18 of the question answering device 2 (or the keyword increasing unit 63 of the question answering device 20) outputs the third keyword as an output keyword based on the first keyword input to the keyword input unit 11, Based on the second keyword input to the keyword input unit 11, the fourth keyword is output as an output keyword. The answer type estimation unit 22 estimates the answer type based on the question pronoun input to the question pronoun input unit 21. Then, the answer candidate extraction unit 24 searches the knowledge DB 25 for document data including the keyword output by the keyword increase unit 18, and is estimated by the answer type estimation unit 22 from the document data extracted by this search process. A linguistic expression suitable for the answer type is extracted as an answer candidate for the question constituted by the keyword output by the keyword increasing unit 18. Then, the answer table output unit 15 outputs the answer table. In the second modification of the second embodiment of the present invention, the answer type estimation unit 22 is not based on the question pronoun input to the question pronoun input unit 21 but based on a predetermined question pronoun. You may take the structure which estimates an answer type.

図１３は、本発明の第３の実施の形態における質問応答装置の構成の一例を示す図である。第３の実施の形態では、第１の実施の形態のような機械学習の手法を用いるのではなく、入力された解答タイプ（または予め定められた解答タイプ）を用いて解答候補を抽出する。 FIG. 13 is a diagram illustrating an example of the configuration of the question answering apparatus according to the third embodiment of the present invention. In the third embodiment, instead of using the machine learning method as in the first embodiment, answer candidates are extracted using the input answer type (or a predetermined answer type).

第３の実施の形態においては、例えば、第１のキーワード「日本」と第２のキーワード「首都」と、解答タイプ「固有名詞（地名）」が入力されると、第１のキーワード「日本」に基づいて、第１のキーワードを、例えば「日本」、「アメリカ」、「ドイツ」という３つの第３のキーワードに増加させる。また、第２のキーワード「首都」に基づいて、第２のキーワードを、例えば「首都」、「旧首都」、「最南端都市」という３つの第４のキーワードに増加させる。 In the third embodiment, for example, when the first keyword “Japan”, the second keyword “capital”, and the answer type “proper noun (place name)” are input, the first keyword “Japan”. Based on the above, the first keyword is increased to, for example, three third keywords “Japan”, “America”, and “Germany”. Further, based on the second keyword “capital”, the second keyword is increased to, for example, three fourth keywords “capital”, “old capital”, and “southmost city”.

そして、増加後の第３のキーワードと第４のキーワードとの組み合わせにより構成される、例えば「日本の首都は？」、「アメリカの旧首都は？」、「ドイツの最南端都市は？」・・・といった各質問に対する解答を出力する。より具体的には、後述するように、「日本の首都は？」という質問を構成する第３のキーワード「日本」と第４のキーワード「首都」を、解答候補の検索対象となる文書データ群から検索し、両キーワードを含む文書中の言語表現を解答候補として抽出するとともに、抽出された解答候補のうち解答タイプ「固有名詞（地名）」に適合するものを解答として出力する。 And the combination of the increased third keyword and fourth keyword, for example, “What is the capital of Japan?”, “What is the old capital of America?”, “What is the southernmost city of Germany?”・ An answer to each question is output. More specifically, as will be described later, the third keyword “Japan” and the fourth keyword “capital” that make up the question “What is the capital of Japan?” Are document data groups to be searched for answer candidates. And the linguistic expressions in the document including both keywords are extracted as answer candidates, and the extracted answer candidates that match the answer type “proper noun (place name)” are output as answers.

質問応答装置３は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図１３に示す質問応答装置３の構成要素のうち、キーワード入力部１１、キーワード増加部１２、質問作成部１３、解答表出力部１５、キーワード抽出用ＤＢ１６、パターン抽出部１２１、キーワード抽出部１２２は、それぞれ、図１に示す質問応答装置１の、同符号の構成要素と同様であり、解答候補抽出部２４、知識ＤＢ２５は、図９に示す質問応答装置２の、同符号の構成要素と同様である。本発明の実施の形態においては、図１３に示す構成から質問作成部１３を省略し、解答候補抽出部２４が、キーワード増加部１２によって出力されたキーワードによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 The question answering device 3 is a device that increases the number of input keywords and outputs an answer to a question composed of the increased keywords. Among the components of the question answering device 3 shown in FIG. 13, the keyword input unit 11, the keyword increase unit 12, the question creation unit 13, the answer table output unit 15, the keyword extraction DB 16, the pattern extraction unit 121, and the keyword extraction unit 122 are 1 are the same as those of the question answering apparatus 1 shown in FIG. 1, and the answer candidate extraction unit 24 and the knowledge DB 25 are the same as those of the question answering apparatus 2 shown in FIG. It is. In the embodiment of the present invention, the question creating unit 13 is omitted from the configuration shown in FIG. 13, and the answer candidate extracting unit 24 extracts answer candidates for the question constituted by the keywords output by the keyword increasing unit 12. A configuration of outputting may be adopted.

キーワード入力部１１には、キーワードが入力される。例えば、第１のキーワード「日本」と第２のキーワード「首都」が入力される。解答タイプ入力部３１には、質問作成部１３によって作成される質問、または、質問応答装置３が質問作成部１３を省略する構成を採るときは、キーワード増加部１２によって出力されるキーワードによって構成される質問に対する解答候補の解答タイプが入力される。例えば、「固有名詞（地名）」という解答タイプが入力される。 A keyword is input to the keyword input unit 11. For example, the first keyword “Japan” and the second keyword “capital” are input. The answer type input unit 31 includes a question created by the question creating unit 13 or a keyword output by the keyword increasing unit 12 when the question answering device 3 omits the question creating unit 13. The answer type of the answer candidate for the question is entered. For example, an answer type “proper noun (place name)” is input.

この他、解答タイプ入力部３１に入力される解答タイプとして、例えば、「固有名詞（数値）」、「固有名詞（人名）」、「カタカナ表現」（カタカナだけで表現されるもの）、「名詞」、「動詞」などが挙げられる。なお、解答タイプ入力部３１に入力される解答タイプは、ユーザの指定入力に基づいて入力されるものであってもよいし、また、質問応答装置３とは別のコンピュータによって入力されるものであってもよい。 In addition, as answer types input to the answer type input unit 31, for example, “proprietary noun (numerical value)”, “proprietary noun (person name)”, “katakana expression” (represented only by katakana), “noun” ”,“ Verb ”and the like. Note that the answer type input to the answer type input unit 31 may be input based on a user's designated input, or may be input by a computer different from the question answering device 3. There may be.

キーワード増加部１２は、図１を参照して説明したように、キーワード抽出技術を用いて、入力された各キーワードと同じ分野のキーワードをキーワード抽出用ＤＢ１６から抽出して、キーワードを増加させる。 As described with reference to FIG. 1, the keyword increase unit 12 uses the keyword extraction technique to extract keywords in the same field as the input keywords from the keyword extraction DB 16 and increase the keywords.

例えば、キーワード増加部１２は、入力された第１のキーワードに基づいて、第１のキーワードの数より多い第３のキーワードを出力する。また、例えば、キーワード増加部１２は、入力された第２のキーワードに基づいて、第２のキーワードの数より多い第４のキーワードを出力する。 For example, the keyword increasing unit 12 outputs a third keyword that is larger than the number of first keywords based on the input first keyword. In addition, for example, the keyword increasing unit 12 outputs a fourth keyword that is larger than the number of second keywords based on the input second keyword.

質問作成部１３は、第３のキーワードと第４のキーワードとによって構成される質問を複数作成する。解答候補抽出部２４は、知識ＤＢ２５から、質問作成部１３によって作成された各質問を構成するキーワード（または、キーワード増加部１２によって出力された第３のキーワードと第４のキーワード）を含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、解答タイプ入力部３１に入力された解答タイプ（または予め定められた解答タイプ）に適合する言語表現を、解答候補として抽出する。 The question creating unit 13 creates a plurality of questions configured by the third keyword and the fourth keyword. The answer candidate extraction unit 24 stores a document including the keywords (or the third keyword and the fourth keyword output by the keyword increase unit 12) constituting each question created by the question creation unit 13 from the knowledge DB 25. A language expression that matches the answer type (or predetermined answer type) input to the answer type input unit 31 is extracted as an answer candidate from the language expressions included in the document extracted by the search process. To do.

解答表出力部１５は、抽出された各解答候補が質問と対応付けられた表を解答表として出力する。 The answer table output unit 15 outputs a table in which each extracted answer candidate is associated with a question as an answer table.

図１４は、本発明の第３の実施の形態における質問応答処理フローの一例を示す図である。キーワード入力部１１に、第１のキーワードと第２のキーワードを入力キーワードとして入力する（ステップＳ２１）。例えば、第１のキーワード「日本」と第２のキーワード「首都」が入力される。また、解答タイプ入力部３１に、質問作成部１３により作成される質問に対する解答候補の解答タイプを入力する（ステップＳ２２）。例えば、解答タイプとして、「固有名詞（地名）」が入力される。なお、質問応答装置３が質問作成部１３を備えない構成を採るときは、解答タイプ入力部３１には、キーワード増加部１２によって出力されるキーワードによって構成される質問に対する解答候補の解答タイプが入力される。 FIG. 14 is a diagram illustrating an example of a question response process flow according to the third embodiment of the present invention. The first keyword and the second keyword are input to the keyword input unit 11 as input keywords (step S21). For example, the first keyword “Japan” and the second keyword “capital” are input. Moreover, the answer type of the answer candidate with respect to the question created by the question creation part 13 is input into the answer type input part 31 (step S22). For example, “proper noun (place name)” is input as the answer type. When the question answering device 3 does not include the question creating unit 13, the answer type input unit 31 receives the answer type of the answer candidate for the question configured by the keyword output by the keyword increasing unit 12. Is done.

キーワード増加部１２のパターン抽出部１２１で、入力キーワードをキーワード抽出用ＤＢ１６で全文検索し、複数の入力キーワードの周辺に出現したパターンをｃ_iとして抽出する（ステップＳ２３）。周辺に出現するパターンの定義は適宜行なう。パターンｃ_iの抽出は、第１のキーワードと第２のキーワードそれぞれについて行う。 A pattern extraction unit 121 of the keyword increasing portion 12, and full-text search input keywords in the keyword extraction DB 16, extracts the emerging pattern around the plurality of input keywords as c _i (step S23). The pattern appearing in the vicinity is appropriately defined. Extraction of the pattern c _i are performed for each of the first keyword and the second keyword.

キーワード増加部１２のキーワード抽出部１２２で、パターン抽出部１２１で抽出したパターンｃ_iをキーワード抽出用ＤＢ１６で全文検索し、パターンｃ_iによって抽出される表現ｅｘｐを抽出すると同時に、抽出した表現ｅｘｐをＳｃｏｒｅの値の大きい順にソートし、キーワードとして出力する（ステップＳ２４）。ステップＳ２４の処理によって、例えば、第１のキーワードが、「日本」、「アメリカ」、「ドイツ」という３つの第３のキーワードに増加する。また、第２のキーワードが、「首都」、「旧首都」、「最南端都市」という３つの第４のキーワードに増加する。 In keyword extraction section 122 of the keyword increasing portion 12, and full-text search for a pattern c _i extracted by the pattern extraction unit 121 by the keyword extraction DB 16, and at the same time extracts the representation exp extracted by the pattern c _i, the extracted expression exp The values are sorted in descending order of the Score value and output as keywords (step S24). By the process of step S24, for example, the first keyword increases to three third keywords “Japan”, “America”, and “Germany”. In addition, the second keyword increases to three fourth keywords of “capital”, “old capital”, and “southernest city”.

質問作成部２３が、出力されたキーワードにより構成される質問を作成する（ステップＳ２５）。ステップＳ２５においては、出力された第３のキーワードと第４のキーワードとにより構成される質問を作成する。例えば、質問作成部２３は、第３のキーワード「アメリカ」と第４のキーワード「首都」とにより構成される質問「アメリカの首都は？」を作成する。質問応答装置３が質問作成部１３を備えない構成を採るときは、上記ステップＳ２５の処理は、省略される。 The question creating unit 23 creates a question composed of the output keywords (step S25). In step S25, a question composed of the output third keyword and fourth keyword is created. For example, the question creating unit 23 creates a question “What is the US capital?” Composed of the third keyword “USA” and the fourth keyword “capital”. When the question answering device 3 is configured not to include the question creating unit 13, the process of step S25 is omitted.

次に、解答候補抽出部２４は、作成された各質問に対する解答候補を抽出する（ステップＳ２６）。すなわち、解答候補抽出部２４は、知識ＤＢ２５から、質問作成部１３によって作成された各質問を構成するキーワードを含む文書データを検索し、この検索処理で抽出された文書データから、解答タイプ入力部３１に入力された解答タイプに適合する言語表現を、解答候補として抽出する。質問応答装置３が質問作成部１３を備えない構成を採るときは、上記ステップＳ２６において、解答候補抽出部２４は、知識ＤＢ２５から、キーワード増加部１２によって出力されたキーワード（第３のキーワードと第４のキーワード）を含む文書データを検索し、この検索処理で抽出された文書データから、解答タイプ入力部３１に入力された解答タイプに適合する言語表現を、キーワード増加部１２によって出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補として抽出する。そして、解答表出力部１５が、解答表を出力する（ステップＳ２７）。例えば、図１５に示すような解答表が出力される。 Next, the answer candidate extraction unit 24 extracts answer candidates for each created question (step S26). That is, the answer candidate extraction unit 24 searches the knowledge DB 25 for document data including keywords that constitute each question created by the question creation unit 13, and uses the answer type input unit from the document data extracted by this search process. A linguistic expression that matches the answer type input in 31 is extracted as an answer candidate. When the question answering device 3 does not include the question creating unit 13, in step S26, the answer candidate extracting unit 24 extracts the keyword (the third keyword and the third keyword) output from the knowledge DB 25 by the keyword increasing unit 12. 4), and the linguistic expression suitable for the answer type input to the answer type input unit 31 is output from the document data extracted by the search process. It extracts as an answer candidate with respect to the question comprised by 3 keywords and a 4th keyword. And the answer table output part 15 outputs an answer table (step S27). For example, an answer table as shown in FIG. 15 is output.

図１６は、本発明の第３の実施の形態の変形例１の構成例を示す図である。質問応答装置３０は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図１６中に示す質問応答装置３０が備える構成要素のうち、図１に示す質問応答装置１が備える構成要素または図６に示す質問応答装置１０または図１３に示す質問応答装置３が備える構成要素と同一の符号が付けられたものは、当該質問応答装置１または質問応答装置１０または質問応答装置３が備える構成要素と同様の機能を有する。 FIG. 16 is a diagram illustrating a configuration example of Modification 1 of the third embodiment of the present invention. The question answering device 30 is a device that increases the number of input keywords and outputs an answer to a question composed of the increased keywords. Of the constituent elements provided in the question answering apparatus 30 shown in FIG. 16, the constituent elements provided in the question answering apparatus 1 shown in FIG. 1, the constituent elements provided in the question answering apparatus 10 shown in FIG. 6, or the question answering apparatus 3 shown in FIG. The same reference numerals as those of the question answering device 1, the question answering device 10, or the question answering device 3 have the same functions.

本発明の実施の形態においては、図１６に示す構成から質問作成部１３を省略し、解答候補抽出部２４が、キーワード増加部６０によって出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 In the embodiment of the present invention, the question creating unit 13 is omitted from the configuration shown in FIG. 16, and the answer candidate extracting unit 24 is configured by the third keyword and the fourth keyword output by the keyword increasing unit 60. A configuration may be adopted in which answer candidates for a question to be asked are extracted and output.

上記の質問応答装置３０を用いた場合の質問応答処理フローは、図１４に示す質問応答処理フローと、ステップＳ２３、ステップＳ２４の処理が異なる以外は、同様である。質問応答装置３０を用いた場合の質問応答処理フローの一例においては、図１４のステップＳ２３およびステップＳ２４の代わりに、キーワード増加部６０のキーワード抽出部１０１で、キーワード入力部１１に入力された第１のキーワードと同じ分野の単語を単語データＤＢ６１中から抽出し、第３のキーワードとして出力する。また、キーワード抽出部１０１で、キーワード入力部１１に入力された第２のキーワードと同じ分野の単語を単語データＤＢ６１中から抽出し、第４のキーワードとして出力する。 The question answering process flow when the above question answering apparatus 30 is used is the same as the question answering process flow shown in FIG. 14 except that the processes in steps S23 and S24 are different. In an example of the question answering process flow when the question answering device 30 is used, the keyword extracting unit 101 of the keyword increasing unit 60 inputs the keyword input unit 11 to the keyword input unit 11 instead of steps S23 and S24 in FIG. A word in the same field as the one keyword is extracted from the word data DB 61 and output as a third keyword. Further, the keyword extraction unit 101 extracts words in the same field as the second keyword input to the keyword input unit 11 from the word data DB 61 and outputs them as the fourth keyword.

また、質問応答装置３０を用いた場合の質問応答処理フローの別の例においては、図１４のステップＳ２３およびステップＳ２４の代わりに、キーワード増加部６０の類似度算出部１００が、キーワード入力部１１に入力された第１のキーワードとシソーラスＤＢ６２中の単語との類似度を算出し、キーワード増加部６０のキーワード抽出部１０１が、算出された類似度が予め定めた閾値以上の単語を第３のキーワードとして出力する。また、類似度算出部１００が、キーワード入力部１１に入力された第２のキーワードとシソーラスＤＢ６２中の単語との類似度を算出し、キーワード抽出部１０１が、算出された類似度が予め定めた閾値以上の単語を第４のキーワードとして出力する。 In another example of the question answering process flow when the question answering device 30 is used, the similarity calculating unit 100 of the keyword increasing unit 60 replaces the keyword input unit 11 in place of steps S23 and S24 in FIG. The similarity between the first keyword input to the word and the word in the thesaurus DB 62 is calculated, and the keyword extracting unit 101 of the keyword increasing unit 60 calculates a word whose calculated similarity is equal to or greater than a predetermined threshold to the third Output as a keyword. In addition, the similarity calculation unit 100 calculates the similarity between the second keyword input to the keyword input unit 11 and the word in the thesaurus DB 62, and the keyword extraction unit 101 determines the calculated similarity in advance. A word equal to or greater than the threshold is output as the fourth keyword.

また、キーワード抽出部１０１は、例えば、算出された類似度が大きい順に所定の個数の単語をシソーラスデータ中から取り出して、上記の第３のキーワード、第４のキーワードとして出力する構成を採ることもできる。 Further, the keyword extraction unit 101 may take a configuration in which, for example, a predetermined number of words are extracted from the thesaurus data in descending order of the calculated similarity and are output as the third keyword and the fourth keyword. it can.

本発明の第３の実施の形態の変形例２では、例えば、図１３に示す質問応答装置３または図１６に示す質問応答装置３０において、解答タイプ入力部３１には、キーワード増加部１２（またはキーワード増加部６０）によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプであって、キーワード入力部１１に入力された第２のキーワードに対応付けられた解答タイプが入力される。キーワード増加部１２（またはキーワード増加部６０）は、キーワード入力部１１に入力された第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力し、キーワード入力部１１に入力された第２のキーワードを出力キーワードとして出力する。解答候補抽出部２４は、知識ＤＢ２５から、キーワード増加部１２（またはキーワード増加部６０）によって出力されたキーワード（第３のキーワードと第２のキーワード）を含む文書データを検索し、この検索処理で抽出された文書データから、当該第２のキーワードに対応する解答タイプ入力部３１に入力された解答タイプに適合する言語表現を、キーワード増加部１２によって出力された第３のキーワードと当該第２のキーワードとによって構成される質問に対する解答候補として抽出する。そして、解答表出力部１５が、解答表を出力する。 In the second modification of the third embodiment of the present invention, for example, in the question answering device 3 shown in FIG. 13 or the question answering device 30 shown in FIG. 16, the answer type input unit 31 includes the keyword increasing unit 12 (or An answer type which is a type of linguistic expression of an answer candidate for a question configured by an output keyword output by the keyword increasing unit 60), and is associated with the second keyword input to the keyword input unit 11 The answer type is entered. The keyword increasing unit 12 (or the keyword increasing unit 60) outputs the third keyword as an output keyword based on the first keyword input to the keyword input unit 11, and the second keyword input to the keyword input unit 11 Are output as output keywords. The answer candidate extraction unit 24 searches the knowledge DB 25 for document data including the keywords (the third keyword and the second keyword) output by the keyword increase unit 12 (or the keyword increase unit 60). From the extracted document data, a linguistic expression that matches the answer type input to the answer type input unit 31 corresponding to the second keyword, and the third keyword output by the keyword increasing unit 12 and the second keyword are output. It is extracted as an answer candidate for a question composed of keywords. Then, the answer table output unit 15 outputs the answer table.

なお、本発明の第３の実施の形態の変形例３では、解答候補抽出部２４は、知識ＤＢ２５から、キーワード増加部１２（またはキーワード増加部６０）によって出力されたキーワード（第３のキーワードと第２のキーワード）を含む文書データを検索し、この検索処理で抽出された文書データから、予め定められた、当該第２のキーワードに対応付けられた解答タイプに適合する言語表現を、キーワード増加部１２によって出力された第３のキーワードと当該第２のキーワードとによって構成される質問に対する解答候補として抽出する構成を採ってもよい。 In the third modification of the third embodiment of the present invention, the answer candidate extraction unit 24 uses the keyword (the third keyword and the keyword) output from the knowledge DB 25 by the keyword increasing unit 12 (or the keyword increasing unit 60). The document data including the second keyword) is searched, and from the document data extracted by the search processing, a predetermined language expression that matches the answer type associated with the second keyword is increased. A configuration may be adopted in which it is extracted as an answer candidate for a question configured by the third keyword output by the unit 12 and the second keyword.

本発明の第３の実施の形態の変形例３では、例えば、図１３に示す質問応答装置３または図１６に示す質問応答装置３０において、キーワード入力部１１には、第１のキーワードと、複数のグループによってグループ化された第２のキーワードとが入力される。例えば、第１のキーワード「日本」と、人名のグループに属する第２のキーワード「首相」、「市長」と、地名のグループに属する第２のキーワード「首都」、「旧首都」が入力される。解答タイプ入力部３１には、キーワード入力部１１に入力される第２のキーワードが属する各グループに対応付けられた解答タイプが入力される。例えば、人名のグループに対応する解答タイプとして、解答タイプ「人名」が入力され、地名のグループに対応する解答タイプとして、解答タイプ「地名」が入力される。 In the third modification of the third embodiment of the present invention, for example, in the question answering device 3 shown in FIG. 13 or the question answering device 30 shown in FIG. 16, the keyword input unit 11 includes a first keyword and a plurality of keywords. And the second keyword grouped by the group. For example, the first keyword “Japan”, the second keyword “Prime Minister” and “mayor” belonging to the personal name group, and the second keyword “capital” and “old capital” belonging to the place name group are input. . The answer type input unit 31 receives an answer type associated with each group to which the second keyword input to the keyword input unit 11 belongs. For example, the answer type “person name” is input as the answer type corresponding to the group of person names, and the answer type “place name” is input as the answer type corresponding to the group of place names.

キーワード増加部１２（またはキーワード増加部６０）は、第１のキーワードに基づいて、第３のキーワードを出力キーワードとして出力する。また、第２のキーワードに基づいて、当該第２のキーワードが属するグループ毎に、第４のキーワードを出力キーワードとして出力する。例えば、第３のキーワードとして、「日本」、「ドイツ」、「アメリカ」が出力される。また、例えば、人名のグループに属する第４のキーワードとして、「首相」、「市長」、「ノーベル賞受賞者」が出力され、地名のグループに属する第４のキーワードとして、「首都」、「旧首都」、「最南端都市」が出力される。 The keyword increasing unit 12 (or the keyword increasing unit 60) outputs the third keyword as an output keyword based on the first keyword. Further, based on the second keyword, the fourth keyword is output as an output keyword for each group to which the second keyword belongs. For example, “Japan”, “Germany”, and “USA” are output as the third keyword. Also, for example, “Prime Minister”, “Mayor”, “Nobel Prize Winner” are output as the fourth keyword belonging to the personal name group, and “Capital”, “Old” are the fourth keywords belonging to the place name group, for example. "Capital" and "Southernmost city" are output.

解答候補抽出部２４は、キーワード増加部１２（またはキーワード増加部６０）によって出力された第３のキーワードと、人名のグループに属する第４のキーワードとによって構成される質問に対する解答候補を、解答タイプ入力部３１に入力された解答タイプ「人名」を用いて抽出する。例えば、「ドイツのノーベル賞受賞者は？」という質問に対する解答候補は、解答タイプ「人名」を用いて抽出される。また、解答候補抽出部２４は、キーワード増加部１２（またはキーワード増加部６０）によって出力された第３のキーワードと、地名のグループに属する第４のキーワードとによって構成される質問に対する解答候補を、解答タイプ入力部３１に入力された解答タイプ「地名」を用いて抽出する。例えば、「アメリカの首都は？」という質問に対する解答候補は、解答タイプ「地名」を用いて抽出される。そして、解答表出力部１５が解答表を出力する。 The answer candidate extraction unit 24 selects an answer candidate for a question composed of the third keyword output by the keyword increasing unit 12 (or the keyword increasing unit 60) and the fourth keyword belonging to the group of person names as an answer type. The answer type “person name” input to the input unit 31 is used for extraction. For example, answer candidates for the question “Who is the German Nobel Prize winner?” Are extracted using the answer type “person name”. In addition, the answer candidate extraction unit 24 selects answer candidates for a question configured by the third keyword output by the keyword increasing unit 12 (or the keyword increasing unit 60) and the fourth keyword belonging to the place name group. The answer type “location name” input to the answer type input unit 31 is used for extraction. For example, answer candidates for the question “What is the US capital?” Are extracted using the answer type “place name”. Then, the answer table output unit 15 outputs the answer table.

図１７は、本発明の第４の実施の形態における質問応答装置の構成の一例を示す図である。第４の実施の形態では、キーワードの類似関係を用いて解答候補を抽出する。 FIG. 17 is a diagram illustrating an example of a configuration of a question answering apparatus according to the fourth embodiment of the present invention. In the fourth embodiment, answer candidates are extracted using the similarity relationship of keywords.

第４の実施の形態においては、例えば、第１のキーワード「日本」、「アメリカ」、・・・と第２のキーワード「面積」、「首都」、・・・と解答タイプ「固有名詞（数値）」、「固有名詞（地名）」、・・・が入力される。入力される解答タイプは、入力された第２のキーワードのそれぞれに対応付けられている。例えば、第２のキーワード「面積」に対応付けられた解答タイプは「固有名詞（数値）」であり、第２のキーワード「首都」に対応付けられた解答タイプは「固有名詞（地名）」である。 In the fourth embodiment, for example, the first keyword “Japan”, “America”,... And the second keyword “area”, “capital”,. ) ”,“ Proper noun (place name) ”, and so on. The input answer type is associated with each input second keyword. For example, the answer type associated with the second keyword “area” is “proper noun (numerical value)”, and the answer type associated with the second keyword “capital” is “proper noun (place name)”. is there.

第４の実施の形態では、例えば、入力された第１のキーワード「日本」、「アメリカ」、・・・に基づいて、第１のキーワードを多数の第３のキーワード（例えば「日本」、「アメリカ」、「ドイツ」、・・・）に増加させる。また、キーワード増加部１２が第２のキーワード「面積」、「首都」、・・・に基づいて、第２のキーワードを多数の第４のキーワード（例えば「面積」、「首都」、「旧首都」、・・・）に増加させる。 In the fourth embodiment, for example, based on the input first keywords “Japan”, “America”,..., The first keyword is converted into a number of third keywords (for example, “Japan”, “ "America", "Germany", ...). Further, the keyword increasing unit 12 selects the second keyword as a number of fourth keywords (for example, “area”, “capital”, “old capital” based on the second keywords “area”, “capital”,. ", ...).

次に、第２のキーワード（と同一の第４のキーワード）のうち、第４のキーワードに類似するキーワードを、類似キーワードとして決定する。例えば、第４のキーワード「旧首都」に類似する第２のキーワード（と同一の第４のキーワード）「首都」を類似キーワードとして決定する。 Next, among the second keywords (the same fourth keyword), a keyword similar to the fourth keyword is determined as a similar keyword. For example, the second keyword (the same fourth keyword) “capital” similar to the fourth keyword “old capital” is determined as a similar keyword.

そして、第３のキーワードと第４のキーワードとの組み合わせにより構成される質問に対する解答の候補を、上記質問を構成する第４のキーワードに類似する類似キーワードに対応付けられている解答タイプを用いて抽出し、解答表を出力する。例えば、「日本の旧首都は？」という質問に対する解答の候補を、類似キーワード「首都」に対応付けられている解答タイプ「固有名詞（地名）」を用いて抽出し、解答表を出力する。 And the candidate of the answer with respect to the question comprised by the combination of a 3rd keyword and a 4th keyword is used for the answer type matched with the similar keyword similar to the 4th keyword which comprises the said question. Extract and output the answer sheet. For example, answer candidates for the question “What is the old capital of Japan?” Are extracted using the answer type “proper noun (place name)” associated with the similar keyword “capital”, and an answer table is output.

質問応答装置４は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図１７に示す質問応答装置４の構成要素のうち、キーワード入力部１１、キーワード増加部１２、解答表出力部１５、キーワード抽出用ＤＢ１６、パターン抽出部１２１、キーワード抽出部１２２は、それぞれ、図１に示す質問応答装置１の、同符号の構成要素と同様であり、知識ＤＢ２５は、図９に示す質問応答装置２が備える知識ＤＢ２５と同様であり、解答タイプ入力部３１は、図１３に示す質問応答装置３が備える解答タイプ入力部３１と同様である。 The question answering device 4 is a device that increases the number of input keywords and outputs an answer to a question composed of the increased keywords. Among the components of the question answering device 4 shown in FIG. 17, the keyword input unit 11, the keyword increase unit 12, the answer table output unit 15, the keyword extraction DB 16, the pattern extraction unit 121, and the keyword extraction unit 122 are respectively shown in FIG. The knowledge DB 25 is the same as the knowledge DB 25 provided in the question answering apparatus 2 shown in FIG. 9, and the answer type input unit 31 is shown in FIG. This is the same as the answer type input unit 31 provided in the question answering device 3.

キーワード入力部１１には、キーワードが入力される。例えば、第１のキーワード「日本」、「アメリカ」、・・・と第２のキーワード「面積」、「首都」、・・が入力される。解答タイプ入力部３１には、質問作成部４２によって作成される質問、または、質問応答装置４が質問作成部４２を省略する構成を採るときは、キーワード増加部１２によって出力されるキーワードによって構成される質問に対する解答候補の解答タイプが入力される。入力される解答タイプは、特に、第２のキーワードに対応付けられている。 A keyword is input to the keyword input unit 11. For example, the first keyword “Japan”, “America”,... And the second keyword “area”, “capital”,. The answer type input unit 31 includes a question created by the question creating unit 42 or a keyword output by the keyword increasing unit 12 when the question answering device 4 omits the question creating unit 42. The answer type of the answer candidate for the question is entered. The input answer type is particularly associated with the second keyword.

例えば、解答タイプ入力部３１には、第２のキーワード「面積」に対応して、「固有名詞（数値）」という解答タイプが入力され、第２のキーワード「首都」に対応して、「固有名詞（地名）」という解答タイプが入力される。 For example, an answer type “proprietary noun (numerical value)” corresponding to the second keyword “area” is input to the answer type input unit 31, and “specific” corresponding to the second keyword “capital” is input. The answer type “noun (place name)” is input.

キーワード増加部１２は、図１を参照して説明したように、キーワード抽出技術を用いて、入力された各キーワードと同じ分野のキーワードをキーワード抽出用ＤＢ１６から抽出して、キーワードを増加させる。キーワード増加部１２の処理により、第１のキーワードから第３のキーワードが出力され、第２のキーワードから第４のキーワードが出力される。 As described with reference to FIG. 1, the keyword increase unit 12 uses the keyword extraction technique to extract keywords in the same field as the input keywords from the keyword extraction DB 16 and increase the keywords. By the processing of the keyword increasing unit 12, the third keyword is output from the first keyword, and the fourth keyword is output from the second keyword.

類似キーワード決定部４１は、各第４のキーワードに類似する、キーワード入力部１１に入力された第２のキーワード（と同一の第４のキーワード）を、類似キーワードとして決定する。類似キーワードの決定手法について以下に説明する。 The similar keyword determination unit 41 determines a second keyword (the same fourth keyword) input to the keyword input unit 11 that is similar to each fourth keyword as a similar keyword. A method for determining similar keywords will be described below.

（共起ベクトルを用いる手法（１））
第４のキーワード毎に、キーワード増加部１２が抽出したパターンｃ_iと共起してキーワード抽出用ＤＢ１６中に出現した回数を算出し、算出した回数を要素とするベクトル（以下、「共起ベクトル」という）を求める。 (Method using co-occurrence vector (1))
For each fourth keyword, the number of appearances in the keyword extraction DB 16 co-occurring with the pattern c _i extracted by the keyword increasing unit 12 is calculated, and a vector (hereinafter, “co-occurrence vector” having the calculated number of times as an element is calculated. ").

例えば、キーワード増加部１２におけるキーワード抽出処理において、第４のキーワード（１）がパターンｃ₁と共起して出現した回数が０、パターンｃ₂と共起して出現した回数が１、・・・、パターンｃ_nと共起して出現した回数が１とすると、第４のキーワード（１）についての共起ベクトルは、（０，１，・・・１）と求まる。同様にして、他の第４のキーワード（第２のキーワード（２）、第２のキーワード（３）、・・・）についての共起ベクトルを求める。 For example, in the keyword extraction process in the keyword increasing unit 12, the number of times the fourth keyword (1) appears co-occurring with the pattern c ₁ is 0, the number of times the fourth keyword (1) appears co-occurring with the pattern c ₂ is 1,. If the number of occurrences co-occurring with the pattern c _n is 1, the co-occurrence vector for the fourth keyword (1) is obtained as (0, 1,... 1). Similarly, co-occurrence vectors for other fourth keywords (second keyword (2), second keyword (3),...) Are obtained.

キーワード入力部１１に入力された第２のキーワードと同一の第４のキーワードについての共起ベクトルと、対応する類似キーワードを求めたい第４のキーワードについての共起ベクトルとの類似の度合いを求める。例えば、キーワード入力部１１に入力された第２のキーワードと同一の第４のキーワードについての共起ベクトルが（ａ₁，ａ₂，ａ₃，・・・ａ_n）、対応する類似キーワードを求めたい第４のキーワードについての共起ベクトルが（ｂ₁，ｂ₂，ｂ₃，・・・ｂ_n）とすると、（ａ₁−ｂ₁）²＋（ａ₂−ｂ₂）²＋（ａ₃−ｂ₃）²＋・・・（ａ_n−ｂ_n）²の値を算出する。算出された値が両共起ベクトル間の類似の度合いを示している。算出された値が低いほど、類似の度合いが高い。 The degree of similarity between the co-occurrence vector for the fourth keyword identical to the second keyword input to the keyword input unit 11 and the co-occurrence vector for the fourth keyword for which a corresponding similar keyword is desired is obtained. For example, the co-occurrence vector (a ₁ , a ₂ , a ₃ ,..., _An ) for the fourth keyword that is the same as the second keyword input to the keyword input unit 11 is obtained for the corresponding similar keyword. If the co-occurrence vector for the fourth keyword is (b ₁ , b ₂ , b ₃ ,... B _n ), (a ₁ −b ₁ ) ² + (a ₂ −b ₂ ) ² + (a ₃₋ b ₃ ) ² +... ( _An −b _n ) ² is calculated. The calculated value indicates the degree of similarity between the two co-occurrence vectors. The lower the calculated value, the higher the degree of similarity.

算出された値が最も低いときのキーワード入力部１１に入力された第２のキーワードと同一の第４のキーワードを、対応する類似キーワードを求めたい第４のキーワードに類似する類似キーワードとする。 The fourth keyword that is the same as the second keyword input to the keyword input unit 11 when the calculated value is the lowest is set as a similar keyword similar to the fourth keyword for which a corresponding similar keyword is desired.

（共起ベクトルを用いる手法（２））
類似キーワード決定部４１は、まず、第４のキーワードを用いて知識ＤＢ２５を全文検索し、各第４のキーワードと共起して出現した語（共起語）を抽出する。そして、各第４のキーワードが、抽出された共起語と共起して知識ＤＢ２５中に出現した回数を要素とするベクトルを、各第４のキーワードについての共起ベクトルとして求める。 (Method using co-occurrence vector (2))
The similar keyword determination unit 41 first searches the knowledge DB 25 using the fourth keyword, and extracts words (co-occurrence words) that appear along with each fourth keyword. Then, a vector whose element is the number of times each fourth keyword co-occurs with the extracted co-occurrence word and appears in the knowledge DB 25 is obtained as a co-occurrence vector for each fourth keyword.

例えば、第４のキーワード（１）が共起語ｗ₁と共起して出現した回数が２、共起語ｗ₂と共起して出現した回数が０、共起語ｗ₃と共起して出現した回数が１・・・、パターンｃ_nと共起して出現した回数が１とすると、第４のキーワード（１）についての共起ベクトルは、（２，０，１，・・・１）と求まる。同様にして、他の第４のキーワード（第２のキーワード（２），第２のキーワード（３），・・・）についての共起ベクトルを求める。 For example, the number of occurrences of the fourth keyword (1) co-occurring with the co-occurrence word w ₁ is 2, the number of occurrences of co-occurrence with the co-occurrence word w ₂ is 0, and the co-occurrence word w ₃ is co-occurring to number of appearances is 1 ..., when the number of emerged and co-occur with pattern c _n is 1, the co-occurrence vector for the fourth keyword (1), (2,0,1, ...・ 1) is obtained. Similarly, co-occurrence vectors for other fourth keywords (second keyword (2), second keyword (3),...) Are obtained.

なお、本発明の実施の形態においては、類似キーワード決定部４１が、知識ＤＢ２５ではなく、他の文書データを用いて上記共起ベクトルを求める構成を採ることもできる。例えば、大量の文書データが格納された大規模コーパス（図示を省略）を用いて上記共起ベクトルを求める構成を採ることもできる。 In the embodiment of the present invention, the similar keyword determination unit 41 may be configured to obtain the co-occurrence vector using other document data instead of the knowledge DB 25. For example, it is possible to adopt a configuration in which the co-occurrence vector is obtained using a large-scale corpus (not shown) in which a large amount of document data is stored.

（シソーラスデータを用いる手法）
シソーラスデータが分類語彙表の形式で格納されているシソーラスデータベース（図１７では図示を省略）を用意する。類似キーワード決定部４１は、シソーラスデータベース内に格納されているシソーラスデータ中の各単語に振られた、１０桁の分類番号における各桁の数字の一致の割合を用いて、第４のキーワードと、キーワード入力部１１に入力された第２のキーワード（と同一の第４のキーワード）との類似度を求める。 (Method using thesaurus data)
A thesaurus database (not shown in FIG. 17) in which thesaurus data is stored in the form of a classification vocabulary table is prepared. The similar keyword determination unit 41 uses the fourth keyword, using the ratio of the digit of each digit in the 10-digit classification number assigned to each word in the thesaurus data stored in the thesaurus database, The similarity to the second keyword (the same fourth keyword) input to the keyword input unit 11 is obtained.

すなわち、例えば、分類語彙表中の、対応する類似キーワードを求めたい第４のキーワードと同一の単語に振られた分類番号について、キーワード入力部１１に入力された第２のキーワード（と同一の第４のキーワード）と同一の単語に振られた分類番号との間での、各桁の数字の一致の割合を算出し、算出された値の大きさを類似度とする。そして、算出された値が最も大きいときの、上記第２のキーワード（と同一の第４のキーワード）を、対応する類似キーワードを求めたい第４のキーワードに類似する類似キーワードとして決定する。 That is, for example, for the classification number assigned to the same word as the fourth keyword for which the corresponding similar keyword is to be obtained in the classification vocabulary table, the second keyword (the same as the second keyword input to the keyword input unit 11) is assigned. 4) and the classification number assigned to the same word are calculated, and the percentage of the numbers in each digit is calculated, and the magnitude of the calculated value is used as the similarity. Then, the second keyword (the same fourth keyword) as that when the calculated value is the largest is determined as a similar keyword similar to the fourth keyword for which a corresponding similar keyword is desired.

質問作成部４２は、キーワード増加部１２の処理によって出力された第３のキーワードと第４のキーワードとの組み合わせによって構成される質問を作成する。 The question creating unit 42 creates a question configured by a combination of the third keyword and the fourth keyword output by the processing of the keyword increasing unit 12.

本発明の実施の形態においては、図１７に示す構成から質問作成部４２を省略し、解答候補抽出部４３が、キーワード増加部１２によって出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 In the embodiment of the present invention, the question creating unit 42 is omitted from the configuration shown in FIG. 17, and the answer candidate extracting unit 43 is configured by the third keyword and the fourth keyword output by the keyword increasing unit 12. A configuration may be adopted in which answer candidates for a question to be asked are extracted and output.

解答候補抽出部４３は、知識ＤＢ２５から、質問作成部４２によって作成された各質問を構成する第３のキーワードと第４のキーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、各質問を構成する第４のキーワードに類似する類似キーワードに対応付けられて解答タイプ入力部３１に入力された解答タイプに適合する言語表現を、解答候補として抽出する。解答候補抽出部４３は、知識ＤＢ２５から、質問作成部４２によって作成された各質問を構成する第３のキーワードと第４のキーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、各質問を構成する第４のキーワードに類似する類似キーワードに予め対応付けられた解答タイプに適合する言語表現を、解答候補として抽出する構成を採ってもよい。 The answer candidate extraction unit 43 searches the knowledge DB 25 for documents including the third keyword and the fourth keyword that constitute each question created by the question creation unit 42, and includes them in the document extracted by this search process. Language expressions that match the answer type input to the answer type input unit 31 in association with similar keywords similar to the fourth keyword that constitutes each question are extracted as answer candidates. The answer candidate extraction unit 43 searches the knowledge DB 25 for documents including the third keyword and the fourth keyword that constitute each question created by the question creation unit 42, and includes them in the document extracted by this search process. The language expression that matches the answer type previously associated with the similar keyword similar to the fourth keyword constituting each question may be extracted as answer candidates.

また、解答候補抽出部４３は、質問応答装置４が質問作成部４２を省略する構成を採るときは、知識ＤＢ２５から、キーワード増加部１２によって出力される第３のキーワードと第４のキーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、第４のキーワードに類似する類似キーワードに対応付けられて解答タイプ入力部３１に入力された（または予め類似キーワードに対応付けられた）解答タイプに適合する言語表現を、第３のキーワードと当該第４のキーワードとによって構成される各質問に対する解答候補として抽出する。 The answer candidate extraction unit 43 includes the third keyword and the fourth keyword output from the knowledge DB 25 by the keyword increasing unit 12 when the question answering device 4 omits the question creating unit 42. A document is searched, and it is input to the answer type input unit 31 in association with a similar keyword similar to the fourth keyword among the language expressions included in the document extracted by this search processing (or as a similar keyword in advance). A linguistic expression that matches the associated answer type is extracted as an answer candidate for each question configured by the third keyword and the fourth keyword.

すなわち、各質問に対する解答候補の抽出に、各質問を構成する第４のキーワードに類似する類似キーワードに対応付けられた解答タイプを用いる。 That is, an answer type associated with a similar keyword similar to the fourth keyword constituting each question is used to extract answer candidates for each question.

例えば、「日本の旧首都は？」という質問に対する解答の候補を、第４のキーワード「旧首都」に類似する類似キーワード「首都」に対応付けられている解答タイプ「固有名詞（地名）」を用いて抽出する。 For example, the answer type “proprietary noun (place name)” associated with the similar keyword “capital” similar to the fourth keyword “old capital” is selected as the answer candidate for the question “What is Japan's old capital?” Use to extract.

図１８は、本発明の第４の実施の形態における質問応答処理フローの一例を示す図である。キーワード入力部１１に、第１のキーワードと第２のキーワードを入力キーワードとして入力する（ステップＳ３１）。例えば、第１のキーワード「日本」、「アメリカ」、・・・と第２のキーワード「面積」、「首都」、・・・が入力される。また、解答タイプ入力部３１に、第２のキーワードに対応付けられた解答タイプを入力する（ステップＳ３２）。例えば、第２のキーワード「面積」に対応付けられた解答タイプ「固有名詞（数値）」、第２のキーワード「首都」に対応付けられた解答タイプ「固有名詞（地名）」が入力される。 FIG. 18 is a diagram showing an example of a question response process flow according to the fourth embodiment of the present invention. The first keyword and the second keyword are input as input keywords to the keyword input unit 11 (step S31). For example, the first keyword “Japan”, “America”,... And the second keyword “area”, “capital”,. In addition, the answer type input unit 31 inputs the answer type associated with the second keyword (step S32). For example, an answer type “proper noun (numerical value)” associated with the second keyword “area” and an answer type “proper noun (place name)” associated with the second keyword “capital” are input.

キーワード増加部１２のパターン抽出部１２１で、入力キーワードをキーワード抽出用ＤＢ１６で全文検索し、複数の入力キーワードの周辺に出現したパターンをｃ_iとして抽出する（ステップＳ３３）。周辺に出現するパターンの定義は適宜行なう。なお、パターンｃ_iの抽出は、第１のキーワードと第２のキーワードそれぞれについて行う。 A pattern extraction unit 121 of the keyword increasing portion 12, and full-text search input keywords in the keyword extraction DB 16, extracts the emerging pattern around the plurality of input keywords as c _i (step S33). The pattern appearing in the vicinity is appropriately defined. Incidentally, extraction of the pattern c _i are performed for each of the first keyword and the second keyword.

キーワード増加部１２のキーワード抽出部１２２で、パターン抽出部１２１で抽出したパターンｃ_iをキーワード抽出用ＤＢ１６で全文検索し、パターンｃ_iによって抽出される表現ｅｘｐを抽出すると同時に、抽出した表現ｅｘｐをＳｃｏｒｅの値の大きい順にソートし、キーワードとして出力する（ステップＳ３４）。 In keyword extraction section 122 of the keyword increasing portion 12, and full-text search for a pattern c _i extracted by the pattern extraction unit 121 by the keyword extraction DB 16, and at the same time extracts the representation exp extracted by the pattern c _i, the extracted expression exp The values are sorted in descending order of the Score value and output as keywords (step S34).

ステップＳ３４の処理によって、例えば、第１のキーワードが、多数の第３のキーワード（例えば、「日本」、「アメリカ」、「ドイツ」、「イタリア」、「フランス」、「イギリス」・・・）に増加する。また、第２のキーワードが、多数の第４のキーワード（例えば、「面積」、「人口」、「緯度」、「首都」、「旧首都」、「最南端都市」・・・）に増加する。 By the processing in step S34, for example, the first keyword is converted into a number of third keywords (for example, “Japan”, “America”, “Germany”, “Italy”, “France”, “United Kingdom”,...). To increase. In addition, the second keyword increases to a large number of fourth keywords (for example, “area”, “population”, “latitude”, “capital”, “old capital”, “southmost city”, etc.). .

類似キーワード決定部４１が、第４のキーワードと類似する類似キーワードを決定する（ステップＳ３５）。例えば、第４のキーワード「旧首都」に類似する類似キーワードとして、キーワード入力部１１に入力された第２のキーワード（と同一の第４のキーワード）である「首都」が決定される。 The similar keyword determination unit 41 determines a similar keyword similar to the fourth keyword (step S35). For example, as the similar keyword similar to the fourth keyword “Old Capital”, the “capital” that is the second keyword (the same fourth keyword) input to the keyword input unit 11 is determined.

キーワード入力部１１へのキーワードの入力がある間（ステップＳ３６）は、上述したステップＳ３１〜ステップＳ３５の処理が繰り返される。 While the keyword is input to the keyword input unit 11 (step S36), the above-described processing of step S31 to step S35 is repeated.

ステップＳ３６において、キーワード入力部１１への入力キーワードの入力がなくなると、質問作成部４２が、第３のキーワードと第４のキーワードとにより構成される質問を作成する（ステップＳ３７）。例えば、「日本の旧首都は？」、「アメリカの面積は？」、「ドイツの緯度は？」・・・といった質問を作成する。質問応答装置４が質問作成部４２を備えない構成を採るときは、上記ステップＳ３７の処理は、省略される。 In step S36, when there is no input keyword input to the keyword input unit 11, the question creating unit 42 creates a question composed of the third keyword and the fourth keyword (step S37). For example, questions such as "What is the old capital of Japan?", "What is the area of America?", "What is the latitude of Germany?" When the question answering apparatus 4 does not include the question creating unit 42, the process of step S37 is omitted.

解答候補抽出部４３は、作成された各質問に対する解答候補を抽出する（ステップＳ３８）。すなわち、解答候補抽出部４３は、知識ＤＢ２５から、質問作成部４２によって作成された各質問を構成する第３のキーワードと第４キーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、各質問を構成する第４のキーワードが類似する類似キーワードに対応付けられた解答タイプに適合する言語表現を、解答候補として抽出する。 The answer candidate extraction unit 43 extracts answer candidates for each created question (step S38). That is, the answer candidate extraction unit 43 searches the knowledge DB 25 for documents including the third keyword and the fourth keyword that constitute each question created by the question creation unit 42, and extracts the document extracted by this search process. Among the included language expressions, a language expression that matches an answer type associated with a similar keyword in which the fourth keyword constituting each question is similar is extracted as an answer candidate.

ステップＳ３８において、解答候補抽出部４３は、質問応答装置４が質問作成部４２を省略する構成を採るときは、知識ＤＢ２５から、キーワード増加部１２によって出力される第３のキーワードと第４のキーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、第４のキーワードに類似する類似キーワードに対応付けられて解答タイプ入力部３１に入力された解答タイプに適合する言語表現を、第３のキーワードと当該第４のキーワードとによって構成される各質問に対する解答候補として抽出する。 In step S38, the answer candidate extraction unit 43 uses the third keyword and the fourth keyword output from the knowledge DB 25 by the keyword increase unit 12 when the question answering device 4 omits the question creation unit 42. And the answer type input to the answer type input unit 31 in association with a similar keyword similar to the fourth keyword among the language expressions included in the document extracted by the search process. The linguistic expression to be extracted is extracted as an answer candidate for each question composed of the third keyword and the fourth keyword.

例えば、第４のキーワード「緯度」によって構成される質問に対する解答候補の抽出には、第４のキーワード「緯度」が類似する類似キーワード「面積」に対応付けられた解答タイプ「固有名詞（数値）」を用いる。 For example, for extraction of answer candidates for a question composed of the fourth keyword “latitude”, the answer type “proper noun (numerical value) associated with the similar keyword“ area ”similar to the fourth keyword“ latitude ”is used. Is used.

また、例えば、第４のキーワード「旧首都」によって構成される質問に対する解答候補の抽出には、第４のキーワード「旧首都」が類似する類似キーワードに対応付けられた解答タイプ「固有名詞（地名）」を用いる。 For example, for extraction of answer candidates for a question composed of the fourth keyword “Old Capital”, the answer type “proper noun (place name) associated with a similar keyword similar to the fourth keyword“ Old Capital ”is used. ) ”.

そして、解答表出力部１５が、解答表を出力する（ステップＳ３９）。例えば図１９に示すような解答表が出力される。 And the answer table output part 15 outputs an answer table (step S39). For example, an answer table as shown in FIG. 19 is output.

図２０は、本発明の第４の実施の形態における質問応答装置の構成の別の例を示す図である。質問応答装置４０は、入力されたキーワードを増加し、増加したキーワードにより構成される質問に対する解答を出力する装置である。図２０中に示す質問応答装置４０が備える構成要素のうち、図１に示す質問応答装置１が備える構成要素または図６に示す質問応答装置１０または図１７に示す質問応答装置４が備える構成要素と同一の符号が付けられたものは、当該質問応答装置１または質問応答装置１０または質問応答装置４が備える構成要素と同様の機能を有する。本発明の実施の形態においては、図２０に示す構成から質問作成部４２を省略し、解答候補抽出部４３が、キーワード増加部６０によって出力された第３のキーワードと第４のキーワードとによって構成される質問に対する解答候補を抽出し、出力する構成を採ってもよい。 FIG. 20 is a diagram illustrating another example of the configuration of the question answering apparatus according to the fourth embodiment of the present invention. The question answering device 40 is a device that increases the number of input keywords and outputs an answer to a question composed of the increased keywords. Among the constituent elements of the question answering apparatus 40 shown in FIG. 20, the constituent elements of the question answering apparatus 1 shown in FIG. 1, the constituent elements of the question answering apparatus 10 shown in FIG. 6, or the question answering apparatus 4 shown in FIG. The same reference numerals as those of the question answering device 1, the question answering device 10, or the question answering device 4 have the same functions. In the embodiment of the present invention, the question creating unit 42 is omitted from the configuration shown in FIG. 20, and the answer candidate extracting unit 43 is configured by the third keyword and the fourth keyword output by the keyword increasing unit 60. A configuration may be adopted in which answer candidates for a question to be asked are extracted and output.

上記の質問応答装置４０を用いた場合の質問応答処理フローは、図１８に示す質問応答処理フローと、ステップＳ３３、ステップＳ３４の処理が異なる以外は、同様である。質問応答装置４０を用いた場合の質問応答処理フローの一例においては、図１８のステップＳ３３およびステップＳ３４の代わりに、キーワード増加部６０のキーワード抽出部１０１で、キーワード入力部１１に入力された第１のキーワードと同じ分野の単語を単語データＤＢ６１中から抽出し、第３のキーワードとして出力する。また、キーワード抽出部１０１で、キーワード入力部１１に入力された第２のキーワードと同じ分野の単語を単語データＤＢ６１中から抽出し、第４のキーワードとして出力する。 The question answering process flow when the above question answering device 40 is used is the same as the question answering process flow shown in FIG. 18 except that the processes in steps S33 and S34 are different. In an example of the question answering process flow when the question answering device 40 is used, the keyword extracting unit 101 of the keyword increasing unit 60 inputs the keyword input unit 11 to the keyword input unit 11 instead of steps S33 and S34 in FIG. A word in the same field as the one keyword is extracted from the word data DB 61 and output as a third keyword. Further, the keyword extraction unit 101 extracts words in the same field as the second keyword input to the keyword input unit 11 from the word data DB 61 and outputs them as the fourth keyword.

また、質問応答装置４０を用いた場合の質問応答処理フローの別の例においては、図１８のステップＳ３３およびステップＳ３４の代わりに、キーワード増加部６０の類似度算出部１００が、キーワード入力部１１に入力された第１のキーワードとシソーラスＤＢ６２中の単語との類似度を算出し、キーワード増加部６０のキーワード抽出部１０１が、算出された類似度が予め定めた閾値以上の単語を第３のキーワードとして出力する。また、類似度算出部１００が、キーワード入力部１１に入力された第２のキーワードとシソーラスＤＢ６２中の単語との類似度を算出し、キーワード抽出部１０１が、算出された類似度が予め定めた閾値以上の単語を第４のキーワードとして出力する。 In another example of the question answering process flow when the question answering device 40 is used, the similarity calculating unit 100 of the keyword increasing unit 60 replaces the keyword input unit 11 in place of steps S33 and S34 in FIG. The similarity between the first keyword input to the word and the word in the thesaurus DB 62 is calculated, and the keyword extracting unit 101 of the keyword increasing unit 60 calculates a word whose calculated similarity is equal to or greater than a predetermined threshold to the third Output as a keyword. In addition, the similarity calculation unit 100 calculates the similarity between the second keyword input to the keyword input unit 11 and the word in the thesaurus DB 62, and the keyword extraction unit 101 determines the calculated similarity in advance. A word equal to or greater than the threshold is output as the fourth keyword.

また、キーワード抽出部１０１は、例えば、上記算出された類似度が大きい順に所定の個数の単語をシソーラスデータ中から取り出して、上記の第３のキーワード、第４のキーワードとして出力する構成を採ることもできる。 In addition, the keyword extraction unit 101 takes, for example, a configuration in which a predetermined number of words are extracted from the thesaurus data in descending order of the calculated similarity and are output as the third keyword and the fourth keyword. You can also.

また、本発明の第４の実施の形態においては、図１７に示す質問応答装置４または図２０に示す質問応答装置４０は、例えば、解答タイプ入力部３１に替えて、キーワード入力部１１に入力された第２のキーワードに対応付けられた疑問代名詞が入力される疑問代名詞入力部（図示を省略）と、上記疑問代名詞入力部に入力された疑問代名詞に基づいて、キーワード増加部１２（またはキーワード増加部６０）によって出力される出力キーワードによって構成される質問に対する解答の候補の言語表現の類型である解答タイプを推定する解答タイプ推定部（図示を省略）とを備える構成を採ってもよい。 Further, in the fourth embodiment of the present invention, the question answering device 4 shown in FIG. 17 or the question answering device 40 shown in FIG. 20 is input to the keyword input unit 11 instead of the answer type input unit 31, for example. A question pronoun input unit (not shown) for inputting a question pronoun associated with the second keyword that has been entered, and a keyword increase unit 12 (or keyword) based on the question pronoun input to the question pronoun input unit A configuration may be adopted that includes an answer type estimation unit (not shown) that estimates an answer type that is a type of a linguistic expression of an answer candidate for a question configured by an output keyword output by the increase unit 60).

上記の構成においては、解答候補抽出部４３は、知識ＤＢ２５から、キーワード増加部１２（またはキーワード増加部６０）によって出力される第３のキーワードと第４のキーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、第４のキーワードに類似する類似キーワードに対応付けられて上記疑問代名詞入力部に入力された疑問代名詞に基づいて上記解答タイプ推定部が推定した解答タイプに適合する言語表現を、第３のキーワードと当該第４のキーワードとによって構成される各質問に対する解答候補として抽出してもよい。また、解答候補抽出部４３は、知識ＤＢ２５から、キーワード増加部１２（またはキーワード増加部６０）によって出力される第３のキーワードと第４のキーワードを含む文書を検索し、この検索処理で抽出された文書に含まれる言語表現のうち、第４のキーワードに類似する類似キーワードに対応付けられた疑問代名詞（すなわち、キーワード入力部１１に入力された第２のキーワードに対応付けされるものとして予め定められた疑問代名詞）に基づいて上記解答タイプ推定部が推定した解答タイプに適合する言語表現を、第３のキーワードと当該第４のキーワードとによって構成される各質問に対する解答候補として抽出してもよい。 In the above configuration, the answer candidate extraction unit 43 searches the knowledge DB 25 for documents including the third keyword and the fourth keyword output by the keyword increasing unit 12 (or the keyword increasing unit 60), and this search. Of the linguistic expressions included in the document extracted by the processing, the answer type estimation unit estimates based on the question pronoun input corresponding to the similar keyword similar to the fourth keyword and input to the question pronoun input unit. A linguistic expression suitable for the answer type may be extracted as an answer candidate for each question configured by the third keyword and the fourth keyword. Further, the answer candidate extraction unit 43 searches the knowledge DB 25 for documents including the third keyword and the fourth keyword output by the keyword increase unit 12 (or the keyword increase unit 60), and is extracted by this search process. Among the linguistic expressions contained in the document, the pronoun pronoun associated with a similar keyword similar to the fourth keyword (that is, predetermined as being associated with the second keyword input to the keyword input unit 11) A linguistic expression suitable for the answer type estimated by the answer type estimation unit based on the question pronoun is extracted as an answer candidate for each question composed of the third keyword and the fourth keyword. Good.

なお、本発明は、コンピュータにより読み取られ実行されるプログラムとして実施することもできる。本発明を実現するプログラムは、コンピュータが読み取り可能な、可搬媒体メモリ、半導体メモリ、ハードディスクなどの適当な記録媒体に格納することができ、これらの記録媒体に記録して提供され、または、通信インタフェースを介してネットワークを利用した送受信により提供されるものである。 The present invention can also be implemented as a program that is read and executed by a computer. The program for realizing the present invention can be stored in an appropriate recording medium such as a portable medium memory, a semiconductor memory, or a hard disk, which can be read by a computer, provided by being recorded on these recording media, or communication. It is provided by transmission / reception using a network via an interface.

本発明の第１の実施の形態における質問応答装置の構成の一例を示す図である。It is a figure which shows an example of a structure of the question answering apparatus in the 1st Embodiment of this invention. 解答表の一例である。It is an example of an answer table. キーワードの抽出結果に対する適合率・再現率の一例を示す図である。It is a figure which shows an example of the relevance rate and recall rate with respect to the extraction result of a keyword. 正解データの一例を示す図である。It is a figure which shows an example of correct data. 本発明の第１の実施の形態における質問応答処理フローの一例を示す図である。It is a figure which shows an example of the question response process flow in the 1st Embodiment of this invention. 本発明の第１の実施の形態における質問応答装置の構成の別の例を示す図である。It is a figure which shows another example of a structure of the question answering apparatus in the 1st Embodiment of this invention. 単語データＤＢのデータ構成例を示す図である。It is a figure which shows the data structural example of word data DB. シソーラスＤＢのデータ構成例を示す図である。It is a figure which shows the data structural example of a thesaurus DB. 本発明の第２の実施の形態における質問応答装置の構成の一例を示す図である。It is a figure which shows an example of a structure of the question answering apparatus in the 2nd Embodiment of this invention. 解答表の一例である。It is an example of an answer table. 本発明の第２の実施の形態における質問応答処理フローの一例を示す図である。It is a figure which shows an example of the question response process flow in the 2nd Embodiment of this invention. 本発明の第２の実施の形態の変形例１の構成例を示す図である。It is a figure which shows the structural example of the modification 1 of the 2nd Embodiment of this invention. 本発明の第３の実施の形態における質問応答装置の構成の一例を示す図である。It is a figure which shows an example of a structure of the question answering apparatus in the 3rd Embodiment of this invention. 本発明の第３の実施の形態における質問応答処理フローの一例を示す図である。It is a figure which shows an example of the question response process flow in the 3rd Embodiment of this invention. 解答表の一例である。It is an example of an answer table. 本発明の第３の実施の形態の変形例１の構成例を示す図である。It is a figure which shows the structural example of the modification 1 of the 3rd Embodiment of this invention. 本発明の第４の実施の形態における質問応答装置の構成の一例を示す図である。It is a figure which shows an example of a structure of the question answering apparatus in the 4th Embodiment of this invention. 本発明の第４の実施の形態における質問応答処理フローの一例を示す図である。It is a figure which shows an example of the question response process flow in the 4th Embodiment of this invention. 解答表の一例である。It is an example of an answer table. 本発明の第４の実施の形態における質問応答装置の構成の別の例を示す図である。It is a figure which shows another example of a structure of the question answering apparatus in the 4th Embodiment of this invention. 解の候補と得点のリストの例である。It is an example of a list of candidate solutions and scores. 解の候補の得点を単純に加算する方法を用いた出力結果の例である。It is an example of the output result using the method of adding the score of a solution candidate simply. 質問に対する出力結果の例である。It is an example of the output result with respect to a question. 質問に対する出力結果の例である。It is an example of the output result with respect to a question. 質問に対する出力結果の例である。It is an example of the output result with respect to a question. 質問に対する出力結果の例である。It is an example of the output result with respect to a question. サポートベクトルマシン法のマージン最大化の概念を示す図である。It is a figure which shows the concept of margin maximization of a support vector machine method.

Explanation of symbols

１、２、３、４、１０、２０、３０、４０質問応答装置
１１キーワード入力部
１２、１８、６０、６３キーワード増加部
１３、２３、４２質問作成部
１４、２４、４３解答候補抽出部
１５解答表出力部
１６キーワード抽出用ＤＢ
１７学習ＤＢ
２１疑問代名詞入力部
２２解答タイプ推定部
２５知識ＤＢ
３１解答タイプ入力部
４１類似キーワード決定部
６１単語データＤＢ
６２シソーラスＤＢ
１００類似度算出部
１０１、１２２キーワード抽出部
１２１パターン抽出部 1, 2, 3, 4, 10, 20, 30, 40 Question answering device 11 Keyword input unit 12, 18, 60, 63 Keyword increasing unit 13, 23, 42 Question creating unit 14, 24, 43 Answer candidate extracting unit 15 Answer table output section 16 Keyword extraction DB
17 Learning DB
21 Question pronoun input part 22 Answer type estimation part 25 Knowledge DB
31 Answer type input part 41 Similar keyword determination part 61 Word data DB
62 Thesaurus DB
100 similarity calculation unit 101, 122 keyword extraction unit 121 pattern extraction unit

Claims

The first keyword itself is a plurality of keywords, the second keyword itself is a plurality of keywords, and the first keyword and the second keyword are input. The first keyword and the second keyword Along with answers to question data expressed in natural language composed of keywords, a third keyword obtained by increasing the first keyword and a fourth keyword obtained by increasing the second keyword A question answering device that outputs an answer to question data expressed in a natural language,
A keyword input means for inputting a plurality of keywords as input keywords;
Keyword increasing means for extracting more keywords than the number of input keywords based on the input keywords and outputting them as output keywords;
The third keyword, which is an output keyword obtained by increasing the first keyword as an input keyword by the keyword increasing means, and the second keyword as an input keyword obtained by the keyword increasing means. Answer candidate extraction means for extracting an answer candidate that is an answer candidate for a question constituted by the fourth keyword that is an output keyword from a document data group that is a search target of answer candidates stored in advance;
An answer table output means for outputting, as an answer table, a table in which each extracted answer candidate is associated with a question;
The keyword increasing means is:
The input keyword is subjected to a full text search in a keyword extraction database in which document data for keyword extraction is stored, and character strings appearing before and after the plurality of keywords in a search result of the plurality of keywords of the input keyword are used as a pattern. Pattern extracting means for extracting;
A keyword extraction unit that performs a full text search in the keyword extraction database for the pattern extracted by the pattern extraction unit, extracts an expression surrounded by the pattern, and outputs the extracted expression as an output keyword;
A question answering apparatus characterized by that.

The question answering device according to claim 1,
The keyword increasing means is:
Based on the input first keyword, a third keyword is output as an output keyword. Based on the input second keyword, a fourth keyword is output as an output keyword.
The answer candidate extraction means learns what kind of answer the problem will be by using a large number of sets of questions prepared in advance and answers to the problem, and outputs the output based on the learning result. A question answering apparatus, wherein an answer candidate that is a candidate for an answer to a question composed of the third keyword and the fourth keyword is extracted.

The question answering device according to claim 1,
The keyword increasing means is:
Based on the input first keyword, a third keyword is output as an output keyword. Based on the input second keyword, a fourth keyword is output as an output keyword.
The answer candidate extracting unit extracts the output third document data and the fourth keyword data including the fourth keyword from a large amount of document data stored in the storage unit in advance, and language representation of the extracted document data From the above, using the frequency of appearance in the large amount of document data group, an answer candidate for the question constituted by the output third keyword and fourth keyword is extracted. .

The question answering device according to claim 1,
Interrogative pronoun input means for inputting interrogative pronouns associated with the second keyword;
Based on the question pronoun input by the question pronoun input means, an answer type estimation means for estimating an answer type which is a type of a linguistic expression of an answer candidate for a question configured by an output keyword output by the keyword increasing means And
The keyword increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs the input second keyword as an output keyword.
The answer candidate extraction means searches the document data group including the third keyword and the second keyword output by the keyword increasing means from the document data group that is the search target of the answer candidate. Extracting from the extracted document data a linguistic expression that matches the answer type estimated by the answer type estimating means as an answer candidate of a question composed of the third keyword and the second keyword. A question answering device.

The question answering device according to claim 1,
Based on a question pronoun associated with the second keyword determined in advance, an answer type that is a type of a linguistic expression of an answer candidate for a question constituted by an output keyword output by the keyword increasing means is estimated Answer type estimation means for
The keyword increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs the input second keyword as an output keyword.
The answer candidate extraction means searches the document data group including the third keyword and the second keyword output by the keyword increasing means from the document data group that is the search target of the answer candidate. Extracting from the extracted document data a linguistic expression that matches the answer type estimated by the answer type estimating means as an answer candidate of a question composed of the third keyword and the second keyword. A question answering device.

The question answering device according to claim 1,
The keyword increasing means is:
Based on the input first keyword, a third keyword is output as an output keyword. Based on the input second keyword, a fourth keyword is output as an output keyword.
The answer candidate extraction means searches the document data group including the third keyword and the fourth keyword output by the keyword increasing means from the document data group that is the search target of the answer candidate. A linguistic expression suitable for a predetermined answer type is extracted from the extracted document data as an answer candidate of a question composed of the output third keyword and fourth keyword. Question answering device.

The question answering device according to claim 1,
An answer type that is a type of linguistic expression of answer candidates for a question configured by the output keyword output by the keyword increasing means, and that is associated with the second keyword input to the keyword input means An answer type input means for inputting the type is provided.
The keyword increasing means outputs a third keyword as an output keyword based on the input first keyword, and outputs a fourth keyword as an output keyword based on the input second keyword. And
Similar keyword determining means for determining a similar keyword to the output fourth keyword among the second keywords as a similar keyword for each of the fourth keywords,
The answer candidate extraction means searches the document data group including the third keyword and the fourth keyword output by the keyword increasing means from the document data group that is the search target of the answer candidate, and extracts by this search processing A linguistic expression that matches the answer type input to the answer type input means by associating the output fourth keyword with a similar keyword that is similar to the output fourth keyword. And a fourth keyword as a candidate for answering a question.

The question answering device according to claim 7,
The similar keyword determination means includes
A co-occurrence word that is a word that co-occurs with the fourth keyword output by the keyword extraction means is extracted from a large amount of document data stored in advance in the storage means, and the fourth For each keyword, find a co-occurrence vector that is a vector whose elements are the number of occurrences in the document data group co-occurring with each of the extracted co-occurrence words,
The degree of similarity between the co-occurrence vector for each fourth keyword and the co-occurrence vector for the same fourth keyword as the second keyword input to the keyword input means is obtained, and the obtained degree of similarity is obtained. A question answering apparatus characterized in that a fourth keyword identical to a second keyword similar to each of the fourth keywords is determined as the similar keyword.

The first keyword itself is a plurality of keywords, the second keyword itself is a plurality of keywords, and the first keyword and the second keyword are input. The first keyword and the second keyword Along with answers to question data expressed in natural language composed of keywords, a third keyword obtained by increasing the first keyword and a fourth keyword obtained by increasing the second keyword A question answering method for outputting an answer to question data expressed in a natural language,
Inputting a first keyword composed of a plurality of keywords and a second keyword composed of a plurality of keywords;
Based on the input first keyword and second keyword, the third keyword and the fourth keyword are extracted by increasing the number of the respective keywords constituting the first keyword and the second keyword. Output step,
An answer candidate that is a candidate for an answer to a question constituted by a plurality of increased output keywords including the third keyword and the fourth keyword is extracted from a document data group that is a search target for answer candidates stored in advance. Steps,
Outputting a table in which each extracted answer candidate and a question are associated with each other as an answer table,
Extracting more keywords than the number of input keywords includes:
The input keyword is subjected to a full text search in a keyword extraction database in which document data for keyword extraction is stored, and character strings appearing before and after the plurality of keywords in a search result of the plurality of keywords of the input keyword are used as a pattern. Extracting steps;
Searching the full text of the extracted pattern in the keyword extraction database, extracting an expression surrounded by the pattern, and outputting the extracted expression as an output keyword.
A question answering method characterized by that.

The first keyword itself is a plurality of keywords, the second keyword itself is a plurality of keywords, and the first keyword and the second keyword are input. The first keyword and the second keyword Along with answers to question data expressed in natural language composed of keywords, a third keyword obtained by increasing the first keyword and a fourth keyword obtained by increasing the second keyword A question answering program for causing a computer provided in a question answering device to output an answer to question data expressed in a natural language,
In the computer,
A process of inputting a first keyword composed of a plurality of keywords and a second keyword composed of a plurality of keywords;
Based on the input first keyword and second keyword, the third keyword and the fourth keyword are extracted by increasing the number of the respective keywords constituting the first keyword and the second keyword. Processing to output
An answer candidate that is a candidate for an answer to a question constituted by a plurality of increased output keywords including the third keyword and the fourth keyword is extracted from a document data group that is a search target for answer candidates stored in advance. Processing,
And a process of outputting a table in which each extracted answer candidate and a question are associated as an answer table ,
The process of extracting more keywords than the number of input keywords is as follows:
The input keyword is subjected to a full text search in a keyword extraction database in which document data for keyword extraction is stored, and character strings appearing before and after the plurality of keywords in a search result of the plurality of keywords of the input keyword are used as a pattern. Processing to extract,
A question answering program, comprising: performing a full text search on the extracted pattern in the keyword extraction database, extracting an expression surrounded by the pattern, and outputting the extracted expression as an output keyword.