JP4925293B2

JP4925293B2 - Certainty assigning device, method and program

Info

Publication number: JP4925293B2
Application number: JP2006354123A
Authority: JP
Inventors: 真樹村田
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2006-12-28
Filing date: 2006-12-28
Publication date: 2012-04-25
Anticipated expiration: 2026-12-28
Also published as: JP2008165480A

Description

本発明は、例えばgoogle（登録商標）などの検索結果で、検索の上位から検索結果の文書を提示するときに、その各文書に確信度（精度、再現率、Ｆ値、正解率等）を付与するものである。確信度（正解率）は、その文書が検索結果として正しいかどうかを意味する値である。５０％なら、半分の確率であっているもので、１００％なら、ほぼ１００％あっているものである。これを自動で付与する検索結果への確信度付与装置及び方法及びプログラムに関する。 In the present invention, for example, when a search result document such as google (registered trademark) is presented from the top of the search, a certainty factor (accuracy, recall, F value, correct answer rate, etc.) is given to each document. It is given. The certainty factor (correct answer rate) is a value that indicates whether the document is correct as a search result. If it is 50%, the probability is half, and if it is 100%, it is almost 100%. The present invention relates to a certainty degree assigning device, a method, and a program for a search result to be automatically assigned.

従来、キーワードにより文書を検索して、キーワードの出現確率等により検索結果を順序つけて出力するシステムはあった（特許文献１参照）。
特許３７９９４４７号公報 Conventionally, there has been a system in which a document is searched by a keyword, and search results are ordered and output based on the appearance probability of the keyword (see Patent Document 1).
Japanese Patent No. 3799447

上記従来の検索結果を順序つけて出力するシステムは、効果的な方法で各文書に確信度（正解率等）を付与する技術はなかった。 In the conventional system for outputting the search results in order, there is no technique for giving a certainty factor (accuracy rate, etc.) to each document in an effective manner.

本発明は上記問題点の解決を図り、検索結果で、検索の上位から検索結果の文書を提示するときにその各文書に確信度（正解率等）を自動で付与することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to solve the above problems, and to automatically assign a certainty factor (accuracy rate, etc.) to each document when a search result document is presented from the top of the search.

図７は確信度付与装置の説明図である。図７中、１は入力部（入力手段）、６は出力部（出力手段）、１０は文書分類装置（問題解決手段）、１１は対応表作成部（対応関係作成手段）、１２は確信度付与部（確信度付与手段）、１３は格納部（対応表）である。 FIG. 7 is an explanatory diagram of the certainty degree assigning device. In FIG. 7, 1 is an input unit (input unit), 6 is an output unit (output unit), 10 is a document classification device (problem solving unit), 11 is a correspondence table creation unit (correspondence creation unit), and 12 is a certainty factor. An assigning unit (confidence giving means) and 13 are storage units (correspondence table).

本発明は、前記従来の課題を解決するため次のような手段を有する。 The present invention has the following means in order to solve the conventional problems.

（１）：問題を入力する入力手段１と、前記入力された問題を解いてその解答を複数抽出し、該抽出した前記解答と所定値とを出力する問題解決手段１０と、予め解答が付与された問題を複数個用意し、該問題をそれぞれ前記問題解決手段１０に入力してそれぞれの解答を出力するときに、前記所定値と前記解答と前記解答の確信度を求め（即ち、この確信度は、前記解答と、予め解答が付与された問題を照らし合わせて、出力した解答がどのくらいあっているかを調べて、確信度の定義にしたがって求める）、前記所定値と確信度の対応関係を作成する対応関係作成手段１１と、前記入力手段１より新しい問題を入力して前記問題解決手段１０で解答を順序化して出力するとき、ある解答が出力される前記所定値を求め、前記対応関係からある解答の確信度を付与して出力する確信度付与手段１２とを備える。このため、出力される解答の確信度を付与することができ、どの解答が信頼できるかを容易に判断できる。 (1): An input means 1 for inputting a question, a problem solving means 10 for solving the inputted problem and extracting a plurality of the answers, and outputting the extracted answer and a predetermined value, and an answer given in advance When a plurality of questions are prepared, and each of the questions is input to the problem solving means 10 and each answer is output, the certain value, the answer, and the certainty of the answer are obtained (that is, this belief) The degree is determined according to the definition of the certainty by comparing the answer with the question to which the answer has been assigned in advance, and checking how much the answer is output), and the correspondence between the predetermined value and the certainty The correspondence creation means 11 to be created, and when the new problem is inputted from the input means 1 and the answers are ordered by the problem solving means 10 and outputted, the predetermined value at which a certain answer is outputted is obtained, and the correspondence relation From That by applying a confidence answer and a certainty factor providing unit 12 to be output. For this reason, the certainty degree of the output answer can be given and it can be judged easily which answer is reliable.

（２）：前記（１）の確信度付与装置において、前記確信度として、全ての出力のうちの正解出力の割合である適合率を用いる。このため、出力された分類又は文書までの適合率（精度）を容易に判断することができる。 (2): In the certainty factor assigning apparatus of (1), a precision ratio that is a ratio of correct outputs among all outputs is used as the certainty factor. For this reason, it is possible to easily determine the accuracy (accuracy) to the output classification or document.

（３）：前記（１）の確信度付与装置において、前記確信度として、正解数のうち、正解出力の割合である再現率を用いる。このため、再現率により正解のもれ量を容易に判断することができる。 (3): In the certainty factor assigning apparatus according to (1), a recall ratio, which is a ratio of correct output, is used as the certainty factor. For this reason, it is possible to easily determine the amount of correct leakage from the recall.

（４）：前記（１）の確信度付与装置において、前記確信度として、再現率の逆数と適合率の逆数の平均の逆数であるＦ値を用いる。このため、Ｆ値を用いて、適合率（精度）ともれ量を考慮した確信度を付与することができる。 (4): In the certainty factor assigning apparatus of (1), an F value that is the inverse of the average of the reciprocal of the recall rate and the reciprocal of the relevance rate is used as the certainty factor. For this reason, the certainty factor which considered the amount of precision (accuracy) leakage amount can be provided using F value.

（５）：前記（１）〜（４）の確信度付与装置において、前記確信度付与手段１２により確信度を付与して出力する数を、Ｆ値を最大にする数とする。このため、確信度の高い分類又は文書のみを出力することができる。 (5): In the certainty degree assigning apparatus according to the above (1) to (4), the number that is output with the certainty degree imparted by the certainty degree imparting means 12 is the number that maximizes the F value. For this reason, it is possible to output only classifications or documents with high certainty.

（６）：前記（１）の確信度付与装置において、前記確信度として、個々の解答の正解率を用いる。このため、出力される個々の解答の正解率（確信度）を付与することができ、どの解答が信頼できるかを容易に判断できる。 (6): In the certainty factor assigning apparatus of (1), the correct answer rate of each answer is used as the certainty factor. For this reason, the correct answer rate (certainty factor) of each output answer can be given, and it can be judged easily which answer is reliable.

（７）：前記（６）の確信度付与装置において、予め解答が付与された問題を複数個用意し、該問題をそれぞれ前記問題解決手段１０に入力してそれぞれの解答を出力するときに、該解答がぎりぎり出力される前記所定値を求め、該ぎりぎり出力される解答が正解しているかを調べて前記所定値の時の正解率を求め、どうゆう所定値なら正解か不正解かの事例を機械学習して学習結果を蓄える機械学習手段を備え、前記確信度付与手段１２は、前記対応関係として前記学習結果を用いる。このため、機械学習により、出力される正解の正解率を容易に付与することができる。 (7): In the certainty degree assigning device of (6), when preparing a plurality of questions to which an answer has been assigned in advance, and inputting each of the questions to the problem solving means 10 and outputting each answer, Finding the predetermined value at which the answer is output at the last minute, checking whether the answer output at the last minute is correct, obtaining the correct answer rate at the predetermined value, and how to determine whether the answer is correct or incorrect Machine learning means for machine learning and storing the learning result, and the certainty factor giving means 12 uses the learning result as the correspondence. For this reason, the correct answer rate of the output correct answer can be easily given by machine learning.

（８）：前記（７）の確信度付与装置において、前記所定値として、複数観点の所定値を用い、前記機械学習手段に前記複数観点の所定値のときの正解か不正解かの事例を機械学習させる。このため、複数観点の所定値により、より正確な機械学習を行うことができる。 (8): In the certainty factor assigning apparatus of (7), a predetermined value of a plurality of viewpoints is used as the predetermined value, and a case of a correct answer or an incorrect answer when the predetermined value of the plurality of viewpoints is used in the machine learning means. Let machine learning. For this reason, more accurate machine learning can be performed using predetermined values from a plurality of viewpoints.

（９）：前記（１）〜（８）の確信度付与装置において、前記問題解決手段１０が文書分類装置であり、前記問題が分類を付与する文書であり、前記解答が前記文書の分類である。このため、出力される分類に確信度を付与することができる。 (9): In the certainty assigning apparatus according to (1) to (8), the problem solving means 10 is a document classification apparatus, the problem is a document to which a classification is assigned, and the answer is a classification of the document. is there. For this reason, a certainty factor can be given to the classification to be output.

（１０）：前記（１）〜（８）の確信度付与装置において、前記問題解決手段１０が情報検索装置であり、前記問題が質問の文書であり、前記解答が前記質問の文書より検索された文書である。このため、出力される解答の文書に確信度を付与することができる。 (10): In the certainty degree assigning device according to (1) to (8), the problem solving means 10 is an information search device, the problem is a question document, and the answer is searched from the question document. Document. For this reason, a certainty factor can be given to the output answer document.

（１１）：前記（１）〜（１０）の確信度付与装置において、前記問題解決手段１０で、スコアを求めて解答を出力する場合、前記所定値として、ある解答のスコアを最初の解答のスコアで割った値（kp）を用いる。このため、あるkpにより解答に確信度を付与することができる。 (11): In the certainty degree assigning devices of (1) to (10), when the problem solving means 10 obtains a score and outputs an answer, the score of a certain answer is set as the predetermined value as the first answer. Use the value divided by the score (kp). For this reason, certainty can give certainty to the answer.

（１２）：前記（１）〜（１０）の確信度付与装置において、前記問題解決手段１０で、解答を出力するときの前記所定値として出力順位（kj）を用いる。このため、kjにより解答に確信度を付与することができる。 (12): In the certainty factor assigning apparatus of (1) to (10), the problem solving means 10 uses the output rank (kj) as the predetermined value when outputting the answer. For this reason, certainty can be given to the answer by kj.

（１３）：前記（１）〜（１０）の確信度付与装置において、前記問題解決手段１０で、スコアを求めて解答を出力する場合、前記所定値として、スコア（kl）を用いる。このため、klにより解答に確信度を付与することができる。 (13): In the certainty degree assigning device according to (1) to (10), when the problem solving means 10 obtains a score and outputs an answer, the score (kl) is used as the predetermined value. For this reason, certainty can be given to the answer by kl.

本発明によれば次のような効果がある。 The present invention has the following effects.

（１）：予め解答が付与された問題から対応関係作成手段で、所定値と確信度の対応関係を作成しておき、新しい問題を入力して問題解決手段で解答を順序化して出力するとき、ある解答が出力される所定値を求め、前記対応関係からある解答の確信度を付与して出力するため、出力される解答の確信度を付与することができ、どの解答まで信頼できるかを容易に判断できる。 (1): When the correspondence creation means creates a correspondence between the predetermined value and the certainty degree from the question to which the answer has been assigned in advance, and inputs a new question and orders and outputs the answers by the problem solving means Since a certain value is output for a certain answer, and the certainty of the certain answer is given and output from the correspondence, the certainty of the answer can be given, and which answer can be trusted Easy to judge.

（２）：前記確信度として、全ての出力のうちの正解出力の割合である適合率を用いるため、出力された分類又は文書までの適合率（精度）を容易に判断することができる。 (2): As the certainty factor, since the relevance ratio that is the ratio of correct output among all outputs is used, the relevance ratio (accuracy) to the output classification or document can be easily determined.

（３）：前記確信度として、正解数のうち、正解出力の割合である再現率を用いるため、再現率により正解のもれ量を容易に判断することができる。 (3): As the certainty factor, since the recall rate that is the ratio of the correct answer output among the number of correct answers is used, the correct answer leakage amount can be easily determined by the recall rate.

（４）：前記確信度として、再現率の逆数と適合率の逆数の平均の逆数であるＦ値を用いるため、Ｆ値を用いて、適合率（精度）ともれ量を考慮した確信度を付与することができる。 (4): Since the F value, which is the average of the reciprocal of the reproducibility and the reciprocal of the relevance rate, is used as the certainty factor, the F value is used to determine the certainty factor considering the precision (accuracy) leakage amount. Can be granted.

（５）：前記確信度付与手段により確信度を付与して出力する数を、Ｆ値を最大にする数とするため、確信度の高い分類又は文書のみを出力することができる。 (5): Since the number to which the certainty factor is given and output by the certainty factor assigning unit is the number that maximizes the F value, it is possible to output only a classification or a document with a high certainty factor.

（６）：前記確信度として、個々の解答の正解率を用いるため、出力される個々の解答の正解率（確信度）を付与することができ、どの解答が信頼できるかを容易に判断できる。 (6): Since the accuracy rate of each answer is used as the certainty factor, the correct answer rate (confidence factor) of each output answer can be given, and it can be easily determined which answer is reliable. .

（７）：機械学習手段を備え前記確信度付与手段で、前記対応関係として前記学習結果を用いるため、機械学習により、出力される正解の正解率を容易に付与することができる。 (7) Since the learning result is provided with machine learning means and the learning result is used as the correspondence relationship, the correct answer rate of the output correct answer can be easily given by machine learning.

（８）：前記所定値として、複数観点の所定値を用い、機械学習手段に前記複数観点の所定値のときの正解か不正解かの事例を機械学習させるため、複数観点の所定値により、より正確な機械学習を行うことができる。 (8): As the predetermined value, a predetermined value of a plurality of viewpoints is used, and the machine learning means causes the machine learning unit to machine-learn cases of correct answers or incorrect answers when the predetermined values of the plurality of viewpoints are satisfied. More accurate machine learning can be performed.

（９）：前記問題解決手段が文書分類装置であり、前記問題が分類を付与する文書であり、前記解答が前記文書の分類であるため、出力される分類に確信度を付与することができる。 (9): Since the problem solving means is a document classification device, the problem is a document to which a classification is given, and the answer is a classification of the document, a certainty factor can be given to the outputted classification. .

（１０）：前記問題解決手段が情報検索装置であり、前記問題が質問の文書であり、前記解答が前記質問の文書より検索された文書であるため、出力される解答の文書に確信度を付与することができる。 (10): Since the problem solving means is an information retrieval device, the problem is a question document, and the answer is a document retrieved from the question document, a certainty factor is given to the output answer document. Can be granted.

（１１）：前記問題解決手段で、スコアを求めて解答を出力する場合、前記所定値として、ある解答のスコアを最初の解答のスコアで割った値（kp）を用いるため、あるkpにより解答に確信度を付与することができる。 (11): In the case where the problem solving means obtains a score and outputs an answer, a value (kp) obtained by dividing the score of a certain answer by the score of the first answer is used as the predetermined value. The certainty can be given to.

（１２）：問題解決手段で、解答を出力するときの前記所定値として出力順位（kj）を用いるため、kjにより解答に確信度を付与することができる。 (12): Since the output order (kj) is used as the predetermined value when the answer is outputted by the problem solving means, the certainty can be given to the answer by kj.

（１３）：前記問題解決手段で、スコアを求めて解答を出力する場合、前記所定値として、スコア（kl）を用いるため、klにより解答に確信度を付与することができる。 (13): In the case where the problem solving means obtains a score and outputs an answer, the score (kl) is used as the predetermined value, so that the certainty can be given to the answer by kl.

本発明は、情報検索結果で、検索の上位から検索結果の文書を提示するときにその各文書に確信度（精度、再現率、Ｆ値、正解率等）を自動で付与するものである。付与の方法は、あらかじめ正解のセットを用意しておき、その正解セットでどういう場合に、どのくらいの精度かの対応表を求めておく。そして新しい文書がきたとき、その文書がどういう場合か調べて、先に求めた表から確信度を求める。なお、表以外に他の同様の方法でも可能である。また、文書検索以外の、出力がリスト化されているものならばどのようなものも扱える。 The present invention automatically assigns certainty (accuracy, recall, F value, correct answer rate, etc.) to each document when presenting the search result document from the top of the search in the information search result. As a method of giving, a set of correct answers is prepared in advance, and a correspondence table showing how accurate the correct answer set is in any case. Then, when a new document arrives, the situation of the document is examined and the certainty factor is obtained from the previously obtained table. In addition to the table, other similar methods are possible. Also, anything other than document search can be handled as long as the output is listed.

§１：表に基づく確信度付与の説明
本発明は、分類したい文書と類似した文書を、検索において高精度で知られるBM25やSMART の方式で収集し、その文書群で出現頻度の大きい分類にその文書を分類するとき確信度の付与を行う。特に、一つの文書に複数の分類が付与される、Multi-class の分類問題を扱い、出現頻度の大きい分類のうち、どの分類までを、その文書の分類とするか確信度を参考とすることができる。 §1: Explanation of assigning confidence based on a table The present invention collects documents similar to a document to be classified by a BM25 or SMART method that is known with high accuracy in a search, and classifies the document group with a high appearance frequency. Confidence is given when classifying the document. In particular, handle a multi-class classification problem in which multiple classifications are assigned to a single document, and refer to the certainty of which classification among the classifications with the highest frequency of occurrence as the classification of the document. Can do.

（１）：文書分類装置の説明
図１は文書分類装置の説明図である。図１において、文書分類装置には、入力部（入力手段）１、文書抽出部（文書抽出手段）２、文書類似度算出部（文書類似度算出手段）３、スコア算出部（スコア算出手段）４、分類集合抽出部（分類集合抽出手段）５、出力部（出力手段）６が設けてある。 (1): Description of Document Classification Device FIG. 1 is an explanatory diagram of a document classification device. 1, the document classification apparatus includes an input unit (input unit) 1, a document extraction unit (document extraction unit) 2, a document similarity calculation unit (document similarity calculation unit) 3, and a score calculation unit (score calculation unit). 4. A classification set extraction unit (classification set extraction unit) 5 and an output unit (output unit) 6 are provided.

入力部１は、特許文書等の文書を入力する入力手段である。文書抽出部２は、分類したい文書と類似した文書（ｋ個）を抽出する文書抽出手段である。文書類似度算出部３は、文書間の類似度を算出する文書類似度算出手段である。スコア算出部４は、分類のスコアを算出するスコア算出手段である。分類集合抽出部５は、分類のスコアにより、分類したい文書の分類集合（スコアが指定値以上のもの）を抽出する分類集合抽出手段である。出力部６は、分類したい文書の分類を出力（画面表示、印刷）する出力手段である。この出力部６の出力は、画面表示せず、プログラム内部で、他のプログラムに出力したり、プログラム内部で変数の値として、算出したりすることも含むものである。 The input unit 1 is input means for inputting a document such as a patent document. The document extracting unit 2 is a document extracting unit that extracts a document (k pieces) similar to a document to be classified. The document similarity calculation unit 3 is a document similarity calculation unit that calculates the similarity between documents. The score calculation unit 4 is score calculation means for calculating a classification score. The classification set extraction unit 5 is a classification set extraction unit that extracts a classification set of documents to be classified (with a score equal to or higher than a specified value) based on the classification score. The output unit 6 is output means for outputting (screen display, printing) the classification of the document to be classified. The output of the output unit 6 includes not being displayed on the screen but also outputting to another program inside the program or calculating as a variable value inside the program.

（２）：特許の文書分類装置の説明
特許文書（特許文献）は、ＩＰＣ、ＦＩ、Ｆターム（F-term）等で分類されている。特に、F-termは、一定の技術範囲（テーマ）を種々の技術的観点から多観点で区別したものであり、例えば、目的、用途、構造、材料、製法、処理操作方法、制御手段など多数の技術的観点から技術を区別したタームリストに基づいている。このため、一つの特許文書には、通常、複数のF-term（特許分類）が付与されている。以下、文書として特許文書を用いる場合の説明をする。 (2): Description of Patent Document Classification Device Patent documents (patent documents) are classified by IPC, FI, F-term, and the like. In particular, the F-term distinguishes a certain technical scope (theme) from various technical viewpoints from many viewpoints. For example, there are many purposes, applications, structures, materials, manufacturing methods, processing methods, control means, etc. It is based on a term list that distinguishes technologies from the technical point of view. For this reason, a plurality of F-terms (patent classifications) are usually assigned to one patent document. Hereinafter, a description will be given of a case where a patent document is used as a document.

図２は特許文書分類装置の説明図である。図２において、特許文書分類装置には、入力部（入力手段）１、KDOC抽出部（KDOC抽出手段）２、文書類似度算出部（文書類似度算出手段）３、スコア（Score _M1(x) ）算出部（スコア算出手段）４、F-term xの集合抽出部（F-term xの集合抽出手段）５、出力部（出力手段）６が設けてある。 FIG. 2 is an explanatory diagram of the patent document classification apparatus. 2, the patent document classification device includes an input unit (input unit) 1, a KDOC extraction unit (KDOC extraction unit) 2, a document similarity calculation unit (document similarity calculation unit) 3, and a score (Score _M1 (x) ) A calculation unit (score calculation unit) 4, an F-term x set extraction unit (F-term x set extraction unit) 5, and an output unit (output unit) 6 are provided.

入力部１は、特許文書を入力する入力手段である。KDOC抽出部２は、分類したい特許文書と類似した特許文書（ｋ個）を抽出するKDOC抽出手段である。なお、ここでKDOCは、抽出したｋ個の特許文書である。文書類似度算出部３は、特許文書間の類似度を算出する文書類似度算出手段である。スコア（Score _M1(x) ）算出部４は、特許分類のスコア（Score _M1(x) ）を算出するスコア算出手段である。Ｆ-term x の集合抽出部５は、特許分類のスコアにより、分類したい特許文書のF-term xの集合を抽出する分類集合抽出手段である。出力部６は、分類したい特許文書のF-term xの集合を出力する出力手段である。 The input unit 1 is input means for inputting a patent document. The KDOC extraction unit 2 is a KDOC extraction unit that extracts patent documents (k) similar to the patent documents to be classified. Here, KDOC is k extracted patent documents. The document similarity calculation unit 3 is a document similarity calculation unit that calculates the similarity between patent documents. The score (Score _M1 (x)) calculating unit 4 is score calculating means for calculating a patent classification score (Score _M1 (x)). The F-term x set extraction unit 5 is classification set extraction means for extracting a set of F-term x of patent documents to be classified based on patent classification scores. The output unit 6 is an output unit that outputs a set of F-term x of patent documents to be classified.

（３）：特許文書の分類処理の説明
図３は特許文書の分類処理フローチャートである。以下、図３の処理Ｓ１〜Ｓ５に従って説明する。 (3): Explanation of Patent Document Classification Processing FIG. 3 is a flowchart of patent document classification processing. Hereinafter, description will be given according to the processing S1 to S5 of FIG.

Ｓ１：入力部１に、分類したい特許文書を入力する。 S1: A patent document to be classified is input to the input unit 1.

Ｓ２：KDOC抽出部２は、入力した分類したい特許文書と類似したｋ個の特許文書（KDOC）を抽出する。ここで、文書類似度算出部３で、入力した分類したい特許文書と学習データとして与えられた特許文書集合（データベース等の格納手段内の）との類似度を求める。学習データとして与えられた特許文書集合は、正しいF-termの分類の付与された文書集合である。ｋ個の特許文書の取り出しには、ruby-ir toolkit を利用した。ｋは実験で定める値である。 S2: The KDOC extraction unit 2 extracts k patent documents (KDOC) similar to the inputted patent document to be classified. Here, the document similarity calculation unit 3 obtains the similarity between the inputted patent document to be classified and the patent document set (in a storage means such as a database) given as learning data. The patent document set given as learning data is a document set to which a correct F-term classification is assigned. The ruby-ir toolkit was used to extract k patent documents. k is a value determined by experiments.

Ｓ３：スコア（Score _M1(x) ）算出部４は、特許分類のスコア（Score _M1(x) ）を算出する。 S3: The score (Score _M1 (x)) calculating unit 4 calculates a patent classification score (Score _M1 (x)).

Ｓ４：Ｆ-term x の集合抽出部５は、特許分類のスコアにより、分類したい特許文書のＦ-term x の集合（スコアが指定値以上のもの）を抽出する。 S4: The F-term x set extraction unit 5 extracts a set of F-term x (scores greater than or equal to a specified value) of patent documents to be classified based on patent classification scores.

Ｓ５：出力部６は、分類したい特許文書のＦ-term x の集合を出力する。 S5: The output unit 6 outputs a set of F-term x of patent documents to be classified.

図４は入力特許文書と選択された特許文書の間の類似度を求める処理フローチャートである。以下、図４の処理Ｓ１１〜Ｓ１２に従って説明する。 FIG. 4 is a process flowchart for obtaining the similarity between the input patent document and the selected patent document. Hereinafter, a description will be given according to the processes S11 to S12 of FIG.

Ｓ１１：文書類似度算出部３は、入力の特許文書からキーワードを抽出する。このキーワードとしては、形態素解析技術を利用して、名詞を取り出した。 S11: The document similarity calculation unit 3 extracts keywords from the input patent document. As this keyword, nouns were extracted using morphological analysis technology.

Ｓ１２：文書類似度算出部３は、次に学習データにある与えられた入力のテーマ（テーマは特に与えなくてもよい）を持つすべての特許文書から、上記キーワードを少なくとも一つ含む特許文書を取り出し、該取り出した特許文書の Sim_SMARTを算出する。この Sim_SMARTを学習データにあるそれぞれの特許文書との間の類似度として用いる。 S12: The document similarity calculation unit 3 next selects a patent document including at least one of the above keywords from all the patent documents having a given input theme (theme may not be given) in the learning data. Take out and calculate the Sim _SMART of the taken out patent document. This Sim _SMART is used as the similarity between each patent document in the learning data.

（４）：Ｆ-term x の集合の取り出しの説明
Ｆ-term x の集合の取り出しには、以下のように四つの方法がある。 (4): Explanation of F-term x set extraction There are four methods for extracting the F-term x set as follows.

ａ）方法１の説明
特許分類装置（KDOC抽出部２）は、まず、入力と最も類似したｋ個の特許文書を、学習データとして与えられた特許文書集合（正しいF-termの分類の付与された文書集合) から取り出す。このｋ個の特許文書をKDOCと呼ぶことにする。文書の取り出しには、ruby-ir toolkit を利用した。ｋは、実験で定める値である。 a) Description of Method 1 First, the patent classification device (KDOC extraction unit 2) first sets k patent documents most similar to the input as a set of patent documents (correct F-term classification assigned as learning data). From the document set). These k patent documents will be referred to as KDOC. The ruby-ir toolkit was used to retrieve the documents. k is a value determined by experiments.

（ruby-ir toolkit の参考文献）
ruby-ir-eng,"Masao Utiyama", "Information Retrieval Module for Ruby", 2005,
（"www2.nict.go.jp/jt/a132/members/mutiyama/software" ）
特許分類装置（スコア算出部４）は、次に、KDOCを以下の式（１）にしたがってソートすることで、F-term xのスコア（Score _M1(x) ）を計算する。 (Reference for ruby-ir toolkit)
ruby-ir-eng, "Masao Utiyama", "Information Retrieval Module for Ruby", 2005,
("Www2.nict.go.jp/jt/a132/members/mutiyama/software")
Next, the patent classification device (score calculation unit 4) calculates the score (Score _M1 (x)) of F-term x by sorting KDOCs according to the following equation (1).

ここで、
role(x,i) = 1 （もしi 番目の文書が F-term x の分類を持つ場合）
= 0（その他の場合）
ただし、score _doc(i) は、入力文書と選択された文書の間の類似度がi 番目に大きいとされた文書の類似度の値であり、 k_rは実験により定められる定数である。なお、score _doc(i) を、次のように簡単にすることもできる。 here,
role (x, i) = 1 (if i-th document has F-term x classification)
= 0 (otherwise)
Here, score _doc (i) is the value of the similarity of the document in which the similarity between the input document and the selected document is the i-th largest, and k _r is a constant determined by experiment. You can also simplify score _doc (i) as follows:

score _doc(i) ＝ 1001 - i
特許分類装置（分類集合抽出部５）は、最終的に、以下の式（２）を満足するF-term xの集合を取り出す。 score _doc (i) = 1001-i
The patent classification device (classification set extraction unit 5) finally extracts a set of F-term x that satisfies the following expression (2).

｛ x｜Score _M1(x) ≧ k_p× max_yScore _M1(y) ｝・・・・（２）
ただし、 k_pは、実験により定められる定数である。この取り出されたF-term xの集合が求める分類である。 {X | Score _M1 (x) ≧ k _p × max _y Score _M1 (y)} (2)
However, k _p is a constant determined by experiment. This is the classification required by the set of F-term x extracted.

方法１の利用例の説明
（下のF-term1 、F-term2 などは、各文書にふられているF-termである)
文書Ａ入力文書との類似度 100 F-term1
文書Ｂ入力文書との類似度 90 F-term1 F-term2
文書Ｃ入力文書との類似度 80 F-term1
文書Ｄ入力文書との類似度 70 F-term3
だったとし、kr = 0.99 とすると，
F-term1 のスコアは、 100+90*0.99+80*0.99＾2=267.5
F-term2 のスコアは、 90*0.99=89.1
F-term3 のスコアは、 70*0.99＾3=67.9
となる。 Explanation of usage example of Method 1 (F-term1, F-term2, etc. below are F-terms used in each document)
Document A Similarity with input document 100 F-term1
Document B Similarity with input document 90 F-term1 F-term2
Document C Similarity with input document 80 F-term1
Document D Similarity with input document 70 F-term3
And kr = 0.99,
The score of F-term1 is 100 + 90 * 0.99 + 80 * 0.99 ^ 2 = 267.5
F-term2 score is 90 * 0.99 = 89.1
F-term3 score is 70 * 0.99 ^ 3 = 67.9
It becomes.

kp = 0.9とすると、トップのスコアの 267.5の 0.9倍の 240.8以上のスコアの分類を取り出す。この場合、F-term1 だけがそれを満足するので、F-term1 だけが答えとして取り出されることになる。 If kp = 0.9, the classification of scores with a score of 240.8 or higher, which is 0.9 times the top score of 267.5, is extracted. In this case, only F-term1 satisfies it, so only F-term1 is taken out as an answer.

ｂ）方法２の説明
文書分類装置は、まず、方法１と同様に KDOC を取り出す。文書分類装置は、次に、F-term xが KDOC において、何個の文書に現れたかを数える。この数を F_KDOC(x) で記すと、文書分類装置は、最終的に以下の式を満足するF-term xの集合を取り出すことになる。 b) Description of Method 2 First, the document classification device extracts KDOC as in Method 1. The document classifier then counts how many documents F-term x appears in KDOC. If this number is written as F _KDOC (x), the document classification device will eventually extract a set of F-term x that satisfies the following expression.

｛ x｜ F_KDOC(x) ≧ k_u×k ｝,
ただし、 k_uは、実験により定められる定数である。ただし、 k_u＝0.5 のとき、この方法は、オリジナルのｋ近傍法と同一になる。 {X | F _KDOC (x) ≧ k _u × k},
However, k _u is a constant determined by experiment. However, when k _u = 0.5, this method is identical to the original k-neighbor method.

ｃ）方法３の説明
文書分類装置は、まず、方法１と同様に KDOC を取り出す。文書分類装置は、次に、 F_KDOC(x) を計算する。文書分類装置は、最終的に、 F_KDOC(x) の値の大きい順に k_f個の F-term を取り出し、これを求める分類とする。ここで、 k_fは、実験により定める定数である。 c) Description of Method 3 First, the document classification device extracts KDOC as in Method 1. The document classifier then calculates F _KDOC (x). The document classification device finally takes out k _f F-terms in descending order of the value of F _KDOC (x), and determines them as the classification to be obtained. Here, k _f is a constant determined by experiment.

（５）：対応表の説明
上記方法１〜３でkp、 k_u、 k_fを変化すると、取り出すF-termの数が変化するこになる。ここで入力文書に正解データ（正しいF-termが付与されている）がある場合、変化させた各kp、 k_u、 k_fと確信度（精度、再現率、Ｆ値）の対応表を作成することができる。 (5): Explanation of correspondence table When kp, k _u , k _f are changed in the above methods 1 to 3, the number of F-terms to be taken out changes. If there is correct data (correct F-term is given) in the input document, create a correspondence table for each changed kp, k _u , k _f and certainty (accuracy, recall, F value) can do.

例えば、方法１を利用した場合のkpと特許文書のF-termの精度（適合率）の対応の場合、
kp=0.9の時に選ばれたF-termの精度→95%
kp=0.8の時に選ばれたF-termの精度→85%
kp=0.7の時に選ばれたF-termの精度→80%
kp=0.6の時に選ばれたF-termの精度→75%
kp=0.5の時に選ばれたF-termの精度→65%
kp=0.4の時に選ばれたF-termの精度→50%
kp=0.3の時に選ばれたF-termの精度→45%
kp=0.2の時に選ばれたF-termの精度→20%
kp=0.1の時に選ばれたF-termの精度→10%
上記の対応が各入力文書（正しいF-termが付与されている特許文書）ごとに出力される。したがって、精度（適合率）は、特許文書分類装置に入力された特許文書ごとに出力され、特許文書ごとに異なる精度となることがあるので、各特許文書の精度の平均をとる。例えば、kp=0.9の時の各特許文書の精度の平均を取るものである。なお、再現率、Ｆ値の場合も精度と同様に各特許文書の平均を取って対応表を作成する。 For example, in the case of correspondence between kp when using Method 1 and the accuracy (accuracy rate) of F-term of patent documents,
F-term accuracy selected when kp = 0.9 → 95%
F-term accuracy selected when kp = 0.8 → 85%
F-term accuracy selected when kp = 0.7 → 80%
F-term accuracy selected when kp = 0.6 → 75%
F-term accuracy selected when kp = 0.5 → 65%
F-term accuracy selected when kp = 0.4 → 50%
F-term accuracy selected when kp = 0.3 → 45%
F-term accuracy selected when kp = 0.2 → 20%
F-term accuracy selected when kp = 0.1 → 10%
The above correspondence is output for each input document (patent document with the correct F-term). Accordingly, the accuracy (relevance ratio) is output for each patent document input to the patent document classification apparatus, and may have different accuracy for each patent document, so the average accuracy of each patent document is taken. For example, the accuracy of each patent document when kp = 0.9 is taken. In the case of the recall rate and F value, the correspondence table is created by taking the average of each patent document in the same manner as the accuracy.

図５はkpとＦ値の対応の説明図である。図５において、kpとＦ値（F-measure ）の対応は、kpが 0.1から 0.3まではＦ値が上昇し、 0.4から 0.9までＦ値が低下している。kpが 0.3の時Ｆ値が最大となっている。なお、Dry run のデータは、各手法のパラメータを決めるのに利用した。Formal runのデータでの実験結果が、手法の性能を示していることになる。 FIG. 5 is an explanatory diagram of the correspondence between kp and F value. In FIG. 5, the correspondence between kp and F value (F-measure) is that the F value increases from kp to 0.1 to 0.3, and the F value decreases from 0.4 to 0.9. The F value is maximum when kp is 0.3. The Dry run data was used to determine the parameters for each method. The experimental results with Formal run data show the performance of the method.

図６はkpと再現率と精度の対応の説明図である。図６において、横軸が再現率(Recall)、縦軸が精度(Precision) であり、グラフの黒点の数字がkpの値である。この図では、再現率が大きくなるほど精度は低下している。すなわち、kpが小さくなる（選ばれるF-termの数が増える）ほど精度が低下し、再現率が上がっていることがわかる。 FIG. 6 is an explanatory diagram of the correspondence between kp, recall, and accuracy. In FIG. 6, the horizontal axis represents the recall (Recall), the vertical axis represents the accuracy (Precision), and the black dot number in the graph represents the value of kp. In this figure, the accuracy decreases as the recall rate increases. In other words, it can be seen that as kp decreases (the number of F-terms selected increases), the accuracy decreases and the recall rate increases.

（６）：文書間の類似度の計算の説明
学習データにおけるそれぞれの特許文書と、入力の特許文書の間の類似度を計算するために以下の四つの方法を利用できる。 (6): Description of Calculation of Similarity Between Documents The following four methods can be used to calculate the similarity between each patent document in the learning data and the input patent document.

ａ）SMART の説明
文書分類装置は、まず、入力の特許文書からキーワードを取り出す。キーワードとしては、形態素解析技術を利用して、名詞を取り出しす。次に、学習データにある与えられた入力のテーマを持つすべての特許文書から、上記キーワードを少なくとも一つ含む文書を取り出す。文書分類装置（文書類似度算出部３）は、それぞれの取り出した文書の Sim_SMARTを算出するために以下の式（３）を使う。 Sim_SMARTを入力文書と学習データにあるそれぞれの特許文書との間の類似度として用いる。 a) Description of SMART First, the document classification device extracts a keyword from an input patent document. As keywords, nouns are extracted using morphological analysis technology. Next, a document including at least one keyword is extracted from all patent documents having a given input theme in the learning data. The document classification device (document similarity calculation unit 3) uses the following equation (3) to calculate Sim _SMART of each extracted document. Sim _SMART is used as the similarity between the input document and each patent document in the learning data.

この式において、T は入力の特許文書と取り出された特許文書の両方に現れたキーワードの集合を意味し、tfはキーワードt が取り出された文書において出現した回数を意味し、avtfは取り出された文書において取り出されたキーワードそれぞれの出現の平均を意味し、qtf は入力の文書におけるキーワードt の出現した回数を意味し、utf は取り出された文書におけるキーワードの異なりの数を意味し、pivot は学習データの全文書における文書ごとのキーワードの異なりの数の平均を意味し、N は学習データにおける与えられた入力のテーマ分類をもつ特許文書の総数を意味し、n はキーワードt が現れた文書の数を意味する。 In this expression, T means the set of keywords that appear in both the input patent document and the retrieved patent document, tf means the number of times the keyword t appears in the retrieved document, and avtf is extracted Qtf means the number of occurrences of keyword t in the input document, utf means the number of different keywords in the extracted document, and pivot means learning Means the average of the number of different keywords per document in all documents in the data, N means the total number of patent documents with a given input theme classification in the training data, and n is the number of documents in which the keyword t appeared Means number.

SMART は、情報検索のキーワードの重み付け法のひとつである（引用文献；Singhal et al.,1996;Singhal,1997）。 SMART is one of the keyword weighting methods for information retrieval (cited reference; Singhal et al., 1996; Singhal, 1997).

ｂ）BM25の説明
文書分類装置は、まず、入力の特許文書からキーワードを取り出す。キーワードとしては、形態素解析技術を利用して、名詞を取り出した。次に、学習データにある与えられた入力のテーマ分類を持つすべての特許文書から、上記キーワードを少なくとも一つ含む文書を取り出す。文書分類装置（文書類似度算出部３）は、それぞれの取り出した文書の Sim_BM25を算出するために以下の式（６）を使う。 Sim_BM25を入力文書と学習データにあるそれぞれの特許文書との間の類似度として用いる。 b) Description of BM25 The document classification device first extracts keywords from the input patent document. As keywords, nouns were extracted using morphological analysis technology. Next, a document including at least one of the keywords is extracted from all patent documents having a given input theme classification in the learning data. The document classification device (document similarity calculation unit 3) uses the following equation (6) to calculate Sim _BM25 of each extracted document. Sim _BM25 is used as the similarity between the input document and each patent document in the learning data.

この式に置いてT 、tf、qtf 、N 、n は、SMART のものと同じである。dlは取り出した記事の長さであり、avdlは全文書での記事の長さの平均であり、k₁、k₃それとb は実験で定める定数である。 ruby-ir toolkitのデフォルト値として、k₁＝1 、 k₃＝1000、 b＝1 の値を利用した。BM25のオリジナルの式のlog ｛ (N-n+0.5)/(n + 0.5)｝の代りにlog(N/n)を利用した。これは、オリジナルの式だとマイナスのスコアを出力するためである。実験において修正した式の方が高い精度を出すことを確認した。 T, tf, qtf, N, n are the same as those in SMART. dl is the length of the retrieved article, avdl is the average of the length of the article in all documents, k ₁ , k _{3 and} b are constants determined by experiments. As default values of ruby-ir toolkit, values of k ₁ = 1, k ₃ = 1000, b = 1 were used. Instead of log {(N-n + 0.5) / (n + 0.5)} in the original formula of BM25, log (N / n) was used. This is because the original formula outputs a negative score. It was confirmed that the formula corrected in the experiment gave higher accuracy.

BM25は、情報検索のキーワードの重み付け手法の一つである（引用文献；Robertson et al.,1994 ）．
ｃ）Tfidf の説明
文書分類装置は、まず、入力の特許文書からキーワードを取り出す。キーワードとしては、形態素解析技術を利用して、名詞を取り出した。次に、学習データにある与えられた入力のテーマ分類を持つすべての文書から、上記キーワードを少なくとも一つ含む文書を取り出す。文書分類装置（文書類似度算出部３）は、それぞれの取り出した文書の Sim_Tfidfを算出するために以下の式（９）を使う。 Sim_Tfidfを入力文書と学習データにあるそれぞれの文書との間の類似度として用いる。 BM25 is one of the keyword weighting methods for information retrieval (cited reference; Robertson et al., 1994).
c) Description of Tfidf First, the document classification device extracts keywords from the input patent document. As keywords, nouns were extracted using morphological analysis technology. Next, a document including at least one of the keywords is extracted from all documents having a given input theme classification in the learning data. The document classification device (document similarity calculation unit 3) uses the following equation (9) to calculate Sim _Tfidf of each extracted document. Sim _Tfidf is used as the similarity between the input document and each document in the training data.

この式で、T 、tf、N 、 nは、 SMARTのものと同一である。 In this equation, T 1, tf, N and n are the same as those of SMART.

ｄ）Overlap の説明
文書分類装置は、まず、入力の特許文書からキーワードを取り出す。キーワードとしては、形態素解析技術を利用して、名詞を取り出した。次に、学習データにある与えられた入力のテーマ分類を持つすべての文書から、上記キーワードを少なくとも一つ含む文書を取り出す。文書分類装置（文書類似度算出部３）は、それぞれの取り出した文書の Sim_Overlapを算出するために以下の式（１０）を使う。 Sim_Overlapを入力文書と学習データにあるそれぞれの文書との間の類似度として用いる。 d) Description of Overlap First, the document classification device extracts keywords from the input patent document. As keywords, nouns were extracted using morphological analysis technology. Next, a document including at least one of the keywords is extracted from all documents having a given input theme classification in the learning data. The document classification device (document similarity calculation unit 3) uses the following equation (10) to calculate the Sim _Overlap of each extracted document. Sim _Overlap is used as the similarity between the input document and each document in the training data.

この式で、T は、 SMARTのものと同一である。 In this equation, T is the same as that of SMART.

（７）：文書検索結果の評価の説明
特許文書のテーマ分類が与えられたときに、入力の日本語特許文書のF-termの分類を求める。この評価には、図５のようにF-measure （Ｆ値）を使かうことができる。F-measure は、再現率(Recall)の逆数と適合率(Precision) の逆数の平均の逆数である。再現率は、正解の分類のうち、正解の出力の割合（再現率が大きいと正解の漏れが少なくなる）であり、適合率は、すべての出力のうち、正解の出力の割合である。式で表現すると以下のようになる。 (7): Explanation of Evaluation of Document Search Result When a theme classification of a patent document is given, an F-term classification of the input Japanese patent document is obtained. For this evaluation, F-measure (F value) can be used as shown in FIG. F-measure is the reciprocal of the average of the reciprocal of the recall (Recall) and the reciprocal of the precision (Precision). The recall is the ratio of correct output among correct answers (the larger the recall is, the smaller the correct answer leaks), and the relevance ratio is the ratio of correct outputs among all outputs. Expressed as an expression:

（８）：単語の認識の説明
ａ）形態素解析システムの説明
日本語を単語に分割するために、単語抽出部が行う形態素解析システムが必要になる。ここではChaSenについて説明する（奈良先端大で開発されている形態素解析システム茶筌http://chasen.aist-nara.ac.jp/index.html.jp で公開されている）。 (8): Explanation of word recognition a) Explanation of morphological analysis system In order to divide Japanese into words, a morphological analysis system performed by a word extraction unit is required. Here, ChaSen will be explained (published on the morphological analysis system Chasen http://chasen.aist-nara.ac.jp/index.html.jp developed at Nara Institute of Technology).

これは、日本語文を分割し、さらに、各単語の品詞も推定してくれる。例えば、「学校へ行く」を入力すると以下の結果を得ることができる。 This splits the Japanese sentence and also estimates the part of speech of each word. For example, if “go to school” is entered, the following results can be obtained.

学校ガッコウ学校名詞−一般
へヘへ助詞−格助詞−一般
行くイク行く動詞−自立五段・カ行促音便基本型
ＥＯＳ
このように各行に一個の単語が入るように分割され、各単語に読みや品詞の情報が付与される。 School Gakkou School Noun-General To He To particle-Case particle-General Go Iku Go Verb-Independence
In this way, each line is divided so that one word is included, and reading and part-of-speech information are given to each word.

ｂ）英語の品詞タグつけの説明
英語の品詞タグつけシステムとしては、次の Brillのものが有名である。 b) Explanation of English part-of-speech tagging The following part-of-speech tagging systems in English are famous.

Eric Brill, Transformation-Based Error-Driven Learning and
Natural Language Processing: A Case Study in Part-of-Speech Tagging,
Computational Linguistics, Vol. 21, No. 4, p.543-565, 1995.
これは、英語文の各単語の品詞を推定してくれるものである。 Eric Brill, Transformation-Based Error-Driven Learning and
Natural Language Processing: A Case Study in Part-of-Speech Tagging,
Computational Linguistics, Vol. 21, No. 4, p.543-565, 1995.
This estimates the part of speech of each word in an English sentence.

（９）：表に基づく確信度付与の説明
ａ）kpを利用する場合の説明
予め問題と解答の組を大量に集める。問題は、F-termをふるべき特許であり、解答は、その特許のF-termである。これを評価データと呼ぶ。前記文書分類装置でいくつかのkpごとに、上記評価データでF-termを出力し評価し、そのときの精度（適合率）、再現率、Ｆ値等の確信度を求める。更に同じkpに対応する全ての特許のF-termの精度（適合率）、再現率、Ｆ値の平均値を求める。そうすると、kpと精度（適合率）、再現率、Ｆ値の対応表が完成する。 (9): Explanation of assigning confidence based on a table a) Explanation when using kp Collect a large number of sets of questions and answers in advance. The problem is a patent that should have an F-term, and the answer is the F-term of that patent. This is called evaluation data. For each kp, the document classification device outputs an F-term with the above evaluation data and evaluates it, and obtains certainty such as accuracy (accuracy), recall, and F value. Furthermore, the F-term accuracy (matching rate), recall, and average F value of all patents corresponding to the same kp are obtained. Then, a correspondence table of kp, accuracy (accuracy rate), recall rate, and F value is completed.

次に、新しい特許が文書分類装置に入ってくると、F-termが出力される。各F-termがぎりぎり出力されるkpを求める。この求め方は、以下のようになる。 Next, when a new patent enters the document classification device, an F-term is output. Find the kp at which each F-term is output. This method is as follows.

あるF-termのスコア（Score ）を最初のF-term（最もスコアの大きいF-term）のスコアで割った値がそのF-termがぎりぎり出力されるkpとなる。（kpの定義によりこうなる、式（２）を参照こと）。スコアは式（１）等を利用して求める。 The value obtained by dividing the score (Score) of an F-term by the score of the first F-term (the F-term with the highest score) is the kp at which that F-term is output. (Refer to equation (2), depending on the definition of kp). The score is obtained using equation (1) or the like.

各F-termのkpが求まれば、先の対応表に基づいて、各F-termに対応する精度（適合率）、再現率、Ｆ値をくっつけて表示する。そのF-termまでのF-term群に対する精度（適合率) 、再現率、Ｆ値である（個々のF-termの精度（適合率）、再現率、Ｆ値ではない）。個々のF-termのものについては、後に説明する。 When kp of each F-term is obtained, the accuracy (accuracy), recall, and F value corresponding to each F-term are displayed based on the previous correspondence table. The accuracy (accuracy), recall, and F value for the F-term group up to that F-term (not the accuracy (accuracy), recall, and F value of each F-term). Each F-term will be explained later.

以下、図面に基づいて説明する。図７は確信度付与装置の説明図である。図７において、確信度付与装置には、入力部１、出力部６、文書分類装置（問題解決手段）１０、対応表作成部（対応関係作成手段）１１、確信度付与部１２、格納部（対応表）１３が設けてある。 Hereinafter, description will be given based on the drawings. FIG. 7 is an explanatory diagram of the certainty degree assigning device. In FIG. 7, the certainty degree assigning device includes an input unit 1, an output unit 6, a document classification device (problem solving unit) 10, a correspondence table creating unit (corresponding relationship creating unit) 11, a certainty degree assigning unit 12, and a storage unit ( Correspondence table) 13 is provided.

入力部１は、情報を入力する入力手段である。出力部６は、情報を出力する出力手段である。文書分類装置１０は、前に説明した文書の分類を行う文書分類手段（問題解決手段）である（図１、図２参照）。対応表作成部（対応関係作成手段）１１は、kpと精度（適合率）、再現率、Ｆ値の対応関係（表）を作成する対応関係（表）作成手段である。確信度付与部１２は、文書分類装置１０で付与した分類に精度（適合率）、再現率、Ｆ値、正解率等の確信度を付与する確信度付与手段である。格納部（対応表）１３は、対応表作成部１１が作成した対応表を格納する格納手段である。 The input unit 1 is input means for inputting information. The output unit 6 is output means for outputting information. The document classification device 10 is document classification means (problem solving means) for classifying the documents described above (see FIGS. 1 and 2). The correspondence table creating unit (corresponding relationship creating means) 11 is a correspondence (table) creating means for creating a correspondence (table) between kp, accuracy (accuracy), recall, and F value. The certainty factor assigning unit 12 is a certainty factor assigning unit that assigns certainty factors such as accuracy (accuracy rate), recall rate, F value, and correct answer rate to the classification given by the document classification device 10. The storage unit (correspondence table) 13 is storage means for storing the correspondence table created by the correspondence table creation unit 11.

図８は対応表作成処理フローチャートである。以下、図８の処理Ｓ２１〜Ｓ２５にしたがって説明する。 FIG. 8 is a flowchart of the correspondence table creation process. Hereinafter, a description will be given according to processing S21 to S25 of FIG.

Ｓ２１：入力部１より、予め問題と解答の組（ここでは特許文書とそのF-term）を大量に入力し、文書分類装置１０の格納手段に格納する。 S 21: A large number of pairs of questions and answers (here, patent documents and their F-terms) are input in advance from the input unit 1 and stored in the storage means of the document classification device 10.

Ｓ２２：文書分類装置１０は、前記入力された１つの特許文書と類似する他の特許文書を検索して分類を求める（F-termを求める）。 S22: The document classification device 10 searches for another patent document similar to the one input patent document and determines the classification (F-term is determined).

Ｓ２３：文書分類装置１０は、前記類似する他の特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）のスコアを算出する。 S23: The document classification device 10 calculates the score of the obtained classification (F-term) according to how many patent documents the classification (F-term) of the other similar patent document appears.

Ｓ２４：対応表作成部１１は、kpを変化させた時に文書分類装置１０より出力されるそれぞれの分類（F-term）の確信度を求める。 S24: The correspondence table creation unit 11 obtains the certainty of each classification (F-term) output from the document classification device 10 when kp is changed.

Ｓ２５：対応表作成部１１は、前記Ｓ２１で入力した特許文書全てについて、文書分類装置１０で分類を付与（F-termを求め）し、kpを変化させて確信度を求め、更に同じkpに対応する全ての特許文書の確信度の平均値を求め、対応表を作成する。 S25: The correspondence table creation unit 11 assigns a classification (obtains an F-term) to the patent document input in S21 by the document classification device 10, obtains a certainty factor by changing kp, and further sets the same kp. An average value of certainty factors of all corresponding patent documents is obtained, and a correspondence table is created.

図９は確信度付与処理フローチャートである。以下、図９の処理Ｓ３１〜Ｓ３５にしたがって説明する。 FIG. 9 is a flowchart of the certainty factor giving process. Hereinafter, a description will be given according to processing S31 to S35 of FIG.

Ｓ３１：入力部１より、新たな文書（F-termが付与されていない特許文書）を入力する。 S31: A new document (patent document to which no F-term is assigned) is input from the input unit 1.

Ｓ３２：文書分類装置１０は、前記入力された特許文書と類似する特許文書（前記処理Ｓ２１で入力された特許文書）を検索して分類を求める（F-termを求める）。 S32: The document classification device 10 searches for a patent document similar to the input patent document (patent document input in the process S21) to determine the classification (determines F-term).

Ｓ３３：文書分類装置１０は、前記類似する特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記付与した分類（F-term）のスコアを算出する。 S33: The document classification apparatus 10 calculates the score of the assigned classification (F-term) based on how many patent documents the classification (F-term) of the similar patent document appears in.

Ｓ３４：確信度付与部１２は、各分類（F-term）がぎりぎり出力されるkpを求める。 S34: The certainty factor assigning unit 12 obtains kp at which each classification (F-term) is barely output.

Ｓ３５：確信度付与部１２は、格納部１３の対応表から前記求めたkpに対応する確信度を各F-termに付与して出力部より出力する。 S35: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kp from the correspondence table of the storage unit 13 to each F-term and outputs it from the output unit.

このように、本発明は、文書分類に関する発明である。分類したい文書と類似した文書を、検索において高精度で知られるBM25やSMART の方式で収集し、その文書群で出現頻度の大きい分類にその文書を分類する。特に、一つの文書に複数の分類が付与される、Multi-class の分類問題を扱い、出現頻度の大きい分類のうち、どの分類までを、その文書の分類とするかを確信度により容易に決定することができる。 Thus, the present invention relates to document classification. Documents similar to the document to be classified are collected by the BM25 and SMART methods that are known with high accuracy in the search, and the documents are classified into categories having a high appearance frequency in the document group. In particular, it handles Multi-class classification problems where multiple classifications are assigned to a single document, and it is easy to determine which classification is the classification of the document from among the classifications with a high appearance frequency. can do.

特許文書には、複数の特許を分類するためのコードがふられている。そのコードは一般には人手で付与されているが、本発明を利用すれば、ある程度自動でもコードを付与することができるようになり、人手の作業を軽減する効果がある。 The patent document has a code for classifying a plurality of patents. The code is generally given manually, but if the present invention is used, the code can be given even to some extent automatically, and there is an effect of reducing the manual work.

なお、確信度付与部１２で、確信度を付与して出力する分類（F-term）の数は、Ｆ値の最大のところまで、精度（適合率）がある値（規定値）以上のところまで、再現率がある値（規定値）以下のところまで出力する等を行うことにより、不要な出力を少なくすることができる。 In addition, the number of classifications (F-terms) to which the certainty factor is given and output by the certainty factor granting unit 12 is a place where the accuracy (matching rate) is equal to or greater than a certain value (specified value) up to the maximum F value. Thus, unnecessary output can be reduced by performing output up to a certain value (specified value) or less.

ｂ）出力順位を利用する場合の説明
出力順位を利用する方法の場合、文書分類装置で出力する分類（F-term）をkj位までを出力システムとする。いくつかkjの値を変えて、このシステムで評価データの問題を解き、精度（適合率) 、再現率、Ｆ値の値を求める。そうすると、kj（順位）と精度（適合率）、再現率、Ｆ値の対応表が完成する。 b) Explanation in the case of using the output order In the case of the method using the output order, the classification (F-term) output by the document classification device is set to the output system up to kj. This system solves the problem of the evaluation data by changing some values of kj, and obtains the accuracy (precision), recall, and F value. Then, a correspondence table of kj (rank), accuracy (precision), recall, and F value is completed.

新しい特許が入ってくると、文書分類装置で先の方法でF-termを出力する。各F-termがぎりぎり出力されるkjを求める。すると、出力される順位がkjとなる（ kj の定義によりこうなる、他の方法ではこの部分は異なった方法になる）。 When a new patent comes in, F-term is output by the previous method on the document classification device. Find the kj at which each F-term is output. Then, the output order is kj (this is different depending on the definition of kj, and this part is different in other methods).

各Fterm のkjが求まれば、先の対応表に基づいて、各Fterm に対応する精度（適合率）、再現率、Ｆ値をくっつけて表示する。これは、そのF-termまでの文書群に対する精度（適合率）、再現率、Ｆ値であることに注意。（これは個々のF-termの精度（適合率）、再現率、Ｆ値でない。個々のF-termのものについて、以下の個々の値の算出の場合を参照のこと) 。 Once the kj of each Fterm is obtained, the accuracy (precision), recall, and F value corresponding to each Fterm are displayed together based on the above correspondence table. Note that this is the accuracy (precision), recall, and F value for the documents up to the F-term. (This is not the accuracy (accuracy), recall, or F value of each F-term. For individual F-terms, see the calculation of individual values below.)

図１０は対応表作成処理フローチャートである。以下、図１０の処理Ｓ４１〜Ｓ４５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７で説明したkpの代わりにkjを用いるものである）。 FIG. 10 is a correspondence table creation process flowchart. Hereinafter, description will be made in accordance with steps S41 to S45 of FIG. 10 (see FIG. 7 for the certainty factor imparting apparatus, where kj is used instead of kp described in FIG. 7).

Ｓ４１：入力部１より、予め問題と解答の組（ここでは特許文書とそのF-term）を大量に入力し、文書分類装置１０の格納手段に格納する。 S41: A large number of pairs of questions and answers (here, patent documents and their F-terms) are input in advance from the input unit 1 and stored in the storage means of the document classification device 10.

Ｓ４２：文書分類装置１０は、前記入力された１つの特許文書と類似する他の特許文書を検索して分類を求める（F-termを求める）。 S42: The document classification device 10 searches for another patent document similar to the one input patent document and determines the classification (F-term is determined).

Ｓ４３：文書分類装置１０は、前記類似する他の特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）の順位kjを算出する。 S43: The document classification apparatus 10 calculates the rank kj of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the other similar patent document appears in.

Ｓ４４：対応表作成部１１は、kjを変化させた時に文書分類装置１０より出力されるそれぞれの分類（F-term）の確信度を求める。 S44: The correspondence table creation unit 11 obtains the certainty of each classification (F-term) output from the document classification device 10 when kj is changed.

Ｓ４５：対応表作成部１１は、前記Ｓ４１で入力した特許文書全てについて、文書分類装置１０で分類を付与（F-termを求め）し、kjを変化させて確信度を求め、更に同じkjに対応する全ての特許文書の確信度の平均値を求め、対応表を作成する。 S45: The correspondence table creation unit 11 assigns a classification (obtains F-term) to all the patent documents input in S41 by using the document classification device 10, obtains a certainty factor by changing kj, and sets the same kj. An average value of certainty factors of all corresponding patent documents is obtained, and a correspondence table is created.

図１１は確信度付与処理フローチャートである。以下、図１１の処理Ｓ５１〜Ｓ５５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７で説明したkpの代わりにkjを用いるものである）。 FIG. 11 is a flowchart of the certainty factor giving process. Hereinafter, description will be made in accordance with steps S51 to S55 in FIG. 11 (see FIG. 7 for the certainty factor imparting apparatus, but here, kj is used instead of kp described in FIG. 7).

Ｓ５１：入力部１より、新たな文書（F-termが付与されていない特許文書）を入力する。 S51: A new document (patent document to which no F-term is assigned) is input from the input unit 1.

Ｓ５２：文書分類装置１０は、前記入力された特許文書と類似する特許文書（前記処理Ｓ４１で入力された特許文書）を検索して分類を求める（F-termを求める）。 S52: The document classification device 10 searches for a patent document similar to the input patent document (the patent document input in the process S41) to determine the classification (F-term is determined).

Ｓ５３：文書分類装置１０は、前記類似する特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）の順位kjを算出する。 S53: The document classification apparatus 10 calculates the rank kj of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the similar patent document appears in.

Ｓ５４：確信度付与部１２は、各F-termがぎりぎり出力されるkjを求める。 S54: The certainty factor assigning unit 12 obtains kj at which each F-term is barely output.

Ｓ５５：確信度付与部１２は、格納部１３の対応表から前記求めたkjに対応する確信度を各F-termに付与して出力部より出力する。 S55: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kj from the correspondence table of the storage unit 13 to each F-term and outputs it from the output unit.

ｃ）スコア（Score ）を利用する場合の説明
スコアを利用する方法の場合、文書分類装置で出力する分類（F-term）をスコアが kl 以上のものまでを出力システムとする。いくつかklの値を変えて、このシステムで評価データの問題を解き、精度（適合率）、再現率、Ｆ値の値を求める。そうすると、kl（スコア）と精度（適合率）、再現率、Ｆ値の対応表が完成する。 c) Explanation when using the score (Score) In the case of the method using the score, the classification (F-term) output by the document classification device is up to a score of kl or more. This system solves the problem of evaluation data by changing some values of kl, and obtains accuracy (precision), recall, and F value. Then, a correspondence table of kl (score), accuracy (precision), recall, and F value is completed.

新しい特許が入ってくると、文書分類装置は先の方法でF-termを出力する。各F-termがぎりぎり出力されるklを求める。ここで各F-termのスコアが kl となる。（ kl の定義によりこうなる。他の方法ではこの部分は異なった方法になる) 。 When a new patent comes in, the document classification device outputs an F-term using the previous method. Find the kl at which each F-term is output. Here, the score of each F-term is kl. (This is due to the definition of kl. In other ways this part is different).

各Fterm のklが求まれば、先の対応表に基づいて、各Fterm に対応する精度（適合率）、再現率、Ｆ値をくっつけて表示する。（そのF-termまでの文書群に対する精度（適合率）、再現率、Ｆ値であることに注意。個々のF-termの精度（適合率）、再現率、Ｆ値でない。個々のF-termのものについては、以下の個々の値の算出の場合を参照）。 If the kl of each Fterm is obtained, the accuracy (precision), recall, and F value corresponding to each Fterm are displayed based on the previous correspondence table. (Note that the accuracy (accuracy), recall, and F value for the document group up to that F-term. The accuracy (accuracy), recall, and F value of each F-term are not. (For term, see the calculation of individual values below).

図１２は対応表作成処理フローチャートである。以下、図１２の処理Ｓ６１〜Ｓ６５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７で説明したkpの代わりにklを用いるものである）。 FIG. 12 is a flowchart of the correspondence table creation process. Hereinafter, description will be made in accordance with steps S61 to S65 of FIG. 12 (see FIG. 7 for the certainty factor imparting device, where kl is used instead of kp described in FIG. 7).

Ｓ６１：入力部１より、予め問題と解答の組（ここでは特許文書とそのF-term）を大量に入力し、文書分類装置１０の格納手段に格納する。 S61: A large number of pairs of questions and answers (here, patent documents and their F-terms) are input in advance from the input unit 1 and stored in the storage means of the document classification device 10.

Ｓ６２：文書分類装置１０は、前記入力された１つの特許文書と類似する他の特許文書を検索して分類を求める（F-termを求める）。 S62: The document classification device 10 searches for another patent document similar to the one input patent document and determines the classification (F-term is determined).

Ｓ６３：文書分類装置１０は、前記類似する他の特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）のスコア（kl）を算出する。 S63: The document classification device 10 calculates the score (kl) of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the other similar patent document appears in. To do.

Ｓ６４：対応表作成部１１は、klを変化させた時に文書分類装置１０より出力されるそれぞれの分類（F-term）の確信度を求める。 S64: The correspondence table creation unit 11 obtains the certainty of each classification (F-term) output from the document classification device 10 when kl is changed.

Ｓ６５：対応表作成部１１は、前記Ｓ６１で入力した特許文書全てについて、文書分類装置１０で分類を付与（F-termを求め）し、klを変化させて確信度を求め、更に同じklに対応する全ての特許文書の確信度の平均値を求め、対応表を作成する。 S65: The correspondence table creation unit 11 assigns a classification (obtains an F-term) to the patent document input in S61 by the document classification device 10 to obtain a certainty factor by changing kl, and further sets the same kl. An average value of certainty factors of all corresponding patent documents is obtained, and a correspondence table is created.

図１３は確信度付与処理フローチャートである。以下、図１３の処理Ｓ７１〜Ｓ７５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７で説明したkpの代わりにklを用いるものである）。 FIG. 13 is a flowchart of the certainty factor giving process. Hereinafter, description will be made in accordance with steps S71 to S75 in FIG. 13 (see FIG. 7 for the certainty factor imparting device, where kl is used instead of kp described in FIG. 7).

Ｓ７１：入力部１より、新たな文書（F-termが付与されていない特許文書）を入力する。 S71: A new document (patent document to which no F-term is assigned) is input from the input unit 1.

Ｓ７２：文書分類装置１０は、前記入力された特許文書と類似する特許文書（前記処理Ｓ６１で入力された特許文書）を検索して分類を求める（F-termを求める）。 S72: The document classification device 10 searches for a patent document similar to the input patent document (the patent document input in the process S61) to determine the classification (determines F-term).

Ｓ７３：文書分類装置１０は、前記類似する特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）のスコアklを算出する。 S73: The document classification device 10 calculates a score kl of the obtained classification (F-term) according to how many patent documents the classification (F-term) of the similar patent document appears.

Ｓ７４：確信度付与部１２は、各F-termがぎりぎり出力されるklを求める。 S74: The certainty factor assigning unit 12 obtains kl at which each F-term is output at the last minute.

Ｓ７５：確信度付与部１２は、格納部１３の対応表から前記求めたklに対応する確信度を各F-termに付与して出力部より出力する。 S75: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kl from the correspondence table of the storage unit 13 to each F-term and outputs it from the output unit.

以上 kp 、順位、スコアを利用する方法を示したが、順序化して出力するシステムであれば、他のものを利用することもできる。 The method of using kp, ranking, and score has been described above, but other systems can be used as long as they are systems that output in order.

§２：情報検索の場合の説明
（１）：情報検索システム（情報検索装置）の説明
キーワードから文書を検索する技術（文書検索の技術）は、例えば、次のものがある。 §2: Explanation in case of information retrieval (1): Explanation of information retrieval system (information retrieval apparatus) Examples of techniques for retrieving documents from keywords (document retrieval techniques) include the following.

（単語群Ａをより多く含む記事の抽出方法の説明）
情報検索の基礎知識として以下の式がある。ここで、Score(D)が大きいものを取る。 (Explanation of how to extract articles containing more word groups A)
There is the following formula as basic knowledge of information retrieval. Here, the score (D) is large.

(1) 基本的な方法 (tf・idf 法) の説明
score(D) = Σ ( tf(w,D) * log(N/df(w)) )
w ∈W で加算
Ｗはユーザーが入力するキーワードの集合
tf(w,D)は文書Ｄでのｗの出現回数
df(w)は全文書でＷが出現した文書の数
Ｎは文書の総数
score(D) が高い文書を検索結果として出力する。 (1) Explanation of basic method (tf / idf method)
score (D) = Σ (tf (w, D) * log (N / df (w)))
w ∈ W and W is the set of keywords entered by the user
tf (w, D) is the number of occurrences of w in document D
df (w) is the number of documents in which W appears in all documents N is the total number of documents
Documents with high score (D) are output as search results.

(2) Robertson らの Okapi weightingの説明
（文献）
村田真樹，馬青，内元清貴，小作浩美，内山将夫，井佐原均“位置情報と分野情報を用いた情報検索”自然言語処理（言語処理学会誌) 2000年 4月，7 巻，2 号, p.141 〜 p.160
の (1)式、が性能がよいことが知られている。これの式(1) のΣで積を取る前の tf 項とidf 項の積が Okapiのウェイティング法になって、この値を単語の重みに使う。 (2) Explanation of Okson weighting by Robertson et al.
Murata Masaki, Ma Aoi, Uchimoto Kiyotaka, Osaku Hiromi, Uchiyama Masao, Isahara Hitoshi "Information Retrieval Using Location Information and Field Information" Natural Language Processing (Journal of the Language Processing Society) April 2000, Volume 7, Issue 2 , p.141-p.160
(1) is known to have good performance. The product of the tf term and idf term before taking the product with Σ in Equation (1) is Okapi's weighting method, and this value is used as the word weight.

Okapi の式なら
Score(D) = Σ ( tf(w,D)/(tf(w,D) + length/delta) * log(N/df(w)) )
w ∈W で加算
lengthは記事Ｄの長さ、delta は記事の長さの平均、
記事の長さは、記事のバイト数、また、記事に含まれる単語数などを使う。 Okapi expression
Score (D) = Σ (tf (w, D) / (tf (w, D) + length / delta) * log (N / df (w)))
Add by w ∈W
length is the length of article D, delta is the average length of articles,
The length of the article uses the number of bytes of the article and the number of words included in the article.

さらに、以下の情報検索を行うこともできる。 Further, the following information search can be performed.

（Okapi の参考文献） S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford Okapi at TREC-3, TREC-3, 1994
（SMART の参考文献）
Amit Singhal AT&T at TREC-6, TREC-6, 1997
より高度な情報検索の方法として、tf・idf を使うだけの式でなく、これらの Okapiや SMARTの式を用いてもよい。 (Okapi references) SE Robertson, S. Walker, S. Jones, MM Hancock-Beaulieu, and M. Gatford Okapi at TREC-3, TREC-3, 1994
(SMART reference)
Amit Singhal AT & T at TREC-6, TREC-6, 1997
As a more advanced information retrieval method, these Okapi and SMART formulas may be used instead of just formulas using tf / idf.

これらの方法では、tf・idf だけでなく、記事の長さなども利用して、より高精度な情報検索を行うことができる。 In these methods, more accurate information retrieval can be performed using not only tf / idf but also the length of the article.

今回の、単語群Ａをより多く含む記事の抽出方法では、さらに、Rocchio's formula を使うことができる。 Rocchio's formula can also be used in this method of extracting articles that contain more words.

（文献）
"J. J. Rocchio", "Relevance feedback in information retrieval","The SMART retrieval System", "Edited by G. Salton", "Prentice Hall, Inc.","page 313-323", 1971
この方法は、log(N/df(w))のかわりに、
｛E(t) + k＿af * (RatioC(t) - RatioD(t))｝ *log(N/df(w))
を使う。 (Reference)
"JJ Rocchio", "Relevance feedback in information retrieval", "The SMART retrieval System", "Edited by G. Salton", "Prentice Hall, Inc.", "page 313-323", 1971
This method can be used instead of log (N / df (w))
{E (t) + k_af * (RatioC (t)-RatioD (t))} * log (N / df (w))
use.

E(t) = 1 (元の検索にあったキーワード)
= 0 (それ以外)
RatioC(t) は記事群Ｂでのt の出現率
RatioD(t) は記事群Ｃでのt の出現率
log(N/df(w))を上式でおきかえた式でScore(D)を求めて、その値が大きいものほど、単語群Ａをより多く含む記事として取り出すものである。 E (t) = 1 (keyword from the original search)
= 0 (otherwise)
RatioC (t) is the rate of occurrence of t in article group B
RatioD (t) is the rate of occurrence of t in article group C
Score (D) is obtained by replacing log (N / df (w)) with the above equation, and the larger the value, the more the word group A is extracted.

Score(D)のΣの加算の際に足す単語ｗの集合Ｗは、元のキーワードと、単語群Ａの両方とする。ただし、元のキーワードと、単語群Ａは重ならないようにする。 A set W of words w to be added when Σ of Score (D) is added is both the original keyword and the word group A. However, the original keyword and the word group A should not overlap.

また、他の方法として、Score(D)のΣの加算の際に足す。単語ｗの集合Ｗは、単語群Ａのみとする。ただし、元のキーワードと、単語群Ａは重ならないようにする。 Another method is to add Σ of Score (D). The set W of words w is only the word group A. However, the original keyword and the word group A should not overlap.

ここでは Roccio の式で複雑な方法をとったが、単純に、単語群Ａの単語の出現回数の和が大きいものほど、単語群Ａをより多く含む記事として取り出すようにしてもよいし、また、単語群Ａの出現の異なりの大きいものほど、単語群Ａをより多く含む記事として取り出すようにしてもよい。 Here, Roccio's formula is used in a complicated manner, but simply, the larger the sum of the number of occurrences of words in word group A, the more the word group A may be extracted. Alternatively, the larger the difference in the appearance of the word group A, the more the word group A may be taken out as an article.

（２）：確信度付与の説明
予め問題と解答の組を大量に集める。問題は、情報検索の質問（例えば、企業合併に関する記事を取り出すこと) であり、解答は、その質問に対応する記事群である。これを評価データと呼ぶ、ここで上記（１）で説明したような情報検索システム（情報検索装置）を一つ用意する。 (2): Explanation of giving confidence Collect a large number of sets of questions and answers in advance. The problem is an information retrieval question (for example, taking out an article about a corporate merger), and the answer is a group of articles corresponding to the question. One information retrieval system (information retrieval apparatus) as described in (1) above is prepared, which is called evaluation data.

質問から、形態素解析して、名詞をキーワードと取り出して、そのキーワードを利用して上記情報検索システムで記事を取り出す。そうすると、各記事はOkapi の式ならScore(D)の値を持ち、この値の大きいものが出力される。 From the question, morphological analysis is performed, a noun is extracted as a keyword, and an article is extracted by the information search system using the keyword. Then, each article has a Score (D) value if it is an Okapi expression, and the one with this value is output.

ａ）kpの値を利用する場合の説明
kpの値を利用する方法の場合は、ある質問の場合のScore(D)の最大値を Score＿max とする。そして、Score ＿max * kpの文書まで出力する。いくつかkpの値を変えて、このシステムで評価データの問題を解き、精度（適合率) 、再現率、Ｆ値等の（確信度）の値を求める。そうすると、kpと精度（適合率) 、再現率、Ｆ値の対応表が完成する。 a) Explanation of using kp value
In the case of a method using the value of kp, Score_max is the maximum value of Score (D) for a certain question. Then, the document up to Score_max * kp is output. This system solves the problem of the evaluation data by changing some values of kp, and obtains the value of (confidence) such as accuracy (relevance ratio), recall ratio, and F value. Then, a correspondence table of kp, accuracy (precision), recall, and F value is completed.

次に、新しい情報検索の質問が入ってくる。先の方法（情報検索システム）で文書を出力する。各文書がぎりぎり出力されるkpを求める。 Next comes a new information retrieval question. The document is output by the previous method (information retrieval system). Find the kp at which each document is output.

この求め方は、以下のようにする。 This method is as follows.

ある文書のScore を最初の文書（最もScore の大きい文書）のScore で割った値がその文書がぎりぎり出力されるkpとなる。（ kp の定義によりこうなる。順位による方法や他の方法ではこの部分は異なった方法になる) 。 The value obtained by dividing the score of a document by the score of the first document (the document with the largest score) is the kp at which the document is output. (This is due to the definition of kp. This method is different in the ranking method and other methods).

各文書のkpが求まれば、先の対応表に基づいて、各文書に対応する精度（適合率）、再現率、Ｆ値をくっつけて表示する。（これは、その文書までの文書群に対する精度（適合率）、再現率、Ｆ値であることに注意。個々の文書の精度（適合率）、再現率、Ｆ値でない。個々の文書のものについては、個々の値の算出の場合を参照) 。 When kp of each document is obtained, the accuracy (relevance ratio), reproduction ratio, and F value corresponding to each document are displayed on the basis of the above correspondence table. (Note that this is the accuracy (matching rate), recall, and F value for the group of documents up to that document. It is not the accuracy (matching rate), recall, and F value of individual documents. For the calculation of individual values).

図１４は対応表作成処理フローチャートである。以下、図１４の処理Ｓ８１〜Ｓ８５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７の文書分類装置の代わりに情報検索システム（情報検索装置）を用いるものである）。 FIG. 14 is a flowchart of the correspondence table creation process. Hereinafter, description will be made in accordance with steps S81 to S85 in FIG. 14 (see FIG. 7 for the certainty factor imparting apparatus, but here, an information retrieval system (information retrieval apparatus) is used instead of the document classification apparatus in FIG. 7). .

Ｓ８１：入力部１より、予め質問（問題）と記事（解答）の組を大量に入力し、情報検索システムの格納手段に格納する。 S81: A large number of sets of questions (questions) and articles (answers) are input in advance from the input unit 1 and stored in the storage means of the information search system.

Ｓ８２：情報検索システムは、前記入力されたある１つの質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、前記入力された記事（解答）の情報検索を行って記事を取り出す。 S82: The information search system performs a morphological analysis from one input question, extracts a noun as a keyword, and performs information search of the input article (answer) using the keyword. Take out the article.

Ｓ８３：情報検索システムは、Score ＿max * kpの文書（記事）まで出力する。 S83: The information retrieval system outputs up to a document (article) of Score_max * kp.

Ｓ８４：対応表作成部１１は、kpを変化させた時に情報検索システムより出力されるそれぞれの記事の確信度を求める。 S84: The correspondence table creation unit 11 obtains the certainty factor of each article output from the information search system when kp is changed.

Ｓ８５：対応表作成部１１は、前記Ｓ８１で入力した質問全てについて、情報検索システムで記事を出力し、kpを変化させて確信度を求め、更に同じkpに対応する全ての記事の確信度の平均値を求め、対応表を作成する（対応表は格納部１３に格納する）。 S85: The correspondence table creation unit 11 outputs articles by the information search system for all the questions input in S81, obtains the certainty by changing kp, and further determines the certainty of all articles corresponding to the same kp. An average value is obtained and a correspondence table is created (the correspondence table is stored in the storage unit 13).

図１５は確信度付与処理フローチャートである。以下、図１５の処理Ｓ９１〜Ｓ９４にしたがって説明する。 FIG. 15 is a flowchart of certainty factor assignment processing. Hereinafter, a description will be given according to processing S91 to S94 in FIG.

Ｓ９１：入力部１より、新たな情報検索の質問を入力する。 S91: A new information search question is input from the input unit 1.

Ｓ９２：情報検索システムは、前記入力された質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、情報検索を行って記事を取り出す。 S92: The information search system performs morphological analysis from the input question, extracts a noun as a keyword, performs an information search using the keyword, and extracts an article.

Ｓ９３：確信度付与部１２は、情報検索システムにより、各記事がぎりぎり出力されるkpを求める。 S93: The certainty factor assigning unit 12 obtains kp at which each article is output by the information search system.

Ｓ９４：確信度付与部１２は、格納部１３の対応表から前記求めたkpに対応する確信度を記事に付与して出力部より出力する。 S94: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kp from the correspondence table of the storage unit 13 to the article and outputs it from the output unit.

ｂ）出力順位を利用する場合の説明
出力順位を利用する方法の場合は、kj位までの文書（記事）を出力システムとする。これは、いくつかkjの値を変えて、この情報検索システムで、評価データの問題を解き、精度（適合率) 、再現率、Ｆ値の値を求める。そうすると、kjと精度（適合率) 、再現率、Ｆ値の対応表が完成する。 b) Explanation in the case of using the output order In the case of the method of using the output order, the documents (articles) up to kj are used as the output system. This is to change some values of kj and solve the problem of the evaluation data by this information retrieval system to obtain the accuracy (precision), recall, and F value. Then, a correspondence table of kj, accuracy (precision), recall, and F value is completed.

次に、新しい情報検索の質問が入ってくると、先の方法（対応表作成時の）で文書を出力する。そして、各文書がぎりぎり出力されるkjを求める。この出力される順位がkjとなる。（これはkjの定義によりこうなる。他の方法ではこの部分は異なった方法になる）。各文書のkjが求まれば、先の対応表に基づいて、各文書に対応する精度（適合率）、再現率、Ｆ値をくっつけて表示する。（その文書までの文書群に対する精度（適合率）、再現率、Ｆ値であることに注意、個々の文書の精度（適合率）、再現率、Ｆ値でない。個々の文書のものについては、以下の個々の値の算出の場合を参照）。 Next, when a new information retrieval question comes in, the document is output by the previous method (when creating the correspondence table). Then, kj at which each document is output is obtained. The order of output is kj. (This is due to the definition of kj. In other ways this part is different). If kj of each document is obtained, the accuracy (relevance ratio), reproduction ratio, and F value corresponding to each document are displayed on the basis of the above correspondence table. (Note that it is accuracy (relevance rate), recall rate, and F value for a group of documents up to that document, not the accuracy (relevance rate), recall rate, and F value of each document. (See the calculation of individual values below).

図１６は対応表作成処理フローチャートである。以下、図１６の処理Ｓ１０１〜Ｓ１０５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７の文書分類装置の代わりに情報検索システムを用い、kpの代わりにkjを用いるものである）。 FIG. 16 is a flowchart of the correspondence table creation process. Hereinafter, description will be made in accordance with steps S101 to S105 in FIG. 16 (see FIG. 7 for the certainty factor assigning device, where an information retrieval system is used instead of the document classification device of FIG. 7 and kj is used instead of kp). Is).

Ｓ１０１：入力部１より、予め質問（問題）と記事（解答）の組を大量に入力し、情報検索システムの格納手段に格納する。 S101: A large number of sets of questions (questions) and articles (answers) are input in advance from the input unit 1 and stored in the storage means of the information search system.

Ｓ１０２：情報検索システムは、前記入力されたある１つの質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、記事（解答）の情報検索を行って記事を取り出す。 S102: The information search system performs a morphological analysis from one inputted question, extracts a noun as a keyword, performs an information search of an article (answer) using the keyword, and extracts an article.

Ｓ１０３：情報検索システムは、kj位までの文書（記事）を出力する。 S103: The information retrieval system outputs documents (articles) up to kj.

Ｓ１０４：対応表作成部１１は、kjを変化させた時に情報検索システムより出力されるそれぞれの文書（記事）の確信度を求める。 S104: The correspondence table creation unit 11 obtains the certainty factor of each document (article) output from the information search system when kj is changed.

Ｓ１０５：対応表作成部１１は、前記Ｓ１０１で入力した質問全てについて、情報検索システムで文書（記事）を出力し、kjを変化させて確信度を求め、更に同じkjに対応する全ての文書（記事）の確信度の平均値を求め、対応表を作成する（対応表は格納部１３に格納する）。 S105: The correspondence table creation unit 11 outputs a document (article) with the information search system for all the questions input in S101, obtains a certainty factor by changing kj, and all documents corresponding to the same kj ( The average value of the certainty of the article is obtained and a correspondence table is created (the correspondence table is stored in the storage unit 13).

図１７は確信度付与処理フローチャートである。以下、図１７の処理Ｓ１１１〜Ｓ１１４にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７の文書分類装置の代わりに情報検索システムを用い、kpの代わりにkjを用いるものである）。 FIG. 17 is a flowchart of certainty factor assignment processing. Hereinafter, description will be made in accordance with steps S111 to S114 in FIG. 17 (see FIG. 7 for the certainty factor assigning device, where an information retrieval system is used in place of the document classification device in FIG. 7 and kj is used in place of kp. Is).

Ｓ１１１：入力部１より、新たな情報検索の質問を入力する。 S111: A new information search question is input from the input unit 1.

Ｓ１１２：情報検索システムは、前記入力された質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、情報検索を行って記事を取り出す。 S112: The information search system performs morphological analysis from the input question, extracts a noun as a keyword, performs an information search using the keyword, and extracts an article.

Ｓ１１３：確信度付与部１２は、情報検索システムにより、各記事がぎりぎり出力されるkjを求める。 S113: The certainty factor assigning unit 12 obtains kj at which each article is barely output by the information search system.

Ｓ１１４：確信度付与部１２は、格納部１３の対応表から前記求めたkjに対応する確信度を記事に付与して出力部より出力する。 S114: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kj from the correspondence table of the storage unit 13 to the article and outputs the article from the output unit.

ｃ）スコア（Score ）を利用する場合の説明
Score を利用する方法の場合は、Score が kl 以上の文書までを出力システムとする。いくつか、klの値を変えて、この情報検索システムで、評価データの問題を解き、精度（適合率）、再現率、Ｆ値の値を求める。そうすると、Score であるklと精度（適合率）、再現率、Ｆ値の対応表が完成する。 c) Explanation of using score (Score)
In the case of the method using Score, the output system is up to the document whose Score is kl or more. This information retrieval system solves the problem of evaluation data by changing the value of kl, and obtains the accuracy (precision), recall, and F value. Then, the correspondence table of Score kl, accuracy (precision), recall, and F-number is completed.

次に、新しい情報検索の質問が入ってくる。先の方法（対応表の作成方法）で文書を出力する。ここで、各文書がぎりぎり出力されるklを求める。すると各文書の Scoreが kl となる。（ kl の定義によりこうなる。他の方法ではこの部分は異なった方法になる）。 Next comes a new information retrieval question. Output the document using the previous method (method for creating the correspondence table). Here, kl at which each document is output is obtained. Then, the score of each document becomes kl. (This is due to the definition of kl. In other ways this part is different).

各文書のklが求まれば、先の対応表に基づいて、各文書に対応する精度（適合率）、再現率、Ｆ値をくっつけて表示する。（その文書までの文書（記事）群に対する精度（適合率）、再現率、Ｆ値であることに注意。個々の文書の精度（適合率）、再現率、Ｆ値でない。個々の文書のものについては、以下の個々の値の算出の場合を参照）
図１８は対応表作成処理フローチャートである。以下、図１８の処理Ｓ１２１〜Ｓ１２５にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７の文書分類装置の代わりに情報検索システムを用い、kpの代わりにklを用いるものである）。 Once the kl of each document is obtained, the accuracy (relevance rate), recall rate, and F value corresponding to each document are displayed based on the previous correspondence table. (Note that the accuracy (relevance rate), recall rate, and F value for the document (article) group up to that document. Not the accuracy (relevance rate), recall rate, and F value of each document. (See the case of calculating individual values below)
FIG. 18 is a correspondence table creation process flowchart. Hereinafter, description will be made in accordance with steps S121 to S125 of FIG. 18 (see FIG. 7 for the certainty factor assigning device, where an information retrieval system is used instead of the document classification device of FIG. 7 and kl is used instead of kp). Is).

Ｓ１２１：入力部１より、予め質問（問題）と記事（解答）の組を大量に入力し、情報検索システムの格納手段に格納する。 S121: A large number of sets of questions (questions) and articles (answers) are input in advance from the input unit 1 and stored in the storage means of the information search system.

Ｓ１２２：情報検索システムは、前記入力されたある１つの質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、記事（解答）の情報検索を行って記事を取り出す。 S122: The information search system performs a morphological analysis from one input question, extracts a noun as a keyword, performs an information search for an article (answer) using the keyword, and extracts an article.

Ｓ１２３：情報検索システムは、Score がkl以上の文書（記事）までを出力する。 S123: The information search system outputs up to documents (articles) whose Score is kl or more.

Ｓ１２４：対応表作成部１１は、klを変化させた時に情報検索システムより出力されるそれぞれの文書（記事）の確信度を求める。 S124: The correspondence table creation unit 11 obtains the certainty factor of each document (article) output from the information search system when kl is changed.

Ｓ１２５：対応表作成部１１は、前記Ｓ１２１で入力した質問全てについて、情報検索システムで文書（記事）を出力し、klを変化させて確信度を求め、更に同じklに対応する全ての文書（記事）の確信度の平均値を求め、対応表を作成する（対応表は格納部１３に格納する）。 S125: The correspondence table creation unit 11 outputs a document (article) by the information search system for all the questions input in S121, obtains a certainty factor by changing kl, and further calculates all documents corresponding to the same kl ( The average value of the certainty of the article is obtained and a correspondence table is created (the correspondence table is stored in the storage unit 13).

図１９は確信度付与処理フローチャートである。以下、図１９の処理Ｓ１３１〜Ｓ１３４にしたがって説明する（確信度付与装置は図７参照、但し、ここでは図７の文書分類装置の代わりに情報検索システムを用い、kpの代わりにklを用いるものである）。 FIG. 19 is a flowchart of certainty factor assignment processing. Hereinafter, description will be made in accordance with steps S131 to S134 in FIG. 19 (see FIG. 7 for the certainty factor assigning apparatus, where an information retrieval system is used in place of the document classification apparatus in FIG. 7 and kl is used in place of kp. Is).

Ｓ１３１：入力部１より、新たな情報検索の質問を入力する。 S131: A new information search question is input from the input unit 1.

Ｓ１３２：情報検索システムは、前記入力された質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、情報検索を行って記事を取り出す。 S132: The information search system performs morphological analysis from the input question, extracts a noun as a keyword, performs an information search using the keyword, and extracts an article.

Ｓ１３３：確信度付与部１２は、情報検索システムにより、各記事がぎりぎり出力されるklを求める。 S133: The certainty factor assigning unit 12 obtains kl at which each article is output with the information search system.

Ｓ１３４：確信度付与部１２は、格納部１３の対応表から前記求めたklに対応する確信度を記事に付与して出力部より出力する。 S134: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kl from the correspondence table of the storage unit 13 to the article and outputs it from the output unit.

ｄ）klのスコアの正規化の説明
klとしては、スコアの正規化を行ったものを用いてもよい。スコアの正規化としてはいくつか方法がある。前記（１）：情報検索システムの説明で説明した Okapiの式のΣで単語分だけ加算するが、その単語の数で、元のスコアを割るという方法が最も単純な正規化としてありえる。 d) Explanation of kl score normalization
As kl, a score normalized may be used. There are several methods for score normalization. (1): The number of words is added in Okapi's formula Σ described in the description of the information retrieval system, but the original score is divided by the number of words as the simplest normalization.

次に、W の単語が１回ずつ出現する記事を想定して、その記事のスコアで、元のスコアを割るという方法がある。 Next, assuming an article in which the word W appears once, the original score is divided by the score of the article.

また、方法自体を変更して、Σで加算するということをやめて、ベクトルにしてから、スコアを求めることで正規化と同じ効果をもたせてもよい。 Alternatively, the method itself may be changed to stop the addition by Σ, make a vector, and then obtain the score to obtain the same effect as normalization.

例えば、あらゆる種類の単語分だけ、要素とするベクトルを作成して、各ベクトルの要素の値は、前記 Okapiの式のΣの内部の部分の式を利用して求めて、入力のキーワードでもベクトルを作成し、検索対象の文書でもベクトルを作成する。これらベクトルの角度をスコアとする。角度を利用することで、正規化と同じ効果をもつ。 For example, vectors of all kinds of words are created as elements, and the values of the elements of each vector are obtained using the expression inside the Σ of the Okapi expression. To create a vector for the document to be searched. The angle of these vectors is taken as a score. Using the angle has the same effect as normalization.

Ｆタームの話だと、BM25と Okapiはほぼ同じ式だが、BM25の式（６）のΣで単語分だけ加算するが、その単語の数で、元のスコアを割るという方法が最も単純な正規化としてありえる。 In the case of F-term, BM25 and Okapi are almost the same formula, but add only the word by Σ of BM25 formula (6), but the method of dividing the original score by the number of words is the simplest regular It can be as

次に、W の単語が１回ずつ出現する記事を想定してその記事のスコアで、元のスコアを割るという方法がある。 Next, assuming an article in which the word W appears once, the original score is divided by the article score.

また、方法自体を変更して、Σで加算するということをやめて、ベクトルにしてから、スコアを求めることで正規化と同じ効果をもたせてもよい。例えば、あらゆる種類の単語分だけ、要素とするベクトルを作成して、各ベクトルの要素の値は、入力の単語については、式（８）を使い文書の単語については、式（７）を使い、求めて、入力のキーワードでもベクトルを作成し、検索対象の文書でもベクトルを作成する。これらベクトルの角度をスコアとする。角度を利用することで、正規化と同じ効果をもつ。 Alternatively, the method itself may be changed to stop the addition by Σ, make a vector, and then obtain the score to obtain the same effect as normalization. For example, vectors for all types of words are created as elements, and the values of the elements of each vector are expressed using equation (8) for input words and using equation (7) for document words. Then, a vector is created for the input keyword, and a vector is created for the document to be searched. The angle of these vectors is taken as a score. Using the angle has the same effect as normalization.

ｅ）観点の異なる２つの所定値を使う場合の説明
例えば、kpと kl の二つを使うことを考える。kp = 0, 0.1, 0.2, ,, 1.0と kj = 1, 2, 3, ,, 1000の二つを使うことを考える。これらのあらゆる組み合わせの場合の、確信度の平均を求めて、対応表を作る。 e) Explanation of using two predetermined values from different viewpoints For example, consider using two of kp and kl. Consider using kp = 0, 0.1, 0.2,, 1.0 and kj = 1, 2, 3,, 1000. For all these combinations, find the average confidence and create a correspondence table.

kp = 0, kj = 1 の場合の確信度 ...
kp = 0.1, kj = 1 の場合の確信度 ...
...
kp = 0, kj = 2 の場合の確信度 ...
kp = 0.1, kj = 2 の場合の確信度 ...
...
...
kp = 0, kj = 1000 の場合の確信度 ...
kp = 0.1, kj = 1000 の場合の確信度 ...
...
上のように対応表が求まる．
ここで、あたらしい問題が入力される。そして、解答を出力させる。解答を出力させる時点の kp, kj を求める。このkp, kjは、１つのときと同じ方法で求めれる。kp, kjがわかれば上記の対応表を調べて、その場合の確信度を求めて出力する。解答を出力させる時点の kp, kj とぴったり同じときのデータが対応表にない場合は補間処理を行う。 Confidence for kp = 0, kj = 1 ...
Confidence for kp = 0.1 and kj = 1 ...
...
Confidence for kp = 0, kj = 2 ...
Confidence for kp = 0.1, kj = 2 ...
...
...
Confidence when kp = 0, kj = 1000 ...
Confidence for kp = 0.1 and kj = 1000 ...
...
The correspondence table is obtained as above.
Here, a new problem is entered. Then, the answer is output. Find kp and kj at the time when the answer is output. These kp and kj are obtained by the same method as in the case of one. If kp and kj are known, the above correspondence table is examined, and the certainty factor in that case is obtained and output. If there is no data in the correspondence table that is exactly the same as kp and kj at the time when the answer is output, interpolation is performed.

例えば、このあと、新しい入力で kp が kp1で kj が kj1であったとする。そして、kp1 , kj1 の場合の値が表にのっていないとする。そうすると、ある種の補間処理が必要になる。その場合は、表にのっている、kp1 と最も近い値の kp と、kj1 と最も近い値の kj との組み合わせの時点の値を使ってもいいし、表にのっている、kp1 をはさむ二つの kp 、kj1 をはさむ二つの kj を用い、二つの kp と二つの kj から kp, kj をひとつずつ選ぶ全ての組み合わせの４つのデータの平均を使ってもよい。 For example, suppose that after this, kp is kp1 and kj is kj1 for a new input. The values for kp1 and kj1 are not on the table. Then, some kind of interpolation processing is required. In that case, you can use the value of the combination of kp1 closest to kp1 and kj1 closest to kj, or you can use kp1 Two kp sandwiched between two kj sandwiched between kj1 may be used, and the average of four combinations of all combinations of kp and kj selected from two kp and two kj may be used.

また、表にのっているkp1 をはさむ二つの kp の２つのデータ kp2, kp3 (kp2>kp>kp3)をkj1 をはさむ二つの kj の２つのデータ kj2, kj3 (kj2>kj>kj3)を用い、kp2, kj2のときの確信度を p(2,2) 、kp3, kj2のときの確信度を p(3,2) 、kp2, kj3のときの確信度を p(2,3) 、kp3, kj3のときの確信度を p(3,3) とし、
r(2,2) = sqrt( (kp-kp2) ＾2 + a(kj-kj2) ＾2 )
r(3,2) = sqrt( (kp-kp3) ＾2 + a(kj-kj2) ＾2 )
r(2,3) = sqrt( (kp-kp2) ＾2 + a(kj-kj3) ＾2 )
r(3,3) = sqrt( (kp-kp3) ＾2 + a(kj-kj3) ＾2 )
として，
p(2,2)/r(2,2) + p(3,2)/r(3,2) + p(2,3)/r(2,3) + p(3,3)/r(3,3)
を
1/r(2,2) + 1/r(3,2) + 1/r(2,3) + 1/r(3,3)
で割ったものを確信度に用いてよい。 Also, the two data kp2, kp3 (kp2>kp> kp3) sandwiching kp1 in the table and the two data kj2, kj3 (kj2>kj> kj3) sandwiching kj1 P (2,2) for certainty for kp2, kj2, p (3,2) for certainty for kp3, kj2, and p (2,3) for certainty for kp2, kj3 The confidence for kp3, kj3 is p (3,3)
r (2,2) = sqrt ((kp-kp2) ^ 2 + a (kj-kj2) ^ 2)
r (3,2) = sqrt ((kp-kp3) ^ 2 + a (kj-kj2) ^ 2)
r (2,3) = sqrt ((kp-kp2) ^ 2 + a (kj-kj3) ^ 2)
r (3,3) = sqrt ((kp-kp3) ^ 2 + a (kj-kj3) ^ 2)
As
p (2,2) / r (2,2) + p (3,2) / r (3,2) + p (2,3) / r (2,3) + p (3,3) / r (3,3)
The
1 / r (2,2) + 1 / r (3,2) + 1 / r (2,3) + 1 / r (3,3)
Divided by may be used for confidence.

ここで、 aは定数であり、あらかじめ実験で定めるか、システム利用者が予め値を与える。＾はべき乗を意味し、sqrtは平方根を意味する。これに類する方法でもよい。他の補間方法でもよい。kp kj klなど３つ以上使う場合も同様である。 Here, a is a constant, which is determined in advance by experiment or given in advance by the system user. ^ Means power and sqrt means square root. A similar method may be used. Other interpolation methods may be used. The same applies when three or more such as kp kj kl are used.

§３：個々の値の算出の説明
（１）：文書分類装置を用いる場合の説明
ａ）kpの値を利用する場合の説明
個々の値の算出の場合は、予め問題と解答の組を大量に集める。問題は、F-termをふるべき特許、解答は、その特許のF-termである。これを評価データと呼ぶ。 §3: Description of calculation of individual values (1): Description of using document classification device a) Description of using kp values In the case of calculation of individual values, a large number of pairs of questions and answers are preliminarily used. To collect. The problem is the patent that should use the F-term, and the answer is the F-term of that patent. This is called evaluation data.

前記文書分類装置（特許文書分類装置）で上記評価データでF-termを出力する。ここで各F-termがぎりぎり出力されるkpを求める。この求め方は、以下のようにする。 The document classification device (patent document classification device) outputs an F-term with the evaluation data. Here, kp is calculated for each F-term. This method is as follows.

あるF-termのスコア（Score ）を最初のF-term（最もスコアの大きいF-term）のスコアで割った値がそのF-termがぎりぎり出力されるkpとなる。（kpの定義によりこうなる、式（２）を参照のこと、順位による方法や他の方法ではこの部分は異なった方法になることに注意) スコアは式（１）等を利用して求める。 The value obtained by dividing the score (Score) of an F-term by the score of the first F-term (the F-term with the highest score) is the kp at which that F-term is output. (Refer to equation (2), which depends on the definition of kp, and note that this method is different in the ranking method and other methods.) The score is obtained using equation (1) and the like.

前記出力された上記評価データの各F-termごとにそれが正解しているかを調べて、各kpの時の正解率を求める。更に同じkpに対応する全ての上記評価データ（特許文書）のF-termの正解率の平均値を求める。そうすると、kpと正解率の対応表が完成する。 It is checked for each F-term of the output evaluation data that is output, and the correct answer rate at each kp is obtained. Furthermore, the average value of the correct answer rate of the F-term of all the evaluation data (patent documents) corresponding to the same kp is obtained. Then, the correspondence table between kp and accuracy rate is completed.

新しい特許（分類が付与されていない）が入ってくると、前記文書分類装置でF-termを出力する。各F-termがぎりぎり出力されるkpを求める。この求め方は、上記対応表作成の場合と同様であり、あるF-termのスコアを最初のF-term（最もScore の大きいF-term) のスコアで割った値がそのF-termがぎりぎり出力されるkpとなる。（kpの定義によりこうなる、式（２）を参照のこと、順位による方法や他の方法ではこの部分は異なった方法になることに注意) スコアは式（１）等を利用して求める。 When a new patent (without classification) is entered, the document classification device outputs an F-term. Find the kp at which each F-term is output. This method is the same as in the case of creating the above correspondence table. The value obtained by dividing the score of a certain F-term by the score of the first F-term (the F-term with the highest score) is the last. Output kp. (Refer to equation (2), which depends on the definition of kp, and note that this method is different in the ranking method and other methods.) The score is obtained using equation (1) and the like.

各F-termのkpが求まれば、先の対応表に基づいて、各F-termに対応する正解率をくっつけて表示する。（この正解率は、個々のF-termの正解率であることに注意。そのF-termまでのF-term群に対する精度（適合率) 、再現率、Ｆ値などとは異なるものである。）
図２０は対応表作成処理フローチャートである。以下、図２０の処理Ｓ１４１〜Ｓ１４５にしたがって説明する（確信度付与装置は図７参照）。 If kp of each F-term is obtained, the correct answer rate corresponding to each F-term is attached and displayed based on the previous correspondence table. (Note that this accuracy rate is the accuracy rate of each F-term. The accuracy (accuracy rate), recall rate, F value, etc. for the F-term group up to that F-term are different. )
FIG. 20 is a flowchart of the correspondence table creation process. Hereinafter, description will be made according to the processing S141 to S145 of FIG.

Ｓ１４１：入力部１より、予め問題と解答の組（ここでは特許文書とそのF-term）を大量に入力し、文書分類装置１０の格納手段に格納する。 S141: A large number of pairs of questions and answers (here, patent documents and their F-terms) are input in advance from the input unit 1 and stored in the storage means of the document classification device 10.

Ｓ１４２：文書分類装置１０は、前記入力された１つの特許文書と類似する他の特許文書を検索して分類を求める（F-termを求める）。 S142: The document classification device 10 searches for another patent document similar to the one input patent document and determines the classification (determines F-term).

Ｓ１４３：文書分類装置１０は、前記類似する他の特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）のスコアを算出する。 S143: The document classification apparatus 10 calculates the score of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the other similar patent document appears.

Ｓ１４４：対応表作成部１１は、各分類（F-term）がきりぎり出力されるkpを求め、各分類（F-term）ごとにそれが正解しているかを調べて、各kpのときの正解率を求める。 S144: Correspondence table creation unit 11 obtains kp at which each classification (F-term) is output, checks whether it is correct for each classification (F-term), and obtains kp at each kp. Find the accuracy rate.

Ｓ１４５：対応表作成部１１は、前記Ｓ１４１で入力した特許文書全てについて、文書分類装置１０で分類を付与（F-termを求め）し、各分類（F-term）がきりぎり出力されるkpを求め、更に該同じkpに対応する全ての特許文書の正解率の平均値を求め、対応表を作成する（対応表は格納手段１３に格納される）。 S145: The correspondence table creation unit 11 assigns a classification (F-term is determined) by the document classification apparatus 10 to all the patent documents input in S141, and each classification (F-term) is output at the limit kp Further, an average value of correct answer rates of all patent documents corresponding to the same kp is obtained, and a correspondence table is created (the correspondence table is stored in the storage means 13).

図２１は確信度付与処理フローチャートである。以下、図２１の処理Ｓ１５１〜Ｓ１５５にしたがって説明する。 FIG. 21 is a flowchart showing the certainty factor giving process. Hereinafter, a description will be given according to processing S151 to S155 of FIG.

Ｓ１５１：入力部１より、新たな文書（F-termが付与されていない特許文書）を入力する。 S151: A new document (patent document to which no F-term is assigned) is input from the input unit 1.

Ｓ１５２：文書分類装置１０は、前記入力された特許文書と類似する特許文書（前記処理Ｓ１４１で入力されたの特許文書）を検索して分類を求める（F-termを求める）。 S152: The document classification device 10 searches for a patent document similar to the input patent document (the patent document input in the process S141) to determine the classification (F-term is determined).

Ｓ１５３：文書分類装置１０は、前記類似する特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記付与した分類（F-term）のスコアを算出する。 S153: The document classification apparatus 10 calculates the score of the assigned classification (F-term) according to the number of patent documents in which the classification (F-term) of the similar patent document appears.

Ｓ１５４：確信度付与部１２は、各分類（F-term）がぎりぎり出力されるkpを求める。 S154: The certainty factor assigning unit 12 obtains kp at which each classification (F-term) is barely output.

Ｓ１５５：確信度付与部１２は、格納部１３の対応表から前記求めたkpに対応する確信度を各F-termに付与して出力部より出力する。 S155: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the kp obtained from the correspondence table of the storage unit 13 to each F-term and outputs it from the output unit.

ｂ）出力順位を利用する場合の説明
出力順位を利用する方法の場合は、F-termを出力システム（前記特許文書分類装置）とする。このシステムで評価データの問題を解き、kj個目の出力のF-termがあっているかまちがっているかを調べて、kj個目の出力の正解率を求める。そうすると、kjと正解率の対応表が完成する。 b) Explanation in the case of using the output order In the case of the method using the output order, the F-term is the output system (the patent document classification device). This system solves the problem of the evaluation data, checks whether the F-term of the kj-th output is present or wrong, and obtains the accuracy rate of the kj-th output. Then, the correspondence table between kj and accuracy rate is completed.

新しい特許が入ってくると、先の方法（特許文書分類装置）でF-termを出力する。そして、各F-termがぎりぎり出力されるkjを求める。そのF-termが出力される順位がkjとなる。（ kj の定義によりこうなる。他の方法ではこの部分は異なった方法になる）。 When a new patent comes in, F-term is output by the previous method (patent document classification device). Then, kj that is output at the end of each F-term is obtained. The order in which the F-term is output is kj. (This is due to the definition of kj. In other ways this part is different).

各Fterm のkjが求まれば、先の対応表に基づいて、各Fterm に対応する正解率をくっつけて表示する。 If kj of each Fterm is obtained, the correct answer rate corresponding to each Fterm is displayed based on the above correspondence table.

図２２は対応表作成処理フローチャートである。以下、図２２の処理Ｓ１６１〜Ｓ１６５にしたがって説明する（確信度付与装置は図７参照）。 FIG. 22 is a flowchart of the correspondence table creation process. In the following, description will be given in accordance with steps S161 to S165 of FIG. 22 (see FIG. 7 for the certainty factor imparting apparatus).

Ｓ１６１：入力部１より、予め問題と解答の組（ここでは特許文書とそのF-term）を大量に入力し、文書分類装置１０の格納手段に格納する。 S161: A large number of sets of questions and answers (here, patent documents and their F-terms) are input in advance from the input unit 1 and stored in the storage means of the document classification device 10.

Ｓ１６２：文書分類装置１０は、前記入力された１つの特許文書と類似する他の特許文書を検索して分類を求める（F-termを求める）。 S162: The document classification device 10 searches for another patent document similar to the one input patent document to determine the classification (F-term is determined).

Ｓ１６３：文書分類装置１０は、前記類似する他の特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）の順位kjを算出する。 S163: The document classification device 10 calculates the rank kj of the obtained classification (F-term) based on the number of patent documents in which the classification (F-term) of the other similar patent document appears.

Ｓ１６４：対応表作成部１１は、kj個目の分類（F-term）の出力があっているか間違っているかを調べて、kj個目の出力の正解率を求める。 S164: The correspondence table creation unit 11 checks whether the kj-th classification (F-term) is output or is incorrect, and obtains the accuracy rate of the kj-th output.

Ｓ１６５：対応表作成部１１は、前記Ｓ１６１で入力した特許文書全てについて、kj個目の出力の正解率を求め、更に同じkjに対応する全ての特許文書の正解率の平均値を求め、対応表を作成する（対応表は格納手段１３に格納される）。 S165: The correspondence table creation unit 11 obtains the accuracy rate of the kj-th output for all the patent documents input in S161, and further obtains the average value of the accuracy rates of all patent documents corresponding to the same kj A table is created (the correspondence table is stored in the storage means 13).

図２３は確信度付与処理フローチャートである。以下、図２３の処理Ｓ１７１〜Ｓ１７５にしたがって説明する（確信度付与装置は図７参照）。 FIG. 23 is a flowchart of certainty factor assignment processing. In the following, description will be made according to processing S171 to S175 in FIG. 23 (see FIG. 7 for the certainty factor imparting apparatus).

Ｓ１７１：入力部１より、新たな文書（F-termが付与されていない特許文書）を入力する。 S171: A new document (patent document to which no F-term is assigned) is input from the input unit 1.

Ｓ１７２：文書分類装置１０は、前記入力された特許文書と類似する特許文書（前記処理Ｓ１６１で入力されたの特許文書）を検索して分類を求める（F-termを求める）。 S172: The document classification device 10 searches for a patent document similar to the input patent document (the patent document input in the process S161) to determine the classification (F-term is determined).

Ｓ１７３：文書分類装置１０は、前記類似する特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）の順位kjを算出する。 S173: The document classification device 10 calculates the rank kj of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the similar patent document appears in.

Ｓ１７４：確信度付与部１２は、各F-termがぎりぎり出力されるkjを求める。 S174: The certainty factor assigning unit 12 obtains kj at which each F-term is barely output.

Ｓ１７５：確信度付与部１２は、格納部１３の対応表から前記求めたkjに対応する確信度を各F-termに付与して出力部より出力する。 S175: The certainty factor assigning unit 12 assigns the certainty factor corresponding to the obtained kj from the correspondence table of the storage unit 13 to each F-term and outputs it from the output unit.

ｃ）スコア（score ）を利用する場合の説明
スコアを利用する方法の場合は、F-termを出力システム（前記特許文書分類装置）を使用する。このシステムで評価データの問題を解き、出力される各F-termを評価する。F-termのスコアが kl のものについて、そのF-termがあっているかどうかを調べて、klの場合の正解率を求める。これをあらゆるklについて求める。そうすると、klと正解率の対応表が完成する。 c) Explanation when using score (score) In the case of a method using a score, an F-term output system (the patent document classification device) is used. This system solves the problem of evaluation data and evaluates each output F-term. If the F-term score is kl, check whether the F-term exists, and find the correct answer rate for kl. Find this for every kl. Then, the correspondence table between kl and accuracy rate is completed.

新しい特許（F-termが付与されていない）が入ってくると、特許文書分類装置でF-termを出力する。ここで各F-termがぎりぎり出力されるklを求める。すると各F-termのスコアが kl となる。（ kl の定義によりこうなる。他の方法ではこの部分は異なった方法になる）
各Fterm のklが求まれば、先の対応表に基づいて、各Fterm に対応する正解率をくっつけて表示する。 When a new patent (without F-term) is entered, the patent document classification device outputs the F-term. Here, kl that each F-term is output is obtained. Then the score of each F-term becomes kl. (It depends on the definition of kl. In other methods, this part is different.)
When the kl of each Fterm is obtained, the correct answer rate corresponding to each Fterm is displayed based on the previous correspondence table.

図２４は対応表作成処理フローチャートである。以下、図２４の処理Ｓ１８１〜Ｓ１８５にしたがって説明する（確信度付与装置は図７参照）。 FIG. 24 is a flowchart of the correspondence table creation process. Hereinafter, description will be made according to processing S181 to S185 in FIG. 24 (see FIG. 7 for the certainty factor imparting apparatus).

Ｓ１８１：入力部１より、予め問題と解答の組（ここでは特許文書とそのF-term）を大量に入力し、文書分類装置１０の格納手段に格納する。 S181: A large number of sets of questions and answers (here, patent documents and their F-terms) are input in advance from the input unit 1 and stored in the storage means of the document classification device 10.

Ｓ１８２：文書分類装置１０は、前記入力された１つの特許文書と類似する他の特許文書を検索して分類を求める（F-termを求める）。 S182: The document classification device 10 searches for another patent document similar to the inputted one patent document to determine the classification (determines F-term).

Ｓ１８３：文書分類装置１０は、前記類似する他の特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）のスコア（kl）を算出する。 S183: The document classification apparatus 10 calculates the score (kl) of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the other similar patent document appears. To do.

Ｓ１８４：対応表作成部１１は、F-termのスコアが kl のものについて、そのF-termがあっているかどうかを調べて、klの場合の正解率を求める。 S184: The correspondence table creation unit 11 examines whether or not the F-term score is kl, and determines whether the F-term is present, and obtains the correct answer rate in the case of kl.

Ｓ１８５：対応表作成部１１は、これを前記Ｓ１８１で入力した特許文書の分類（F-term）のあらゆるklについてその正解率を求める。そうすると、klと正解率の対応表が完成する（対応表は格納手段１３に格納される）。 S185: The correspondence table creation unit 11 obtains the correct answer rate for every kl of the classification (F-term) of the patent document input in S181. Then, the correspondence table between kl and accuracy rate is completed (the correspondence table is stored in the storage means 13).

図２５は確信度付与処理フローチャートである。以下、図２５の処理Ｓ１９１〜Ｓ１９５にしたがって説明する（確信度付与装置は図７参照）。 FIG. 25 is a flowchart of the certainty degree giving process. In the following, description will be made in accordance with steps S191 to S195 of FIG. 25 (see FIG. 7 for the certainty factor imparting apparatus).

Ｓ１９１：入力部１より、新たな文書（F-termが付与されていない特許文書）を入力する。 S191: A new document (patent document to which no F-term is assigned) is input from the input unit 1.

Ｓ１９２：文書分類装置１０は、前記入力された特許文書と類似する特許文書（前記処理Ｓ１８１で入力されたの特許文書）を検索して分類を求める（F-termを求める）。 S192: The document classification device 10 searches for a patent document similar to the input patent document (the patent document input in the process S181) to determine the classification (determines an F-term).

Ｓ１９３：文書分類装置１０は、前記類似する特許文書の分類（F-term）が何個の特許文書に現れたか等により、前記求めた分類（F-term）のスコア（kl）を算出する。 S193: The document classification device 10 calculates the score (kl) of the obtained classification (F-term) based on how many patent documents the classification (F-term) of the similar patent document appears in.

Ｓ１９４：確信度付与部１２は、各F-termがぎりぎり出力されるklを求める。 S194: The certainty factor assigning unit 12 obtains kl from which each F-term is output.

Ｓ１９５：確信度付与部１２は、格納部１３の対応表から前記求めたklに対応する正解率を各F-termに付与して出力部より出力する。 S195: The certainty factor assigning unit 12 assigns the correct answer rate corresponding to the obtained kl from the correspondence table of the storage unit 13 to each F-term and outputs it from the output unit.

（２）：情報検索装置を用いる場合の説明
予め問題と解答の組を大量に集める。問題は、情報検索の質問（例えば、企業合併に関する記事を取り出すこと) であり、解答は、その質問に対応する記事（文書）群である。これを評価データと呼ぶ、ここで前に説明したような情報検索システム（情報検索装置）を一つ用意する。 (2): Explanation in the case of using the information retrieval apparatus Collect a large number of sets of questions and answers in advance. The problem is an information retrieval question (for example, taking out an article about a corporate merger), and the answer is an article (document) group corresponding to the question. One information retrieval system (information retrieval apparatus) as described above is prepared, which is called evaluation data.

質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して上記情報検索システムで記事を取り出す。そうすると、各記事はOkapi の式ならScore(D)の値を持ち、この値の大きいものが出力される。 From the question, morphological analysis is performed, nouns are extracted as keywords, and articles are extracted by the information search system using the keywords. Then, each article has a Score (D) value if it is an Okapi expression, and the one with this value is output.

ａ）kpの値を利用する場合の説明
kpの値を利用する方法の場合は、ある質問の場合のScore(D)の最大値を Score＿max とする。そして、Score ＿max * kpの記事（文書）まで出力する。 a) Explanation of using kp value
In the case of a method using the value of kp, Score_max is the maximum value of Score (D) for a certain question. Then, it outputs up to an article (document) of Score_max * kp.

前記情報検索システムで上記評価データで記事（文書）群を出力する。ここで各記事（文書）がぎりぎり出力されるkpを求める。この求め方は、以下のようにする。 The information retrieval system outputs an article (document) group with the evaluation data. Here, kp at which each article (document) is barely output is obtained. This method is as follows.

ある記事のスコア（Score ）を最初の記事（最もスコアの大きい記事）のスコアで割った値がその記事がぎりぎり出力されるkpとなる。（kpの定義によりこうなる、式（２）を参照のこと、順位による方法や他の方法ではこの部分は異なった方法になることに注意) スコアは式（１）等を利用して求める。 The value obtained by dividing the score of an article (Score) by the score of the first article (the article with the highest score) is the kp at which that article is output. (Refer to equation (2), which depends on the definition of kp, and note that this method is different in the ranking method and other methods.) The score is obtained using equation (1) and the like.

前記出力された上記評価データの各記事ごとにそれが正解しているかを調べて、各kpの時の正解率を求める。更に同じkpに対応する全ての上記評価データ（質問）の記事の正解率の平均値を求める。そうすると、kpと正解率の対応表が完成する。 For each article of the output evaluation data, it is checked whether it is correct, and the correct answer rate at each kp is obtained. Furthermore, the average value of the correct answer rates of the articles of all the evaluation data (questions) corresponding to the same kp is obtained. Then, the correspondence table between kp and accuracy rate is completed.

新しい情報検索の質問が入ってくると、前記情報検索システムで記事を出力する。各記事がぎりぎり出力されるkpを求める。この求め方は、上記対応表作成の場合と同様であり、ある記事のスコアを最初の記事（最もScore の大きい記事) のスコアで割った値がその記事がぎりぎり出力されるkpとなる。（kpの定義によりこうなる、式（２）を参照のこと、順位による方法や他の方法ではこの部分は異なった方法になることに注意) スコアは式（１）等を利用して求める。 When a new information retrieval question comes in, an article is output by the information retrieval system. Find the kp at which each article is output. This method is the same as in the case of creating the correspondence table, and the value obtained by dividing the score of an article by the score of the first article (the article with the highest score) is the kp at which that article is output. (Refer to equation (2), which depends on the definition of kp, and note that this method is different in the ranking method and other methods.) The score is obtained using equation (1) and the like.

各記事のkpが求まれば、先の対応表に基づいて、各記事に対応する正解率をくっつけて表示する。（この正解率は、個々のF-termの正解率であることに注意。そのF-termまでのF-term群に対する精度（適合率) 、再現率、Ｆ値などとは異なるものである。）
図２６は対応表作成処理フローチャートである。以下、図２６の処理Ｓ２０１〜Ｓ２０５にしたがって説明する（確信度付与装置は図７参照、但し、図７の文書分類装置の代わりに情報検索システム（装置）を用いる）。 When kp of each article is obtained, the correct answer rate corresponding to each article is displayed based on the previous correspondence table. (Note that this accuracy rate is the accuracy rate of each F-term. The accuracy (accuracy rate), recall rate, F value, etc. for the F-term group up to that F-term are different. )
FIG. 26 is a flowchart of the correspondence table creation process. In the following, description will be made in accordance with steps S201 to S205 in FIG. 26 (see FIG. 7 for the certainty factor assigning apparatus, but an information retrieval system (apparatus) is used instead of the document classification apparatus in FIG. 7).

Ｓ２０１：入力部１より、予め問題と解答の組（ここでは情報検索の質問とその質問に対応する記事群）を大量に入力し、情報検索システムの格納手段に格納する。 S201: A large number of sets of questions and answers (here, information search questions and articles corresponding to the questions) are input in advance from the input unit 1 and stored in the storage means of the information search system.

Ｓ２０２：情報検索システムは、前記入力されたある１つの質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、前記入力された記事群の情報検索を行って記事を取り出す。 S202: The information search system performs a morphological analysis from one input question, extracts a noun as a keyword, and performs information search of the input article group using the keyword to search for an article. Take out.

Ｓ２０３：情報検索システムは、Score ＿max * kpの文書（記事）まで出力する。 S203: The information search system outputs up to Score_max * kp document (article).

Ｓ２０４：対応表作成部１１は、各記事がきりぎり出力されるkpを求め、各記事ごとにそれが正解しているかを調べて、各kpのときの正解率を求める。 S204: The correspondence table creation unit 11 obtains kp at which each article is output at the limit, checks whether it is correct for each article, and obtains the correct rate at each kp.

Ｓ２０５：対応表作成部１１は、前記Ｓ２０１で入力した質問全てについて、情報検索システムで記事を検索し、各記事がきりぎり出力されるkpを求め、更に該同じkpに対応する全ての特許文書の正解率の平均値を求め、対応表を作成する（対応表は格納手段１３に格納される）。 S205: The correspondence table creation unit 11 searches the articles with the information search system for all the questions input in S201, obtains kp at which each article is output, and further, all patent documents corresponding to the same kp. Is calculated, and a correspondence table is created (the correspondence table is stored in the storage means 13).

図２７は確信度付与処理フローチャートである。以下、図２７の処理Ｓ２１１〜Ｓ２１４にしたがって説明する。 FIG. 27 is a flowchart of certainty factor assignment processing. Hereinafter, a description will be given according to processing S211 to S214 in FIG.

Ｓ２１１：入力部１より、新たな情報検索の質問を入力する。 S211: A new information search question is input from the input unit 1.

Ｓ２１２：情報検索システムは、前記入力された質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、情報検索を行って記事を取り出す。 S212: The information search system performs morphological analysis from the input question, extracts a noun as a keyword, performs an information search using the keyword, and extracts an article.

Ｓ２１３：確信度付与部１２は、各記事がぎりぎり出力されるkpを求める。 S213: The certainty factor assigning unit 12 obtains kp at which each article is output at the last minute.

Ｓ２１４：確信度付与部１２は、格納部１３の対応表から前記求めたkpに対応する確信度である正解率を各記事に付与して出力部より出力する。 S214: The certainty factor assigning unit 12 assigns each article a correct answer rate, which is the certainty factor corresponding to the kp obtained from the correspondence table of the storage unit 13, and outputs it from the output unit.

ｂ）出力順位を利用する場合の説明
出力順位を利用する方法の場合は、情報検索システムを用いる。このシステムで評価データの問題を解き、kj個目の出力の記事があっているかまちがっているかを調べて、kj個目の出力の正解率を求める。そうすると、kjと正解率の対応表が完成する。 b) Explanation of using output order In the case of a method using output order, an information retrieval system is used. This system solves the problem of the evaluation data, checks whether the kj-th output article exists or is wrong, and calculates the correct answer rate of the kj-th output. Then, the correspondence table between kj and accuracy rate is completed.

新しい特許が入ってくると、先の方法（特許情報検索システム）で記事を出力する。そして、各記事がぎりぎり出力されるkjを求める。そのF-termが出力される順位がkjとなる。（ kj の定義によりこうなる。他の方法ではこの部分は異なった方法になる）。 When a new patent comes in, the article is output by the previous method (patent information retrieval system). Then, kj from which each article is output is obtained. The order in which the F-term is output is kj. (This is due to the definition of kj. In other ways this part is different).

図２８は対応表作成処理フローチャートである。以下、図２８の処理Ｓ２２１〜Ｓ２２５にしたがって説明する（確信度付与装置は図７参照、但し、図７の文書分類装置の代わりに情報検索システム（装置）を用いる）。 FIG. 28 is a correspondence table creation process flowchart. In the following, description will be made in accordance with steps S221 to S225 in FIG. 28 (see FIG. 7 for the certainty factor assigning apparatus, but an information retrieval system (apparatus) is used instead of the document classification apparatus in FIG. 7).

Ｓ２２１：入力部１より、予め問題と解答の組（ここでは情報検索の質問とその質問に対応する記事群）を大量に入力し、情報検索システムの格納手段に格納する。 S221: A large number of sets of questions and answers (here, information search questions and articles corresponding to the questions) are input in advance from the input unit 1 and stored in the storage means of the information search system.

Ｓ２２２：情報検索システムは、前記入力されたある１つの質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、前記入力された記事群の情報検索を行って記事を取り出す。 S222: The information search system performs a morphological analysis from one input question, extracts a noun as a keyword, performs information search of the input article group using the keyword, and searches for an article. Take out.

Ｓ２２３：情報検索システムは、前記取り出した記事の順位kjをまで出力する。 S223: The information retrieval system outputs up to the rank kj of the extracted articles.

Ｓ２２４：対応表作成部１１は、kj個目の記事の出力があっているか間違っているかを調べて、kj個目の出力の正解率を求める。 S224: The correspondence table creation unit 11 checks whether the kj-th article is output or is incorrect, and obtains the accuracy rate of the kj-th output.

Ｓ２２５：対応表作成部１１は、前記Ｓ２２１で入力した質問全てについて、kj個目の出力の正解率を求め、更に同じkjに対応する全ての記事の正解率の平均値を求め、対応表を作成する（対応表は格納手段１３に格納される）。 S225: The correspondence table creation unit 11 obtains the accuracy rate of the kj-th output for all the questions input in S221, further obtains the average value of the accuracy rates of all articles corresponding to the same kj, and obtains the correspondence table. Create (correspondence table is stored in the storage means 13).

図２９は確信度付与処理フローチャートである。以下、図２９の処理Ｓ２３１〜Ｓ２３４にしたがって説明する（確信度付与装置は図７参照、但し、図７の文書分類装置の代わりに情報検索システム（装置）を用いる）。 FIG. 29 is a flowchart of certainty factor assignment processing. In the following, description will be made in accordance with steps S231 to S234 in FIG. 29 (see FIG. 7 for the certainty factor assigning apparatus, but an information retrieval system (apparatus) is used instead of the document classification apparatus in FIG. 7).

Ｓ２３１：入力部１より、新たな情報検索の質問を入力する。 S231: A new information search question is input from the input unit 1.

Ｓ２３２：情報検索システムは、前記入力された質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、情報検索を行って記事を取り出す。 S232: The information retrieval system performs morphological analysis from the input question, extracts a noun as a keyword, performs an information search using the keyword, and extracts an article.

Ｓ２３３：確信度付与部１２は、各記事がぎりぎり出力されるkjを求める。 S233: The certainty factor assigning unit 12 obtains kj at which each article is barely output.

Ｓ２３４：確信度付与部１２は、格納部１３の対応表から前記求めたkjに対応する確信度である正解率を各記事に付与して出力部より出力する。 S234: The certainty factor assigning unit 12 assigns to each article a correct answer rate that is a certainty factor corresponding to the kj obtained from the correspondence table of the storage unit 13, and outputs it from the output unit.

ｃ）スコア（Score ）を利用する場合の説明
スコアを利用する方法の場合は、前記情報検索システムを使用する。このシステムで評価データの問題を解き、出力される各記事を評価する。記事のスコアが kl のものについて、その記事があっているかどうかを調べて、klの場合の正解率を求める。これをあらゆるklについて求める。そうすると、klと正解率の対応表が完成する。 c) Explanation in the case of using the score (Score) In the case of the method of using the score, the information search system is used. This system solves the problem of evaluation data and evaluates each output article. For articles with a score of kl, check if the article is present and find the correct answer rate for kl. Find this for every kl. Then, the correspondence table between kl and accuracy rate is completed.

新しい情報検索の質問が入ってくると、情報検索システムで記事を出力する。ここで各記事がぎりぎり出力されるklを求める。すると各記事のスコアが kl となる。（ kl の定義によりこうなる。他の方法ではこの部分は異なった方法になる）
各記事のklが求まれば、先の対応表に基づいて、各記事に対応する正解率をくっつけて表示する。 When a new information retrieval question comes in, an article is output by the information retrieval system. Here, kl at which each article is barely output is obtained. Then, the score of each article becomes kl. (It depends on the definition of kl. In other methods, this part is different.)
If the kl of each article is obtained, the correct answer rate corresponding to each article is displayed based on the previous correspondence table.

図３０は対応表作成処理フローチャートである。以下、図３０の処理Ｓ２４１〜Ｓ２４５にしたがって説明する（確信度付与装置は図７参照、但し、図７の文書分類装置の代わりに情報検索システム（装置）を用いる）。 FIG. 30 is a flowchart of the correspondence table creation process. Hereinafter, description will be made in accordance with steps S241 to S245 in FIG. 30 (see FIG. 7 for the certainty factor imparting apparatus, except that an information retrieval system (apparatus) is used instead of the document classification apparatus in FIG. 7).

Ｓ２４１：入力部１より、予め問題と解答の組（ここでは情報検索の質問とその質問に対応する記事群）を大量に入力し、情報検索システムの格納手段に格納する。 S241: A large number of sets of questions and answers (here, information search questions and articles corresponding to the questions) are input from the input unit 1 and stored in storage means of the information search system.

Ｓ２４２：情報検索システムは、前記入力されたある１つの質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、前記入力された記事群の情報検索を行って記事を取り出す。 S242: The information search system performs a morphological analysis from one input question, extracts a noun as a keyword, performs information search of the input article group using the keyword, and searches for an article. Take out.

Ｓ２４３：情報検索システムは、スコアがkl以上の記事を出力する。 S243: The information search system outputs an article having a score of kl or higher.

Ｓ２４４：対応表作成部１１は、記事のスコアが kl のものについて、その記事があっているかどうかを調べて、klの場合の正解率を求める。 S244: The correspondence table creation unit 11 checks whether there is an article for an article whose score is kl, and obtains a correct answer rate in the case of kl.

Ｓ２４５：対応表作成部１１は、これを前記Ｓ２４１で入力した質問全てについて、記事を出力し、同じklに対応する正解率の平均値を求め、klと正解率の対応表が完成する（対応表は格納手段１３に格納される）。 S245: The correspondence table creation unit 11 outputs articles for all the questions input in S241, calculates the average value of the correct answer rates corresponding to the same kl, and completes the correspondence table between kl and correct answer rates (correspondence The table is stored in the storage means 13).

図３１は確信度付与処理フローチャートである。以下、図３１の処理Ｓ２５１〜Ｓ２５４にしたがって説明する（確信度付与装置は図７参照、但し、図７の文書分類装置の代わりに情報検索システム（装置）を用いる）。 FIG. 31 is a flowchart showing the certainty factor giving process. In the following, description will be made in accordance with steps S251 to S254 in FIG. 31 (see FIG. 7 for the certainty factor assigning apparatus, but an information retrieval system (apparatus) is used instead of the document classification apparatus in FIG. 7).

Ｓ２５１：入力部１より、新たな情報検索の質問を入力する。 S251: A new information search question is input from the input unit 1.

Ｓ２５２：情報検索システムは、前記入力された質問から、形態素解析して、名詞をキーワードとして取り出して、そのキーワードを利用して、情報検索を行って記事を取り出す。 S252: The information search system performs morphological analysis from the input question, extracts a noun as a keyword, performs an information search using the keyword, and extracts an article.

Ｓ２５３：確信度付与部１２は、各記事がぎりぎり出力されるklを求める。 S253: The certainty factor assigning unit 12 obtains kl at which each article is output at the last minute.

Ｓ２５４：確信度付与部１２は、格納部１３の対応表から前記求めたklに対応する確信度である正解率を各記事に付与して出力部より出力する。 S254: The certainty factor assigning unit 12 assigns each article a correct answer rate that is the certainty factor corresponding to the obtained kl from the correspondence table of the storage unit 13, and outputs it from the output unit.

以上 kp 、順位、スコアを利用する方法を示したが、順序化して出力する他のものを利用することができる。 The method of using kp, ranking, and score has been described above, but other methods that output in order can be used.

（３）：データの補間、補正の説明
表（対応表）に基づく方法で、例えば、kpと正解率の対応表が作成できたとする。このあと、新しい入力で kp が kp1の場合の正解率が表から必要になったが、kp1 の値が表にのっていないとする。そうすると、ある種の補間処理が必要になる。その場合は、表にのっている、kp1 と最も近い値の kp の部分を kp1の代りにつかってもいいし、表にのっている、kp1 をはさむ二つの kp の２行のデータを用い、その２行のデータの正解率の平均を kp1の正解率としてもよい。 (3): Explanation of data interpolation and correction Assume that, for example, a correspondence table between kp and accuracy rate can be created by a method based on a table (correspondence table). After this, the correct rate for new input and kp of kp1 is needed from the table, but the value of kp1 is not on the table. Then, some kind of interpolation processing is required. In that case, the part of kp that is closest to kp1 in the table can be used instead of kp1, or the two rows of kp that sandwich kp1 in the table can be used. It is also possible to use the average of the accuracy rates of the two rows of data as the accuracy rate of kp1.

また、表にのっている、kp1 をはさむ二つの kp の２行のデータ kp2、kp3 (kp2>kp>kp3)を用い、その２行のデータの正解率 pr2、pr3 を利用して
〔(kp -kp3) pr2 + (kp2 - kp) pr3〕／〔(kp2 - kp) + (kp -kp3)〕
を正解率としてもよい。その他の補完処理により kp に対応する正解率を求めてもよい。 In addition, two rows of data kp2 and kp3 (kp2>kp> kp3) sandwiching kp1 in the table are used, and the accuracy rates pr2 and pr3 of the two rows of data are used [( kp -kp3) pr2 + (kp2-kp) pr3] / [(kp2-kp) + (kp -kp3)]
May be the correct answer rate. The accuracy rate corresponding to kp may be obtained by other complement processing.

また、kpと正解率の対に対して、単回帰式近似、又は、多項式近似、又は、対数近似、又は、指数近似などをして求めた近似式により kp に対応する正解率を求めるようにしてもよい（例えば、「Excel で学ぶ時系列分析と予測」（オーム社）２章の“単回帰分析”３章の“重回帰分析”参照）。また、上記回帰分析的な近似以外の補正処理を行ってもよい。なお、データの補間、補正は、kl等の他のデータについても同様である。 In addition, the correct answer rate corresponding to kp is obtained by an approximate expression obtained by single regression equation approximation, polynomial approximation, logarithmic approximation, exponential approximation or the like for a pair of kp and accuracy rate. (See, for example, “Time Series Analysis and Forecasting with Excel” (Ohm) Chapter 2 “Single Regression Analysis” Chapter 3 “Multiple Regression Analysis”). Further, correction processing other than the regression analysis approximation may be performed. Data interpolation and correction are the same for other data such as kl.

§４：機械学習を用いる場合の説明
ａ）機械学習法の詳細な説明
図３２は機械学習法の説明図である。図３２において、機械学習法には、教師データ記憶手段２１、解−素性対抽出手段２２、機械学習手段２３、学習結果記憶手段２４、表現対抽出手段２５、素性抽出手段２６、解推定手段２７、出力手段２８を備える。 §4: Explanation in the case of using machine learning a) Detailed explanation of the machine learning method FIG. 32 is an explanatory diagram of the machine learning method. 32, the machine learning method includes teacher data storage means 21, solution-feature pair extraction means 22, machine learning means 23, learning result storage means 24, expression pair extraction means 25, feature extraction means 26, and solution estimation means 27. The output means 28 is provided.

ここで、機械学習手段２３による機械学習の手法について説明する。機械学習の手法は、問題−解の組のセットを多く用意し、それで学習を行ない、どういう問題のときにどういう解になるかを学習し、その学習結果を利用して、新しい問題のときも解を推測できるようにする方法である（例えば、下記の参考文献（１）〜参考文献（３）参照）。 Here, a method of machine learning by the machine learning means 23 will be described. The machine learning method prepares many sets of problem-solution pairs, learns them, learns what kind of solution the problem becomes, and uses the learning result to create a new problem. This is a method that allows the solution to be estimated (for example, see the following references (1) to (3)).

参考文献（１）：村田真樹，機械学習に基づく言語処理，龍谷大学理工学部．招待講演．2004. http://www2.nict.go.jp/jt/a132/members/murata/ps/rk1-siryou.pdf
参考文献（２）：サポートベクトルマシンを用いたテンス・アスペクト・モダリティの日英翻訳，村田真樹，馬青，内元清貴，井佐原均，電子情報通信学会言語理解とコミュニケーション研究会 NLC2000-78 ，2001年．
参考文献（３）：SENSEVAL2J辞書タスクでのＣＲＬの取り組み，村田真樹，内山将夫，内元清貴，馬青，井佐原均，電子情報通信学会言語理解とコミュニケーション研究会 NLC2001-40 ，2001年．
どういう問題のときに、という、問題の状況を機械に伝える際に、素性（解析に用いる情報で問題を構成する各要素）というものが必要になる。問題を素性によって表現するのである。例えば、日本語文末表現の時制の推定の問題において、問題：「彼が話す。」−−−解「現在」が与えられた場合に、素性の一例は、「彼が話す。」「が話す。」「話す。」「す」「。」となる。 Reference (1): Masaki Murata, Language Processing Based on Machine Learning, Faculty of Science and Engineering, Ryukoku University. Invited lecture. 2004.http: //www2.nict.go.jp/jt/a132/members/murata/ps/rk1-siryou.pdf
Reference (2): Japanese-English translation of tense aspect modality using support vector machine, Maki Murata, Ma Ao, Kiyotaka Uchimoto, Hitoshi Isahara, IEICE Society for Language Understanding and Communication NLC2000-78, 2001 Year.
Reference (3): CRL in the SENSEVAL2J dictionary task, Masaki Murata, Masao Uchiyama, Kiyotaka Uchimoto, Ma Aoi, Hitoshi Isahara, IEICE Language Understanding and Communication Study Group NLC2001-40, 2001.
In order to convey the problem situation to the machine, what kind of problem is required, features (elements constituting the problem with information used for analysis) are required. The problem is expressed by the feature. For example, in the problem of estimating the tense of Japanese sentence ending expressions, the problem: “He speaks.” --- If the solution “present” is given, an example of a feature is “He speaks.” . "" Speaking. "" Su "". "

すなわち、機械学習の手法は、素性の集合−解の組のセットを多く用意し、それで学習を行ない、どういう素性の集合のときにどういう解になるかを学習し、その学習結果を利用して、新しい問題のときもその問題から素性の集合を取り出し、その素性の場合の解を推測する方法である。 In other words, the machine learning method prepares many sets of feature set-solution pairs, performs learning, learns what kind of solution the feature set becomes, and uses the learning result. This is a method of extracting a set of features from a new problem and inferring a solution in the case of the feature.

機械学習手段２３は、機械学習の手法として、例えば、ｋ近傍法、シンプルベイズ法、決定リスト法、最大エントロピー法、サポートベクトルマシン法などの手法を用いる。 The machine learning means 23 uses techniques such as a k-nearest neighbor method, a simple Bayes method, a decision list method, a maximum entropy method, and a support vector machine method as a machine learning method.

ｋ近傍法は、最も類似する一つの事例のかわりに、最も類似するｋ個の事例を用いて、このｋ個の事例での多数決によって分類先（解）を求める手法である。ｋは、あらかじめ定める整数の数字であって、一般的に、１から９の間の奇数を用いる。 The k-nearest neighbor method is a method for obtaining a classification destination (solution) by using the k most similar cases instead of the most similar case, and by majority decision of the k cases. k is a predetermined integer number, and generally an odd number between 1 and 9 is used.

シンプルベイズ法は、ベイズの定理にもとづいて各分類になる確率を推定し、その確率値が最も大きい分類を求める分類先とする方法である。 The Simple Bayes method is a method of estimating the probability of each classification based on Bayes' theorem and determining the classification having the highest probability value as a classification destination.

シンプルベイズ法において、文脈ｂで分類ａを出力する確率は、以下の式（１１）で与えられる。 In the simple Bayes method, the probability of outputting the classification a in the context b is given by the following equation (11).

ただし、ここで文脈ｂは、あらかじめ設定しておいた素性ｆ_j（∈Ｆ，１≦ｊ≦ｋ）の集合である。ｐ（ｂ）は、文脈ｂの出現確率である。ここで、分類ａに非依存であって定数のために計算しない。Ｐ（ａ）（ここでＰはｐの上部にチルダ）とＰ（ｆ_i｜ａ）は、それぞれ教師データから推定された確率であって、分類ａの出現確率、分類ａのときに素性ｆ_iを持つ確率を意味する。Ｐ（ｆ_i｜ａ）として最尤推定を行って求めた値を用いると、しばしば値がゼロとなり、式（１２）の値がゼロで分類先を決定することが困難な場合が生じる。そのため、スームージングを行う。ここでは、以下の式（１３）を用いてスームージングを行ったものを用いる。 Here, the context b is a set of features f _j (εF, 1 ≦ j ≦ k) set in advance. p (b) is the appearance probability of the context b. Here, since it is independent of the classification a and is a constant, it is not calculated. P (a) (where P is a tilde at the top of p) and P (f _i | a) are the probabilities estimated from the teacher data, respectively, and the appearance probability of class a, and the feature f for class a means the probability of having _i . When the value obtained by performing maximum likelihood estimation as P (f _i | a) is used, the value often becomes zero, and it may be difficult to determine the classification destination because the value of Expression (12) is zero. Therefore, smoothing is performed. Here, what smoothed using the following formula | equation (13) is used.

ただし、ｆｒｅｑ（ｆ_i，ａ）は、素性ｆ_iを持ちかつ分類がａである事例の個数、ｆｒｅｑ（ａ）は、分類がａである事例の個数を意味する。 Here, freq (f _i , a) means the number of cases having the feature f _i and the classification a, and freq (a) means the number of cases having the classification a.

決定リスト法は、素性と分類先の組とを規則とし、それらをあらかじめ定めた優先順序でリストに蓄えおき、検出する対象となる入力が与えられたときに、リストで優先順位の高いところから入力のデータと規則の素性とを比較し、素性が一致した規則の分類先をその入力の分類先とする方法である。 The decision list method uses features and combinations of classification destinations as rules, stores them in the list in a predetermined priority order, and when input to be detected is given, from the highest priority in the list This is a method in which input data is compared with the feature of the rule, and the classification destination of the rule having the same feature is set as the classification destination of the input.

決定リスト方法では、あらかじめ設定しておいた素性ｆ_j( ∈Ｆ，１≦ｊ≦ｋ）のうち、いずれか一つの素性のみを文脈として各分類の確率値を求める。ある文脈ｂで分類ａを出力する確率は以下の式によって与えられる。 In the decision list method, the probability value of each classification is obtained using only one of the features f _j (εF, 1 ≦ j ≦ k) set in advance as a context. The probability of outputting classification a in a context b is given by

ｐ（ａ｜ｂ）＝ｐ（ａ｜ｆmax ）（１４）
ただし、ｆmax は以下の式によって与えられる。 p (a | b) = p (a | fmax) (14)
However, fmax is given by the following equation.

また、Ｐ（ａ_i｜ｆ_j）（ここでＰはｐの上部にチルダ）は、素性ｆ_jを文脈に持つ場合の分類ａ_iの出現の割合である。 P (a _i | f _j ) (where P is a tilde at the top of p) is the rate of appearance of the classification a _i when the feature f _j is in the context.

最大エントロピー法は、あらかじめ設定しておいた素性ｆ_j（１≦ｊ≦ｋ）の集合をＦとするとき、以下所定の条件式（式（１６））を満足しながらエントロピーを意味する式（１７）を最大にするときの確率分布ｐ（ａ，ｂ）を求め、その確率分布にしたがって求まる各分類の確率のうち、最も大きい確率値を持つ分類を求める分類先とする方法である。 In the maximum entropy method, when F is a set of features f _j (1 ≦ j ≦ k) set in advance, an expression (entropy) that satisfies the predetermined conditional expression (expression (16)) below ( In this method, the probability distribution p (a, b) when 17) is maximized is obtained, and the classification having the largest probability value is obtained from the probabilities of the respective classifications obtained according to the probability distribution.

ただし、Ａ、Ｂは分類と文脈の集合を意味し、ｇ_j（ａ，ｂ）は文脈ｂに素性ｆ_jがあって、なおかつ分類がａの場合１となり、それ以外で０となる関数を意味する。また、Ｐ（ａ_i｜ｆ_j）（ここでＰはｐの上部にチルダ）は、既知データでの（ａ，ｂ）の出現の割合を意味する。 However, A and B mean a set of classifications and contexts, and g _j (a, b) is a function that is 1 if the context b has a feature f _j and the classification is a, and is 0 otherwise. means. Further, P (a _i | f _j ) (where P is a tilde at the top of p) means the rate of appearance of (a, b) in the known data.

式（１６）は、確率ｐと出力と素性の組の出現を意味する関数ｇをかけることで出力と素性の組の頻度の期待値を求めることになっており、右辺の既知データにおける期待値と、左辺の求める確率分布に基づいて計算される期待値が等しいことを制約として、エントロピー最大化( 確率分布の平滑化) を行なって、出力と文脈の確率分布を求めるものとなっている。最大エントロピー法の詳細については、以下の参考文献（４）および参考文献（５）に記載されている。 Formula (16) is to obtain the expected value of the frequency of the output and feature pair by multiplying the probability p and the function g meaning the appearance of the pair of output and feature, and the expected value in the known data on the right side And the expected value calculated based on the probability distribution calculated on the left side is the constraint, and entropy maximization (smoothing of the probability distribution) is performed to determine the probability distribution of the output and the context. Details of the maximum entropy method are described in the following references (4) and (5).

参考文献（４）：Eric Sven Ristad, Maximum Entropy Modeling for Natural Language,(ACL/EACL Tutorial Program, Madrid, 1997
参考文献（５）：Eric Sven Ristad, Maximum Entropy Modeling Toolkit, Release 1.6beta, (http://www.mnemonic.com/software/memt,1998) ）
サポートベクトルマシン法は、空間を超平面で分割することにより、二つの分類からなるデータを分類する手法である。 Reference (4): Eric Sven Ristad, Maximum Entropy Modeling for Natural Language, (ACL / EACL Tutorial Program, Madrid, 1997
Reference (5): Eric Sven Ristad, Maximum Entropy Modeling Toolkit, Release 1.6beta, (http://www.mnemonic.com/software/memt,1998))
The support vector machine method is a method of classifying data composed of two classifications by dividing a space by a hyperplane.

図３３はサポートベクトルマシン法のマージン最大化の概念図である。図３３において、白丸は正例、黒丸は負例を意味し、実線は空間を分割する超平面を意味し、破線はマージン領域の境界を表す面を意味する。図３３（Ａ）は、正例と負例の間隔が狭い場合（スモールマージン）の概念図、図３３（Ｂ）は、正例と負例の間隔が広い場合（ラージマージン）の概念図である。 FIG. 33 is a conceptual diagram of margin maximization in the support vector machine method. In FIG. 33, a white circle means a positive example, a black circle means a negative example, a solid line means a hyperplane that divides the space, and a broken line means a surface that represents the boundary of the margin area. FIG. 33A is a conceptual diagram when the interval between the positive example and the negative example is narrow (small margin), and FIG. 33B is a conceptual diagram when the interval between the positive example and the negative example is wide (large margin). is there.

このとき、二つの分類が正例と負例からなるものとすると、学習データにおける正例と負例の間隔（マージン) が大きいものほどオープンデータで誤った分類をする可能性が低いと考えられ、図３３（Ｂ）に示すように、このマージンを最大にする超平面を求めそれを用いて分類を行なう。 At this time, if the two classifications consist of positive and negative examples, the larger the interval (margin) between the positive and negative examples in the learning data, the less likely it is to make an incorrect classification with open data. As shown in FIG. 33B, a hyperplane that maximizes this margin is obtained, and classification is performed using the hyperplane.

基本的には上記のとおりであるが、通常、学習データにおいてマージンの内部領域に少数の事例が含まれてもよいとする手法の拡張や、超平面の線形の部分を非線型にする拡張（カーネル関数の導入) がなされたものが用いられる。 Basically, it is as described above. Usually, an extension of the method that the training data may contain a small number of cases in the inner area of the margin, or an extension that makes the linear part of the hyperplane nonlinear ( The one with the introduction of the kernel function is used.

この拡張された方法は、以下の識別関数を用いて分類することと等価であり、その識別関数の出力値が正か負かによって二つの分類を判別することができる。 This extended method is equivalent to classification using the following discriminant function, and the two classes can be discriminated depending on whether the output value of the discriminant function is positive or negative.

ただし、ｘは識別したい事例の文脈（素性の集合) を、ｘ_iとｙ_j（ｉ＝１，…，ｌ，ｙ_j∈｛１，−１｝）は学習データの文脈と分類先を意味し、関数ｓｇｎは、
ｓｇｎ（ｘ）＝１（ｘ≧０）
−１（otherwise ）
であり、また、各α_iは式（２０）と式（２１）の制約のもと式（１９）を最大にする場合のものである。 Where x is the context (set of features) to be identified, and x _i and y _j (i = 1,..., L, y _j ∈ {1, -1}) mean the context and classification destination of the learning data. And the function sgn is
sgn (x) = 1 (x ≧ 0)
-1 (otherwise)
In addition, each α _i is for maximizing Expression (19) under the constraints of Expression (20) and Expression (21).

また、関数Ｋはカーネル関数と呼ばれ、様々なものが用いられるが、本形態では以下の多項式のものを用いる。 The function K is called a kernel function, and various functions are used. In this embodiment, the following polynomial is used.

Ｋ（ｘ，ｙ）＝（ｘ・ｙ＋１）ｄ（２２）
Ｃ、ｄは実験的に設定される定数である。例えば、Ｃはすべての処理を通して１に固定した。また、ｄは、１と２の二種類を試している。ここで、α_i＞０となるｘ_iは、サポートベクトルと呼ばれ、通常、式（１８）の和をとっている部分は、この事例のみを用いて計算される。つまり、実際の解析には学習データのうちサポートベクトルと呼ばれる事例のみしか用いられない。 K (x, y) = (x · y + 1) d (22)
C and d are constants set experimentally. For example, C was fixed at 1 throughout all treatments. Moreover, two types of 1 and 2 are tried for d. Here, x _i satisfying α _i > 0 is called a support vector, and the portion taking the sum of Expression (18) is usually calculated using only this case. That is, only actual cases called support vectors are used for actual analysis.

なお、拡張されたサポートベクトルマシン法の詳細については、以下の参考文献（６）および参考文献（７）に記載されている。 Details of the extended support vector machine method are described in the following references (6) and (7).

参考文献（６）：Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods,(Cambridge University Press,2000)
参考文献（７）：Taku Kudoh, Tinysvm:Support Vector machines,(http://cl.aist-nara.ac.jp/taku-ku//software/Tiny SVM/index.html,2000)
サポートベクトルマシン法は、分類の数が２個のデータを扱うものである。したがって、分類の数が３個以上の事例を扱う場合には、通常、これにペアワイズ法またはワンＶＳレスト法などの手法を組み合わせて用いることになる。 Reference (6): Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, (Cambridge University Press, 2000)
Reference (7): Taku Kudoh, Tinysvm: Support Vector machines, (http://cl.aist-nara.ac.jp/taku-ku//software/Tiny SVM / index.html, 2000)
The support vector machine method handles data with two classifications. Therefore, when handling cases with three or more classifications, a pair-wise method or a one-VS rest method is usually used in combination with this.

ペアワイズ法は、ｎ個の分類を持つデータの場合に、異なる二つの分類先のあらゆるペア（ｎ（ｎ−１）／２個）を生成し、各ペアごとにどちらがよいかを二値分類器、すなわちサポートベクトルマシン法処理モジュールで求めて、最終的に、ｎ（ｎ−１）／２個の二値分類による分類先の多数決によって、分類先を求める方法である。 In the pairwise method, in the case of data having n classifications, every pair (n (n-1) / 2) of two different classification destinations is generated, and a binary classifier indicates which is better for each pair. That is, it is obtained by the support vector machine method processing module and finally obtains the classification destination by majority decision of the classification destination by n (n−1) / 2 binary classification.

ワンＶＳレスト法は、例えば、ａ、ｂ、ｃという三つの分類先があるときは、分類先ａとその他、分類先ｂとその他、分類先ｃとその他、という三つの組を生成し、それぞれの組についてサポートベクトルマシン法で学習処理する。そして、学習結果による推定処理において、その三つの組のサポートベクトルマシンの学習結果を利用する。推定するべき候補が、その三つのサポートベクトルマシンではどのように推定されるかを見て、その三つのサポートベクトルマシンのうち、その他でないほうの分類先であって、かつサポートベクトルマシンの分離平面から最も離れた場合のものの分類先を求める解とする方法である。例えば、ある候補が、「分類先ａとその他」の組の学習処理で作成したサポートベクトルマシンにおいて分離平面から最も離れた場合には、その候補の分類先は、a と推定する。 For example, when there are three classification destinations a, b, and c, the one VS rest method generates three sets of classification destination a and other, classification destination b and other, classification destination c and other, The learning process is performed on the set of the support vector machine method. Then, in the estimation process based on the learning result, the learning results of the three sets of support vector machines are used. See how the three support vector machines are estimated as candidates to be estimated. Of the three support vector machines, it is the non-other classification target and the separation plane of the support vector machine. This is a method for obtaining a classification destination of a thing farthest from the object. For example, when a candidate is farthest from the separation plane in the support vector machine created by the learning process of “classification destination a and other”, the candidate classification destination is estimated as a.

解推定手段２７が推定する、各表現対についての、どのような解（分類先）になりやすいかの度合いの求め方は、機械学習手段２３が機械学習の手法として用いる様々な方法によって異なる。 The method of obtaining the degree of the solution (classification destination) that is likely to be about each expression pair estimated by the solution estimation unit 27 differs depending on various methods used as a machine learning method by the machine learning unit 23.

例えば、本発明の実施の形態において、機械学習手段２３が、機械学習の手法としてｋ近傍法を用いる場合、機械学習手段２３は、教師データの事例同士で、その事例から抽出された素性の集合のうち重複する素性の割合（同じ素性をいくつ持っているかの割合）にもとづく事例同士の類似度を定義して、前記定義した類似度と事例とを学習結果情報として学習結果記憶手段２４に記憶しておく。 For example, in the embodiment of the present invention, when the machine learning means 23 uses the k-nearest neighbor method as a machine learning technique, the machine learning means 23 is a set of features extracted from the cases among the cases of the teacher data. The similarity between cases based on the ratio of overlapping features (the number of the same features) is defined, and the defined similarity and the case are stored in the learning result storage unit 24 as learning result information. Keep it.

そして、解推定手段２７は、表現対抽出手段２５によって新しい表現対（の候補）が抽出されたときに、学習結果記憶手段２４において定義された類似度と事例を参照して、表現対抽出手段２５によって抽出された表現対の候補について、その候補の類似度が高い順にｋ個の事例を学習結果記憶手段２４の事例から選択し、選択したｋ個の事例での多数決によって決まった分類先を、表現対の候補の分類先（解）として推定する。すなわち、解推定手段２７では、各表現対についての、どのような解（分類先）になりやすいかの度合いを、選択したｋ個の事例での多数決の票数、ここでは「抽出するべき」という分類が獲得した票数とする。 Then, the solution estimation means 27 refers to the similarity and the case defined in the learning result storage means 24 when a new expression pair (candidate) is extracted by the expression pair extraction means 25, and the expression pair extraction means 27 For the candidates of the expression pairs extracted by 25, k cases are selected from the cases in the learning result storage means 24 in descending order of the similarity of the candidates, and the classification destination determined by the majority decision in the selected k cases is selected. Estimated as the classification target (solution) of the expression pair candidate. That is, in the solution estimation means 27, the degree of what kind of solution (classification destination) is likely to be obtained for each expression pair is the number of votes of the majority vote in the selected k cases, here “to be extracted”. The number of votes obtained by classification.

また、機械学習手法として、シンプルベイズ法を用いる場合には、機械学習手段２３は、教師データの事例について、前記事例の解と素性の集合との組を学習結果情報として学習結果記憶手段２４に記憶する。そして、解推定手段２７は、表現対抽出手段２５によって新しい表現対（の候補）が抽出されたときに、学習結果記憶手段２４の学習結果情報の解と素性の集合との組をもとに、ベイズの定理にもとづいて素性抽出手段２６で取得した表現対の候補の素性の集合の場合の各分類になる確率を算出して、その確率の値が最も大きい分類を、その表現対の候補の素性の分類（解）と推定する。すなわち、解推定手段２７では、表現対の候補の素性の集合の場合にある解となりやすさの度合いを、各分類になる確率、ここでは「抽出するべき」という分類になる確率とする。 When the simple Bayes method is used as the machine learning method, the machine learning unit 23 stores, in the learning result storage unit 24, a set of the solution of the case and a set of features as learning result information for the case of the teacher data. Remember. Then, the solution estimation means 27, when a new expression pair (candidate) is extracted by the expression pair extraction means 25, based on the combination of the learning result information solution and the feature set in the learning result storage means 24. Based on the Bayes' theorem, the probability of each classification in the case of the feature pair of the expression pair candidates acquired by the feature extraction means 26 is calculated, and the classification having the largest probability value is selected as the candidate for the expression pair. It is estimated as the classification (solution) of the features of. That is, in the solution estimation means 27, the degree of ease of becoming a certain solution in the case of a set of features of expression pair candidates is set as the probability of each classification, in this case, the probability of “to be extracted”.

また、機械学習手法として決定リスト法を用いる場合には、機械学習手段２３は、教師データの事例について、素性と分類先との規則を所定の優先順序で並べたリストを学習結果記憶手段２４に記憶する。そして、表現対抽出手段１５によって新しい表現対（の候補）が抽出されたときに、解推定手段２７は、学習結果記憶手段２４のリストの優先順位の高い順に、抽出された表現対の候補の素性と規則の素性とを比較し、素性が一致した規則の分類先をその候補の分類先（解）として推定する。すなわち、解推定手段２７では、表現対の候補の素性の集合の場合にある解となりやすさの度合いを、所定の優先順位またはそれに相当する数値、尺度、ここでは「抽出するべき」という分類になる確率のリストにおける優先順位とする。 Further, when the decision list method is used as the machine learning method, the machine learning unit 23 stores, in the learning result storage unit 24, a list in which rules of features and classification destinations are arranged in a predetermined priority order for the example of the teacher data. Remember. Then, when a new expression pair (candidate) is extracted by the expression pair extraction unit 15, the solution estimation unit 27 selects the extracted expression pair candidates in descending order of priority in the list of the learning result storage unit 24. The feature is compared with the feature of the rule, and the classification destination of the rule having the identical feature is estimated as the candidate classification destination (solution). That is, the solution estimation means 27 divides the degree of the likelihood of becoming a solution in the case of a set of candidate expression pairs into a predetermined priority or a numerical value or scale corresponding thereto, in this case, “to be extracted”. Priority in the list of probabilities.

また、機械学習手法として最大エントロピー法を使用する場合には、機械学習手段２３は、教師データの事例から解となりうる分類を特定し、所定の条件式を満足しかつエントロピーを示す式を最大にするときの素性の集合と解となりうる分類の二項からなる確率分布を求めて学習結果記憶手段２４に記憶する。そして、表現対抽出手段２５によって新しい表現対（の候補）が抽出されたときに、解推定手段２７は、学習結果記憶手段２４の確率分布を利用して、抽出された表現対の候補の素性の集合についてその解となりうる分類の確率を求めて、最も大きい確率値を持つ解となりうる分類を特定し、その特定した分類をその候補の解と推定する。すなわち、解推定手段２７では、表現対の候補の素性の集合の場合にある解となりやすさの度合いを、各分類になる確率、ここでは「抽出するべき」という分類になる確率とする。 When the maximum entropy method is used as the machine learning method, the machine learning means 23 specifies a class that can be a solution from the example of the teacher data, and maximizes an expression that satisfies a predetermined conditional expression and shows entropy. Then, a probability distribution composed of a set of features and a binomial classification that can be a solution is obtained and stored in the learning result storage unit 24. Then, when a new expression pair (candidate) is extracted by the expression pair extraction unit 25, the solution estimation unit 27 uses the probability distribution of the learning result storage unit 24 to identify the feature of the extracted expression pair candidate. The probability of the classification that can be the solution for the set of is determined, the classification that can be the solution having the largest probability value is identified, and the identified classification is estimated as the candidate solution. That is, in the solution estimation means 27, the degree of ease of becoming a certain solution in the case of a set of features of expression pair candidates is set as the probability of each classification, in this case, the probability of “to be extracted”.

また、機械学習手法としてサポートベクトルマシン法を使用する場合には、機械学習手段２３は、教師データの事例から解となりうる分類を特定し、分類を正例と負例に分割して、カーネル関数を用いた所定の実行関数にしたがって事例の素性の集合を次元とする空間上で、その事例の正例と負例の間隔を最大にし、かつ正例と負例を超平面で分割する超平面を求めて学習結果記憶手段２４に記憶する。そして表現対抽出手段２５によって新しい表現対（の候補）が抽出されたときに、解推定手段２７は、学習結果記憶手段２４の超平面を利用して、抽出された表現対の候補の素性の集合が超平面で分割された空間において正例側か負例側のどちらにあるかを特定し、その特定された結果にもとづいて定まる分類を、その候補の解と推定する。すなわち、解推定手段２７では、表現対の候補の素性の集合の場合にある解となりやすさの度合いを、分離平面からの正例（抽出するべき表現対）の空間への距離の大きさとする。より詳しくは、抽出するべき表現対を正例、抽出するべきではない表現対を負例とする場合に、分離平面に対して正例側の空間に位置する事例が「抽出するべき事例」と判断され、その事例の分離平面からの距離をその事例の度合いとする。 When the support vector machine method is used as the machine learning method, the machine learning means 23 identifies a class that can be a solution from the example of the teacher data, divides the class into a positive example and a negative example, A hyperplane that maximizes the interval between the positive and negative examples of a case and divides the positive and negative examples by a hyperplane in a space whose dimension is a set of case features according to a predetermined execution function using And stored in the learning result storage means 24. When a new expression pair (candidate) is extracted by the expression pair extraction unit 25, the solution estimation unit 27 uses the hyperplane of the learning result storage unit 24 to identify the feature of the extracted expression pair candidate. Whether the set is on the positive example side or the negative example side in the space divided by the hyperplane is specified, and the classification determined based on the specified result is estimated as the candidate solution. That is, in the solution estimation means 27, the degree of easiness to be a solution in the case of a set of candidate expression pairs is set as the distance from the separation plane to the space of the positive example (expression pair to be extracted). . More specifically, when the expression pair to be extracted is a positive example and the expression pair that should not be extracted is a negative example, the case located in the space on the positive example side with respect to the separation plane is referred to as “example to be extracted”. The distance from the separation plane of the case is determined as the degree of the case.

ｂ）機械学習を用いる場合の説明（文書分類装置を使用する場合）
確信度付与装置で機械学習の方法の場合は、予め問題と解答の組を大量に集める。問題は、F-termをふるべき特許、解答は、その特許のF-termとする。これを評価データと呼ぶ。 b) Explanation when using machine learning (when using a document classification device)
In the case of the machine learning method using the certainty factor assigning device, a large number of sets of questions and answers are collected in advance. The problem is the patent that should use the F-term, and the answer is the F-term of that patent. This is called evaluation data.

ここで前記文書分類装置を用いて上記評価データでF-termを出力する。そして、各F-termがぎりぎり出力されるkpを求める。この求め方は、以下のようにする。 Here, an F-term is output with the evaluation data using the document classification device. Then, the kp at which each F-term is output is obtained. This method is as follows.

あるF-termのスコア（Score ）を最初のF-term（最もScore の大きいF-term）のスコアで割った値がそのF-termがぎりぎり出力されるkpとなる。また、そのF-termの順位kjも求める。また、そのF-termのスコア＝ kl も求める。スコアは前記式（１）等で求める。 The value obtained by dividing the score (Score) of an F-term by the score of the first F-term (the F-term with the highest score) is the kp at which that F-term is output. Also, the F-term rank kj is also obtained. Also, find the score of that F-term = kl. A score is calculated | required by said Formula (1) etc.

各F-termごとにそれが正解しているかどうかを調べる。正解していれば、kp、kj、klのときに正解とし、正解していなければkp、kj、klのときに不正解という事例になる。 Check if it is correct for each F-term. If the answer is correct, the answer is correct when kp, kj, and kl, and the answer is incorrect when kp, kj, and kl are not correct.

出力した各F-termについて上記事例を作成する。次に、機械学習（機械学習手段２３）を利用する。kp、kj、klのときに正解、kp、kj、klのときに不正解、といった事例を学習データ（解−素性対抽出手段２２）として、機械学習を行う。ここで、kp、kj、klがそれぞれ素性となる。正解、不正解は求める分類先となる。 Create the above example for each output F-term. Next, machine learning (machine learning means 23) is used. Machine learning is performed using learning data (solution-feature pair extraction means 22) as examples of correct answers when kp, kj, and kl and incorrect answers when kp, kj, and kl. Here, kp, kj, and kl are features. The correct answer and the incorrect answer are classification destinations to be obtained.

機械学習により、どういうkp、kj、klなら、正解に、どういうkp、kj、klなら、不正解になりやすいかを学習し、それを学習結果（学習結果記憶手段２４）に蓄える。 By machine learning, it learns what kind of kp, kj, kl is correct, what kind of kp, kj, kl is likely to be incorrect, and stores it in the learning result (learning result storage means 24).

ここで、新しい特許（F-termが付与されていない）が入ってくる。前記文書分類装置を用いて、F-termを出力する。そして、各F-termがぎりぎり出力されるkpを求める。この求め方は、以下の通りである。 Here comes a new patent (no F-term granted). F-term is output using the document classification device. Then, the kp at which each F-term is output is obtained. This method is as follows.

あるF-termのスコアを最初のF-term（最もスコアの大きいF-term）のスコアで割った値がそのF-termがぎりぎり出力されるkpとなる。また、そのF-termの順位kjも求める。また、そのF-termのスコア＝klも求める。スコアは前記式（１）等で求める。 The value obtained by dividing the score of an F-term by the score of the first F-term (the F-term with the highest score) is the kp at which that F-term is output. Also, the F-term rank kj is also obtained. Also find the score of that F-term = kl. A score is calculated | required by said Formula (1) etc.

先の学習結果により、このときのkp、kj、klの場合に正解になりやすい確信度を求める（解推定手段２７）。ここでは、確信度も出力できる機械学習（機械学習手段）を用いる。 Based on the previous learning result, a certainty factor that is likely to be correct in the case of kp, kj, and kl at this time is obtained (solution estimation means 27). Here, machine learning (machine learning means) capable of outputting a certainty factor is used.

この確信度を各F-termに対応する正解率としてくっつけて表示する。（この正解率は、個々のF-termの正解率であることに注意。そのF-termまでのF-term群に対する精度（適合率）、再現率、Ｆ値などとは異なるものである）。 This certainty factor is displayed as a correct answer rate corresponding to each F-term. (Note that this accuracy rate is the accuracy rate of each F-term. It is different from the accuracy (accuracy rate), recall rate, F value, etc. for the F-term group up to that F-term) .

ここで、機械学習の素性をkp、kj、klとしたが、これの一部のみを素性としてもよいし、逆に他のものもこの素性に加えても良いし、これらの一部と他のものの組み合わせを素性としてもよい。 Here, kp, kj, and kl are used as machine learning features, but only some of them may be used as features, and conversely, other features may be added to this feature. A combination of items may be used as a feature.

例えば、特許文書群に含まれる単語や文字列を利用して、その単語が該当特許に含まれるかいなかという素性や、その文字列が該当特許に含まれるかいなかという素性を利用してもよい。 For example, using a word or character string included in a patent document group, a feature as to whether the word is included in the corresponding patent or a feature as to whether the character string is included in the corresponding patent may be used. .

ｃ）機械学習を用いる場合の説明（情報検索システムを使用する場合）
確信度付与装置で機械学習の方法の場合は、予め問題と解答の組を大量に集める。問題は、情報検索の質問、解答はその質問に対応する記事群である。これを評価データと呼ぶ。 c) Explanation when using machine learning (when using information retrieval system)
In the case of the machine learning method using the certainty factor assigning device, a large number of sets of questions and answers are collected in advance. The problem is an information retrieval question, and the answer is an article group corresponding to the question. This is called evaluation data.

ここで前記情報検索システムを用いて上記評価データで記事を出力する。そして、各記事がぎりぎり出力されるkpを求める。この求め方は、以下のようにする。 Here, an article is output with the evaluation data using the information retrieval system. Then, the kp at which each article is output is obtained. This method is as follows.

ある記事のスコア（Score ）を最初の記事（最もScore の大きい記事）のスコアで割った値がその記事がぎりぎり出力されるkpとなる。また、その記事の順位kjも求める。また、その記事のスコア＝ kl も求める。スコアは前記式（１）等で求める。 The value obtained by dividing the score (Score) of an article by the score of the first article (the article with the highest score) is the kp at which the article is output. Also, the ranking kj of the article is obtained. Also find the score of the article = kl. A score is calculated | required by said Formula (1) etc.

各記事ごとにそれが正解しているかどうかを調べる。正解していれば、kp、kj、klのときに正解とし、正解していなければkp、kj、klのときに不正解という事例になる。 Check each article to see if it is correct. If the answer is correct, the answer is correct when kp, kj, and kl, and the answer is incorrect when kp, kj, and kl are not correct.

出力した各記事について上記事例を作成する。次に、機械学習（機械学習手段２３）を利用する。kp、kj、klのときに正解、kp、kj、klのときに不正解、といった事例を学習データ（解−素性対抽出手段２２）として、機械学習を行う。ここで、kp、kj、klがそれぞれ素性となる。正解、不正解は求める分類先となる。 Create the above case for each output article. Next, machine learning (machine learning means 23) is used. Machine learning is performed using learning data (solution-feature pair extraction means 22) as examples of correct answers when kp, kj, and kl and incorrect answers when kp, kj, and kl. Here, kp, kj, and kl are features. The correct answer and the incorrect answer are classification destinations to be obtained.

ここで、新しい情報検索に質問が入ってくる。前記情報検索システムを用いて、記事を出力する。そして、各記事がぎりぎり出力されるkpを求める。この求め方は、以下の通りである。 Here comes the question for new information retrieval. An article is output using the information retrieval system. Then, the kp at which each article is output is obtained. This method is as follows.

ある記事のスコアを最初の記事（最もスコアの大きい記事）のスコアで割った値がその記事がぎりぎり出力されるkpとなる。また、その記事の順位kjも求める。また、その記事のスコア＝klも求める。スコアは前記式（１）等で求める。 The value obtained by dividing the score of an article by the score of the first article (the article with the highest score) is the kp at which that article is output. Also, the ranking kj of the article is obtained. Also find the score = kl of the article. A score is calculated | required by said Formula (1) etc.

この確信度を各記事に対応する正解率としてくっつけて表示する。（この正解率は、個々の記事の正解率であることに注意。その記事までの記事群に対する精度（適合率）、再現率、Ｆ値などとは異なるものである）。 This certainty factor is displayed as a correct answer rate corresponding to each article. (Note that this correct answer rate is the correct answer rate of each article. It is different from the accuracy (accuracy rate), recall rate, F value, etc. for the article group up to that article).

以上、分類を付与する場合と情報検索の場合に機械学習により確信度（正解率）を出力する説明をしたが、この機械学習法としては、ニューラルネットワークや重回帰分析を用いてもよい。重回帰分析の説明は、「Excel で学ぶ時系列分析と予測」（オーム社）３章の“重回帰分析”で求めてもよい。重回帰分析の場合は、「正解」を値１「不正解」を値０として求めればよい。 As described above, the reliability (accuracy rate) is output by machine learning in the case of assigning the classification and in the case of information retrieval. However, as this machine learning method, a neural network or multiple regression analysis may be used. An explanation of multiple regression analysis may be found in “Multiple regression analysis” in Chapter 3 of “Time Series Analysis and Forecasting with Excel” (Ohm). In the case of multiple regression analysis, “correct answer” may be obtained with value 1 and “incorrect answer” with value 0.

すなわち、求める分類が２種類ならば、重回帰分析が利用できる。重回帰分析の場合は、素性の数だけ説明変数x を用意し、素性のありなしを、その説明変数x の値を１、０で表現する。目的変数（被説明変数）は、ある分類の場合を値１、他の分類の場合を値０として求めればよい。 That is, if there are two types of classification to be obtained, multiple regression analysis can be used. In the case of multiple regression analysis, as many explanatory variables x as the number of features are prepared, and the presence or absence of the features is expressed by 1 and 0 as the value of the explanatory variable x. The objective variable (explained variable) may be obtained with a value of 1 for a certain classification and a value of 0 for another classification.

（重回帰分析の利用の説明）
重回帰分析では、x1, x2, x3,,, yの組のデータがあるときに、x1, x2, x3,,, から yを求める。 (Explanation of the use of multiple regression analysis)
In the multiple regression analysis, y is obtained from x1, x2, x3 ,,, when there is a set of data of x1, x2, x3 ,,, y.

y = a0 + a1 * x1 + a2 * x2 + a3 * x3 + ...
の式の係数 a0, a1, ...を、データから適切にもとめることができる。 y = a0 + a1 * x1 + a2 * x2 + a3 * x3 + ...
The coefficients a0, a1, ... in this equation can be determined appropriately from the data.

(1) kp - yの組からの予測
y は確信度
x1 = kp として、
y = a0 + a1 * kpとして、
回帰分析により kp - y の組のデータから a0, a1 を求める。kpから yを求める式が求まる。 (1) Prediction from kp-y pair
y is confidence
x1 = kp
As y = a0 + a1 * kp
Find a0 and a1 from the kp-y data by regression analysis. The formula for obtaining y from kp is obtained.

(2) kp - yの組からの予測 (2 次の利用)
y は確信度
x1 = kp 、 x2 = kp＾2 として、
y = a0 + a1 * kp + a2 * kp＾2 として、
重回帰分析により kp - y の組のデータから a0, a1, a2 を求める。kpから yを求める式が求まる。 (2) Prediction from kp-y pairs (secondary use)
y is confidence
As x1 = kp and x2 = kp ^ 2,
y = a0 + a1 * kp + a2 * kp ^ 2
Find a0, a1, a2 from the kp-y set of data by multiple regression analysis. The formula for obtaining y from kp is obtained.

(3) kp、kj - yの組からの予測
y は確信度
x1 = kp 、x2 = kj として、
y = a0 + a1 * kp + a2 * kjとして、
重回帰分析により kp 、 kj - y の組のデータから a0, a1, a2 を求める。kp, kjから yを求める式が求まる。 (3) Prediction from kp, kj-y pairs
y is confidence
x1 = kp, x2 = kj,
y = a0 + a1 * kp + a2 * kj
Obtain a0, a1, a2 from the data of kp and kj-y by multiple regression analysis. From kp and kj, an equation for obtaining y is obtained.

(4) kp 、 kj - y の組からの予測 (2 次の利用)
y は確信度
x1 = kp, x2 = kjとして、
y = a0 + a1 * kp + a2 * kj + a3 * kp＾2 + a4 * kp * kj + a5 * kj＾2 として、
重回帰分析により kp 、 kj - y の組のデータから a0, a1, a2,,, a5 を求める。kp 、 kj から yを求める式が求まる。 (4) Prediction from kp, kj-y pair (second-order use)
y is confidence
x1 = kp, x2 = kj,
y = a0 + a1 * kp + a2 * kj + a3 * kp ^ 2 + a4 * kp * kj + a5 * kj ^ 2
Obtain a0, a1, a2,..., A5 from the data of kp, kj-y pairs by multiple regression analysis. From kp and kj, an equation for obtaining y is obtained.

これらの処理は，重回帰分析を機械学習手法として用いている方法ともとらえられるし、また、重回帰分析を補間手法として利用しているともとらえられる。 These processes can be regarded as a method using multiple regression analysis as a machine learning method, and can also be regarded as using multiple regression analysis as an interpolation method.

（機械学習、重回帰分析の利用の説明）
機械学習、重回帰分析を利用するときには、kp - yの組のデータや、kp、kj - yの組のデータなどを利用する。 (Explanation of the use of machine learning and multiple regression analysis)
When using machine learning and multiple regression analysis, kp-y set data, kp, kj-y set data, and the like are used.

このとき、同じ kp について、y の平均を求めて、各kpごとに yの値が一つだけあるデータを作って、それをkp - yの組のデータとしてもよい。この場合、 kp - y の平均の組のデータになっている。 At this time, for the same kp, the average of y may be obtained, and data having only one y value for each kp may be created and used as the data of the set kp−y. In this case, the data is an average set of kp-y.

また、これとは別の方法として、このとき、同じ kp について、データをまとめることをせずに、元のすべてのデータ自体を使って、それをkp - yの組のデータとしてもよい。すなわち、確信度の平均をとるという操作をせずに、元の、問題の個数分だけ、kp - yの組のデータの個数があるようにしてもよい。 As another method, it is also possible to use all the original data itself for the same kp, and use it as the data of the kp-y pair. That is, the number of data in the kp-y set may be as many as the original number of problems without performing the operation of averaging the certainty factors.

（確信度についての説明）
前記の説明において使用する確信度としては、適合率の偏差値、再現率の偏差値、Ｆ値の偏差値、正解率の偏差値を用いてもよい。また、これらに類するものでもよい。数値的に求められるものなら、これら以外のものでもよい。 (Explanation of confidence)
As the certainty factor used in the above description, a deviation value of precision, a deviation value of recall, a deviation value of F value, and a deviation value of accuracy rate may be used. Moreover, you may be similar to these. Anything other than these may be used as long as it is calculated numerically.

なお、値が大きいものを取り出すなどについては、「値が閾値以上のものを取り出す」「値が大きいものを所定の値の個数以上のものを大きい順に取り出す」「取り出されたものの値の最大値に対して所定の割合をかけた値を求め、その求めた値以上の値を持つものを取り出す」等の表現とすることができる。また、これら閾値、所定の値を、あらかじめ定めることも、適宜ユーザが値を変更、設定できることも可能である。 For taking out items with large values, “take out items with a value greater than or equal to the threshold”, “take out items with a larger value than the number of predetermined values in descending order”, “maximum value of the taken out values The value obtained by multiplying a predetermined ratio with respect to the value obtained by taking out a value having a value equal to or greater than the obtained value can be used. These threshold values and predetermined values can be set in advance, or the user can change and set the values as appropriate.

また、入力された問題を解いてその解答を複数順序化して抽出し、該抽出した解答と所定値を出力するとき、この所定値は、前に説明したkp、kj、klのように解答の順序化と同じ順（又は逆の順）となる（複数観点の所定値を用いる場合は除く）。 In addition, when an inputted problem is solved and a plurality of answers are extracted and the extracted answer and a predetermined value are output, the predetermined value is the value of the answer like kp, kj, kl described above. The order is the same as the ordering (or the reverse order) (except when a predetermined value from a plurality of viewpoints is used).

§５：実験結果の説明
次に実際に実験を行なった結果の説明をする。NTCIR-5 Patent分類タスクのデータを使用した。ここで分類対象の特許文書は 1201 件あった。そして、この特許文書を次のように分割した。 §5: Description of experimental results Next, the results of actual experiments will be described. Data from NTCIR-5 Patent classification task was used. Here, there were 1201 patent documents to be classified. And this patent document was divided as follows.

close … 600
open … 601
ここでclose のデータを使って、対応表を求めて、確信度を予測する。そして、openのデータを使って、予測した確信度の妥当性を確認する。実験結果の表を図３４で示している。 close… 600
open… 601
Here, using the close data, a correspondence table is obtained and the certainty level is predicted. Then, using open data, the validity of the predicted confidence level is confirmed. A table of experimental results is shown in FIG.

図３４は実験結果の説明図であり、図３４において、表の値は、真の値と、本発明により予測した値の差の絶対値（絶対誤差）を示している。図３４では、４つの方法を試してある。ここの確信度（再現率、適合率、Ｆ値）は、kj個目までのＦタームを出力させた場合の確信度（再現率、適合率、Ｆ値）である。すなわち、kj個目のＦタームの確信度でなく、kj個目までのＦタームの確信度となっている。 FIG. 34 is an explanatory diagram of the experimental results. In FIG. 34, the values in the table indicate the absolute value (absolute error) of the difference between the true value and the value predicted by the present invention. In FIG. 34, four methods are tried. The certainty factor (reproduction rate, relevance rate, F value) here is a certainty factor (reproduction rate, relevance rate, F value) when up to kjth F terms are output. In other words, it is not the reliability of the kj-th F-term but the reliability of the F-term up to the kj-th.

(1) base0.5 --- すべて確信度を0.5 とする方法。 (1) base0.5 --- Method to set all the confidence levels to 0.5.

(2) kp --- kp と確信度の対応表を求めて予測する方法（ここでは、kp = 0, 0.1, 0.2, ..., 1.0の値の場合の対応表を求めた) 。 (2) kp --- A method of predicting by obtaining a correspondence table between kp and certainty factor (here, a correspondence table in the case of values of kp = 0, 0.1, 0.2, ..., 1.0 is obtained).

(3) kj --- kj と確信度の対応表を求めて予測する方法（ここでは、kj = 1, 2, 3, ..., 200の値の場合の対応表を求めた) 。 (3) kj --- A method of predicting by obtaining a correspondence table between kj and confidence (here, a correspondence table for kj = 1, 2, 3, ..., 200 is obtained).

(4) kp, kj --- kp,kjと確信度の対応表を求めて予測する方法（ここでは、kp = 0, 0.1, 0.2, ..., 1.0の値とkj = 1, 2, 3, ..., 200の値のすべての組み合わせの場合の対応表を求めた）。 (4) kp, kj --- A method for predicting by obtaining a correspondence table of kp, kj and certainty (here, kp = 0, 0.1, 0.2, ..., 1.0 and kj = 1, 2, The correspondence table was obtained for all combinations of the values 3, ..., 200).

kjについては補間処理は必要ない。kpについては補間処理を行った。この補間処理はすでに説明した次の式でおこなった。 No interpolation processing is necessary for kj. Interpolation processing was performed for kp. This interpolation processing was performed using the following formula already described.

〔(kp -kp3) pr2 + (kp2 - kp) pr3〕／〔(kp2 - kp) + (kp -kp3)〕
図３４の表の kj は、システムの出力の何個目のＦタームのときの結果を示すかをあらわしている。例えば、kj = 1だと、システムの出力の１個目のＦタームのときの結果を示している。図３４の表の値は、真の値と、本発明により予測した値の差の絶対値（絶対誤差）と書いたが、正確には、記事ごとに、kj個目のＦタームのときの真の値（確信度）と、本発明により予測した値（確信度) の差の絶対値（絶対誤差）を求めて、それを加えて、記事の総数で割った。つまり、表の値は、絶対誤差の平均である。 [(Kp -kp3) pr2 + (kp2-kp) pr3] / [(kp2-kp) + (kp -kp3)]
In the table of FIG. 34, kj represents how many F-terms of the system output indicate the result. For example, if kj = 1, the result for the first F-term of the system output is shown. The values in the table of FIG. 34 are written as the absolute value (absolute error) of the difference between the true value and the value predicted by the present invention. To be precise, for each article, the value is the kjth F-term. The absolute value (absolute error) of the difference between the true value (confidence) and the value predicted by the present invention (confidence) was obtained, added, and divided by the total number of articles. That is, the values in the table are the average of absolute errors.

図３４の表では、全般的に base0.5に比べて他の方法の誤差はかなり小さい。このため、本発明の有効性がわかる。また、kp、kj単独のものを利用したのに比べてkp、kj両方を利用したものは、すこしではあるが誤差が小さくなっている。また、kpまた kj また kp、kjのそれぞれの手法とも、kjが小さい、上位の出力において、適合率の誤差が 0.2前後と少し大きいがそれを除くと、誤差は 0.1前後であり、かなりよい予測が実現できていることがわかる。 In the table of FIG. 34, the errors of the other methods are generally much smaller than base0.5. For this reason, the effectiveness of the present invention is understood. Also, compared to using kp and kj alone, the one using both kp and kj has a small error. In addition, kp, kj, kp, and kj methods have low kj, and the upper output has a large error of precision of around 0.2, but excluding it, the error is around 0.1, which is a fairly good prediction. It can be seen that is realized.

§６：プログラムインストールの説明
入力部（入力手段）１、文書抽出部（文書抽出手段）２、KDOC抽出部（KDOC抽出手段）２、文書類似度算出部（文書類似度算出手段）３、スコア算出部（スコア算出手段）４、スコア（スコア_M1(x)）算出部４、分類集合抽出部（分類集合抽出手段）５、F-term xの集合抽出部（F-term xの集合抽出手段）５、出力部（出力手段）６、文書分類装置（文書分類手段）１０、対応表作成部（対応関係作成手段）１１、確信度付与部（確信度付与手段１２、格納部（対応表）１３、教師データ記憶手段２１、解−素性対抽出手段２２、機械学習手段２３、学習結果記憶手段２４、表現対抽出手段２５、素性抽出手段２６、解推定手段２７、出力手段２８、問題解決手段、情報検索システム（装置）等は、プログラムで構成でき、主制御部（ＣＰＵ）が実行するものであり、主記憶に格納されているものである。このプログラムは、一般的な、コンピュータ（情報処理装置）で処理されるものである。このコンピュータは、主制御部、主記憶、ファイル装置、表示装置、キーボード等の入力手段である入力装置などのハードウェアで構成されている。 §6: Description of program installation Input unit (input unit) 1, document extraction unit (document extraction unit) 2, KDOC extraction unit (KDOC extraction unit) 2, document similarity calculation unit (document similarity calculation unit) 3, score Calculation unit (score calculation unit) 4, score (score _M1 (x)) calculation unit 4, classification set extraction unit (classification set extraction unit) 5, F-term x set extraction unit (F-term x set extraction unit) ) 5, output unit (output unit) 6, document classification device (document classification unit) 10, correspondence table creation unit (correspondence relationship creation unit) 11, confidence factor assignment unit (confidence factor provision unit 12, storage unit (correspondence table)) 13, teacher data storage means 21, solution-feature pair extraction means 22, machine learning means 23, learning result storage means 24, expression pair extraction means 25, feature extraction means 26, solution estimation means 27, output means 28, problem solving means Information retrieval system (apparatus) is a program. This program is executed by the main control unit (CPU) and stored in the main memory, and is processed by a general computer (information processing apparatus). Is composed of hardware such as an input device which is an input means such as a main control unit, main memory, file device, display device, and keyboard.

このコンピュータに、本発明のプログラムをインストールする。このインストールは、フロッピィ、光磁気ディスク等の可搬型の記録（記憶）媒体に、これらのプログラムを記憶させておき、コンピュータが備えている記録媒体に対して、アクセスするためのドライブ装置を介して、或いは、ＬＡＮ等のネットワークを介して、コンピュータに設けられたファイル装置にインストールされる。そして、このファイル装置から処理に必要なプログラムステップを主記憶に読み出し、主制御部が実行するものである。 The program of the present invention is installed on this computer. In this installation, these programs are stored in a portable recording (storage) medium such as a floppy disk or a magneto-optical disk, and a drive device for accessing the recording medium provided in the computer is used. Alternatively, it is installed in a file device provided in the computer via a network such as a LAN. Then, the program steps necessary for processing are read from the file device into the main memory and executed by the main control unit.

本発明の文書分類装置の説明図である。It is explanatory drawing of the document classification device of this invention. 本発明の特許文書分類装置の説明図である。It is explanatory drawing of the patent document classification | category apparatus of this invention. 本発明の特許文書の分類処理フローチャートである。It is a classification process flowchart of the patent document of this invention. 本発明の入力特許文書と選択された特許文書の間の類似度を求める処理フローチャートである。It is a processing flowchart which calculates | requires the similarity between the input patent document of this invention, and the selected patent document. 本発明のkpとＦ値の対応の説明図である。It is explanatory drawing of a response | compatibility of kp and F value of this invention. 本発明のkpと再現率と精度の対応の説明図である。It is explanatory drawing of a response | compatibility of kp of this invention, recall, and precision. 本発明の確信度付与装置の説明図である。It is explanatory drawing of the reliability provision apparatus of this invention. 本発明の対応表作成処理フローチャートである。It is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートである。It is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートであるIt is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートであるIt is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートであるIt is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートであるIt is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートである。It is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートであるIt is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートである。It is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートである。It is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートであるIt is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の対応表作成処理フローチャートである。It is a correspondence table creation process flowchart of this invention. 本発明の確信度付与処理フローチャートである。It is a certainty degree provision process flowchart of this invention. 本発明の機械学習法の説明図である。It is explanatory drawing of the machine learning method of this invention. 本発明のサポートベクトルマシン法のマージン最大化の概念図である。It is a conceptual diagram of margin maximization of the support vector machine method of this invention. 本発明の実験結果の説明図である。It is explanatory drawing of the experimental result of this invention.

Explanation of symbols

１入力部（入力手段）
６出力部（出力手段）
１０文書分類装置（問題解決手段）
１１対応表作成部（対応関係作成手段）
１２確信度付与部（確信度付与手段）
１３格納部（対応表） 1 Input section (input means)
6 Output section (output means)
10 Document classification device (problem solving means)
11 Correspondence table creation part (correspondence creation means)
12 certainty degree granting part (confidence degree giving means)
13 Storage (correspondence table)

Claims

An input means for entering the problem;
Problem solving means for solving the inputted problem and extracting a plurality of the answers, and outputting the extracted answer and a predetermined value;
Preparing a plurality of questions to which an answer has been given in advance, and inputting each of the questions to the problem solving means and outputting each answer, obtaining the predetermined value, the answer, and the certainty of the answer, A correspondence creation means for creating a correspondence between a predetermined value and a certainty factor;
When a new question is input from the input means and the answers are ordered by the problem solving means and output, the predetermined value from which a certain answer is output is obtained, and the certainty of the certain answer is given from the correspondence relationship and output. A certainty degree granting device characterized by comprising a certainty degree assigning means.

2. The certainty factor imparting apparatus according to claim 1, wherein an accuracy rate that is a ratio of correct outputs among all outputs is used as the certainty factor.

2. The certainty factor assigning apparatus according to claim 1, wherein a recall rate, which is a ratio of correct answer among the number of correct answers, is used as the certainty factor.

2. The certainty factor imparting apparatus according to claim 1, wherein an F value that is the inverse of the average of the reciprocal of the recall and the reciprocal of the precision is used as the certainty.

The certainty degree granting apparatus according to any one of claims 1 to 4, wherein the number that is given and output by the certainty degree assigning means is a number that maximizes the F value.

2. The certainty factor assigning apparatus according to claim 1, wherein a correct answer rate of each answer is used as the certainty factor.

When a plurality of questions to which an answer has been assigned in advance are prepared, and each of the questions is input to the problem solving means and each answer is output, the predetermined value from which the answer is output is obtained, and the limit output is performed. A machine learning means for investigating whether the answer to be done is correct and obtaining the correct answer rate at the predetermined value, and storing a learning result by machine learning a case of a correct answer or an incorrect answer if it is a predetermined value,
7. The certainty factor assigning apparatus according to claim 6, wherein the certainty factor assigning unit uses the learning result as the correspondence relationship.

The predetermined value of a plurality of viewpoints is used as the predetermined value, and the correspondence between the predetermined values of the plurality of viewpoints and the certainty factor is created by the correspondence creation unit. Confidence giving device.

9. The certainty provision according to claim 1, wherein the problem solving means is a document classification device, the problem is a document to which classification is given, and the answer is classification of the document. apparatus.

The said problem solving means is an information search device, the said problem is a document of a question, and the said answer is a document searched from the document of the said question, The claim 1 characterized by the above-mentioned. Confidence giving device.

11. When the problem solving means obtains a score and outputs an answer, a value (kp) obtained by dividing the score of a certain answer by the score of the first answer is used as the predetermined value. The certainty degree grant apparatus in any one of.

11. The certainty degree granting apparatus according to claim 1, wherein an output rank (kj) is used as the predetermined value when the answer is output by the problem solving means.

11. The certainty imparting apparatus according to claim 1, wherein when the problem solving means obtains a score and outputs an answer, a score (kl) is used as the predetermined value.

Enter the problem from the input means,
The problem solving means solves the inputted problem and extracts a plurality of answers, and outputs the extracted answer and a predetermined value for ordering the answers,
When preparing a plurality of questions to which answers are assigned in advance by the correspondence creating means, and inputting each question to the problem solving means and outputting each answer, the predetermined value and the answer are output, and the same An average of the certainty values of the predetermined values and the output answers, and a correspondence relationship creating means for creating a correspondence relationship between the predetermined value and the certainty factors,
When a new question is input from the input means by the certainty degree giving means and the answers are ordered by the problem solving means and outputted, the predetermined value from which a certain answer is output is obtained, and the certainty of the certain answer from the correspondence relation A certainty factor assigning method characterized by giving and outputting.

An input means for entering the problem;
Problem solving means for solving the input problem, extracting a plurality of the answers, and outputting the extracted answers and a predetermined value for ordering the answers;
When preparing a plurality of questions to which an answer has been assigned in advance and inputting each of the questions to the problem solving means and outputting each answer, the predetermined value and the answer are output, and the same predetermined value and the An average of the certainty of each output answer, a correspondence creating means for creating a correspondence between the predetermined value and the certainty,
When a new question is input from the input means and the answers are ordered by the problem solving means and output, the predetermined value from which a certain answer is output is obtained, and the certainty of the certain answer is given from the correspondence relationship and output. As a means to give confidence
A program that allows a computer to function.