JP4636734B2

JP4636734B2 - INFORMATION SEARCH SYSTEM, INFORMATION SEARCH METHOD, INFORMATION SEARCH PROGRAM, RECORDING MEDIUM RECORDING INFORMATION SEARCH PROGRAM, OUTPUT INFORMATION SELECTION DEVICE, OUTPUT INFORMATION SELECTION METHOD, OUTPUT INFORMATION SELECTION PROGRAM, AND RECORDING MEDIUM RECORDING OUTPUT INFORMATION SELECTION PROGRAM

Info

Publication number: JP4636734B2
Application number: JP2001168547A
Authority: JP
Inventors: 茂樹村松; 一則松本
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2001-06-04
Filing date: 2001-06-04
Publication date: 2011-02-23
Anticipated expiration: 2021-06-04
Also published as: JP2002366577A

Description

【０００１】
【発明の属する技術分野】
本発明は、情報検索システム、情報検索方法、情報検索プログラム、情報検索プログラムを記録した記録媒体、出力情報選択装置、出力情報選択方法、出力情報選択プログラム及び出力情報選択プログラムを記録した記録媒体に関する。
【０００２】
【従来の技術】
ユーザがインターネット上で文書検索し、あるいは特定の場所に設置されている文書データベースにアクセスして文書検索する場合、クリエーとして自分の必要としている文書内容を記述したテキストデータを入力し、あるいは自分の必要としている文書のモデルとなる既存の文書のテキストデータを入力し、検索指令を与えると、検索エンジン側で、クリエーとして入力された文書のテキストデータに類似するテキストデータを持つ文書を抽出してユーザに提示する情報検索システムが知られている。
【０００３】
このような情報検索システムでは、例えば、ｔｆ＊ｉｄｆに代表されるベクトル空間モデルを利用する。これは、多数の文書のテキストデータを分析して特徴ベクトルを求めてデータベースに登録しておき、ユーザが入力したテキストデータの特徴ベクトルを求め、すでにデータベースに登録されている特徴ベクトル群から類似する特徴ベクトルを抽出し、それらの抽出された特徴ベクトルを持つ文書を検索結果としてユーザに提示するものである。
【０００４】
上記のｔｆ＊ｉｄｆは、文書のテキストデータ中の各単語の出現頻度に、他の検索対象文書のテキストデータ中の出現頻度を考慮した重みを加えた特徴量である。単語に対する重みｗ_tは、次のような式１によって表わされる。
【０００５】
【数１】

ここで、Ｎは検索対象文書数、ｆ_tは単語ｔを含む文書数である。
【０００６】
文書のテキストデータの特徴ベクトルの各要素ｗ_d,tは、次のように計算される。
【０００７】
【数２】

ここで、ｆ_d,tは、文書ｄのテキストデータ中の単語ｔの出現頻度である。
【０００８】
また、特徴量をベクトルで表した文書間の類似度を求める尺度としては、例えば、コサイン係数（Cosine coefficient）がある。このコサイン係数では、２つのベクトルｘ，ｙ間の類似度ｓｉｍ（ｘ，ｙ）を次の数３式で表す。
【０００９】
【数３】

そして、従来の情報検索システムでは、ユーザのクエリーとしてのテキストデータの特徴ベクトルとの類似度が所定のしきい値よりも高い特徴ベクトルを持つ文書を抽出し、図１２に示すように類似度の高いものから順に並べる形式で検索結果を提示していた。
【００１０】
図１２に示した類似度順の検索結果をベクトル空間に模式的に表わすと、図１３のようになる。つまり、クエリーＱの特徴ベクトルと、しきい値以内の高い類似度を持つ文書Ａ〜Ｅそれぞれの特徴ベクトルとの空間位置関係は図１３のように表わされるのである。この図１３において、クエリーＱに対してしきい値内で類似度が高い順番は、クエリーＱに対する距離尺度がＡ＜Ｄ＜Ｂ＜Ｃ＜ＥであることからＡ，Ｄ，Ｂ，Ｃ，Ｅである。
【００１１】
なお、文書の特徴ベクトルで表すモデルとしては、ｔｆ＊ｉｄｆ以外に、例えば、「Automatic Text Processing The Transformation, Analysis, and Retrieval of Information by Computer Gerard Salton」に示されるように、Term-discrimination ValueやProbabilistic Term Weighting等、様々なモデルがある。
【００１２】
また、類似度を求めるための尺度としても、上のCosine coefficient以外にも、上に例示した文献に示されているように、Inner product、Dice coefficient、Jaccard coefficient等、様々な距離尺度がある。
【００１３】
【発明が解決しようとする課題】
このような従来の情報検索システムでは、次のような問題点があった。図１３に示したように、しきい値以内にある類似度の高い文書Ａ〜Ｅではあっても、クエリーＱに対してそれらの特徴ベクトルの方向はまちまちである。ところが、通常、検索結果は図１２に示すように類似度の高い順に表示されるだけであるため、ユーザはクエリーＱに一番近い文書Ａを展開して読んでみたところ、自分の求めている内容であることが分かったとしても、次には文書ＡとはクエリーＱに対する向きが反対である、類似度が２番目に高い文書Ｄを開くことになる。その次にはまた、クエリーＱに対して文書Ｄとは方向がほぼ正反対であるが、文書Ａとは特徴ベクトルが近い、類似度が３番目の文書Ｂを開く。そしてその次は、これらの文書とは全く方向が異なる文書Ｃを開くことになる。
【００１４】
しかしながら、現実には、ユーザにとっては、例えば、類似度が一番高い文書Ａを開いてみたところ、内容的に自分の求めている文書と関連性が高ければ、次には特徴ベクトルの方向として、２番目に類似度が高いが、クエリーＱに対して特徴ベクトルが１番目の文書Ａとは反対を向く文書Ｄを開くよりも、文書Ａと同方向を向く特徴ベクトルを持つ文書Ｂを開く方が望ましい。
【００１５】
しかし、従来は特徴ベクトルの方向を考慮せずに単純に類似度の順に展開するだけであったので、このようなユーザの要求に応えることはできず、ユーザが必要としている内容の文書を見出すまでに手間がかかる問題点があった。
【００１６】
本発明は、このような従来の問題点に鑑みてなされたもので、類似度検索結果からユーザが必要としている情報の閲覧が可能な限り無駄なく行える技術を提供することを目的とする。
【００１７】
【課題を解決するための手段】
請求項１の発明の情報検索システムは、ユーザの入力するクエリーを受け付けるクエリー入力部と、このクエリー入力部の受け付けたクエリーから特徴ベクトルを作成する特徴ベクトル作成部と、多数の検索対象情報のインデックス、内容を表すデータ及び特徴ベクトルデータを登録している情報データベースと、前記特徴ベクトル作成部で作成されたクエリーの特徴ベクトルと前記情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す類似度演算部と、この類似度演算部の取り出したデータを保存する検索結果保持部と、前記類似度演算部の取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する検索結果出力部と、前記検索結果出力部の出力している検索結果に対するユーザの適／不適の判断入力を受け付け、（１）適入力の場合には、前記検索結果保持部に対して、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補として前記検索結果出力部によって出力させ、（２）不適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記検索結果保持部の保持している検索対象情報を検索し、該当する検索対象情報があればその内容を次候補として前記検索結果出力部によって出力させるフィードバック処理部とを備えたものである。
【００１８】
請求項２の発明の情報検索システムは、ユーザの入力するクエリーを受け付けるクエリー入力部と、このクエリー入力部の受け付けたクエリーから特徴ベクトルを作成する特徴ベクトル作成部と、多数の検索対象情報のインデックス、内容を表すデータ及び特徴ベクトルデータを登録している情報データベースと、前記特徴ベクトル作成部で作成されたクエリーの特徴ベクトルと前記情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す類似度演算部と、この類似度演算部の取り出したデータを保存する検索結果保持部と、前記類似度演算部が求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記検索結果保持部に保持しているすべての検索対象情報を網羅するまで繰り返して決定木を作成する決定木作成部と、前記類似度演算部の取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する検索結果出力部と、前記検索結果出力部の出力している検索結果に対するユーザの適／不適の判断入力を受け付け、前記決定木作成部の作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を前記検索結果出力部によって出力させるフィードバック処理部とを備えたものである。
【００１９】
請求項３の発明の情報検索システムは、ユーザの入力するクエリーを受け付けるクエリー入力部と、このクエリー入力部の受け付けたクエリーから特徴ベクトルを作成する特徴ベクトル作成部と、多数の検索対象情報のインデックス、内容を表すデータ及び特徴ベクトルデータを登録している情報データベースと、前記特徴ベクトル作成部で作成されたクエリーの特徴ベクトルと前記情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す類似度演算部と、この類似度演算部の取り出したデータを保存する検索結果保持部と、前記類似度演算部が求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、（２）不適入力の場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記検索結果保持部に保持している検索対象情報群について所定段階まで繰り返して決定木を作成する決定木作成部と、前記類似度演算部の取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する検索結果出力部と、前記検索結果出力部の出力している検索結果に対するユーザの適／不適の判断入力を受け付け、前記決定木作成部の作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を前記検索結果出力部によって出力させ、かつ前記決定木作成部に対して、既存の決定木を成長させる条件に至ったならば前記決定木作成部に決定木を所定段階だけ成長させる指示を与えるフィードバック処理部とを備えたものである。
【００２０】
請求項４の発明は、請求項１〜３の情報検索システムにおいて、前記クエリー及び検索対象情報はテキストデータであることを特徴とするものである。
【００２１】
請求項５の発明の情報検索方法は、クエリー入力部がユーザの入力するクエリーを受け付けるステップ１と、特徴ベクトル作成部が前記受け付けたクエリーから特徴ベクトルを作成するステップ２と、前記特徴ベクトル作成部で作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を類似度演算部が演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出すステップ３と、ステップ３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを検索結果出力部が出力するステップ４と、ステップ４で出力している検索結果に対するユーザの適／不適の判断入力をフィードバック処理部が受け付け、（１）適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記ステップ３で取り出した検索対象情報の中を検索し、該当する検索対象情報があればその内容を次候補として出力し、（２）不適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記ステップ３で取り出した検索対象情報の中を検索し、該当する検索対象情報があればその内容を次候補として出力するステップ５とを有するものである。
【００２２】
請求項６の発明の情報検索方法は、クエリー入力部がユーザの入力するクエリーを受け付けるステップ１と、特徴ベクトル作成部が前記受け付けたクエリーから特徴ベクトルを作成するステップ２と、前記特徴ベクトル作成部で作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出すステップ３と、
ステップ３で求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、ステップ３で取り出した検索対象情報のすべてを網羅するまで繰り返して決定木を作成するステップ４と、
ステップ３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力するステップ５と、ステップ５で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、ステップ４で作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を出力させるステップ６とを有するものである。
【００２３】
請求項７の発明の情報検索方法は、クエリー入力部がユーザの入力するクエリーを受け付けるステップ１と、特徴ベクトル作成部が前記受け付けたクエリーから特徴ベクトルを作成するステップ２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出すステップ３と、ステップ３で求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、ステップ３で取り出した検索対象情報について所定段階まで繰り返して決定木を作成するステップ４と、ステップ３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力するステップ５と、ステップ５で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、ステップ４で作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を出力し、かつステップ４で作成した決定木を成長させる条件に至ったならば当該決定木を所定段階だけ成長させるステップ６とを有するものである。
【００２４】
請求項８の発明は、請求項５〜７の情報検索方法において、前記クエリー及び検索対象情報はテキストデータであることを特徴とするものである。
【００２５】
請求項９の発明の情報検索プログラムは、ユーザの入力するクエリーを受け付ける処理１と、受け付けたクエリーから特徴ベクトルを作成する処理２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す処理３と、処理３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する処理４と、処理４で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、（１）適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記処理３で取り出した検索対象情報の中を検索し、該当する検索対象情報があればその内容を次候補として出力し、（２）不適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記処理３で取り出した検索対象情報の中を検索し、該当する検索対象情報があればその内容を次候補として出力する処理５とをコンピュータに実行させるものである。
【００２６】
請求項１０の発明の情報検索プログラムは、ユーザの入力するクエリーを受け付ける処理１と、受け付けたクエリーから特徴ベクトルを作成する処理２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す処理３と、処理３で求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、処理３で取り出した検索対象情報のすべてを網羅するまで繰り返して決定木を作成する処理４と、処理３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する処理５と、処理５で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、処理４で作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を出力させる処理６とをコンピュータに実行させるものである。
【００２７】
請求項１１の発明の情報検索プログラムは、ユーザの入力するクエリーを受け付ける処理１と、受け付けたクエリーから特徴ベクトルを作成する処理２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す処理３と、処理３で求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、処理３で取り出した検索対象情報について所定段階まで繰り返して決定木を作成する処理４と、処理３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する処理５と、処理５で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、処理４で作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を出力し、かつ処理４で作成した決定木を成長させる条件に至ったならば当該決定木を所定段階だけ成長させる処理６とをコンピュータに実行させるものである。
【００２８】
請求項１２の発明は、請求項９〜１１の情報検索プログラムにおいて、前記クエリー及び検索対象情報はテキストデータであることを特徴とするものである。
【００２９】
請求項１３の発明の記録媒体は、ユーザの入力するクエリーを受け付ける処理１と、
受け付けたクエリーから特徴ベクトルを作成する処理２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す処理３と、処理３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する処理４と、処理４で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、（１）適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記処理３で取り出した検索対象情報の中を検索し、該当する検索対象情報があればその内容を次候補として出力し、（２）不適入力の場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記処理３で取り出した検索対象情報の中を検索し、該当する検索対象情報があればその内容を次候補として出力する処理５とをコンピュータに実行させる情報検索プログラムを記録したものである。
【００３０】
請求項１４の発明の記録媒体は、ユーザの入力するクエリーを受け付ける処理１と、受け付けたクエリーから特徴ベクトルを作成する処理２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す処理３と、処理３で求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、処理３で取り出した検索対象情報のすべてを網羅するまで繰り返して決定木を作成する処理４と、処理３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する処理５と、処理５で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、処理４で作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を出力させる処理６とをコンピュータに実行させる情報検索プログラムを記録したものである。
【００３１】
請求項１５の発明の記録媒体は、ユーザの入力するクエリーを受け付ける処理１と、受け付けたクエリーから特徴ベクトルを作成する処理２と、作成されたクエリーの特徴ベクトルと、情報データベースに登録されている検索対象情報ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ検索対象情報を特定し、それらのインデックス、内容を表すデータ及び特徴ベクトルデータを取り出す処理３と、処理３で求めた類似度の高い特徴ベクトル群に対して、最も類似度の高い特徴ベクトルから始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であればクエリーの特徴ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、処理３で取り出した検索対象情報について所定段階まで繰り返して決定木を作成する処理４と、処理３で取り出した検索対象情報の１つについてそのインデックス、内容を示すデータを出力する処理５と、処理５で出力している検索結果に対するユーザの適／不適の判断入力を受け付け、処理４で作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を出力し、かつ処理４で作成した決定木を成長させる条件に至ったならば当該決定木を所定段階だけ成長させる処理６とをコンピュータに実行させる情報検索プログラムを記録したものである。
【００３２】
請求項１６の発明は、請求項１３〜１５の記録媒体において、前記クエリー及び検索対象情報はテキストデータであることを特徴とするものである。
【００３３】
請求項１７の発明の出力情報選択装置は、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する特徴データ保持部と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する表示情報出力部と、前記出力された検索対象情報に対して、ユーザの適／不適の判断入力を受け付けるフィードバック受付部と、（１）前記ユーザの判断入力が適である場合には、前記表示情報出力部が現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を前記表示情報出力部に出力させ、（２）前記ユーザの判断入力が不適である場合には、前記表示情報出力部が現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を前記表示情報出力部に出力させる出力情報選択部とを備えたものである。
【００３４】
請求項１８の発明の出力情報選択装置は、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する特徴データ保持部と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する表示情報出力部と、前記特徴データ保持部に保持されている一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記特徴データ保持部に保持されているすべての検索対象情報を網羅するまで繰り返して決定木を作成する決定木作成部と、前記表示情報出力部が出力している検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木作成部の作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を前記表示情報出力部によって出力させるフィードバック処理部とを備えたものである。
【００３５】
請求項１９の発明の出力情報選択装置は、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する特徴データ保持部と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する表示情報出力部と、前記特徴データ保持部に保持されている一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記特徴データ保持部に保持されている検索対象情報群について所定段階まで繰り返して決定木を作成する決定木作成部と、前記表示情報出力部が出力している検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木作成部の作成した決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を前記表示情報出力部によって出力させると共に、前記決定木作成部に対して、既存の決定木を成長させる条件に至ったならば前記決定木作成部に決定木を所定段階だけ成長させる指示を与えるフィードバック処理部とを備えたものである。
【００３６】
請求項２０の発明の出力情報選択方法は、特徴データ保持部が一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持するステップ１と、表示情報出力部が前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力するステップ２と、フィードバック受付部が現在出力中の検索対象情報に対して、ユーザの適／不適の判断入力を受け付け、（１）前記ユーザの判断入力が適である場合には、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を出力し、（２）前記ユーザの判断入力が不適である場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を出力するステップ３とを有するものである。
【００３７】
請求項２１の発明の出力情報選択方法は、特徴データ保持部が一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持するステップ１と、表示情報出力部が前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力するステップ２と、フィードバック受付部が前記一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記一群の検索対象情報のすべてを網羅するまで繰り返して決定木を作成するステップ３と、現在出力中の検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を出力するステップ４とを有するものである。
【００３８】
請求項２２の発明の出力情報選択方法は、特徴データ保持部が一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持するステップ１と、表示情報出力部が前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力するステップ２と、フィードバック受付部が前記一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記一群の検索対象情報について所定段階まで繰り返して決定木を作成するステップ３と、現在出力中の検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を出力すると共に、既存の決定木を成長させる条件に至ったならば前記決定木を所定段階だけ成長させるステップ４とを有するものである。
【００３９】
請求項２３の発明の出力情報選択プログラムは、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する処理１と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する処理２と、現在出力中の検索対象情報に対して、ユーザの適／不適の判断入力を受け付け、（１）前記ユーザの判断入力が適である場合には、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を出力し、（２）前記ユーザの判断入力が不適である場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を出力する処理３とをコンピュータに実行させるものである。
【００４０】
請求項２４の発明の出力情報選択プログラムは、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する処理１と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する処理２と、前記一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記一群の検索対象情報のすべてを網羅するまで繰り返して決定木を作成する処理３と、現在出力中の検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を出力する処理４とをコンピュータに実行させるものである。
【００４１】
請求項２５の発明の出力情報選択プログラムは、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する処理１と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する処理２と、前記一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記一群の検索対象情報について所定段階まで繰り返して決定木を作成する処理３と、現在出力中の検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を出力すると共に、既存の決定木を成長させる条件に至ったならば前記決定木を所定段階だけ成長させる処理４とをコンピュータに実行させるものである。
【００４２】
請求項２６の発明の記録媒体は、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する処理１と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する処理２と、現在出力中の検索対象情報に対して、ユーザの適／不適の判断入力を受け付け、（１）前記ユーザの判断入力が適である場合には、現在出力中の検索対象情報の持つ特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を出力し、（２）前記ユーザの判断入力が不適である場合には、現在出力中の検索対象情報の持つ特徴ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば予め与えられている基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報がないか前記一群の検索対象情報を検索し、該当する検索対象情報があればその内容を表現する情報を出力する処理３とを実行する出力情報選択プログラムを記録したものである。
【００４３】
請求項２７の発明の記録媒体は、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する処理１と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する処理２と、前記一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記一群の検索対象情報のすべてを網羅するまで繰り返して決定木を作成する処理３と、現在出力中の検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を出力する処理４とを実行する出力情報選択プログラムを記録したものである。
【００４４】
請求項２８の発明の記録媒体は、一群の検索対象情報それぞれの内容を表現する情報と共に特徴ベクトルデータを保持する処理１と、前記一群の検索対象情報の中から指定された検索対象情報の内容を表現する情報を出力する処理２と、前記一群の検索対象情報に対して、あらかじめ与えられている基準ベクトルと最も類似度の高い特徴ベクトルを持つ検索対象情報から始め、その内容がユーザに受け入れられるとした場合に適、受け入れられないとした場合に不適に分岐し、適に分岐する場合には、当該特徴ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を次候補の検索対象情報とし、前記特徴ベクトルから不適に分岐する場合には、不適とした検索対象情報の持つ特徴ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差と、仮想ベクトルと初段の検索であれば前記基準ベクトル、２段目以降の検索であれば前段で出力した特徴ベクトルの差が反対向きになる仮想ベクトルを算定し、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ他の検索対象情報を検索し、該当する検索対象情報があればその内容を次候補の検索対象情報とする処理を、前記一群の検索対象情報について所定段階まで繰り返して決定木を作成する処理３と、現在出力中の検索対象情報に対するユーザの適／不適の判断入力を受け付け、前記決定木に基づいて次に出力すべき検索対象情報を特定し、該当する検索対象情報の内容を表現する情報を出力すると共に、既存の決定木を成長させる条件に至ったならば前記決定木を所定段階だけ成長させる処理４とを実行する出力情報選択プログラムを記録したものである。
【００４５】
【発明の実施の形態】
以下、本発明の実施の形態を図に基づいて詳説する。本発明の特徴は、特徴ベクトルに基づき類似度検索を行って得られた文書群を、それらの持つ特徴ベクトルの方向を考慮して分類し、検索結果として提示されている文書群からユーザが１つの文書を開いた場合に、その内容に関してユーザの満足度（つまり、ユーザが求めていた内容の文書であるか否か）をＹＥＳ，ＮＯによって入力させ、ユーザの満足度に応じて次に展開する文書をコンピュータ側で自動的に選択して提示するようにすることにより、類似度の高低を基本としながらも特徴ベクトルの方向性も考慮し、ユーザの求めている情報に即した検索結果を順次展開できるようにすることである。
【００４６】
図１は、本発明の情報検索システムの機能的な構成を示している。この情報検索システムは、コンピュータシステムで構成され、若しくは複数台のコンピュータをケーブルで接続するネットワークシステムで構成される。本システムは、機能的な構成要素として、ユーザが自分の必要とする内容を表現するテキストデータ、又は自分の必要とする内容を含む文書のテキストデータをクエリーとして入力するクエリー入力部１、このクエリー入力部１から入力されたテキストデータから特徴ベクトルを作成する特徴ベクトル作成部２、保存されている多数の文書のインデックス、テキストデータ、各文書ごとの特徴ベクトルデータが格納されている文書データベース３、特徴ベクトル作成部２で作成されたクエリーの特徴ベクトルと文書データベース３に登録されている文書ごとの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ文書を特定し、それらのインデックス、テキストデータ及び特徴ベクトルを取り出す類似度演算部４、この類似度演算部４の取り出したデータを保存する検索結果保持部５を備えている。
【００４７】
本システムはまた、類似度演算部４が取り出した類似度の高い文書群のインデックスやテキストデータを表示し、必要に応じて印字出力する検索結果出力部７、ユーザによる検索結果に対するフィードバック操作を受け付け、対応する処理を検索結果保持部５に対して与えるフィードバック処理部８を備えている。
【００４８】
次に、上記の構成の情報検索システムによる情報検索方法について、図２を用いて説明する。この情報検索システムでは、ユーザがクエリー入力部１により、自分の必要とする内容を表現するテキストデータ、又は自分の必要とする内容を含む文書のテキストデータをクエリーとして入力すると（ステップＳ１）、特徴ベクトル作成部２が、入力されたクリエーのテキストデータから特徴ベクトルを作成する（ステップＳ３）。この特徴ベクトルの作成は、従来から広く利用されている数１式、数２式を用いるｔｆ＊ｉｄｆによる。なお、特徴ベクトルの作成方法は、従来例で列挙した他の方法であってもよい
次に、類似度演算部４により、文書データベース３に保存されている多数の文書それぞれの特徴ベクトルと特徴ベクトル作成部２で作成されたクエリーの特徴ベクトルとの類似度を前述の数３式を用いて演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ文書を特定し、それらのインデックス、テキストデータ及び特徴ベクトルを取り出し（ステップＳ５）、検索結果のデータとして検索結果保持部５に一時保存する（ステップＳ７）。なお、類似度の演算も、従来例で列挙した他の方法であってもよい。
【００４９】
こうして得られた検索結果に対して、検索結果出力部７はクエリーに対して類似度が一番高い特徴ベクトルを持つ文書の内容を表す情報を表示する（ステップＳ９）。なお、この文書内容を表す情報には、文書のタイトル、著者名、発行年月日、概要を含む。また、ユーザの指定により文書のテキストデータすべてを展開表示させることもできる。
【００５０】
この文書の内容の表示に対して、ユーザがＵＩデバイス（例えば、キーボードや、マウスのようなポインティングデバイス）により自分の求めていた内容の文書であるか（ＹＥＳ）否か（ＮＯ）の判断を入力すると、フィードバック処理部８がこの判断入力を受け付けて、次のＹＥＳ／ＮＯのいずれかの処理を行う（ステップＳ１１）。
【００５１】
ユーザが、最初に提示された文書の内容について、自分の求めていた内容であると判断し、ＹＥＳを入力した場合、フィードバック処理部８は検索結果保持部５に対して、現在出力中の文書の特徴ベクトルに類似度が最も高い特徴ベクトルを持つ文書を検索し、該当する文書があればその内容を検索結果出力部７によって表示させる（ステップＳ１３，Ｓ１９，Ｓ２１）。
【００５２】
ユーザが、最初に提示された文書の内容について、自分の求めていた内容ではないと判断し、ステップＳ１１でＮＯを入力した場合、フィードバック処理部８は、クエリーの特徴ベクトルに対して現在出力中の文書の持つ特徴ベクトルと反対向きになる仮想ベクトルを算定し（ステップＳ１５）、この仮想ベクトルに類似度が最も高い特徴ベクトルを持つ文書を検索結果保持部５に保持されている文書群から抽出し、その文書内容を検索結果出力部７に表示させる（ステップＳ１７〜Ｓ２１）。
【００５３】
こうして、ステップＳ２１で二番目に表示させた文書の内容について、ユーザはＵＩデバイスによってＹＥＳ／ＮＯの判断を入力すれば、ステップＳ１１〜Ｓ２１の処理を繰り返す。三番目以降の文書について同様である。そして、検索結果保持部５に該当する特徴ベクトルを持つ文書がなくなれば検索結果の出力を終了する（ステップＳ１９でＮＯに分岐）。
【００５４】
以上の情報検索方法による検索結果の出力形態を、図３〜図５を用いて説明する。類似度演算部４による類似度演算結果として得られた文書群は、従来と同様に図１２に示すものであったとする。つまり、クエリーＱに対して、類似度がしきい値を超える特徴ベクトルを持つ文書が類似度の高い順に、Ａ，Ｄ，Ｂ，Ｃ，Ｅであったとする。そして、各文書の特徴ベクトルは図示したように種々の方向を向いているものとする。
【００５５】
最初の検索結果表示では、図３に示すように、クエリーＱに対して類似度が一番高い文書Ａについて、その内容を表示する。そして、ユーザがＹＥＳ／ＮＯの判断入力を行えるように、「関係あり（適）」、「関係なし（不適）」の選択ボタン１１，１２を表示する。
【００５６】
そして、ユーザが文書Ａの内容を確認し、自分の求めていた内容のものであるため「関係あり」のボタン１１を操作すれば、続いて、図１３においてこの文書Ａの特徴ベクトルに対して類似度が最も高い特徴ベクトルを持つ文書を検索結果の文書群の中から検索し、文書Ｂを抽出する。そして図４の（ａ）に示すように、抽出した文書Ｂの内容を表示する。
【００５７】
ここで反対に、ユーザが文書Ａの内容を確認し、「関係なし」のボタン１２を操作すれば、上述したように、類似度の基準とした文書であるクエリーＱに対して、文書Ａの特徴ベクトルと正反対の方向の仮想ベクトルＡ′を次の数４式によって求める。
【００５８】
【数４】

ここで、αはあらかじめ与えられた定数である。
【００５９】
この仮想ベクトルＡ′は、図５に示すようにクエリーＱに対して、文書Ａの特徴ベクトルと正反対の向きにある。
【００６０】
そしてこの仮想ベクトルＡ′を用いて、これに最も近い特徴ベクトルを持つ文書を検索して文書Ｄを抽出し、その文書Ｄの内容を図４（ｂ）に示すように表示する。
【００６１】
以下も同様に、例えば、図４（ａ）の文書Ｂの内容を表示している状態でユーザが「関係あり」のボタンを操作し、あるいは同図（ｂ）の文書Ｄの内容を表示している状態でユーザが「関係あり」のボタン操作をすれば、文書Ｂ、文書Ｄの特徴ベクトルに類似度が最も高い特徴ベクトルを持つ文書を検索結果の文書群の中から抽出する。
【００６２】
逆に、例えば、図４（ａ）の文書Ｂの内容を表示している状態でユーザが「関係なし」のボタンを操作した場合には、この文書Ｂを導き出した１回前の文書Ａの特徴ベクトルに対して、文書Ｂの特徴ベクトルと反対方向の仮想ベクトルＢ′を数４式によって求め、この仮想ベクトルＢ′に対して類似度が最も高い特徴ベクトルを持つ文書を検索する。これにより、文書Ｄが抽出され、その内容が表示されることになる。
【００６３】
以下、同様の方法により、ユーザの判断入力を反映し、ユーザにとって必要とする内容に近い内容を持つ文書群を優先的に順次に提示することができることになる。
【００６４】
なお、図１に示すシステムにおいて、用いるコンピュータの性能により、クエリー入力部１〜類似度演算部４はサーバ側の機能とし、これにＬＡＮやインターネットその他のネットワークで接続されたクライアント側に検索結果保持部５、検索結果出力部７、フィードバック処理部８を設けるシステム構成にすることができる。あるいは、クエリー入力部１〜検索結果保持部５はサーバ側の機能とし、これにＬＡＮやインターネットその他のネットワークで接続されたクライアント側に検索結果出力部７とフィードバック処理部８を設けるシステム構成にすることもできる。
【００６５】
次に、本発明の第２の実施の形態の情報検索システムを、図６を用いて説明する。第２の実施の形態の情報検索システムは、機能的な構成要素として、図１に示した第１の実施の形態のシステムと同様のクエリー入力部１、特徴ベクトル作成部２、文書データベース３、類似度演算部４、検索結果保持部５、検索結果出力部７及びフィードバック処理部８を備えている。そして本実施の形態のシステムはさらに、検索結果保持部５に保存されている文書ごとの特徴ベクトルに基づいて、後述する論理演算によって決定木を作成し、その作成した決定木データを検索結果保持部５に保持させる決定木作成部６を備えている。
【００６６】
次に、上記の構成の第２の実施の形態の情報検索システムによる情報検索方法について、図７のフローチャートを用いて説明する。この情報検索システムでも、図２に示した第１の実施の形態のシステムと同様に、ユーザが自分の必要とする内容を表現するテキストデータ、又は自分の必要とする内容を含む文書のテキストデータをクエリーとして入力すると（ステップＳ１）、入力されたクリエーのテキストデータから特徴ベクトルを作成する（ステップＳ３）。そして、文書データベース３に保存されている多数の文書それぞれの特徴ベクトルとクエリーの特徴ベクトルとの類似度を演算し、所定のしきい値よりも高い類似度を示す特徴ベクトルを持つ文書を特定し、それらのインデックス、テキストデータ及び特徴ベクトルを取り出し（ステップＳ５）、検索結果のデータとして検索結果保持部５に一時保存する（ステップＳ７）。
【００６７】
こうして得られた検索結果に対して、本実施の形態の特徴である決定木作成部６が、後述する処理により図８に示すような決定木を作成して検索結果保持部５に保持させる（ステップＳ８）。そして、検索結果出力部７はクエリーに対して類似度が一番高い特徴ベクトルを持つ文書の内容を表す情報を表示する（ステップＳ９）。
【００６８】
この文書の内容の表示に対して、ユーザがＵＩデバイスにより自分の求めていた内容の文書であるか（ＹＥＳ）否か（ＮＯ）の判断を入力すると、フィードバック処理部８がこの判断入力を受け付けて、次のＹＥＳ／ＮＯのいずれかの処理を行う（ステップＳ１１）。
【００６９】
ユーザが、最初に提示された文書の内容について、自分の求めていた内容であると判断し、ＹＥＳを入力した場合、フィードバック処理部８は検索結果保持部５に保持されている決定木を参照し、現在出力中の文書に対してＹＥＳの場合に移行する次の文書を特定し、その内容を検索結果出力部７によって表示させる（ステップＳ１２，Ｓ１６，Ｓ１８）。
【００７０】
ユーザが、最初に提示された文書の内容について、自分の求めていた内容ではないと判断し、ステップＳ１１でＮＯを入力した場合、フィードバック処理部８は検索結果保持部５に保持されている決定木を参照し、現在出力中の文書に対してＮＯの場合に移行する次の文書を特定し、その内容を検索結果出力部７によって表示させる（ステップＳ１４，Ｓ１６，Ｓ１８）。
【００７１】
こうして、ステップＳ１８で二番目に表示させた文書の内容について、ユーザはＵＩデバイスによってＹＥＳ／ＮＯの判断を入力すれば、ステップＳ１１〜Ｓ１８の処理を繰り返す。三番目以降の文書について同様である。そして、検索結果保持部５に該当する文書がなくなれば検索結果の出力を終了する（ステップＳ１６でＮＯに分岐）。
【００７２】
次に、決定木作成部６による決定木の作成処理について、図８の決定木例を用いて説明する。クエリーの特徴ベクトルＱに対して、しきい値内の類似度を持つ特徴ベクトル群Ａ〜Ｅが検索されたとする。クエリーＱに対する類似度はクエリーＱとの距離が近いほど類似度が高いものとし、ベクトル方向は、クエリーＱに対する相対的な位置として示してある。
【００７３】
決定木作成部６は、類似度演算部４が求めた類似度の高い特徴ベクトルＡ〜Ｅに対して、最も類似度の高い特徴ベクトルＡから始め、その内容がユーザに受け入れられるとした場合にＹＥＳ、受け入れられないとした場合にＮＯに分岐する。
【００７４】
そして、ＹＥＳに分岐する場合には、特徴ベクトルＡに類似度が最も高い特徴ベクトルＢを持つ文書を、特徴ベクトルＡを持つ文書の次に出力する文書とする。一方、特徴ベクトルＡの文書からＮＯに分岐する場合には、上述したようにクエリーＱを基準にして、特徴ベクトルＡと反対方向の仮想ベクトルＡ′を想定し、この仮想ベクトルＡ′に対して最も近い位置の特徴ベクトルＤの文書を次に表示する文書と決定する。そして、同様にして特徴ベクトルごとにＹＥＳ，ＮＯに分岐して順次、出力するベクトルを検索結果のベクトル群から抽出してゆき、図８に示すような決定木を作成する。
【００７５】
これにより、ユーザがクエリーＱに対して検索された文書群の中から順次内容を確認していく場合、フィードバック処理部８は、決定木作成部６が作成し、検索結果保持部５に保持されている図８に示す決定木の情報を参照して、次のように検索結果の文書を展開していく。
【００７６】
まず最初に、クエリーＱに最も近い特徴ベクトルを持つ文書としてＡを出力する。これに対してユーザがＮＯと判断操作をすれば、フィードバック処理部８は、次に特徴ベクトルＤの文書を検索結果出力部７に出力させる。この特徴ベクトルＤの文書に対しても、ユーザがＮＯと判断操作をすれば、次に特徴ベクトルＣの文書を出力させる。そして、特徴ベクトルＣの文書に対してはＹＥＳであれば、次に特徴ベクトルＢの文書を出力させる。そして最後に、特徴ベクトルのＥの文書を出力させるのである。
【００７７】
なお、決定木作成部６による決定木の作成処理を一般的に示したものが図９のフローチャートである。最初に入力したクエリーＱの対してしきい値内の類似度の特徴ベクトルを持つ文書群を検索しておき、その文書群の中で、図９のフローチャートの処理を繰り返し、決定木を作成しておくのである。
【００７８】
この第２の実施の形態によれば、特に高速処理が可能なサーバを備えたシステムにあっては、クライアントとしてこのサーバにアクセスして検索サービスを受けるユーザは、自身のクライアントマシンが高速処理能力を備えていないものであっても検索結果を優先度の高いものから順次、高速で閲覧することができるようになる。
【００７９】
次に、本発明の第３の実施の形態の情報検索システムを、図１０のフローチャート及び図１１の動作説明図を用いて説明する。第３の実施の形態の情報検索システムの機能的な構成は、図６に示した第２の実施の形態のものと共通である。ただし、決定木作成部６は類似度演算部４が抽出した文書群すべてに対して、図８に示すような決定木を作成するのではなく、ユーザの判定入力に応じて、所定段ずつ決定木を成長させていくことを特徴とする。
【００８０】
すなわち、類似度演算部４がクエリーＱに対して求めたしきい値内の特徴ベクトルを持つ文書群が図１３に示したものであった場合、決定木作成部６は最初に、図１１（ａ）に示すように、例えば３段階（この段数は特定されることはなく、２段まででもよいし、検索結果の文書数が多ければ４段以上であってもよい）までの決定木を作成する（ステップＳ３１）。
【００８１】
そして、ユーザが第１段階の文書Ａの内容に対して関係あり（ＹＥＳ）と判断すれば（ステップＳ３３〜Ｓ３７）、文書展開の方向性がほぼ定まるので、図１１（ｂ）のように決定木を成長させる（ステップＳ３９，Ｓ４１，Ｓ４３）。そして、文書Ａの特徴ベクトルに対して最短距離の特徴ベクトルを持つ文書Ｂを抽出して出力する（ステップＳ４５，Ｓ３５）。
【００８２】
なお、ステップＳ４１で、決定木を成長させる条件は、例えば、検索結果としての文書群の中に未提示のものが残っているが、現段階までの決定木では末端の文書まで到達した場合、１段あるいはあらかじめ取り決めた適数段の候補文書を提示するごとに決定木を１段階あるいは適数段ずつ成長させる。この進段の条件は、例えば、１段階あるいは適数段階の文書が提示された場合等にあらかじめ決定しておくことができる。
【００８３】
この第３の実施の形態によれば、ユーザの判定をフィードバックし、展開される可能性の高い系統についてだけ決定木を作成することになるので、無駄な演算処理を軽減することができ、サーバの演算負荷を軽減することができる。
【００８４】
なお、上記の各実施の形態では、検索サービスと演算処理をすべてサーバマシンで実行することにしたが、例えば、決定木作成部６も含め、検索結果保持部５〜フィードバック処理部８の処理機能の全部あるいは一部をクライアントマシン側に設け、サーバマシン側とはネットワークで接続する構成にすることもできる。
【００８５】
また、図６に示すシステムにおいて、用いるコンピュータの性能により、クエリー入力部１〜類似度演算部４はサーバ側の機能とし、これにＬＡＮやインターネットその他のネットワークで接続されたクライアント側に検索結果保持部５、決定木作成部６、検索結果出力部７、フィードバック処理部８を設けるシステム構成にすることができる。あるいは、クエリー入力部１〜決定木作成部５はサーバ側の機能とし、これにＬＡＮやインターネットその他のネットワークで接続されたクライアント側に検索結果出力部７とフィードバック処理部８を設けるシステム構成にすることもできる。
【００８６】
なお、本発明は上記の各実施の形態の情報検索システムの処理機能を実現するプログラム、またそのプログラムを記録したコンピュータ読取り可能な記録媒体、さらには当該プログラムをコンピュータに組み込むことによりコンピュータシステムが実行する情報検索方法をも技術的範囲とする。
【００８７】
さらにまた、上記の各実施の形態ではテキストデータをベースにした文書情報検索について例示したが、これに限らず、文書情報として内容が表現された音楽、映画、画像情報の検索にも適用することができる。また、音楽情報、画像情報に関して、そのデータを特徴ベクトル化してデータベースに登録し、またそれらのインデックス、タイトル、著作者、発行人、販社等のテキストデータも共に登録しておき、クエリーとして音楽情報、画像情報を表現するデータを直接入力して特徴ベクトル化し、特徴ベクトル間の類似度が高い音楽情報、画像情報を抽出し、それらの音楽情報、又は画像情報のデータ、特徴ベクトル、インデックス、タイトル等をデータベースから取り出すようにすれば、上記の各実施の形態と同様の決定木による検索が可能である。
【００８８】
【発明の効果】
以上のように、本発明によれば、ユーザが入力するクエリーに対して特徴ベクトルの類似度がしきい値よりも高い情報群を抽出し、そのうち類似度が最も高い特徴ベクトルを持つ情報の内容をまず提示し、ユーザがその情報を適（ＹＥＳ）と判断すればその情報の特徴ベクトルと類似度が高い特徴ベクトルを持つ情報を次に提示し、逆にその情報を不適（ＮＯ）と判断すれば、特徴ベクトルと反対向きの仮想ベクトルを用い、これに類似度が高い特徴ベクトルを持つ情報を次に提示させるという手順を、以降、順次繰り返し、ユーザの求めている情報を予測して優先的に提示するので、ユーザの必要としている情報と類似度が高い情報が多数にのぼる場合でも、提示されている１つの情報に対するユーザの適／不適の内容判定に対応して、ユーザが必要としている内容の情報を自動的に選択して順次提示していくことができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態の情報検索システムの機能構成を示すブロック図。
【図２】上記の実施の形態による検索処理及び検索結果出力処理のフローチャート。
【図３】上記の実施の形態による検索処理結果の表示例の説明図。
【図４】上記の実施の形態による検索結果の表示出力の遷移を示す説明図。
【図５】上記の実施の形態による検索処理出力処理の説明図。
【図６】本発明の第２の実施の形態の情報検索システムの機能構成を示すブロック図。
【図７】上記の実施の形態の検索処理及び検索結果出力処理のフローチャート。
【図８】上記の実施の形態による検索結果に対する決定木作成処理の説明図。
【図９】上記の実施の形態による検索結果に対する決定木作成処理のフローチャート。
【図１０】本発明の第３の実施の形態の情報検索システムによる検索結果に対する決定木作成処理のフローチャート。
【図１１】上記の実施の形態による検索結果に対する決定木作成処理の説明図。
【図１２】従来例の類似度に基づく検索処理結果の出力例を示す説明図。
【図１３】従来例の類似度に基づく検索処理の説明図。
【符号の説明】
１クエリー入力部
２特徴ベクトル作成部
３文書データベース
４類似度演算部
５検索結果保持部
６決定木作成部
７検索結果出力部
８フィードバック処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information search system, an information search method, an information search program, a recording medium on which an information search program is recorded, an output information selection device, an output information selection method, an output information selection program, and a recording medium on which an output information selection program is recorded. .
[0002]
[Prior art]
When a user searches for a document on the Internet or accesses a document database located at a specific location to search for a document, he / she enters text data describing the document contents he / she needs as a creator, or his / her own If you enter text data of an existing document that is a model of the required document and give a search command, the search engine will extract a document with text data similar to the text data of the document entered as a creator. Information retrieval systems to be presented to users are known.
[0003]
In such an information search system, for example, a vector space model represented by tf * idf is used. This is because the text data of a large number of documents is analyzed and the feature vector is obtained and registered in the database, the feature vector of the text data input by the user is obtained, and similar from the feature vector group already registered in the database. Feature vectors are extracted, and documents having these extracted feature vectors are presented to the user as search results.
[0004]
The tf * idf is a feature amount obtained by adding a weight considering the appearance frequency in the text data of another search target document to the appearance frequency of each word in the text data of the document. Weight w for word _t Is represented by Equation 1 as follows.
[0005]
[Expression 1]

Here, N is the number of documents to be searched, f _t Is the number of documents containing the word t.
[0006]
Each element w of the feature vector of the text data of the document _{d, t} Is calculated as follows.
[0007]
[Expression 2]

Where f _{d, t} Is the appearance frequency of the word t in the text data of the document d.
[0008]
In addition, as a scale for obtaining the similarity between documents in which feature amounts are represented by vectors, for example, there is a cosine coefficient. In this cosine coefficient, the similarity sim (x, y) between two vectors x and y is expressed by the following equation (3).
[0009]
[Equation 3]

Then, in the conventional information retrieval system, a document having a feature vector whose similarity with the feature vector of the text data as the user query is higher than a predetermined threshold is extracted, and the similarity is obtained as shown in FIG. The search results were presented in order from the highest.
[0010]
The search result in order of similarity shown in FIG. 12 is schematically represented in a vector space as shown in FIG. That is, the spatial positional relationship between the feature vector of the query Q and each feature vector of documents A to E having a high degree of similarity within a threshold is expressed as shown in FIG. In FIG. 13, the order in which the similarity to query Q is high within the threshold is A, D, B, C, E since the distance measure for query Q is A <D <B <C <E. It is.
[0011]
In addition to tf * idf, models represented by document feature vectors include Term-discrimination Value and Probabilistic as shown in, for example, “Automatic Text Processing The Transformation, Analysis, and Retrieval of Information by Computer Gerard Salton”. There are various models such as Term Weighting.
[0012]
In addition to the above Cosine coefficient, there are various distance scales such as Inner product, Dice coefficient, Jaccard coefficient, etc., as shown in the above-mentioned literature.
[0013]
[Problems to be solved by the invention]
Such a conventional information retrieval system has the following problems. As shown in FIG. 13, even for documents A to E with high similarity that are within the threshold, the direction of their feature vectors varies with respect to the query Q. However, since the search results are usually displayed only in the order of high similarity as shown in FIG. 12, when the user expands and reads the document A closest to the query Q, the user's request is obtained. Even if the content is found to be the content, next, the document D having the second highest similarity, which is opposite in the direction to the query Q from the document A, is opened. Next, the document D having the third similarity is opened with the feature vector close to the document A, although the direction is almost opposite to that of the document D with respect to the query Q. Then, a document C having a completely different direction from those documents is opened.
[0014]
However, in reality, for the user, for example, when the document A having the highest degree of similarity is opened, if the content is highly relevant to the document that the user desires, the direction of the feature vector is Second, the document B having the feature vector facing the same direction as the document A is opened rather than the document D having the highest similarity but the feature vector facing the query Q opposite to the first document A. Is preferable.
[0015]
However, in the past, it was simply developed in the order of similarity without considering the direction of the feature vector, so it was not possible to meet such a user's request and to find a document with the content that the user needed. There was a problem that took time.
[0016]
The present invention has been made in view of such a conventional problem, and an object of the present invention is to provide a technique that allows browsing of information required by a user from a similarity search result without waste as much as possible.
[0017]
[Means for Solving the Problems]
The information search system according to the first aspect of the present invention includes a query input unit that receives a query input by a user, a feature vector generation unit that generates a feature vector from the query received by the query input unit, and an index of a large number of search target information Information database in which data representing contents and feature vector data are registered, and a feature vector of a query created by the feature vector creation unit and a feature vector for each search target information registered in the information database This similarity is calculated by calculating the degree, specifying the search target information having the feature vector indicating the degree of similarity higher than the predetermined threshold, and extracting the index, the data representing the content, and the feature vector data. A search result holding unit for storing data taken out by the degree calculation unit, and the similarity calculation unit A search result output unit that outputs data indicating the index and contents of one of the retrieved search target information, and a user's suitability input for the search result output by the search result output unit is received, 1) In the case of appropriate input, the search result holding unit is searched for other search target information having a feature vector having the highest similarity to the feature vector of the search target information currently being output. If there is search target information, the content is output as the next candidate by the search result output unit. (2) In the case of inappropriate input, The feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, the difference between the feature vector output in the previous stage if the search is the second stage or later, and the virtual vector and the first stage search For example, if the search is a feature vector of a query and the second and subsequent searches, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. Search the search target information held by the search result holding unit for other search target information having the feature vector having the highest similarity in the virtual vector. A feedback processing unit that is output as a candidate by the search result output unit.
[0018]
An information search system according to a second aspect of the present invention includes a query input unit that receives a query input by a user, a feature vector generation unit that generates a feature vector from the query received by the query input unit, and an index of a large number of search target information Information database in which data representing contents and feature vector data are registered, and a feature vector of a query created by the feature vector creation unit and a feature vector for each search target information registered in the information database This similarity is calculated by calculating the degree, specifying the search target information having the feature vector indicating the degree of similarity higher than the predetermined threshold, and extracting the index, the data representing the content, and the feature vector data. A search result holding unit for storing data taken out by the degree calculation unit, and the similarity calculation unit For the feature vector group with the high similarity, start with the feature vector with the highest similarity, and if the contents are accepted by the user, it is appropriate. When branching, other search target information having a feature vector having the highest similarity to the feature vector is set as search target information of the next candidate, and when branching inappropriately from the feature vector, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the query feature vector is the second or later search, the virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The search result holding unit is configured to search for other search target information having a feature vector having the highest similarity to the virtual vector, and, if there is corresponding search target information, use the content as the next candidate search target information. A decision tree creating unit that repeatedly creates a decision tree until all the search target information held in the list is covered, and data indicating the index and content of one of the search target information extracted by the similarity calculation unit The search result output unit to be output and the user's appropriate / inappropriate judgment input for the search result output from the search result output unit should be received and output next based on the decision tree created by the decision tree creation unit And a feedback processing unit that specifies search target information and causes the search result output unit to output the contents of the corresponding search target information.
[0019]
An information search system according to a third aspect of the present invention includes a query input unit that receives a query input by a user, a feature vector generation unit that generates a feature vector from the query received by the query input unit, and an index of a large number of search target information Information database in which data representing contents and feature vector data are registered, and a feature vector of a query created by the feature vector creation unit and a feature vector for each search target information registered in the information database This similarity is calculated by calculating the degree, specifying the search target information having the feature vector indicating the degree of similarity higher than the predetermined threshold, and extracting the index, the data representing the content, and the feature vector data. A search result holding unit for storing data taken out by the degree calculation unit, and the similarity calculation unit For the feature vector group with the high similarity, start with the feature vector with the highest similarity, and if the contents are accepted by the user, it is appropriate. In the case of branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriate branching from the feature vector, (2) inappropriate input In Case of, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The search result holding unit is configured to search for other search target information having a feature vector having the highest similarity to the virtual vector and, if there is corresponding search target information, use the content as the next candidate search target information. A decision tree creation unit that repeatedly creates a decision tree for a search target information group that is held, and data indicating the index and content of one of the search target information extracted by the similarity calculation unit A search result output unit and a search target to be output next based on a decision tree created by the decision tree creation unit, receiving a user's suitability input for the search result output by the search result output unit Information is specified, the content of the corresponding search target information is output by the search result output unit, and the decision tree creation unit reaches a condition for growing an existing decision tree. If in which a decision tree in the decision tree creation unit and a feedback processing section for giving an instruction to grow by a predetermined phase.
[0020]
According to a fourth aspect of the present invention, in the information search system according to the first to third aspects, the query and the search target information are text data.
[0021]
The information retrieval method of the invention of claim 5 The query input part Step 1 for accepting a query input by a user; The feature vector creation unit Creating a feature vector from the accepted query; In the feature vector creation unit The similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database Similarity calculation part Step 3 for retrieving search target information having a feature vector showing a similarity higher than a predetermined threshold value, extracting the index, data representing the content, and feature vector data, and the search extracted in step 3 Data indicating the index and content of one of the target information Search result output part Step 4 for outputting, and the user's appropriate / unsuitable judgment input for the search result outputted in Step 4 The feedback processor Accept, (1) In the case of appropriate input, the search target information extracted in step 3 for any other search target information having a feature vector having the highest similarity to the feature vector of the search target information currently being output If there is relevant search target information, the content is output as the next candidate. (2) In case of inappropriate input, The feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, the difference between the feature vector output in the previous stage if the search is the second stage or later, and the virtual vector and the first stage search For example, if the search is a feature vector of a query and the second and subsequent searches, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The search target information retrieved in step 3 is searched for other search target information having a feature vector having the highest similarity in the virtual vector, and if there is corresponding search target information, the content is set as a next candidate. Step 5 for outputting.
[0022]
The information search method of the invention of claim 6 The query input part Step 1 for accepting a query input by a user; The feature vector creation unit Creating a feature vector from the accepted query; In the feature vector creation unit Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database Step 3 for extracting the index, data representing contents, and feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in step 3, and branch off inappropriately if the contents are accepted by the user and not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The search extracted in step 3 is a process of searching for other search target information having a feature vector having the highest similarity to this virtual vector, and if there is relevant search target information, the content is set as the next candidate search target information. Step 4 to create a decision tree repeatedly until all of the target information is covered,
Step 5 for outputting data indicating the index and contents of one piece of search target information extracted in Step 3, and a user's suitability input for the search result output in Step 5 are accepted. Step 6 for specifying the search target information to be output next based on the created decision tree and outputting the content of the corresponding search target information.
[0023]
The information retrieval method of the invention of claim 7 The query input part Step 1 for accepting a query input by a user; The feature vector creation unit Step 2 of creating a feature vector from the accepted query, calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database, and from a predetermined threshold Step 3 for retrieving search target information having a feature vector exhibiting a high similarity and extracting the index, contents data and feature vector data, and a feature vector group having a high similarity obtained in step 3 Start with the feature vector with the highest similarity, and if the content is accepted by the user, it is appropriate. If it is not accepted, it branches improperly. Other search target information having the highest feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. If that is, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The search extracted in step 3 is a process of searching for other search target information having a feature vector having the highest similarity to this virtual vector, and if there is relevant search target information, the content is set as the next candidate search target information. Step 4 for creating a decision tree by repeating the target information up to a predetermined stage, Step 5 for outputting data indicating the index and contents of one piece of search target information extracted in Step 3, and Step 5 for outputting. Accepting the user's appropriate / inappropriate judgment input for the search result, specifying the search target information to be output next based on the decision tree created in step 4, outputting the contents of the corresponding search target information, and step 4 And the step 6 for growing the decision tree by a predetermined level when the condition for growing the decision tree is reached.
[0024]
The invention according to claim 8 is the information search method according to any of claims 5 to 7, wherein the query and the search target information are text data.
[0025]
The information search program of the invention of claim 9 is registered in the information database, the process 1 for accepting a query input by a user, the process 2 for creating a feature vector from the accepted query, the feature vector of the created query, Calculating the similarity with the feature vector for each search target information, specifying the search target information having a feature vector that shows a similarity higher than a predetermined threshold, and indicating the index, content, and the feature vector Processing 3 for extracting data, processing 4 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3, and determining whether the user is appropriate or inappropriate for the search result output in processing 4 Accepts the input. (1) If the input is appropriate, the highest similarity is found in the feature vector of the search target information currently being output. The search object information retrieved in the process 3 is searched for other search object information having a feature vector, and if there is corresponding search object information, the content is output as a next candidate. (2) Inappropriate input in case of, The feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, the difference between the feature vector output in the previous stage if the search is the second stage or later, and the virtual vector and the first stage search For example, if the search is a feature vector of a query and the second and subsequent searches, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The virtual vector is searched for other search target information having a feature vector having the highest similarity in the search target information extracted in the process 3, and if there is corresponding search target information, the content is set as a next candidate. The processing 5 to be output is executed by a computer.
[0026]
The information search program of the invention of claim 10 is registered in the information database, the process 1 for accepting a query input by a user, the process 2 for creating a feature vector from the accepted query, the feature vector of the created query, and Calculating the similarity with the feature vector for each search target information, specifying the search target information having a feature vector that shows a similarity higher than a predetermined threshold, and indicating the index, content, and the feature vector The processing 3 for extracting data and the feature vector group having the high similarity obtained in the processing 3 start from the feature vector having the highest similarity, and if the contents are accepted by the user, it is not acceptable. If the branching is inappropriate, and the branching is appropriate, the feature vector has the highest similarity. The search information if the search target information for the next candidate, is unsuitable branched from the feature vectors, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. Search for other search target information having a feature vector having the highest similarity to this virtual vector, and if the corresponding search target information exists, the process using the content as the next candidate search target information is retrieved in process 3 Processing 4 for repeatedly creating a decision tree until all of the target information is covered, processing 5 for outputting data indicating the index and contents of one of the search target information extracted in processing 3, and outputting in processing 5 Process 6 for accepting a user's suitability / unsuitability judgment input for a search result, specifying search target information to be output next based on the decision tree created in Process 4, and outputting the contents of the corresponding search target information Is executed by a computer.
[0027]
An information search program according to an eleventh aspect of the present invention is registered in the information database, the process 1 for accepting a query input by a user, the process 2 for creating a feature vector from the accepted query, the feature vector of the created query, and Calculating the similarity with the feature vector for each search target information, specifying the search target information having a feature vector that shows a similarity higher than a predetermined threshold, and indicating the index, content, and the feature vector The processing 3 for extracting data and the feature vector group having the high similarity obtained in the processing 3 start from the feature vector having the highest similarity, and if the contents are accepted by the user, it is not acceptable. If the branching is inappropriate, and the branching is appropriate, the feature vector has the highest similarity. The search information if the search target information for the next candidate, is unsuitable branched from the feature vectors, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. Search for other search target information having a feature vector having the highest similarity to this virtual vector, and if the corresponding search target information exists, the process using the content as the next candidate search target information is retrieved in process 3 Processing 4 for creating a decision tree by repeating up to a predetermined stage for the target information, processing 5 for outputting data indicating the index and contents of one of the search target information extracted in processing 3, and output by processing 5 Accepting the user's suitability / unsuitability judgment input for the search result, specifying the search target information to be output next based on the decision tree created in process 4, outputting the contents of the corresponding search target information, and process 4 When the condition for growing the decision tree created in step (b) is reached, the computer executes the process 6 for growing the decision tree by a predetermined level.
[0028]
According to a twelfth aspect of the present invention, in the information search program of the ninth to eleventh aspects, the query and the search target information are text data.
[0029]
The recording medium of the invention of claim 13 is a process 1 for receiving a query input by a user;
Processing 2 for creating a feature vector from the received query, calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database, and from a predetermined threshold The search target information having a feature vector exhibiting a high degree of similarity, processing 3 for retrieving the index, content data, and feature vector data, and the index for one of the search target information extracted in processing 3, The process 4 for outputting data indicating the contents and the user's appropriate / inappropriate judgment input for the search result output in the process 4 are accepted. (1) In the case of appropriate input, the search target information currently being output The search target information extracted in the process 3 is searched for other search target information having a feature vector having the highest similarity in the feature vector. And search, and outputs the content if the search target information applicable as the next candidate, in the case of (2) unsuitable input, The feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, the difference between the feature vector output in the previous stage if the search is the second stage or later, and the virtual vector and the first stage search For example, if the search is a feature vector of a query and the second and subsequent searches, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. The virtual vector is searched for other search target information having a feature vector having the highest similarity in the search target information extracted in the process 3, and if there is corresponding search target information, the content is set as a next candidate. An information retrieval program for causing a computer to execute the process 5 to be output is recorded.
[0030]
The recording medium according to the fourteenth aspect of the present invention is registered in the information database, a process 1 for receiving a query input by a user, a process 2 for creating a feature vector from the received query, a feature vector of the created query, Calculates the similarity with the feature vector for each search target information, specifies the search target information having a feature vector that shows a similarity higher than a predetermined threshold, and indicates the index, contents, and feature vector data The processing 3 for extracting the feature vector and the feature vector group having the high similarity obtained in the processing 3 are started from the feature vector having the highest similarity, and the content is accepted by the user. If it branches inappropriately, and if it branches appropriately, another search pair having the feature vector with the highest similarity to the feature vector. When the information as a search target information for the next candidate, is unsuitable branched from the feature vectors, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. Search for other search target information having a feature vector having the highest similarity to this virtual vector, and if the corresponding search target information exists, the process using the content as the next candidate search target information is retrieved in process 3 Processing 4 for repeatedly creating a decision tree until all of the target information is covered, processing 5 for outputting data indicating the index and contents of one of the search target information extracted in processing 3, and outputting in processing 5 Process 6 for accepting a user's suitability / unsuitability judgment input for a search result, specifying search target information to be output next based on the decision tree created in Process 4, and outputting the contents of the corresponding search target information The information retrieval program which makes a computer execute is recorded.
[0031]
The recording medium of the invention of claim 15 is registered in the process 1 for receiving a query input by a user, the process 2 for creating a feature vector from the received query, the feature vector of the created query, and the information database. Calculates the similarity with the feature vector for each search target information, specifies the search target information having a feature vector that shows a similarity higher than a predetermined threshold, and indicates the index, contents, and feature vector data The processing 3 for extracting the feature vector and the feature vector group having the high similarity obtained in the processing 3 are started from the feature vector having the highest similarity, and the content is accepted by the user. If it branches inappropriately, and if it branches appropriately, another search pair having the feature vector with the highest similarity to the feature vector. When the information as a search target information for the next candidate, is unsuitable branched from the feature vectors, If the feature vector of the search target information is inappropriate and the first-stage search, the query feature vector. If the second-stage search or later, the difference between the feature vector output in the previous stage and the virtual vector and first-stage search. If the feature vector of the query is the second or later search, a virtual vector in which the difference between the feature vectors output in the previous step is opposite is calculated. Search for other search target information having a feature vector having the highest similarity to this virtual vector, and if the corresponding search target information exists, the process using the content as the next candidate search target information is retrieved in process 3 Processing 4 for creating a decision tree by repeating up to a predetermined stage for the target information, processing 5 for outputting data indicating the index and contents of one of the search target information extracted in processing 3, and output by processing 5 Accepting the user's suitability / unsuitability judgment input for the search result, specifying the search target information to be output next based on the decision tree created in process 4, outputting the contents of the corresponding search target information, and process 4 An information retrieval program for causing the computer to execute the process 6 for growing the decision tree by a predetermined level if the condition for growing the decision tree created in step 1 is reached is also recorded. It is.
[0032]
A sixteenth aspect of the present invention is the recording medium according to the thirteenth to fifteenth aspects, wherein the query and the search target information are text data.
[0033]
The output information selection device according to the invention of claim 17 is a feature data holding unit for holding feature vector data together with information representing contents of each group of search target information, and a search specified from the group of search target information A display information output unit that outputs information representing the contents of the target information, a feedback reception unit that receives a user's suitability / unsuitability input for the output search target information, and (1) the user's judgment If the input is appropriate, the group of search target information for any other search target information having a feature vector with the highest similarity to the feature vector of the search target information currently being output by the display information output unit If there is relevant search target information, information representing the contents is output to the display information output unit. (2) If the user's judgment input is inappropriate, The difference between the feature vector of the search target information currently being output by the display information output unit and the reference vector given in advance if the search is the first stage, and the feature vector output in the previous stage if the search is in the second stage or later. If the search is a reference vector given in advance if the search is a virtual vector and the first stage, a search is made for a virtual vector in which the difference between the feature vectors output in the previous stage is opposite in the search after the second stage, Search the group of search target information for other search target information having a feature vector having the highest similarity in the virtual vector, and output the display information that represents the contents of the search target information, if any. Output information selection unit to be output to the unit.
[0034]
The output information selection device according to the invention of claim 18 is characterized in that a feature data holding unit for holding feature vector data together with information representing the contents of each group of search target information, and a search specified from the group of search target information A display information output unit that outputs information representing the contents of target information, and a feature having the highest similarity to a reference vector given in advance to a group of search target information held in the feature data holding unit Start with search target information that has a vector, and if the content is accepted by the user, it branches properly, if it is not accepted, it branches inappropriately, and if it branches appropriately, the feature vector has the highest similarity When other search target information having a high feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched, If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The feature data holding unit is configured to search for other search target information having a feature vector having the highest similarity to the virtual vector, and, if there is corresponding search target information, use the content as the next candidate search target information. A decision tree creating unit that repeatedly creates a decision tree until all the search target information held in the list is covered, and a user's suitability input for the search target information output by the display information output unit A feedback processing unit that receives and specifies search target information to be output next based on the decision tree created by the decision tree creation unit, and causes the display information output unit to output information representing the content of the corresponding search target information It is equipped with.
[0035]
An output information selection device according to an invention of claim 19 includes a feature data holding unit for holding feature vector data together with information representing contents of each group of search target information, and a search specified from the group of search target information A display information output unit that outputs information representing the contents of target information, and a feature having the highest similarity to a reference vector given in advance to a group of search target information held in the feature data holding unit Start with search target information that has a vector, and if the content is accepted by the user, it branches properly, if it is not accepted, it branches inappropriately, and if it branches appropriately, the feature vector has the highest similarity When other search target information having a high feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched, If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The feature data holding unit is configured to search for other search target information having a feature vector having the highest similarity to the virtual vector, and, if there is corresponding search target information, use the content as the next candidate search target information. A decision tree creation unit that repeatedly creates a decision tree for a search target information group held in a predetermined stage, and accepts a user's suitability input for the search target information output by the display information output unit The search target information to be output next is specified based on the decision tree created by the decision tree creation unit, and the display information output unit outputs information representing the content of the corresponding search target information, and the decision A feedback processing unit that gives an instruction to the decision tree creation unit to grow the decision tree by a predetermined stage if the condition for growing the existing decision tree is reached. It is those with a.
[0036]
The output information selection method according to the invention of claim 20 comprises: Feature data holding unit Holding feature vector data together with information representing the contents of each group of search target information; and Display information output section Outputting information representing the contents of the search target information designated from the group of search target information; and Feedback reception department Acceptance input of user's appropriate / inappropriate for the search target information currently being output is received. (1) If the user's determination input is appropriate, the feature vector of the search target information currently being output Search the group of search target information for other search target information having a feature vector with the highest similarity, and output the information expressing the contents of the search target information if there is such information (2) the user If the decision input is not appropriate, The feature vector of the search target information currently being output and the reference vector given in advance if it is the first search, the difference between the feature vector output in the previous step and the virtual vector and the first If the search is a reference vector given in advance, and if it is a search after the second stage, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. A step of searching the group of search target information for other search target information having a feature vector having the highest similarity in the virtual vector, and outputting the information expressing the contents of the search target information, if any. 3.
[0037]
The output information selection method according to the invention of claim 21 comprises: Feature data holding unit Holding feature vector data together with information representing the contents of each group of search target information; and Display information output section Outputting information representing the contents of the search target information designated from the group of search target information; and Feedback reception department If the group of search target information starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, and the content is accepted by the user, it is not acceptable. If it branches inappropriately, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. If you want to If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The process of searching for other search target information having a feature vector having the highest similarity to the virtual vector and using the search target information as the next candidate search target information if there is such search target information. Step 3 in which a decision tree is repeatedly generated until all of the information is covered, and a user's suitability input / unsuitable judgment input for the currently output search target information is received, and the next search target to be output based on the decision tree Step 4 which specifies information and outputs information expressing the contents of the corresponding search target information.
[0038]
The output information selection method according to the invention of claim 22 Feature data holding unit Holding feature vector data together with information representing the contents of each group of search target information; and Display information output section Outputting information representing the contents of the search target information designated from the group of search target information; and Feedback reception department If the group of search target information starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, and the content is accepted by the user, it is not acceptable. If it branches inappropriately, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. If you want to If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The process of searching for other search target information having a feature vector having the highest similarity to the virtual vector and using the search target information as the next candidate search target information if there is such search target information. Step 3 for repeatedly creating a decision tree with respect to information, and accepting a user's appropriate / unsuitable judgment input for the currently output search target information, and the search target information to be output next based on the decision tree Step 4 is provided for identifying and outputting information expressing the contents of the corresponding search target information, and for growing the decision tree by a predetermined stage when a condition for growing the existing decision tree is reached.
[0039]
An output information selection program according to a twenty-third aspect of the invention includes a process 1 for holding feature vector data together with information representing the contents of each group of search target information, and search target information specified from the group of search target information The process 2 for outputting the information representing the contents of the content and the user's suitability input for the search target information currently being output are received. (1) When the user's judgment input is appropriate The group of search target information is searched for other search target information having a feature vector having the highest similarity in the feature vector of the search target information currently being output. If there is corresponding search target information, its contents (2) When the user's judgment input is inappropriate, The feature vector of the search target information currently being output and the reference vector given in advance if it is the first search, the difference between the feature vector output in the previous step and the virtual vector and the first If the search is a reference vector given in advance, and if it is a search after the second stage, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. A process of searching the group of search target information for other search target information having a feature vector having the highest similarity in the virtual vector, and outputting information expressing the contents of the search target information, if any. 3 is executed by the computer.
[0040]
The output information selection program according to the invention of claim 24 includes a process 1 for holding feature vector data together with information representing the contents of each group of search target information, and search target information specified from the group of search target information Starting with search target information having a feature vector having the highest similarity to a reference vector given in advance to the group of search target information, and outputting the information representing the content of Appropriately branches if it is accepted, and improperly branches if it is not accepted. If it branches appropriately, another search target information having the feature vector with the highest similarity to the feature vector is selected as the next candidate. When the search target information is inappropriately branched from the feature vector, If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The process of searching for other search target information having a feature vector having the highest similarity to the virtual vector and using the search target information as the next candidate search target information if there is such search target information. Processing 3 for repeatedly creating a decision tree until all of the information is covered, and a user's suitability input / unsuitable judgment input for the currently output search target information are received, and a search target to be output next based on the decision tree The processing 4 for specifying the information and outputting the information expressing the contents of the corresponding search target information is executed by the computer.
[0041]
The output information selection program according to the invention of claim 25 includes a process 1 for storing feature vector data together with information representing the contents of each group of search target information, and search target information designated from the group of search target information Starting with search target information having a feature vector having the highest similarity to a reference vector given in advance to the group of search target information, and outputting the information representing the content of Appropriately branches if it is accepted, and improperly branches if it is not accepted. If it branches appropriately, another search target information having the feature vector with the highest similarity to the feature vector is selected as the next candidate. When the search target information is inappropriately branched from the feature vector, If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The process of searching for other search target information having a feature vector having the highest similarity to the virtual vector and using the search target information as the next candidate search target information if there is such search target information. Processing 3 for creating a decision tree by repeating up to a predetermined stage for information, and accepting a user's suitability / unsuitability judgment input for search target information currently being output, and search target information to be output next based on the decision tree Identifying and outputting information representing the contents of the corresponding search target information, and causing the computer to execute a process 4 for growing the decision tree by a predetermined stage when a condition for growing the existing decision tree is reached It is.
[0042]
A recording medium according to a twenty-sixth aspect of the invention is a processing 1 for storing feature vector data together with information representing the contents of each group of search target information, and the contents of the search target information specified from the group of search target information The process 2 for outputting the information expressing the information, and the user's suitability input for the search target information currently being output are received. (1) If the user's judgment input is appropriate, The group of search target information is searched for other search target information having a feature vector having the highest similarity in the feature vector of the search target information being output, and the content of the corresponding search target information is expressed if there is any. (2) When the user's judgment input is inappropriate, The feature vector of the search target information currently being output and the reference vector given in advance if it is the first search, the difference between the feature vector output in the previous step and the virtual vector and the first If the search is a reference vector given in advance, and if it is a search after the second stage, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. A process of searching the group of search target information for other search target information having a feature vector having the highest similarity in the virtual vector, and outputting information expressing the contents of the search target information, if any. 3 is a record of an output information selection program for executing the process No. 3.
[0043]
The recording medium of the twenty-seventh aspect of the invention is a processing 1 for storing feature vector data together with information representing the contents of each group of search target information, and the contents of the search target information specified from the group of search target information Starting with search target information having a feature vector having the highest similarity to a reference vector given in advance to the group of search target information, and the contents are accepted by the user If it is determined to be acceptable, it branches inappropriately if it is not accepted, and if it branches appropriately, search for the next candidate for other search target information having the feature vector with the highest similarity to the feature vector When the target information is inappropriately branched from the feature vector, If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The process of searching for other search target information having a feature vector having the highest similarity to the virtual vector and using the search target information as the next candidate search target information if there is such search target information. Processing 3 for repeatedly creating a decision tree until all of the information is covered, and a user's suitability input / unsuitable judgment input for the currently output search target information are received, and a search target to be output next based on the decision tree An output information selection program that executes processing 4 for specifying information and outputting information that expresses the contents of the corresponding search target information is recorded.
[0044]
The recording medium of the invention of claim 28 is characterized in that the processing 1 for storing feature vector data together with information representing the contents of each group of search target information, and the contents of the search target information designated from the group of search target information Starting with search target information having a feature vector having the highest similarity to a reference vector given in advance to the group of search target information, and the contents are accepted by the user If it is determined to be acceptable, it branches inappropriately if it is not accepted, and if it branches appropriately, search for the next candidate for other search target information having the feature vector with the highest similarity to the feature vector When the target information is inappropriately branched from the feature vector, If the first-stage search is a feature vector having inappropriate search target information and the reference vector, the second-stage or later search is the difference between the feature vectors output in the previous stage, and the virtual vector and first-stage search is the above-described search. If the search is based on the reference vector and the second and subsequent stages, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. The process of searching for other search target information having a feature vector having the highest similarity to the virtual vector and using the search target information as the next candidate search target information if there is such search target information. Processing 3 for creating a decision tree by repeating up to a predetermined stage for information, and accepting a user's suitability / unsuitability judgment input for search target information currently being output, and search target information to be output next based on the decision tree Output information selection for specifying and outputting the information expressing the contents of the corresponding search target information and executing the process 4 for growing the decision tree by a predetermined stage when the condition for growing the existing decision tree is reached It is a recording of the program.
[0045]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. A feature of the present invention is that a group of documents obtained by performing a similarity search based on a feature vector is classified in consideration of the direction of the feature vector that the user has, and one user is selected from the group of documents presented as a search result. When one document is opened, the user's satisfaction level (that is, whether or not the user has requested the content) is input with YES and NO, and the next expansion is performed according to the user's satisfaction level. By automatically selecting and presenting the document to be displayed on the computer side, the direction of the feature vector is taken into consideration while taking the direction of the similarity vector as the basis, and the search result in accordance with the information requested by the user is obtained. It is to be able to expand sequentially.
[0046]
FIG. 1 shows a functional configuration of the information search system of the present invention. This information retrieval system is constituted by a computer system or a network system in which a plurality of computers are connected by a cable. The system includes, as a functional component, a query input unit 1 for inputting text data expressing a user's required content or text data of a document including the content required by the user as a query. A feature vector creation unit 2 that creates a feature vector from text data input from the input unit 1, a document database 3 that stores a large number of saved document indexes, text data, and feature vector data for each document; The similarity between the feature vector of the query created by the feature vector creation unit 2 and the feature vector for each document registered in the document database 3 is calculated, and a feature vector that shows a similarity higher than a predetermined threshold is calculated. Class that identifies documents and retrieves their indexes, text data, and feature vectors Degree calculation unit 4, and a retrieval result storage 5 for storing the retrieved data in the similarity calculation unit 4.
[0047]
The system also displays a high-similarity document index and text data extracted by the similarity calculation unit 4 and prints out the printout as necessary, and accepts a feedback operation on the search result by the user. , A feedback processing unit 8 for providing the corresponding processing to the search result holding unit 5 is provided.
[0048]
Next, an information search method by the information search system having the above configuration will be described with reference to FIG. In this information search system, when a user inputs text data expressing contents required by the user or text data of a document including contents required by the user as a query by the query input unit 1 (step S1). The vector creation unit 2 creates a feature vector from the input creative text data (step S3). The creation of the feature vector is based on tf * idf that uses Formula 1 and Formula 2 that have been widely used. Note that the feature vector creation method may be other methods listed in the conventional example.
Next, the similarity calculation unit 4 calculates the similarity between the feature vector of each of a large number of documents stored in the document database 3 and the feature vector of the query created by the feature vector creation unit 2 using the above-described equation (3). The document having the feature vector showing the similarity higher than the predetermined threshold is specified, the index, the text data, and the feature vector are extracted (step S5), and the search result is stored as the search result data. The data is temporarily stored in the unit 5 (step S7). The similarity calculation may also be performed by other methods listed in the conventional example.
[0049]
For the search result thus obtained, the search result output unit 7 displays information representing the contents of the document having the feature vector having the highest similarity to the query (step S9). The information representing the document content includes the document title, author name, date of issue, and summary. Also, all text data of the document can be expanded and displayed by the user's specification.
[0050]
In response to the display of the contents of this document, it is determined whether the user has requested the document with a UI device (for example, a keyboard or a pointing device such as a mouse) (YES) or not (NO). When input, the feedback processing unit 8 receives this determination input and performs one of the following YES / NO processes (step S11).
[0051]
When the user determines that the content of the document presented first is the content that the user has requested and enters YES, the feedback processing unit 8 sends the document currently being output to the search result holding unit 5. A document having a feature vector having the highest similarity to the feature vector is searched, and if there is a corresponding document, its content is displayed by the search result output unit 7 (steps S13, S19, S21).
[0052]
When the user determines that the content of the document presented first is not the content that the user has requested and inputs NO in step S11, the feedback processing unit 8 is currently outputting the feature vector of the query. A virtual vector opposite in direction to the feature vector of the document is calculated (step S15), and a document having a feature vector having the highest similarity to this virtual vector is extracted from the document group held in the search result holding unit 5 Then, the document contents are displayed on the search result output unit 7 (steps S17 to S21).
[0053]
In this way, if the user inputs a YES / NO determination for the content of the document displayed second in step S21, the processing in steps S11 to S21 is repeated. The same applies to the third and subsequent documents. Then, when there is no document having the corresponding feature vector in the search result holding unit 5, the output of the search result is terminated (branch to NO in step S19).
[0054]
The output form of the search result by the above information search method will be described with reference to FIGS. Assume that the document group obtained as a result of similarity calculation by the similarity calculation unit 4 is as shown in FIG. In other words, it is assumed that documents having feature vectors whose similarity exceeds a threshold with respect to query Q are A, D, B, C, and E in descending order of similarity. The feature vector of each document is assumed to be in various directions as shown in the figure.
[0055]
In the first search result display, as shown in FIG. 3, the content of the document A having the highest similarity to the query Q is displayed. Then,

selection buttons

11 and 12 of “relevant (appropriate)” and “not related (improper)” are displayed so that the user can make a determination input of YES / NO.
[0056]
Then, if the user confirms the content of the document A and operates the “relevant” button 11 because it is the content desired by the user, the feature vector of the document A in FIG. A document having a feature vector having the highest degree of similarity is searched from a document group as a search result, and document B is extracted. Then, as shown in FIG. 4A, the contents of the extracted document B are displayed.
[0057]
On the contrary, if the user confirms the content of the document A and operates the “not related” button 12, as described above, the query Q, which is the document based on the similarity, is referred to. A virtual vector A ′ in the direction opposite to the feature vector is obtained by the following equation (4).
[0058]
[Expression 4]

Here, α is a constant given in advance.
[0059]
The virtual vector A ′ is in the opposite direction to the feature vector of the document A with respect to the query Q as shown in FIG.
[0060]
Then, using this virtual vector A ′, a document having the closest feature vector is searched to extract document D, and the contents of document D are displayed as shown in FIG.
[0061]
Similarly, for example, the user operates the “Relevant” button while displaying the contents of the document B in FIG. 4A, or displays the contents of the document D in FIG. If the user operates the “Relevant” button in a state in which the user is in a state, the document having the feature vector having the highest similarity to the feature vectors of the document B and the document D is extracted from the search result document group.
[0062]
On the other hand, for example, when the user operates the “unrelated” button in the state where the content of the document B in FIG. 4A is displayed, the document A of the previous document A from which the document B was derived is displayed. With respect to the feature vector, a virtual vector B ′ in the direction opposite to the feature vector of the document B is obtained by Equation 4, and a document having a feature vector having the highest similarity to the virtual vector B ′ is searched. As a result, the document D is extracted and its contents are displayed.
[0063]
Hereinafter, by the same method, it is possible to preferentially present a document group having contents close to the contents required by the user, reflecting the user's judgment input.
[0064]
In the system shown in FIG. 1, depending on the performance of the computer used, the query input unit 1 to the similarity calculation unit 4 function as a server side, and search results are held on the client side connected to the LAN, the Internet, or other network. The system configuration in which the unit 5, the search result output unit 7, and the feedback processing unit 8 are provided can be achieved. Alternatively, the query input unit 1 to the search result holding unit 5 are server-side functions, and a system configuration is provided in which the search result output unit 7 and the feedback processing unit 8 are provided on the client side connected to the LAN or the Internet or other network. You can also
[0065]
Next, an information search system according to a second embodiment of this invention will be described with reference to FIG. The information search system according to the second embodiment includes, as functional components, a query input unit 1, a feature vector creation unit 2, a document database 3, similar to the system according to the first embodiment illustrated in FIG. A similarity calculation unit 4, a search result holding unit 5, a search result output unit 7, and a feedback processing unit 8 are provided. The system according to the present embodiment further creates a decision tree by a logical operation described later based on the feature vector for each document stored in the search result holding unit 5 and holds the created decision tree data as a search result. A decision tree creation unit 6 to be held in the unit 5 is provided.
[0066]
Next, an information search method by the information search system of the second embodiment having the above configuration will be described with reference to the flowchart of FIG. In this information retrieval system as well, as in the system of the first embodiment shown in FIG. 2, the text data expressing the content that the user needs or the text data of the document that includes the content that the user needs. Is input as a query (step S1), a feature vector is created from the input text data of the creator (step S3). Then, the similarity between the feature vector of each of a large number of documents stored in the document database 3 and the query feature vector is calculated, and a document having a feature vector having a similarity higher than a predetermined threshold is specified. These indexes, text data, and feature vectors are extracted (step S5) and temporarily stored in the search result holding unit 5 as search result data (step S7).
[0067]
With respect to the search results obtained in this way, the decision tree creation unit 6 which is a feature of the present embodiment creates a decision tree as shown in FIG. 8 by the processing described later and holds it in the search result holding unit 5 ( Step S8). Then, the search result output unit 7 displays information representing the contents of the document having the feature vector having the highest similarity to the query (step S9).
[0068]
In response to the display of the content of the document, when the user inputs a determination as to whether the user has requested the content using the UI device (YES) or not (NO), the feedback processing unit 8 accepts the determination input. Then, either of the following YES / NO processing is performed (step S11).
[0069]
When the user determines that the content of the document presented first is the content that the user has requested and enters YES, the feedback processing unit 8 refers to the decision tree held in the search result holding unit 5 Then, the next document to be transferred in the case of YES with respect to the currently output document is specified, and the content is displayed by the search result output unit 7 (steps S12, S16, S18).
[0070]
When the user determines that the content of the document presented first is not the content that the user has requested and enters NO in step S11, the feedback processing unit 8 determines that the search result holding unit 5 holds the decision. The tree is referred to, the next document to be transferred in the case of NO with respect to the currently output document is specified, and the content is displayed by the search result output unit 7 (steps S14, S16, S18).
[0071]
In this way, if the user inputs a YES / NO determination on the content of the document displayed second in step S18, the processing in steps S11 to S18 is repeated. The same applies to the third and subsequent documents. Then, when there is no corresponding document in the search result holding unit 5, the output of the search result is terminated (branch to NO in step S16).
[0072]
Next, the decision tree creation processing by the decision tree creation unit 6 will be described using the decision tree example of FIG. It is assumed that feature vector groups A to E having similarities within the threshold are searched for the feature vector Q of the query. It is assumed that the similarity to the query Q is higher as the distance from the query Q is closer, and the vector direction is shown as a relative position to the query Q.
[0073]
When the decision tree creation unit 6 starts with the feature vector A having the highest similarity with respect to the feature vectors A to E having the highest similarity obtained by the similarity calculation unit 4, the content is accepted by the user. If YES, branch to NO if not accepted.
[0074]
If YES, the document having the feature vector B having the highest similarity to the feature vector A is set as a document to be output next to the document having the feature vector A. On the other hand, when branching from the document of the feature vector A to NO, a virtual vector A ′ in the opposite direction to the feature vector A is assumed on the basis of the query Q as described above, and the virtual vector A ′ The document with the feature vector D at the closest position is determined as the document to be displayed next. Similarly, branching to YES and NO for each feature vector, the vectors to be output are sequentially extracted from the vector group of search results, and a decision tree as shown in FIG. 8 is created.
[0075]
Thus, when the user sequentially confirms the contents from the document group searched for the query Q, the feedback processing unit 8 is created by the decision tree creating unit 6 and held in the search result holding unit 5. With reference to the decision tree information shown in FIG. 8, the search result document is developed as follows.
[0076]
First, A is output as a document having a feature vector closest to the query Q. On the other hand, if the user performs a determination operation of NO, the feedback processing unit 8 next causes the search result output unit 7 to output the document of the feature vector D. If the user performs a determination operation for the feature vector D document as well, the document of the feature vector C is output next. If the answer is YES for the feature vector C document, the feature vector B document is output next. Finally, the E document of the feature vector is output.
[0077]
FIG. 9 is a flowchart generally showing a decision tree creation process by the decision tree creation unit 6. A document group having a feature vector having a similarity within a threshold is searched for the query Q that is input first, and the processing of the flowchart of FIG. 9 is repeated in the document group to create a decision tree. Keep it.
[0078]
According to the second embodiment, in a system including a server capable of high-speed processing in particular, a user who accesses the server as a client and receives a search service has his / her client machine with high-speed processing capability. Even those that do not include the search results can be browsed at high speed sequentially from the highest priority.
[0079]
Next, an information search system according to a third embodiment of the present invention will be described with reference to the flowchart of FIG. 10 and the operation explanatory diagram of FIG. The functional configuration of the information search system of the third embodiment is the same as that of the second embodiment shown in FIG. However, the decision tree creation unit 6 does not create a decision tree as shown in FIG. 8 for all the document groups extracted by the similarity calculation unit 4, but decides a predetermined level according to a user's decision input. It is characterized by growing trees.
[0080]
That is, if the document group having the feature vector within the threshold obtained by the similarity calculation unit 4 with respect to the query Q is as shown in FIG. As shown in a), for example, a decision tree of up to three stages (the number of stages is not specified and may be up to two stages, or may be four or more if the number of documents in the search result is large). Create (step S31).
[0081]
Then, if the user determines that there is a relationship (YES) with respect to the contents of the document A in the first stage (steps S33 to S37), the direction of document expansion is almost determined, so the decision is made as shown in FIG. A tree is grown (steps S39, S41, S43). Then, the document B having the feature vector with the shortest distance with respect to the feature vector of the document A is extracted and output (steps S45 and S35).
[0082]
In step S41, the condition for growing the decision tree is, for example, that an unpresented document group remains as a search result, but if the decision tree up to the present stage reaches the last document, Each time a candidate document of one stage or an appropriate number of stages decided in advance is presented, the decision tree is grown one stage or an appropriate number of stages. This advancement condition can be determined in advance, for example, when a document of one stage or an appropriate number of stages is presented.
[0083]
According to the third embodiment, the decision of the user is fed back, and a decision tree is created only for a system that is highly likely to be deployed. The calculation load can be reduced.
[0084]
In each of the above embodiments, the search service and the arithmetic processing are all executed on the server machine. For example, the processing functions of the search result holding unit 5 to the feedback processing unit 8 including the decision tree creating unit 6 All or a part of the above can be provided on the client machine side and connected to the server machine side via a network.
[0085]
Further, in the system shown in FIG. 6, the query input unit 1 to the similarity calculation unit 4 function as a server side depending on the performance of the computer to be used, and search results are held on the client side connected to the LAN, the Internet, or other network. A system configuration in which the unit 5, the decision tree creation unit 6, the search result output unit 7, and the feedback processing unit 8 can be provided. Alternatively, the query input unit 1 to the decision tree creation unit 5 are functions on the server side, and a system configuration is provided in which the search result output unit 7 and the feedback processing unit 8 are provided on the client side connected to the LAN or the Internet or other network. You can also.
[0086]
Note that the present invention is a program for realizing the processing functions of the information search system according to each of the above embodiments, a computer-readable recording medium storing the program, and further executed by the computer system by incorporating the program into the computer. The information search method is also within the technical scope.
[0087]
Furthermore, in each of the above embodiments, the document information search based on the text data is exemplified. However, the present invention is not limited to this, and the present invention is also applicable to the search for music, movie, and image information whose contents are expressed as document information. Can do. In addition, regarding music information and image information, the data is converted into feature vectors and registered in the database, and text data such as indexes, titles, authors, publishers, and sales companies are also registered, and the music information is used as a query. The data representing the image information is directly input and converted into feature vectors, music information and image information having high similarity between the feature vectors are extracted, and the music information or image information data, feature vectors, indexes, titles are extracted. Etc. can be retrieved from the database, it is possible to perform a search using a decision tree similar to the above embodiments.
[0088]
【The invention's effect】
As described above, according to the present invention, an information group having a feature vector similarity higher than a threshold for a query input by a user is extracted, and the content of information having a feature vector with the highest similarity among them is extracted. If the user determines that the information is appropriate (YES), information having a feature vector having a high similarity to the feature vector of the information is then displayed, and conversely, the information is determined to be inappropriate (NO). If this is the case, use a virtual vector in the opposite direction to the feature vector, and then present the information with a feature vector with a high degree of similarity to the next step. Therefore, even if there is a large amount of information that has a high degree of similarity with the information that the user needs, the user can determine whether the information is appropriate or not. Can be successively presented the information of the contents of The is in need automatically selected and.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a functional configuration of an information search system according to a first embodiment of this invention.
FIG. 2 is a flowchart of search processing and search result output processing according to the embodiment.
FIG. 3 is an explanatory diagram of a display example of search processing results according to the embodiment.
FIG. 4 is an explanatory diagram showing transition of search result display output according to the embodiment.
FIG. 5 is an explanatory diagram of search processing output processing according to the embodiment.
FIG. 6 is a block diagram showing a functional configuration of an information search system according to a second embodiment of this invention.
FIG. 7 is a flowchart of search processing and search result output processing of the embodiment.
FIG. 8 is an explanatory diagram of a decision tree creation process for a search result according to the embodiment.
FIG. 9 is a flowchart of decision tree creation processing for a search result according to the embodiment.
FIG. 10 is a flowchart of decision tree creation processing for a search result by the information search system according to the third embodiment of this invention;
FIG. 11 is an explanatory diagram of a decision tree creation process for a search result according to the embodiment.
FIG. 12 is an explanatory diagram illustrating an output example of a search processing result based on the similarity of a conventional example.
FIG. 13 is an explanatory diagram of search processing based on similarity in a conventional example.
[Explanation of symbols]
1 Query input part
2 Feature vector creation unit
3 Document database
4 Similarity calculation section
5 Search result holding part
6 decision tree creation department
7 Search result output section
8 Feedback processing section

Claims

A query input unit for receiving a query input by the user;
A feature vector creation unit that creates a feature vector from a query received by the query input unit;
An information database in which a large number of search target information indexes, data representing contents, and feature vector data are registered;
The feature which shows the similarity higher than a predetermined threshold value by calculating the similarity between the feature vector of the query created by the feature vector creation unit and the feature vector for each search target information registered in the information database A similarity calculation unit that identifies search target information having vectors, extracts their indexes, data representing contents, and feature vector data;
A search result holding unit for storing data extracted by the similarity calculation unit, a search result output unit for outputting data indicating the index and content of one of the search target information extracted by the similarity calculation unit,
A user's suitability input for the search result output from the search result output unit is accepted. (1) In the case of a proper input, search target information currently being output to the search result holding unit Search for other search target information having a feature vector with the highest similarity to the feature vector of the search, and if there is relevant search target information, the content is output as the next candidate by the search result output unit, and (2) unsuitable In the case of input, the difference between the feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, the difference between the feature vector output in the previous stage if it is the second stage search, and the virtual vector a feature vector of the query if the first stage of the search, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, kind to this virtual vector The search result information stored in the search result holding unit is searched for other search target information having the highest feature vector, and if there is corresponding search target information, the search result information is used as the next candidate. An information search system comprising a feedback processing unit to be output by an output unit.

A query input unit for receiving a query input by the user;
A feature vector creation unit that creates a feature vector from a query received by the query input unit;
An information database in which a large number of search target information indexes, data representing contents, and feature vector data are registered;
The feature which shows the similarity higher than a predetermined threshold value by calculating the similarity between the feature vector of the query created by the feature vector creation unit and the feature vector for each search target information registered in the information database A similarity calculation unit that identifies search target information having vectors, extracts their indexes, data representing contents, and feature vector data;
A search result holding unit for storing data extracted by the similarity calculation unit;
Appropriate when the feature vector group obtained by the similarity calculation unit starts with the feature vector with the highest similarity and the contents are accepted by the user, and inappropriate when the content is not accepted by the user. If the target vector is branched appropriately and other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. If the search is not suitable, the feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the search after the second step, and the virtual vector and the first search if the feature vector of the query, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the similarity to this virtual vector All the search results stored in the search result holding unit are searched for other search target information having a higher feature vector, and if there is corresponding search target information, the content is set as the next candidate search target information. A decision tree creation unit that repeatedly creates a decision tree until the target information is covered;
A search result output unit that outputs data indicating the index and contents of one piece of search target information extracted by the similarity calculation unit, and whether the user is appropriate or inappropriate for the search results output by the search result output unit A feedback processing unit that receives a determination input, specifies search target information to be output next based on the decision tree created by the decision tree creation unit, and causes the search result output unit to output the content of the corresponding search target information; An information retrieval system comprising:

A query input unit for receiving a query input by the user;
A feature vector creation unit that creates a feature vector from a query received by the query input unit;
An information database in which a large number of search target information indexes, data representing contents, and feature vector data are registered;
The feature which shows the similarity higher than a predetermined threshold value by calculating the similarity between the feature vector of the query created by the feature vector creation unit and the feature vector for each search target information registered in the information database A similarity calculation unit that identifies search target information having vectors, extracts their indexes, data representing contents, and feature vector data;
A search result holding unit for storing data extracted by the similarity calculation unit, and a feature vector group having a high similarity obtained by the similarity calculation unit, starting from a feature vector having the highest similarity, Appropriately branches when it is accepted by the user, improperly branches when it is not accepted, and when it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector If the candidate search target information and inappropriately branch from the feature vector,
(2) In the case of inappropriate input, the difference between the feature vector of the inappropriate search target information and the feature vector of the query if it is the first stage search, and the difference between the feature vectors output in the previous stage if the search is after the second stage If the search is a virtual vector and the first stage search, a feature vector of the query is calculated. If the second stage search or later, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. Search target information stored in the search result holding unit for searching for other search target information having the highest feature vector and, if there is corresponding search target information, setting the content as the next candidate search target information A decision tree creation unit that repeatedly creates a decision tree for a group up to a predetermined stage;
A search result output unit that outputs data indicating the index and contents of one piece of search target information extracted by the similarity calculation unit, and whether the user is appropriate or inappropriate for the search results output by the search result output unit Accepting a determination input, specifying search target information to be output next based on the decision tree created by the decision tree creation unit, causing the search result output unit to output the contents of the corresponding search target information, and determining the decision An information search system comprising: a feedback processing unit that gives an instruction to the tree creation unit to grow the decision tree by a predetermined stage when a condition for growing an existing decision tree is reached.

The information search system according to claim 1, wherein the query and the search target information are text data.

Step 1 where the query input unit accepts a query input by the user;
Step 2 in which a feature vector creation unit creates a feature vector from the accepted query;
The similarity calculation unit calculates the similarity between the feature vector of the query created by the feature vector creation unit and the feature vector for each search target information registered in the information database, and is higher than a predetermined threshold value. Step 3 for identifying search target information having a feature vector indicating a similarity, and extracting their index, data representing content, and feature vector data;
Step 4 in which the search result output unit outputs data indicating the index and content of one of the search target information extracted in Step 3;
The feedback processing unit accepts a user's appropriate / inappropriate determination input for the search result output in step 4, and (1) in the case of appropriate input, the similarity to the feature vector of the search target information currently being output is The search target information retrieved in step 3 is searched for other search target information having the highest feature vector, and if there is corresponding search target information, the content is output as the next candidate. (2) Inappropriate In the case of input, the difference between the feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, the difference between the feature vector output in the previous stage if the search is after the second stage, and the virtual vector a feature vector of the query if the first stage of the search, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, this virtual vector Search the search target information retrieved in step 3 for other search target information having the feature vector with the highest similarity, and if there is such search target information, output the contents as the next candidate An information search method comprising: step 5;

Step 1 where the query input unit accepts a query input by the user;
Step 2 in which a feature vector creation unit creates a feature vector from the accepted query;
The feature which shows the similarity higher than a predetermined threshold value by calculating the similarity between the feature vector of the query created by the feature vector creation unit and the feature vector for each search target information registered in the information database Step 3 for identifying search target information having vectors and retrieving their index, data representing contents, and feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in step 3, and branch off inappropriately if the contents are accepted by the user and not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector , The feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second and subsequent search, and the query feature vector, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest degree of similarity to this virtual vector Search for other search target information having a feature vector, and if there is applicable search target information, repeat the process of setting the content as the next candidate search target information until all the search target information extracted in step 3 is covered. Step 4 to create a decision tree
Step 5 for outputting data indicating the index and content of one piece of search target information extracted in Step 3,
Accepting the user's appropriate / inappropriate judgment input for the search result output in step 5, identifies the search target information to be output next based on the decision tree created in step 4, and the contents of the corresponding search target information And a step 6 for outputting the information.

Step 1 where the query input unit accepts a query input by the user;
Step 2 in which a feature vector creation unit creates a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database Step 3 for extracting the index, data representing contents, and feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in step 3, and branch off inappropriately if the contents are accepted by the user and not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector , The feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second and subsequent search, and the query feature vector, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest degree of similarity to this virtual vector Search for other search target information having a feature vector, and if there is applicable search target information, the process of setting the content as the next candidate search target information is repeatedly determined for the search target information extracted in step 3 up to a predetermined stage. Step 4 to create the tree,
Step 5 for outputting data indicating the index and content of one piece of search target information extracted in Step 3,
Accepting the user's appropriate / inappropriate judgment input for the search result output in step 5, identifies the search target information to be output next based on the decision tree created in step 4, and the contents of the corresponding search target information And a step 6 for growing the decision tree by a predetermined level if the condition for growing the decision tree created in step 4 is reached.

The information search method according to claim 5, wherein the query and the search target information are text data.

Process 1 for accepting a query input by the user;
Process 2 for creating a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database And processing 3 for extracting the index, the data representing the contents, and the feature vector data;
Processing 4 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3;
Accepting a user's suitability input for the search result output in process 4 is received. (1) In the case of a proper input, the feature vector having the highest similarity to the feature vector of the search target information currently being output The search target information retrieved in the process 3 is searched for other search target information having the above, and if there is corresponding search target information, the content is output as the next candidate,
(2) In the case of inappropriate input, the difference between the feature vector of the search target information currently being output and the feature vector of the query if it is the first stage search, and the feature vector output in the previous stage if the search is after the second stage If the search is a virtual vector and the first stage search, the feature vector of the query is calculated. If the search is in the second stage or later, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite is calculated. A process 5 for searching the search target information extracted in the process 3 for other search target information having the highest feature vector and outputting the contents as the next candidate if there is such search target information. An information retrieval program to be executed by a computer.

Process 1 for accepting a query input by the user;
Process 2 for creating a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database And processing 3 for extracting the index, the data representing the contents, and the feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in Process 3, and branch off inappropriately when the contents are accepted by the user and when the contents are not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector , The feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second and subsequent search, and the query feature vector, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest JP similarity to this virtual vector Search for other search target information having a vector, and if there is applicable search target information, repeat the process of setting the content as the next candidate search target information until all the search target information extracted in process 3 is covered. Processing 4 for creating a decision tree;
Processing 5 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3,
Accepting the user's suitability input for the search result output in process 5, specifying the search target information to be output next based on the decision tree created in process 4, and the contents of the corresponding search target information An information retrieval program for causing a computer to execute processing 6 for outputting.

Process 1 for accepting a query input by the user;
Process 2 for creating a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database And processing 3 for extracting the index, the data representing the contents, and the feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in Process 3, and branch off inappropriately when the contents are accepted by the user and when the contents are not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector , The feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second and subsequent search, and the query feature vector, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest JP similarity to this virtual vector The process of searching for other search target information having a vector and using the search target information as the next candidate search information if there is such search target information is repeated until the predetermined stage for the search target information extracted in process 3 Processing 4 to create
Processing 5 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3,
Accepting a user's suitability input for the search result output in process 5, specifying the search target information to be output next based on the decision tree created in process 4, and the contents of the corresponding search target information And when the condition for growing the decision tree created in process 4 is reached, the information retrieval program causes the computer to execute process 6 for growing the decision tree by a predetermined level.

The information search program according to claim 9, wherein the query and the search target information are text data.

Process 1 for accepting a query input by the user;
Process 2 for creating a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database And processing 3 for extracting the index, the data representing the contents, and the feature vector data;
Processing 4 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3;
Accepting a user's suitability / unsuitability judgment input for the search result output in process 4, and (1) if the input is appropriate, the feature vector having the highest similarity to the feature vector of the search target information currently being output The search target information retrieved in the process 3 is searched for any other search target information having the above, and if there is corresponding search target information, the content is output as the next candidate. (2) In case of inappropriate input Is the feature vector of the search target information currently being output and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second or subsequent search, and the virtual vector and the first search feature vectors of the query if the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest feature vector similarity to this virtual vector Information search that searches the search target information extracted in the process 3 for other search target information having a message, and outputs the content as the next candidate if there is relevant search target information A computer-readable recording medium on which a program is recorded.

Process 1 for accepting a query input by the user;
Process 2 for creating a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database And processing 3 for extracting the index, the data representing the contents, and the feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in Process 3, and branch off inappropriately when the contents are accepted by the user and when the contents are not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector , The feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second and subsequent search, and the query feature vector, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest JP similarity to this virtual vector Search for other search target information having a vector, and if there is applicable search target information, repeat the process of setting the content as the next candidate search target information until all the search target information extracted in process 3 is covered. Processing 4 for creating a decision tree;
Processing 5 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3,
Accepting the user's appropriate / inappropriate judgment input for the search result output in the process 5, specifies the search target information to be output next based on the decision tree created in the process 4, and the contents of the corresponding search target information A computer-readable recording medium on which an information retrieval program for executing the process 6 is output.

Process 1 for accepting a query input by the user;
Process 2 for creating a feature vector from the accepted query;
Search target information having a feature vector that shows a similarity higher than a predetermined threshold by calculating the similarity between the feature vector of the created query and the feature vector for each search target information registered in the information database And processing 3 for extracting the index, the data representing the contents, and the feature vector data;
Start with the feature vector with the highest similarity for the feature vector group with the highest similarity found in Process 3, and branch off inappropriately when the contents are accepted by the user and when the contents are not accepted. In the case of appropriately branching, other search target information having the feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and in the case of inappropriately branching from the feature vector , The feature vector of the search target information and the feature vector of the query if it is the first search, the difference between the feature vector output in the previous step if it is the second and subsequent search, and the query feature vector, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the highest JP similarity to this virtual vector The process of searching for other search target information having a vector and using the search target information as the next candidate search information if there is such search target information is repeated until the predetermined stage for the search target information extracted in process 3 Processing 4 to create
Processing 5 for outputting data indicating the index and contents of one piece of search target information extracted in processing 3,
Accepting a user's suitability input for the search result output in process 5, specifying the search target information to be output next based on the decision tree created in process 4, and the contents of the corresponding search target information And a computer-readable recording medium on which an information search program is executed for executing the process 6 for growing the decision tree by a predetermined stage if the condition for growing the decision tree generated in the process 4 is reached.

The computer-readable recording medium storing the information search program according to claim 13, wherein the query and search target information are text data.

A feature data holding unit that holds feature vector data together with information representing the contents of each group of search target information;
A display information output unit that outputs information representing the contents of the search target information designated from the group of search target information;
A feedback accepting unit that accepts a user's suitability input for the output search target information;
(1) When the judgment input of the user is appropriate, there is no other search target information having a feature vector having the highest similarity to the feature vector of the search target information currently being output by the display information output unit. Or the group of search target information is searched, and if there is corresponding search target information, information representing the contents is output to the display information output unit, and (2) if the user's judgment input is inappropriate The difference between the feature vector of the search target information currently being output by the display information output unit and the reference vector given in advance if it is the first stage search, and the feature vector output in the previous stage if it is the second or later search When the reference vector that is given in advance as long as the search of the virtual vector and the initial stage, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the virtual The group of search target information is searched for other search target information having a feature vector having the highest similarity in the kuttle, and if there is corresponding search target information, information representing the contents is displayed in the display information output unit. An output information selection device comprising an output information selection unit for outputting.

A feature data holding unit that holds feature vector data together with information representing the contents of each group of search target information;
A display information output unit that outputs information representing the contents of the search target information designated from the group of search target information;
When a group of search target information held in the feature data holding unit starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, the content is accepted by the user If it is appropriate, if it is not accepted, it branches improperly, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is searched for the next candidate search target information If the feature vector branches inappropriately, the feature vector of the inappropriate search target information and the reference vector for the first stage search, the feature vector output in the previous stage for the second and subsequent searches And the difference between the virtual vector and the reference vector in the first stage search, and the difference in the feature vector output in the previous stage in the second and subsequent search. Calculated the virtual vector, the processing degree of similarity to a virtual vector search for other search target information having the highest feature vectors, the contents if the search target information corresponding to the search target information for the next candidate, A decision tree creating unit that repeatedly creates a decision tree until all the search target information held in the feature data holding unit is covered, and a user's suitability for the search target information output by the display information output unit Accepting an inappropriate determination input, specifying search target information to be output next based on the decision tree created by the decision tree creation unit, and displaying the information representing the contents of the corresponding search target information by the display information output unit An output information selection device comprising a feedback processing unit for outputting.

A feature data holding unit that holds feature vector data together with information representing the contents of each group of search target information;
A display information output unit that outputs information representing the contents of the search target information designated from the group of search target information;
When a group of search target information held in the feature data holding unit starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, the content is accepted by the user If it is appropriate, if it is not accepted, it branches improperly, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is searched for the next candidate search target information If the feature vector branches inappropriately, the feature vector of the inappropriate search target information and the reference vector for the first stage search, the feature vector output in the previous stage for the second and subsequent searches And the difference between the virtual vector and the reference vector in the first stage search, and the difference in the feature vector output in the previous stage in the second and subsequent search. Calculated the virtual vector, the processing degree of similarity to a virtual vector search for other search target information having the highest feature vectors, the contents if the search target information corresponding to the search target information for the next candidate, A decision tree creating unit that repeatedly creates a decision tree for a search target information group held in the feature data holding unit until a predetermined stage;
Accepting the user's suitability input for the search target information output by the display information output unit, specifying search target information to be output next based on the decision tree created by the decision tree creation unit, The display information output unit outputs information representing the content of the corresponding search target information, and if the decision tree creation unit reaches a condition for growing an existing decision tree, the decision tree creation unit An output information selection device comprising: a feedback processing unit that gives an instruction to grow a decision tree by a predetermined stage.

Step 1 in which a feature data holding unit holds feature vector data together with information representing the contents of each group of search target information;
A step 2 in which a display information output unit outputs information expressing the contents of the search target information designated from the group of search target information;
The feedback accepting unit accepts a user's appropriate / inappropriate determination input for the search target information currently being output. (1) When the user's determination input is appropriate, The group of search target information is searched for other search target information having a feature vector having the highest similarity in the feature vector possessed, and if there is corresponding search target information, information expressing the contents is output ( 2) If the user's judgment input is inappropriate, the feature vector of the search target information currently being output and the reference vector given in advance if it is the first stage search, if it is the search after the second stage The difference between the feature vector output in the previous stage and the difference between the virtual vector and the reference vector given in advance for the first stage search and the feature vector output in the previous stage for the second and subsequent search are opposite. Calculated virtual vector, retrieves the there are no other search target information group of search target information having the highest feature vector similarity to this virtual vector, the relevant search target information representing the contents if An output information selection method comprising: step 3 of outputting information.

Step 1 in which a feature data holding unit holds feature vector data together with information representing the contents of each group of search target information;
A step 2 in which a display information output unit outputs information expressing the contents of the search target information designated from the group of search target information;
Suitable when the feedback reception unit starts with search target information having a feature vector having the highest similarity to a reference vector given in advance for the group of search target information, and the content is accepted by the user. If it is determined that the feature vector is inappropriately branched and if it is appropriately branched, another search target information having a feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector In case of inappropriate branching, the difference between the feature vector possessed by the inappropriate search target information and the reference vector in the first stage search and the difference between the feature vector output in the previous stage in the second stage search and the virtual If the search is a vector and the first stage, the reference vector, and if the search is after the second stage, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite Was calculated, the processing degree of similarity to a virtual vector search for other search target information having the highest feature vectors, the contents if the search target information corresponding to the search target information for the next candidate, the group Step 3 for creating a decision tree repeatedly until all of the search target information is covered, and accepting a user's suitability input for the search target information currently being output, and outputting the next based on the decision tree An output information selection method comprising: step 4 for identifying information to be searched and outputting information representing the content of the corresponding search target information.

Step 1 in which a feature data holding unit holds feature vector data together with information representing the contents of each group of search target information;
A step 2 in which a display information output unit outputs information expressing the contents of the search target information designated from the group of search target information;
Suitable when the feedback reception unit starts with search target information having a feature vector having the highest similarity to a reference vector given in advance for the group of search target information, and the content is accepted by the user. If it is determined that the feature vector is inappropriately branched and if it is appropriately branched, another search target information having a feature vector having the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector In case of inappropriate branching, the difference between the feature vector possessed by the inappropriate search target information and the reference vector in the first stage search and the difference between the feature vector output in the previous stage in the second stage search and the virtual If the search is a vector and the first stage, the reference vector, and if the search is after the second stage, a virtual vector in which the difference between the feature vectors output in the previous stage is opposite Was calculated, the processing degree of similarity to a virtual vector search for other search target information having the highest feature vectors, the contents if the search target information corresponding to the search target information for the next candidate, the group Step 3 for creating a decision tree by repeating the search target information up to a predetermined stage;
Accepts user's appropriate / inappropriate judgment input for search target information currently being output, specifies search target information to be output next based on the decision tree, and outputs information expressing the contents of the corresponding search target information And a step 4 for growing the decision tree by a predetermined level if a condition for growing the existing decision tree is reached.

A process 1 for holding feature vector data together with information representing the contents of each group of search target information;
A process 2 for outputting information representing the content of search target information designated from the group of search target information;
Acceptance input of user's appropriate / inappropriate for the search target information currently being output is received. (1) If the user's determination input is appropriate, the feature vector of the search target information currently being output Search the group of search target information for other search target information having a feature vector with the highest similarity, and output the information expressing the contents of the search target information if there is such information (2) the user If the search input is inappropriate, the feature vector of the search target information currently being output and the reference vector given in advance if the search is the first stage, and the previous stage if the search is after the second stage the difference between the feature vectors, to calculate the virtual vector reference vector that is given in advance as long as the search of the virtual vector and the first stage, the difference between the feature vectors output at the preceding stage as long as the second and subsequent stages search is opposite Processing 3 for searching the group of search target information for other search target information having the feature vector having the highest similarity in the virtual vector, and outputting the information expressing the contents of the search target information if there is such search target information 3 Output information selection program that causes a computer to execute

A process 1 for holding feature vector data together with information representing the contents of each group of search target information;
A process 2 for outputting information representing the content of search target information designated from the group of search target information;
If the group of search target information starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, and the content is accepted by the user, it is not acceptable. If it branches inappropriately, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. In the case of the search, the difference between the feature vector of the inappropriate search target information and the reference vector in the first stage search, the difference between the feature vector output in the previous stage in the second stage search, the virtual vector and the first stage search. said reference vector if the search, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the virtual base Search for other search target information having the feature vector having the highest similarity to the toll, and if there is corresponding search target information, the process of setting the content as the next candidate search target information is performed for all of the group of search target information. Process 3 to create a decision tree repeatedly until
Accepts user's appropriate / inappropriate judgment input for search target information currently being output, specifies search target information to be output next based on the decision tree, and outputs information expressing the contents of the corresponding search target information An output information selection program for causing a computer to execute processing 4 to be executed.

A process 1 for holding feature vector data together with information representing the contents of each group of search target information;
A process 2 for outputting information representing the content of search target information designated from the group of search target information;
If the group of search target information starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, and the content is accepted by the user, it is not acceptable. If it branches inappropriately, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. In the case of the search, the difference between the feature vector of the inappropriate search target information and the reference vector in the first stage search, the difference between the feature vector output in the previous stage in the second stage search, the virtual vector and the first stage search. said reference vector if the search, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the virtual base A process for searching for other search target information having a feature vector having the highest similarity to a toll and using the search target information as the next candidate if there is relevant search target information is predetermined for the group of search target information. Processing 3 for creating a decision tree by repeating the steps;
Accepts user's appropriate / inappropriate judgment input for search target information currently being output, specifies search target information to be output next based on the decision tree, and outputs information expressing the contents of the corresponding search target information And an output information selection program for causing the computer to execute a process 4 for growing the decision tree by a predetermined stage when the condition for growing the existing decision tree is reached.

A process 1 for holding feature vector data together with information representing the contents of each group of search target information;
A process 2 for outputting information representing the content of search target information designated from the group of search target information;
Acceptance input of user's appropriate / inappropriate for the search target information currently being output is received. (1) If the user's determination input is appropriate, the feature vector of the search target information currently being output Search the group of search target information for other search target information having a feature vector with the highest similarity, and output the information expressing the contents of the search target information if there is such information (2) the user If the search input is inappropriate, the feature vector of the search target information currently being output and the reference vector given in advance if the search is the first stage, and the previous stage if the search is after the second stage the difference between the feature vectors, to calculate the virtual vector reference vector that is given in advance as long as the search of the virtual vector and the first stage, the difference between the feature vectors output at the preceding stage as long as the second and subsequent stages search is opposite Processing 3 for searching the group of search target information for other search target information having the feature vector having the highest similarity in the virtual vector, and outputting the information representing the contents of the search target information if there is such search target information 3 A computer-readable recording medium on which an output information selection program for executing the above is recorded.

A process 1 for holding feature vector data together with information representing the contents of each group of search target information;
A process 2 for outputting information representing the content of search target information designated from the group of search target information;
If the group of search target information starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, and the content is accepted by the user, it is not acceptable. If it branches inappropriately, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. In the case of the search, the difference between the feature vector of the inappropriate search target information and the reference vector in the first stage search, the difference between the feature vector output in the previous stage in the second stage search, the virtual vector and the first stage search. said reference vector if the search, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the virtual base Search for other search target information having the feature vector having the highest similarity to the toll, and if there is corresponding search target information, the process of setting the content as the next candidate search target information is performed for all of the group of search target information. Process 3 to create a decision tree repeatedly until
Accepts user's appropriate / inappropriate judgment input for search target information currently being output, specifies search target information to be output next based on the decision tree, and outputs information expressing the contents of the corresponding search target information A computer-readable recording medium on which an output information selection program for executing the process 4 is recorded.

A process 1 for holding feature vector data together with information representing the contents of each group of search target information;
A process 2 for outputting information representing the content of search target information designated from the group of search target information;
If the group of search target information starts with search target information having a feature vector having the highest similarity to a reference vector given in advance, and the content is accepted by the user, it is not acceptable. If it branches inappropriately, and if it branches appropriately, other search target information having the feature vector with the highest similarity to the feature vector is set as the next candidate search target information, and the feature vector is inappropriately branched. In the case of the search, the difference between the feature vector of the inappropriate search target information and the reference vector in the first stage search, the difference between the feature vector output in the previous stage in the second stage search, the virtual vector and the first stage search. said reference vector if the search, the difference between the feature vectors output in the previous stage if the second and subsequent stages search calculated virtual vector becomes opposite, the virtual base A process for searching for other search target information having a feature vector having the highest similarity to a toll and using the search target information as the next candidate if there is relevant search target information is predetermined for the group of search target information. Processing 3 for creating a decision tree by repeating the steps;
Accepts user's appropriate / inappropriate judgment input for search target information currently being output, specifies search target information to be output next based on the decision tree, and outputs information expressing the contents of the corresponding search target information And a computer-readable recording medium on which is recorded an output information selection program for executing the process 4 for growing the decision tree by a predetermined level if the condition for growing the existing decision tree is reached.