JP3315230B2

JP3315230B2 - Similarity search device

Info

Publication number: JP3315230B2
Application number: JP35110693A
Authority: JP
Inventors: 智恵子小林
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-12-29
Filing date: 1993-12-29
Publication date: 2002-08-19
Anticipated expiration: 2017-08-19
Also published as: JPH07200614A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、類似データを検索する
類似検索装置に係わり、特に、既存のデータの中から検
索意図を正確に反映した類似データを自動検索、抽出す
る類似検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a similarity retrieval apparatus for retrieving similar data, and more particularly to a similarity retrieval apparatus for automatically retrieving and extracting similar data that accurately reflects a search intention from existing data.

【０００２】[0002]

【従来の技術】蓄積しているデータ中から、ある特定の
データに類似するデータを検索する装置を類似検索装置
と言い、一般の情報処理分野におけるデータベース検
索、人工知能分野におけるＣＢＲ（事例ベース推論）、
画像処理の画像検索や自然言語処理の辞書検索など、様
々な分野に適用されている。2. Description of the Related Art A device for retrieving data similar to specific data from stored data is called a similarity retrieval device. ),
It is applied to various fields such as image search of image processing and dictionary search of natural language processing.

【０００３】図２４には、全てのデータを検索対象とす
る従来の類似検索装置の概略構成図の一例を示す。この
類似検索装置では、まず、キーボードやマウス等からな
る入力装置１７１に、問題事例として与えられる情報を
入力する。入力された情報は、入出力制御部１７２で内
容が解読され、解読内容が取り込まれる。このデータを
もとに推論処理部１７３は、検索部１７４に検索を指示
する。検索部１７４は、属性と属性値のペアで表現され
たデータおよび類似度のデータを蓄えているデータベー
ス１７５から問題事例に関する全てのデータを呼び出
し、事例間の類似度ｒ＝Σ（属性の重み×属性値間の類
似度ｄ）で表されるような、事例間の類似度の大小を比
較し類似検索を行う。次に、この検索結果を推論処理部
１７３で導きだし、入出力制御部１７２はＣＲＴディス
プレイ装置等からなる出力装置１７６に検索結果を出力
する。FIG. 24 shows an example of a schematic configuration diagram of a conventional similarity search apparatus for searching all data. In this similarity search device, first, information given as a problem case is input to an input device 171 including a keyboard, a mouse, and the like. The input information is decrypted by the input / output control unit 172, and the decrypted content is captured. Based on this data, the inference processing unit 173 instructs the search unit 174 to search. The search unit 174 calls up all the data related to the problem case from the database 175 storing the data represented by the attribute-attribute value pair and the data of the similarity, and the similarity r = Σ (weight of the attribute ×× A similarity search is performed by comparing the magnitude of the similarity between the cases as represented by the similarity d) between the attribute values. Next, the search result is derived by the inference processing unit 173, and the input / output control unit 172 outputs the search result to an output device 176 including a CRT display device or the like.

【０００４】ところが、このような従来の類似検索装置
では、データ数が多量の場合、類似検索における計算量
と時間は莫大なものとなってしまう。However, in such a conventional similar search apparatus, when the number of data is large, the amount of calculation and the time required for the similar search become enormous.

【０００５】一方、図２５には、多量のデータを人為的
に作られたインデックスで絞り込んでから検索する従来
の類似検索装置の概略構成図を示す。[0005] On the other hand, FIG. 25 shows a schematic configuration diagram of a conventional similarity search apparatus that searches a large amount of data after narrowing it down with an artificially created index.

【０００６】ここで、類似検索装置にてデータ検索を行
う際、検索キーとするものを“インデックス”と呼ぶ
が、以下の説明で用いる２つのインデックス、すなわち
第一インデックスおよび第二インデックスの区別を明確
にするために、それらの定義を行う。まず、第一インデ
ックスとは、データベース中の多量のデータから数の絞
り込みを行うのに用いられるもので、条件を厳密に満た
していなければならないインデックスである。そして、
第一インデックスを満たしたデータ群に対して類似検索
が行われるが、この時に、入力された問題データと検索
されたデータとの類似性を詳細に評価するために利用さ
れるものが、第二インデックスである。第二インデック
スとは、各属性の重みである。Here, when a data search is performed by a similar search device, what is used as a search key is called an "index". The two indexes used in the following description, that is, the first index and the second index are distinguished. For clarity, we define them. First, the first index is used to narrow down the number from a large amount of data in the database, and is an index that must strictly satisfy the conditions. And
A similarity search is performed on the data group that satisfies the first index. At this time, the data used to evaluate the similarity between the input problem data and the searched data in detail is the second search. It is an index. The second index is the weight of each attribute.

【０００７】図２５の類似検索装置では、最初に問題事
例データと第一インデックスを入力装置１８１に入力す
る。入力データは入出力制御部１８２で解読され、推論
処理部１８３は第一検索部１８４にデータの絞り込みを
指示する。第一検索部１８４はデータベース１８５から
問題事例に関するデータを呼び出し、第一インデックス
を満たすデータを呼び出す。推論処理部１８３は、この
データ群を第二検索部１８６に送り、第二検索部１８６
でこの絞られたデータの中から、図２４の類似検索装置
の検索部１７４と同様の手段で検索する。次に、この検
索結果を推論処理部１８３で導きだし、入出力制御部１
８２は出力装置１８７に検索結果を出力する。[0007] In the similarity search apparatus of FIG. 25, first, the problem case data and the first index are input to the input device 181. The input data is decrypted by the input / output control unit 182, and the inference processing unit 183 instructs the first search unit 184 to narrow down the data. The first search unit 184 calls data on the problem case from the database 185 and calls data satisfying the first index. The inference processing unit 183 sends this data group to the second search unit 186,
Then, a search is performed from the narrowed data by the same means as the search unit 174 of the similar search device in FIG. Next, this search result is derived by the inference processing unit 183, and the input / output control unit 1
82 outputs the search result to the output device 187.

【０００８】しかし、このように設定した第一インデッ
クスを用いても、望むような類似データの検索を行うこ
とは困難である。なぜならば、従来の類似検索装置では
ユーザが第一インデックスを固定するため、検索の幅を
変えることはできないからである。However, it is difficult to search for similar data as desired even with the first index set as described above. This is because, in the conventional similar search apparatus, the user cannot change the search width because the user fixes the first index.

【０００９】そこで、第一インデックスの変更を行う一
方法として、与えられた第二インデックス（属性の重
み）と検索データ数を判断基準として、ユーザが第一イ
ンデックスを強化したり緩和したりすることが考えられ
る。例えば、検索データ数をもっと増やしたい場合は、
第二インデックス値の小さい属性、すなわち重要度の低
い属性に対して制約を緩めるために、第一インデックス
を現在の状態より緩和する。反対に、検索データ数をも
っと減らしたい場合は、第二インデックス値の大きい属
性、すなわち重要度の高い属性に対して制約を強めるた
めに第一インデックスを現在の状態より強化する。Therefore, as one method of changing the first index, the user can use the given second index (attribute weight) and the number of search data as criteria for strengthening or relaxing the first index. Can be considered. For example, if you want to increase the number of search data,
The first index is relaxed from its current state in order to relax restrictions on attributes having a small second index value, that is, attributes having low importance. Conversely, if it is desired to further reduce the number of search data, the first index is strengthened from the current state in order to increase the constraint on the attribute having the large second index value, that is, the attribute having high importance.

【００１０】しかし、この方法では第一インデックス変
更の際に、どの属性に対して行うべきか、あるいはどの
属性をどのレベルまで緩和、強化するべきかなどについ
て判断基準となるような指標はなく、第一インデックス
の変更はすべてユーザによる試行錯誤的方法であるの
で、第一インデックスの調整をバランスよく行うことは
困難であり、検索意図を正確に反映した類似データ検索
を行うことができないという問題があった。However, in this method, when the first index is changed, there is no index that can be used as a criterion as to which attribute should be performed or which attribute should be relaxed or strengthened to what level. Since the change of the first index is all a trial and error method by the user, it is difficult to adjust the first index in a well-balanced manner, and it is not possible to perform a similar data search that accurately reflects the search intention. there were.

【００１１】[0011]

【発明が解決しようとする課題】前述した従来の類似検
索装置において、前者のように全てのデータを検索対象
とするものでは検索に時間がかかるという問題点があ
り、一方、後者のように多量のデータをユーザが試行錯
誤的に設定した第一インデックスで絞り込んでから検索
するものでは、バランスよく第一インデックスを設定す
ることが困難であるので適確なデータの絞り込みができ
ず、検索意図を正確に反映した類似データ検索を行うこ
とができないという問題があった。The conventional similar search apparatus described above has a problem that it takes a long time to search for all the data as a search target as in the former case, while a large amount of data as in the latter case. If the search is performed after narrowing the first index by the first index set by the user through trial and error, it is difficult to set the first index in a well-balanced manner, so it is not possible to narrow down the data accurately, and the search intention There was a problem that it was not possible to perform a similar data search that accurately reflected.

【００１２】本発明は、上記課題を解決するためになさ
れたものであり、第一インデックスをデータの蓄積状況
に応じてバランス良く設定することで、検索意図を正確
に反映した効果的な類似検索を行うことができる類似検
索装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and an effective similarity search that accurately reflects a search intention by setting a first index in a well-balanced manner according to a data accumulation state. It is an object of the present invention to provide a similarity search device capable of performing a search.

【００１３】[0013]

【課題を解決するための手段】本発明に係る類似検索装
置（請求項１）では、複数組の属性および属性値で表現
されたデータを複数蓄えているデータ記憶手段と、前記
属性値の階層構造および該属性値間の類似度を表す類似
評価知識を記憶する類似評価知識記憶手段と、前記類似
評価知識をもとにして、前記データ記憶手段に蓄えられ
ている複数のデータを、各属性ごとの属性値の範囲およ
び外部から与えられた各属性の重要度を示す第二インデ
ックスにしたがって分類するデータ分類手段と、前記デ
ータ分類手段によるデータ分類結果から目標類似度を生
成する目標類似度生成手段と、この生成された目標類似
度から、各属性の属性値の範囲を示す第一インデックス
を生成する第一インデックス生成手段と、前記第一イン
デックスを用いて、前記データ記憶手段に蓄えられてい
る複数のデータに対して、検索を行ってデータ量を絞り
込む第一検索手段と、前記第二インデックスを用いて、
この絞り込んだデータのうちから特定の属性値を有する
データに類似するデータを抽出する第二検索手段とを具
備したことを特徴とする。According to the present invention, there is provided a similarity search apparatus according to the present invention, wherein: a data storage means for storing a plurality of sets of data represented by a plurality of sets of attributes and attribute values; A similarity evaluation knowledge storage unit for storing similarity evaluation knowledge representing a similarity between a structure and the attribute value; and a plurality of data stored in the data storage unit based on the similarity evaluation knowledge. Classifying means for classifying according to a second index indicating a range of attribute values for each and an externally given importance of each attribute, and a target similarity generation for generating a target similarity from a data classification result by the data classifying means Means, first index generating means for generating a first index indicating a range of attribute values of each attribute from the generated target similarity, and using the first index Wherein for a plurality of data stored in the data storage means, a first search means to narrow the amount of data by performing a search, using the second index,
A second search unit that extracts data similar to data having a specific attribute value from the narrowed-down data.

【００１４】また、本発明に係る類似検索装置（請求項
２）では、複数組の属性および属性値で表現されたデー
タを複数蓄えているデータ記憶手段と、前記属性値の階
層構造および該属性値間の類似度を表す類似評価知識を
記憶する類似評価知識記憶手段と、前記類似評価知識を
もとにして、前記データ記憶手段に蓄えられている複数
のデータを、各属性ごとの属性値の範囲および外部から
与えられた各属性の重要度を示す第二インデックスにし
たがって分類するデータ分類手段と、このデータ分類結
果をユーザに提示する分類結果提示手段と、前記分類結
果に基づいて指示された目標類似度から、各属性の属性
値の範囲を示す第一インデックスを生成する第一インデ
ックス生成手段と、前記第一インデックスを用いて、前
記データ記憶手段に蓄えられている複数のデータに対し
て、検索を行ってデータ量を絞り込む第一検索手段と、
前記第二インデックスを用いて、この絞り込んだデータ
のうちから特定の属性値を有するデータに類似するデー
タを抽出する第二検索手段とを具備したことを特徴とす
る。Further, in the similarity search device according to the present invention (claim 2), a data storage means storing a plurality of data represented by a plurality of sets of attributes and attribute values, a hierarchical structure of the attribute values and the attribute A similarity evaluation knowledge storage unit that stores similarity evaluation knowledge representing a degree of similarity between values; and, based on the similarity evaluation knowledge, a plurality of data stored in the data storage unit. Data classification means for classifying according to a second index indicating the importance of each attribute given from the range and externally provided, a classification result presentation means for presenting the data classification result to a user, and instructed based on the classification result First index generation means for generating a first index indicating a range of attribute values of each attribute from the target similarity, and the data storage means using the first index. For a plurality of data stored, and the first search means to narrow the amount of data by performing a search,
A second search unit configured to extract data similar to data having a specific attribute value from the narrowed data using the second index.

【００１５】また、望ましくは、上記各構成において、
前記目標類似度を対話的に変更する目標類似度変更手段
をさらに具備したことを特徴とする。Preferably, in each of the above structures,
The apparatus further includes a target similarity changing unit that interactively changes the target similarity.

【００１６】一方、本発明に係る類似検索装置（請求項
４）では、複数組の属性および属性値で表現されたデー
タを複数蓄えているデータ記憶手段と、前記属性値の階
層構造および該属性値間の類似度を表す類似評価知識を
記憶する類似評価知識記憶手段と、前記類似評価知識を
もとにして、前記データ記憶手段に蓄えられている複数
のデータを、各属性ごとの属性値の範囲および外部から
与えられた各属性の重要度を示す第二インデックスにし
たがって分類するデータ分類手段と、前記データ分類手
段によるデータ分類結果から各属性の属性値の範囲を示
す第一インデックスを生成する第一インデックス生成手
段と、前記第一インデックスを用いて、前記データ記憶
手段に蓄えられている複数のデータに対して、検索を行
ってデータ量を絞り込む第一検索手段と、前記第二イン
デックスを用いて、この絞り込んだデータのうちから特
定の属性値を有するデータに類似するデータを抽出する
第二検索手段とを具備したことを特徴とする。On the other hand, in the similarity search device according to the present invention (claim 4), a data storage means for storing a plurality of data represented by a plurality of sets of attributes and attribute values, a hierarchical structure of the attribute values and the attribute A similarity evaluation knowledge storage unit that stores similarity evaluation knowledge representing a degree of similarity between values; and, based on the similarity evaluation knowledge, a plurality of data stored in the data storage unit. Data classification means for classifying according to the second index indicating the range of the attribute and the importance of each attribute given from the outside, and generating a first index indicating the range of the attribute value of each attribute from the data classification result by the data classification means Using the first index generating means and the first index, performing a search on a plurality of data stored in the data storage means to narrow down the data amount. A first search unit that writes, using the second index, and characterized by including a second search means for extracting data similar to data having a specific attribute value from among the narrowed-down data.

【００１７】また、望ましくは、上記各構成において、
前記第一インデックスを対話的に変更する第一インデッ
クス変更手段をさらに具備したことを特徴とする。Preferably, in each of the above structures,
The apparatus further comprises a first index changing means for interactively changing the first index.

【００１８】また、望ましくは、上記各構成において、
前記データ分類結果を格納する手段をさらに具備したこ
とを特徴とする。Preferably, in each of the above structures,
The apparatus further comprises means for storing the data classification result.

【００１９】[0019]

【作用】本発明に係る類似検索装置（請求項１）では、
前記第一検索手段によって行われる前記データ記憶手段
に蓄えられている複数のデータに対するデータ量の絞り
込みのための検索（厳密検索）に用いられる第一インデ
ックスを、従来のようにユーザによる経験的・試行錯誤
的に決定するのではなく、自動的に生成する。In the similarity search device according to the present invention (claim 1),
A first index used for a search (strict search) for narrowing down a data amount of a plurality of data stored in the data storage unit, which is performed by the first search unit, is empirically determined by a user as in the related art. It is generated automatically instead of being determined by trial and error.

【００２０】すなわち、まず、データ分類手段が、前記
データ記憶手段に蓄えられている複数のデータを、各属
性ごとの属性値の範囲および第二インデックスにしたが
って分類する。目標類似度生成手段は、この分類結果か
ら厳密検索の際にデータが満たすべき目標類似度を決定
する。そして、第一インデックス生成手段は、この生成
された目標類似度から、各属性の属性値の範囲を示す第
一インデックスを決定する。つまり、どの属性を対象
に、どの属性値を緩和あるいは強化するべきか、さらに
は、緩和あるいは強化するレベルをどこまで行えばよい
かといった第一インデックスを、データベースに蓄積さ
れているデータの分類結果を反映させた上で、バランス
良く自動生成するものである。That is, first, the data classification means classifies the plurality of data stored in the data storage means according to the range of the attribute value for each attribute and the second index. The target similarity generating means determines a target similarity to be satisfied by the data at the time of the strict search from the classification result. Then, the first index generation means determines a first index indicating the range of the attribute value of each attribute from the generated target similarity. In other words, the first index of which attribute should be relaxed or strengthened and which attribute value should be relaxed, and how far the relaxed or strengthened level should be performed, is used for the classification result of the data stored in the database. After the reflection, it is automatically generated in a well-balanced manner.

【００２１】このようにデータの蓄積状況に応じて生成
された第一インデックスを用いて厳密検索を行うことに
よって、検索意図を十分反映したデータの絞り込みがで
きるため、さらに第二検索手段によって行われる類似検
索は、検索意図を正確に反映した効果的なものとなる。By performing a strict search using the first index generated according to the data accumulation state in this manner, data can be narrowed down sufficiently reflecting the search intention, and further performed by the second search means. A similar search is an effective one that accurately reflects a search intention.

【００２２】また、本発明に係る類似検索装置（請求項
２）では、前記第一検索手段によって行われる前記デー
タ記憶手段に蓄えられている複数のデータに対するデー
タ量の絞り込みのための検索（厳密検索）に用いられる
第一インデックスを、従来のようにユーザによる経験的
・試行錯誤的に決定するのではなく、半自動的に生成す
る。Further, in the similarity search device according to the present invention (claim 2), a search (strictly) for narrowing down a data amount of a plurality of data stored in the data storage means performed by the first search means. The first index used for the search is semi-automatically generated, instead of being determined by the user by empirical, trial and error methods.

【００２３】すなわち、まず、データ分類手段が、前記
データ記憶手段に蓄えられている複数のデータを、各属
性ごとの属性値の範囲および第二インデックスにしたが
って分類する。分類結果提示手段は、このデータ分類結
果をユーザに提示する。第一インデックス生成手段は、
ユーザーがこの分類結果を参照して好ましい目標類似度
を指示するのを待ち、該指示に従って各属性の属性値の
範囲を示す第一インデックスを生成する。つまり、従
来、ユーザーは、経験的・試行錯誤的に第一インデック
スを決定していたのに対して、分類結果提示手段がデー
タ分類結果を提示してくれるので、データの蓄積状況に
応じた第一インデックスを指定することができる。That is, first, the data classification means classifies the plurality of data stored in the data storage means according to the range of the attribute value for each attribute and the second index. The classification result presentation means presents the data classification result to the user. The first index generation means includes:
The user waits for the user to specify a desired target similarity with reference to the classification result, and generates a first index indicating the range of attribute values of each attribute according to the instruction. In other words, while the user has conventionally determined the first index by empirical and trial and error, the classification result presenting means presents the data classification result. One index can be specified.

【００２４】このため、上記のようにして指定された第
一インデックス用いて厳密検索を行うことによって、検
索意図を十分反映したデータの絞り込みができるため、
さらに第二検索手段によって行われる類似検索は、検索
意図を正確に反映した効果的なものとなる。For this reason, by performing a strict search using the first index specified as described above, it is possible to narrow down data that sufficiently reflects the search intention.
Further, the similarity search performed by the second search means is an effective one that accurately reflects the search intention.

【００２５】一方、本発明に係る類似検索装置（請求項
４）では、第一インデックスを目標類似度の生成を介さ
ずに直接分類結果から自動生成する。On the other hand, in the similarity search device according to the present invention (claim 4), the first index is automatically generated directly from the classification result without generating the target similarity.

【００２６】このようにデータの蓄積状況に応じて生成
された第一インデックスを用いて厳密検索を行うことに
よって、検索意図を十分反映したデータの絞り込みがで
きるため、さらに第二検索手段によって行われる類似検
索は、検索意図を正確に反映した効果的なものとなる。By performing a strict search using the first index generated according to the data accumulation state as described above, the data can be narrowed down sufficiently reflecting the search intention, and further performed by the second search means. A similar search is an effective one that accurately reflects a search intention.

【００２７】[0027]

【実施例】以下、図面を参照しながら本発明の一実施例
を説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００２８】図１は、本発明の一実施例に係る類似検索
装置の概略構成を示すブロック図である。ここでは、本
発明の類似検索装置を、事例ベース推論システムにおけ
る類似検索に適用した場合について説明する。事例ベー
ス推論とは、過去の問題解決経験を事例として蓄積して
おき、新規問題に対して類似事例を検索・修正すること
により結論を導く推論である。本実施例では、具体的
に、階層関係データ（ツリー構造）から、バランスの良
いインデックスを生成し、平易で正確に、データの蓄積
状況に応じた類似検索を行うことを説明する。さらに具
体例として、データベース中の事例データから「東京
で、６月に、すきやきを」という“宴会の決定事項”に
類似した例を検索する場合について説明する。FIG. 1 is a block diagram showing a schematic configuration of a similarity search apparatus according to one embodiment of the present invention. Here, a case where the similarity search device of the present invention is applied to a similarity search in a case-based reasoning system will be described. Case-based inference is inference in which past problem-solving experiences are accumulated as cases, and similar cases are searched for and corrected for new problems to reach conclusions. In the present embodiment, specifically, a description will be given of generating a well-balanced index from hierarchical relation data (tree structure) and performing a simple and accurate similarity search according to the data accumulation status. Further, as a specific example, a case will be described in which an example similar to the “banquet decision item” of “Sukiyaki in June in Tokyo” is searched from the case data in the database.

【００２９】本実施例の類似検索装置は図１のように、
入力装置１０１、入出力制御部１０２、推論処理部１０
３、第一検索部１０４、第一インデックス生成部１０
５、データ分類部１０６、目標類似度生成部１０７、分
類データ格納部１０８、データベース１０９、類似評価
知識１１０、第二検索部１１１および出力装置１１２を
備えた構成を有している。As shown in FIG. 1, the similarity search apparatus of this embodiment
Input device 101, input / output control unit 102, inference processing unit 10
3. First search unit 104, first index generation unit 10
5, a configuration including a data classification unit 106, a target similarity generation unit 107, a classification data storage unit 108, a database 109, a similarity evaluation knowledge 110, a second search unit 111, and an output device 112.

【００３０】上記構成において、入力装置１０１に、問
題事例として属性と属性値を「属性−属性値」で表した
データ、例えば、「地域−東京」，「時期−６月」，
「食べ物−すきやき」を入力する。この入力データは入
出力制御部１０２で解読され、解読結果は推論処理部１
０３に送られる。これに応じて、推論処理部１０３は第
一検索部１０４に厳密検索を指示する。ここで、厳密検
索とは、データベース中の多量のデータから数の絞り込
みを行うために、厳密に満足しなければならないデータ
を検索することである。In the above-described configuration, the input device 101 inputs data representing attributes and attribute values as "attribute-attribute values" as problem cases, for example, "region-Tokyo", "time-June",
Enter "Food-Sukiyaki". This input data is decoded by the input / output control unit 102, and the decoding result is output to the inference processing unit 1
03 is sent. In response, the inference processing unit 103 instructs the first search unit 104 to perform a strict search. Here, the strict search refers to a search for data that must be strictly satisfied in order to narrow down the number from a large amount of data in the database.

【００３１】厳密検索を指示された第一検索部１０４
は、まず、厳密検索に用いるインデックスを生成するた
めに、第一インデックス生成部１０５に第一インデック
スの作成を指示する。First search unit 104 instructed to perform a strict search
First instructs the first index generation unit 105 to create a first index in order to generate an index used for an exact search.

【００３２】第一インデックス生成部１０５は、データ
ベース１０９に登録されている事例データと類似評価知
識１１０に登録されている評価知識およびユーザが指定
した第二インデックスをもとに、データの蓄積状況に応
じた適切な第一インデックスを生成する。The first index generation unit 105 determines the data accumulation status based on the case data registered in the database 109, the evaluation knowledge registered in the similarity evaluation knowledge 110, and the second index specified by the user. Generate an appropriate first index accordingly.

【００３３】以下、第一インデックス生成部１０５によ
って行われる第一インデックスの生成について説明す
る。なお、第一インデックス生成部１０５で第一インデ
ックスを生成する手順を図２のフローチャートに、デー
タの蓄積状況に応じた第一インデックスを生成するため
にデータ分類部１０６で属性値間の類似度に従ってデー
タを分類する手順を図３のフローチャートに、目標類似
度生成部１０７で第一インデックスを生成するために必
要な目標類似度を生成する手順を図４のフローチャート
にそれぞれ示す。Hereinafter, the generation of the first index performed by the first index generation unit 105 will be described. The procedure for generating the first index by the first index generation unit 105 is shown in the flowchart of FIG. A flowchart of FIG. 3 shows a procedure for classifying data, and a flowchart of FIG. 4 shows a procedure for generating a target similarity required for generating the first index in the target similarity generator 107.

【００３４】ここで、図５に事例データをいくつか示
す。図５のように、事例データは複数の（属性−属性値
のペア）からなり、例えば事例１として（地域−神奈
川），（時期−８月），（食べ物−なべ）という属性値
があることがわかる。FIG. 5 shows some case data. As shown in FIG. 5, the case data is composed of a plurality of (attribute-attribute value pairs). For example, case 1 has attribute values of (region-Kanagawa), (time-August), (food-pan). I understand.

【００３５】また、本実施例では、評価知識として、各
属性における属性値の階層関係および各属性値間の類似
の度合を表す類似度を用いる。図６、図７および図８に
は、属性値の階層関係を各属性ごとに示す。図６の地域
の属性の場合、日本全域の下に関東や関西などがあり、
また関東の下に東京や神奈川などがあるという木構造の
階層関係があることを示している。また、図９、図１０
および図１１には、各属性ごとの各属性値間の類似度を
示す。類似度は値が大きいものほど類似の度合いが強
く、東京と神奈川の類似度は０．４０、東京と京都の類
似度は０．０１というように全地域の類似度を表にした
ものが図９である。なお、本実施例では、入力される問
題データが末端ノードのデータとしているため、類似度
を末端ノード間にのみ定義しているが、入力される問題
データは末端ノードである必要はなく、入力問題に対応
した類似度の設定をすることが可能である。In the present embodiment, as evaluation knowledge, a hierarchical relationship between attribute values in each attribute and a similarity indicating a degree of similarity between attribute values are used. FIGS. 6, 7 and 8 show the hierarchical relationship of attribute values for each attribute. In the case of the attribute of the region in FIG. 6, there are Kanto and Kansai below the whole of Japan,
It also indicates that there is a hierarchical structure of tree structure such as Tokyo and Kanagawa under Kanto. 9 and FIG.
FIG. 11 shows the similarity between attribute values for each attribute. The larger the similarity, the stronger the degree of similarity. The similarity between Tokyo and Kanagawa is 0.40, and the similarity between Tokyo and Kyoto is 0.01. 9 In the present embodiment, since the input question data is the data of the terminal node, the similarity is defined only between the terminal nodes. However, the input problem data does not need to be the terminal node. It is possible to set the similarity corresponding to the problem.

【００３６】まず、図２のステップＳ２３において、入
力された問題に対してデータ分類部１０６を用いてデー
タベース中のデータの分類を行う。このデータ分類の詳
細については、図３のフローチャートに従って説明して
いく。First, in step S23 in FIG. 2, the data in the database is classified using the data classification unit 106 for the input question. The details of this data classification will be described according to the flowchart of FIG.

【００３７】ステップＳ３１では、属性ごとの木構造に
おいて属性値間の類似度の範囲ｔを調べる。例えば、地
域について調べると、問題の属性値「東京」からみた関
東の範囲（但し東京は除く）では、最も類似度の大きい
神奈川で０．４０、最も小さい栃木で０．０５なので
０．０５≦ｔ≦０．４０となる。これら各属性ごとに調
べたものを、地域については図１２、時期については図
１３、食べ物については図１４に示す。In step S31, a range t of similarity between attribute values in the tree structure for each attribute is checked. For example, when examining the area, in the Kanto range (excluding Tokyo) from the attribute value “Tokyo” in question (excluding Tokyo), it is 0.40 for Kanagawa with the highest similarity and 0.05 for the smallest Tochigi, so 0.05 ≦ t ≦ 0.40. FIG. 12 shows the area, FIG. 13 shows the time, and FIG. 14 shows the food.

【００３８】ステップＳ３２では、それぞれの属性の重
み（重要度）である第二インデックスｗを入力する。こ
こでは、地域の属性の重みは０．２、時期の属性の重み
は０．５、食べ物の属性の重みは０．３とする。In step S32, a second index w, which is the weight (importance) of each attribute, is input. Here, the weight of the attribute of the area is 0.2, the weight of the attribute of the time is 0.5, and the weight of the attribute of the food is 0.3.

【００３９】ステップＳ３３では、ｗ×ｔ（ｗは第二イ
ンデックス、ｔは属性値間の類似度の範囲）を計算し
（以下、ｓ＝ｗ×ｔとおく）、ｓの範囲を調べる。In step S33, w × t (w is the second index, t is the range of similarity between attribute values) is calculated (hereinafter, s = w × t), and the range of s is examined.

【００４０】例えば、地域について調べると、東京から
関東の範囲では、最も類似度の大きい神奈川で０．４
０、最も小さい栃木で０．０５なので、神奈川は第二イ
ンデックス０．２×属性値間の類似度０．４０でｓは
０．０８となり、栃木では第二インデックス０．２×属
性値間の類似度０．０５でｓは０．０１となる。つま
り、ｓの範囲は０．０１≦ｓ≦０．０８となる。これを
(a1)とする。これら各属性ごとに調べたものを、地域に
ついては図１５、時期については図１６、食べ物につい
ては図１７にまとめて示す。For example, when examining the area, in the range from Tokyo to Kanto, Kanagawa with the highest similarity is 0.4%.
0, the smallest Tochigi is 0.05, so Kanagawa's second index 0.2 x the similarity between attribute values 0.40 and s is 0.08, and Tochigi's second index 0.2 x the attribute value At the similarity of 0.05, s becomes 0.01. That is, the range of s is 0.01 ≦ s ≦ 0.08. this
(a1). FIG. 15 shows the area, FIG. 16 shows the time, and FIG. 17 shows the food.

【００４１】ステップＳ３４では、類似度の範囲ｒごと
にデータ（事例）をグループ化する。類似度の範囲ｒと
は、ステップＳ３３で求めたｓの組み合わせから求め
る。これらをまとめたものを図１８に示す。In step S34, data (cases) are grouped for each similarity range r. The similarity range r is obtained from the combination of s obtained in step S33. FIG. 18 shows a summary of these.

【００４２】そして、ステップＳ３５において上記のグ
ループ化に満足すれば、データの分類が決定されるが、
グループ化に不満足であれば、ステップＳ３１あるいは
ステップＳ３２に戻り、再度、同様の処理をやり直すこ
とになる。ステップＳ３１に戻った場合は、データベー
スに登録されている属性値間の類似度を変更することか
ら始まり、ステップＳ３２に戻った場合は、第二インデ
ックスｗを変更することから始まる。If the above grouping is satisfied in step S35, the classification of the data is determined.
If the grouping is not satisfactory, the process returns to step S31 or step S32, and the same processing is performed again. When returning to step S31, the process starts by changing the similarity between the attribute values registered in the database, and when returning to step S32, the process starts by changing the second index w.

【００４３】図１８に示すようにデータの分類が終わる
と、分類データ格納部１０８で、これらをデータ情報と
して格納する。格納される分類データを図１９に示す。
図２のフローチャートでは、ステップＳ２４でこれらを
分類データに格納する。When the classification of data is completed as shown in FIG. 18, these are stored as data information in the classification data storage unit 108. FIG. 19 shows the stored classification data.
In the flowchart of FIG. 2, these are stored in the classification data in step S24.

【００４４】なお、分類結果を格納することにより、既
存の入力データに関しては、分類する手間を省くことが
でき、効率的である。新規のデータについては分類情報
がないため、データの分類から行う必要があるが、入力
された問題に対して既存の分類情報がある場合には、図
２０のフローチャートに示すように分類処理をとばすこ
とが考えられる。ステップＳ２１で分類データから既存
分類結果を調べ、ステップＳ２２でデータの分類を行う
かどうか判定する。ここで分類は行わないことを選択す
ると、分類データ格納格納部１０８によって格納された
分類情報を用いて、次の処理ステップＳ２５に移ること
ができる。なお、ステップＳ２２でデータの分類を行う
選択をした場合は、ステップＳ２３以降は図２で説明し
たものと同一の処理の流れになる。By storing the classification result, the trouble of classifying existing input data can be saved, which is efficient. Since there is no classification information for new data, it is necessary to perform the classification from the data. However, if there is existing classification information for the input problem, the classification processing is skipped as shown in the flowchart of FIG. It is possible. In step S21, an existing classification result is checked from the classification data, and in step S22, it is determined whether to classify the data. If the user selects not to perform classification, the process can proceed to the next processing step S25 using the classification information stored by the classification data storage / storage unit 108. If the user selects to classify data in step S22, the flow of the processing after step S23 is the same as that described with reference to FIG.

【００４５】次に、目標類似度生成部１０７で目標類似
度を生成する。図２ではステップＳ２５にあたる。この
目標類似度の生成については、その詳細を図４のフロー
チャートに従って説明していく。Next, a target similarity generator 107 generates a target similarity. FIG. 2 corresponds to step S25. The generation of the target similarity will be described in detail with reference to the flowchart of FIG.

【００４６】ステップＳ４１では、類似度の範囲ｒごと
に、すなわちグループ化されたｒごとに、事例数を調べ
る。In step S41, the number of cases is checked for each similarity range r, that is, for each grouped r.

【００４７】ステップＳ４２では、この結果に対する分
布表示方法を選択する。分布表示方法は大きく分けて、
類似度の範囲情報表示と範囲に応じた事例数表示があ
る。類似度の範囲情報表示のデフォルトとしては、横軸
にｒの範囲を取り、グループ単位に属性について類似度
の範囲を比較できるようにする。事例数表示のデフォル
トとしては、横軸にｒを取り、縦軸に事例数を取り、す
べてのｒについて表示するようにする。In step S42, a distribution display method for this result is selected. The distribution display method is roughly divided,
There is a range information display of the similarity and a case number display according to the range. As a default of the similarity range information display, a range of r is set on the horizontal axis so that the similarity ranges of the attributes can be compared in group units. As the default of the case number display, r is taken on the horizontal axis and the case number is taken on the vertical axis, and all r are displayed.

【００４８】ステップＳ４３では、選択された方法で分
布表示を行いユーザに提示する。表示形式の一例（ここ
ではデフォルト）を、類似度の範囲情報表示については
図２１（ａ）に、類似度の範囲に応じた事例数表示につ
いては図２１（ｂ）に示す。In step S43, distribution is displayed by the selected method and presented to the user. An example of a display format (here, a default) is shown in FIG. 21A for displaying similarity range information, and FIG. 21B for displaying the number of cases according to the similarity range.

【００４９】上記類似度の範囲ｒとその範囲に属する事
例数をまとめた結果を図２２に示す。なお、図１８に示
すグループごとに事例数の検索を行うが、事例内容の検
索ではないので、ディスクアクセスなしで高速に検索で
きる。FIG. 22 shows the result of summarizing the similarity range r and the number of cases belonging to the range. Although the number of cases is searched for each group shown in FIG. 18, the search is not the case contents, so that the search can be performed at high speed without disk access.

【００５０】ここで、このように類似範囲のグループ結
果とそれに対応した事例数を知ることは、二次的な作用
効果として、今後入力すべき事例を把握することを可能
とする利点もある。また、知識獲得においても効果的な
情報となる。Here, knowing the group result in the similar range and the number of cases corresponding to the similar result has an advantage that it is possible to grasp the cases to be input in the future as a secondary effect. It is also effective information in knowledge acquisition.

【００５１】次に、ユーザが、提示された表示方法に満
足した場合、ステップＳ４４でその旨が入力装置１０１
から指示され、ステップＳ４５の目標類似度の選択に移
る。一方、ステップＳ４４で満足しない場合は、ステッ
プＳ４２に戻り、再度、満足のいく表示方法に基づいた
表示結果を実行する。Next, if the user is satisfied with the presented display method, this is indicated in step S44 by the input device 101.
Then, the process proceeds to the selection of the target similarity in step S45. On the other hand, if the result is not satisfied in step S44, the process returns to step S42, and a display result based on a satisfactory display method is executed again.

【００５２】次に、ステップＳ４５の目標類似度の選択
について説明する。目標類似度の選択は、複数の類似度
の範囲ｒのうちから選択したものを目標類似度として指
示することでなされるが、この目標類似度の選択には、
ユーザ自身が上記提示内容を参照して行うマニュアルモ
ードと、目標類似度生成部１０７が予め決められた方法
に従って行う自動モードが考えられる。ただし、基本的
にはいずれのモードにおいても、類似度範囲の選択方法
としては、次のようなものが考えられる。Next, the selection of the target similarity in step S45 will be described. Selection of the target similarity is performed by designating a selected one from a plurality of similarity ranges r as the target similarity.
A manual mode in which the user himself refers to the above-described presentation contents and an automatic mode in which the target similarity generation unit 107 performs in accordance with a predetermined method can be considered. However, basically, in any of the modes, the following can be considered as a method of selecting the similarity range.

【００５３】まず、各属性ごとに片寄りのないものを選
ぶようにすることが望ましい。本実施例では、同一スケ
ールで判定できるように、重要性に応じて各属性に重み
をつけているので、属性間の共通部分の多いものを選択
すればよい。そして、事例数を考慮し、類似検索時に必
要とされる事例数を満たしているグループがいくつか存
在すれば、その場合は範囲の大きい方を優先すればよ
い。また、ドメインごとに異なる決定方法を用いても良
い。First, it is desirable to select one without bias for each attribute. In the present embodiment, each attribute is weighted according to importance so that determination can be made on the same scale. Therefore, an attribute having many common parts between attributes may be selected. Then, in consideration of the number of cases, if there are some groups satisfying the number of cases required at the time of similarity search, in that case, the one with the larger range may be given priority. Further, a different determination method may be used for each domain.

【００５４】以下、マニュアルモードについて、さらに
具体的に説明する。Hereinafter, the manual mode will be described more specifically.

【００５５】（ｉ）まず、図１２（ａ）に例示するよう
な、類似度の範囲情報表示を参照して、類似度の範囲ｒ
１〜ｒ１８のうちから、各属性値の取り得る類似度の範
囲の共通部分の大きい（広い）ものを選択する。上述し
たように、各属性値間の重要性を考慮し、重要度にあわ
せて重みをつけているので、同一のスケールで判定でき
る。この場合、ｒ４、ｒ６、ｒ１０、ｒ１２、ｒ１６、
ｒ１８などか候補として選択される。(I) First, referring to the similarity range information display as illustrated in FIG.
From 1 to r18, the one having a large (wide) common part in the range of the similarity that each attribute value can take is selected. As described above, the importance between the respective attribute values is considered and the weight is assigned according to the importance, so that the determination can be made on the same scale. In this case, r4, r6, r10, r12, r16,
r18 or the like is selected as a candidate.

【００５６】（ｉｉ）次に、図１２（ａ）に例示するよ
うな、類似度の範囲に応じた事例数表示を参照して、上
記候補のうちから事例数の適当なものを選択する。選択
の方法としては、選択すべき事例数の範囲の判定方法を
決めておけば良い。例えば、全データ数に対する該事例
数の比率（または比率の範囲）を決めておく方法、ある
いは全データ数にかかわらず、許容できる事例数（また
は事例数の範囲）を固定的に決定する方法などが考えら
れる。(Ii) Next, referring to the case number display according to the similarity range as exemplified in FIG. 12A, an appropriate case number is selected from the candidates. As a selection method, a method of determining the range of the number of cases to be selected may be determined. For example, a method of determining the ratio (or range of ratio) of the number of cases to the total number of data, or a method of fixedly determining the allowable number of cases (or range of number of cases) regardless of the total number of data, etc. Can be considered.

【００５７】ここでは、事例数の目安が５０であるとす
ると、上記候補のうちからｒ１０が目標類似度として選
択されるわけである。Here, assuming that the standard of the number of cases is 50, r10 is selected as the target similarity from the candidates.

【００５８】なお、上記ｉの類似度範囲の共通部分に対
して、単独の領域の大きいものは避けるという方法も有
効である。It is also effective to avoid a large single area from the common part of the similarity range i.

【００５９】次に、自動モードについて、さらに具体例
を用いて説明する。Next, the automatic mode will be described using a more specific example.

【００６０】（ｉ）まず、各ｒがとる類似度の範囲を算
出する。この結果を、以下に示す。(I) First, a range of similarity taken by each r is calculated. The results are shown below.

【００６１】ｒ１０．０７５〜０．０８ｒ２０．０７５〜０．０８ｒ３０．０３０〜０．０８ｒ４０．０２５〜０．０８ｒ５０．０３０〜０．０８ｒ６０．０１５〜０．０８ｒ７０．０７５〜０．０８ｒ８０．０７５〜０．０８ｒ９０．０３０〜０．０８ｒ１００．０２５〜０．０８ｒ１１０．０３０〜０．０８ｒ１２０．０１５〜０．０８ｒ１３０．０７５〜０．０８ｒ１４０．０７５〜０．０８ｒ１５０．０３０〜０．０８ｒ１６０．０２５〜０．０８ｒ１７０．０３０〜０．０８ｒ１８０．０１５〜０．０８て、選択の優先度の順位付けを行う。この結果を、以下
に示す。R1 0.075 to 0.08 r2 0.075 to 0.08 r3 0.030 to 0.08 r4 0.025 to 0.08 r5 0.030 to 0.08 r6 0.015 to 0.05 08 r7 0.075 to 0.08 r8 0.075 to 0.08 r9 0.030 to 0.08 r10 0.025 to 0.08 r11 0.030 to 0.08 r12 0.015 to 0.08 r13 0.075 to 0.08 r14 0.075 to 0.08 r15 0.030 to 0.08 r16 0.025 to 0.08 r17 0.030 to 0.08 r18 0.015 to 0.08 Priority ranking. The results are shown below.

【００６２】第１順位（０．０１５〜０．０８）ｒ６、ｒ１２、ｒ１８第２順位（０．０２５〜０．０８）ｒ４、ｒ１０、ｒ１６第３順位（０．０３０〜０．０８）ｒ３、ｒ５、ｒ９ｒ１１、ｒ１５、ｒ１７（ｉｉ）次に、事例数の適当なものを選択する。適当な
事例数の決定方法は、上記のマニュアルモードと同様で
ある。First rank (0.015 to 0.08) r6, r12, r18 Second rank (0.025 to 0.08) r4, r10, r16 Third rank (0.030 to 0.08) r3 , R5, r9 r11, r15, r17 (ii) Next, an appropriate number of cases is selected. The method for determining the appropriate number of cases is the same as in the manual mode described above.

【００６３】この選択は、上記の類似度のグループ単位
で判断する。すなわち、上記順位付け結果のうち、第１
順位に属するものから判断していく。判断している順位
に適当な事例数を有するものがない場合には、次の順位
に属するものを判断する。This selection is determined for each group of the similarity. That is, of the ranking results, the first
Judgment is made from those belonging to the ranking. If there is no order having the appropriate number of cases in the determined order, the order belonging to the next order is determined.

【００６４】例えば、第１順位のｒ６、ｒ１２、ｒ１８
については、時期の属性値を緩和しすぎで、事例数が多
すぎたので、次に第２順位を判断する。そして、ｒ４に
ついては、地域の属性値の条件を緩和しすぎで、事例数
が多すぎたが、ｒ１０については、事例数が５２と適当
な数であったことから、最終的にｒ１０が選択されると
いう処理の流れになる。For example, first rank r6, r12, r18
With regard to, the attribute value of time is too relaxed and the number of cases is too large, so the second rank is determined next. As for r4, the condition of the attribute value of the area was too relaxed, and the number of cases was too large. However, as for r10, the number of cases was 52, which was an appropriate number, so that r10 was finally selected. Is performed.

【００６５】なお、いずれのモードにおいても、上記し
た方法の他、種々の選択方法が考えられる。In any mode, in addition to the above-described methods, various selection methods can be considered.

【００６６】また、自動モードでは、前述したステップ
Ｓ４３の分布表示を省略しても良い。In the automatic mode, the distribution display in step S43 described above may be omitted.

【００６７】以上説明したような処理によって、目標類
似度が生成される。The target similarity is generated by the processing described above.

【００６８】そして、図２のステップＳ２６において、
生成された目標類似度から、第一インデックスは、地域
については「本州」、時期については「春、夏」、食べ
物は「ｒｏｏｔ」と決まる。Then, in step S26 in FIG.
Based on the generated target similarity, the first index is determined as “Honshu” for the region, “Spring / Summer” for the time, and “root” for the food.

【００６９】次に、第一インデックス生成部１０５は、
第一検索部１０４に渡されるインデックスを生成する。Next, the first index generation unit 105
An index to be passed to the first search unit 104 is generated.

【００７０】このインデックスとしては、類似評価知識
１１０から階層データを展開した形、すなわち本州を
「東北、青森、秋田、…」というように本州より下位に
あるすべての属性値を展開したリストとして生成する
か、あるいは「IS-A本州」という形で本州を含んだイン
デックスとする。これらの記述形式はデータベースの保
持形式に依存することになるが、図２のステップＳ２６
では、第一検索部１０４に渡す形式のインデックスを設
定する。The index is generated by expanding hierarchical data from the similarity evaluation knowledge 110, that is, generating Honshu as a list in which all attribute values lower than Honshu are expanded, such as “Tohoku, Aomori, Akita,. Or an index that includes Honshu in the form of "IS-A Honshu". Although these description formats will depend on the database storage format, step S26 in FIG.
Then, an index in a format to be passed to the first search unit 104 is set.

【００７１】なお、ｒｏｏｔとは木構造の階層の一番上
のところのことである。つまり、食べ物はすべて検索対
象となることを意味する。The root is the top of the tree structure hierarchy. This means that all foods will be searched.

【００７２】このように、設定した目標類似度に従っ
て、類似している事例を削らずに事例群の範囲を取り、
その中から最も類似した事例を検索するのである。ｒｏ
ｏｔまでさかのぼってしまうと事例数が多くなるので、
上限と下限の範囲に応じて決める必要がある。そこで本
実施例では、蓄積されている事例の分布状況と事例数を
把握しながら目標類似度の設定を行うことができる。As described above, according to the set target similarity, the range of the case group is obtained without removing similar cases, and
The most similar case is searched from among them. ro
If you go back to ot, the number of cases will increase,
It must be determined according to the range of the upper and lower limits. Thus, in the present embodiment, the target similarity can be set while grasping the distribution status and the number of stored cases.

【００７３】次に、第一インデックスの変更は、ステッ
プＳ２７で変更する。データの分類や目標類似度の生成
から再度、上記の手段を繰り返すことにより行う。Next, the first index is changed in step S27. The above procedure is repeated from the data classification and the generation of the target similarity.

【００７４】変更の必要がない場合は、第一インデック
スは地域については「本州」時期については「春、
夏」、食べ物については「ｒｏｏｔ」と決定される。If there is no need to change, the first index will be “Spring,
Summer is determined, and food is determined as “root”.

【００７５】なお、目標類似度の変更方法として、ユー
ザーがすでに生成された目標類似度をもとにして適宜調
整し、その数値を入力できるような機能を追加しても良
い。As a method of changing the target similarity, a function may be added so that the user can appropriately adjust the target similarity based on the already generated target similarity and input the numerical value.

【００７６】次に、第一検索部１０４は、上記のように
して第一インデックス生成部１０５に生成させた第一イ
ンデックスを用いて厳密検索を行う。すなわち、第一検
索部１０４は、第一インデックス「本州」，「春、
夏」，「ｒｏｏｔ」に基づいて、データベース１０９中
から該当する事例を検索する。例えば、図５に示した事
例データのうちの事例１、事例３、事例５、事例６、事
例７、事例１００などが検索されてくる。Next, the first search unit 104 performs a strict search using the first index generated by the first index generation unit 105 as described above. That is, the first search unit 104 outputs the first indexes “Honshu”, “Spring,
Based on “summer” and “root”, a corresponding case is searched from the database 109. For example, among the case data shown in FIG. 5, case 1, case 3, case 5, case 6, case 7, case 100, and the like are retrieved.

【００７７】ここで、従来技術では、ユーザが自分で第
一インデックスを例えば「関東」「春、夏」「ｒｏｏ
ｔ」と設定した場合、事例データから第一インデックス
に該当する事例を検索する際に、事例１００は「関東」
を満たさないため検索されず、ユーザの意図とする検索
を行うことができないといった問題があったが、本発明
では、データベース中の蓄積データの状況を反映させて
第一インデックスを生成しているので、従来であれば取
りこぼしてしまう可能性の強い事例１００のようなデー
タも、取りこぼすことなく絞り込むことができる。Here, according to the prior art, the user can set the first index by himself, for example, “Kanto”, “Spring, Summer”, “ROO”.
When “t” is set, case 100 is “Kanto” when searching the case corresponding to the first index from the case data.
There is a problem that the search is not performed because the condition is not satisfied, and the search intended by the user cannot be performed. However, in the present invention, the first index is generated by reflecting the status of the accumulated data in the database. Also, data such as the case 100 that is likely to be missed in the past can be narrowed down without being missed.

【００７８】さらに、本発明では、マニュアルモードで
は、ユーザに指標を提示することから、第一インデック
スの生成や変更するための目標類似度の設定が容易にな
り、また、蓄積されているデータから十分な数のデータ
（事例）に絞り込みが可能になった。Further, according to the present invention, in the manual mode, the index is presented to the user, so that it is easy to set the target similarity for generating and changing the first index, and to determine the target similarity from the stored data. It is now possible to narrow down to a sufficient number of data (cases).

【００７９】次に、これらの事例を第二検索部１１１で
事例間の類似度を計算し、その大小から類似度データを
検索する。Next, the similarity between the cases is calculated by the second search unit 111 for these cases, and similarity data is searched from the magnitude thereof.

【００８０】例えば、事例間の類似度ｒはｒ＝Σ（ｗ×
ｄ）の計算式で表される。ここに、ｗは第二インデック
ス、ｄは属性値間の類似度である。For example, the similarity r between cases is r = Σ (w ×
It is expressed by the calculation formula of d). Here, w is the second index, and d is the similarity between attribute values.

【００８１】実際に問題事例と事例１の類似度を計算し
てみると、（第二インデックスつまり地域の重み０．２
×東京と神奈川の類似度０．４０）＋（時期の重み０．
５×６月と８月の類似度０．１５）＋（食べ物の重み
０．３×すきやきとなべの類似度０．４）、つまり、ｒ
＝０．２×０．４０＋０．５×０．１５＋０．３×０．
４＝０．２７５となる。When the similarity between the problem case and the case 1 is actually calculated, (the second index, that is, the area weight 0.2
× Similarity between Tokyo and Kanagawa 0.40) + (time weight 0.
5 x June and August similarity 0.15) + (food weight 0.3 x sukiyaki and pan similarity 0.4), that is, r
= 0.2 × 0.40 + 0.5 × 0.15 + 0.3 × 0.
4 = 0.275.

【００８２】同様に問題事例と事例３、事例５、事例
６、事例７、事例１００の類似度を計算し、その結果を
図２３にまとめる。Similarly, the similarity between the problem case and case 3, case 5, case 6, case 7, and case 100 is calculated, and the results are summarized in FIG.

【００８３】これらを類似度の高い順に並び変えると事
例１００、事例７、事例３、事例１、事例５、事例６の
順になる。When these are rearranged in descending order of similarity, case 100, case 7, case 3, case 1, case 5, and case 6 are obtained in this order.

【００８４】この検索で必要な事例数を上位１個とする
と、事例１００が問題事例と最も類似した事例として検
索される。Assuming that the number of cases required in this search is the top one, case 100 is searched as the case most similar to the problem case.

【００８５】このように、本発明では、蓄積されている
データの状況を反映させて第一のインデックスを生成す
ることと、試行錯誤的に指定していた目標類似度を蓄積
されているデータの状況からユーザに提示することによ
り、データの蓄積状況に応じて、バランス良くインデッ
クスを設定し、効果的な類似検索を行うことができる。As described above, according to the present invention, the first index is generated by reflecting the status of the stored data, and the target similarity designated by trial and error is stored in the stored data. By presenting the situation to the user, an index can be set in a well-balanced manner according to the data accumulation situation, and an effective similarity search can be performed.

【００８６】なお、本実施例では、目標類似度を媒介と
して、第１インデックスを生成する例について説明して
きたが、第一インデックス生成部１０５を自動モードで
動作させる場合は、目標類似度の生成を行わずに、ステ
ップ２３のデータの分類結果から直接第一インデックス
を生成することも可能である。In the present embodiment, an example has been described in which the first index is generated using the target similarity as a medium. However, when the first index generation unit 105 is operated in the automatic mode, the generation of the target similarity is performed. It is also possible to generate the first index directly from the data classification result in step 23 without performing the above.

【００８７】また、本発明は上述した実施例に限定され
るものではなく、その要旨を逸脱しない範囲で、種々変
形して実施することができる。The present invention is not limited to the above-described embodiment, but can be implemented in various modifications without departing from the scope of the invention.

【００８８】[0088]

【発明の効果】本発明によれば、データベース中の蓄積
データの状況を反映させて第一インデックスを生成する
ので、バランスの良いインデックスが得られ、効果的な
類似検索を行うことが可能である。According to the present invention, since the first index is generated by reflecting the status of the stored data in the database, a well-balanced index can be obtained, and an effective similarity search can be performed. .

【００８９】また、第一インデックスをマニュアルモー
ドで生成する場合、試行錯誤的に指定していた目標類似
度をデータの蓄積状況からユーザに提示することができ
るので、効率的なインデックス設定が可能である。When the first index is generated in the manual mode, the target similarity designated by trial and error can be presented to the user from the data accumulation status, so that efficient index setting is possible. is there.

[Brief description of the drawings]

【図１】本発明の一実施例に係る類似検索装置の概略構
成を示す図FIG. 1 is a diagram showing a schematic configuration of a similarity search device according to an embodiment of the present invention.

【図２】同実施例における第一インデックスの生成手順
を示すフローチャートFIG. 2 is a flowchart showing a procedure for generating a first index in the embodiment.

【図３】図２のデータ分類の手順を示すフローチャートFIG. 3 is a flowchart showing a procedure of data classification in FIG. 2;

【図４】図２の目標類似度生成の手順を示すフローチャ
ートFIG. 4 is a flowchart showing a procedure of generating a target similarity in FIG. 2;

【図５】同実施例における事例データを示す図FIG. 5 is a diagram showing case data in the embodiment.

【図６】同実施例における地域の属性の階層関係を示す
図FIG. 6 is a view showing a hierarchical relationship of regional attributes in the embodiment.

【図７】同実施例における時期の属性の階層関係を示す
図FIG. 7 is a diagram showing a hierarchical relationship of a time attribute in the embodiment.

【図８】同実施例における食べ物の属性の階層関係を示
す図FIG. 8 is a diagram showing a hierarchical relationship between attributes of food in the embodiment.

【図９】同実施例における地域の属性値間の類似度を示
す図FIG. 9 is a diagram showing a similarity between attribute values of regions in the embodiment.

【図１０】同実施例における時期の属性値間の類似度を
示す図FIG. 10 is a diagram showing a degree of similarity between attribute values of time in the embodiment.

【図１１】同実施例における食べ物の属性値間の類似度
を示す図FIG. 11 is a diagram showing the similarity between attribute values of food in the embodiment.

【図１２】同実施例における地域の属性値間の類似度の
範囲を示す図FIG. 12 is a view showing a range of similarity between attribute values of regions in the embodiment.

【図１３】同実施例における時期の属性値間の類似度の
範囲を示す図FIG. 13 is a diagram showing a range of similarity between attribute values of time in the embodiment.

【図１４】同実施例における食べ物の属性値間の類似度
の範囲を示す図FIG. 14 is a diagram showing a range of similarity between attribute values of food in the embodiment.

【図１５】同実施例における地域についてｓの範囲を示
す図FIG. 15 is a diagram showing a range of s for an area in the embodiment.

【図１６】同実施例における時期についてｓの範囲を示
す図FIG. 16 is a diagram showing a range of s for timing in the embodiment.

【図１７】同実施例における食べ物についてｓの範囲を
示す図FIG. 17 is a diagram showing a range of s for food in the embodiment.

【図１８】同実施例における類似度の範囲ｒごとにグル
ープ化した結果を示す図FIG. 18 is a diagram showing a result of grouping for each similarity range r in the embodiment.

【図１９】同実施例における分類データを示す図FIG. 19 is a diagram showing classification data in the embodiment.

【図２０】同実施例における第一インデックスの他の生
成手順を示すフローチャートFIG. 20 is a flowchart showing another generation procedure of the first index in the embodiment.

【図２１】同実施例における類似度の範囲ｒごとにグラ
フ表示した結果を示す図FIG. 21 is a view showing a result of a graph display for each similarity range r in the embodiment.

【図２２】同実施例における類似度の範囲ｒごとに個数
計算をした結果を示す図FIG. 22 is a view showing the result of the number calculation for each similarity range r in the embodiment.

【図２３】同実施例における各事例の類似度の計算結果
を示す図FIG. 23 is a diagram showing a calculation result of the similarity of each case in the embodiment.

【図２４】従来の類似検索装置の概略構成を示す図FIG. 24 is a diagram showing a schematic configuration of a conventional similarity search device.

【図２５】従来の他の類似検索装置の概略構成を示す図FIG. 25 is a diagram showing a schematic configuration of another conventional similarity search device.

[Explanation of symbols]

１０１…入力装置１０２…入
出力制御部１０３…推論処理部１０４…第
一検索部１０５…第一インデックス生成部１０６…デ
ータ分類部１０７…目標類似度生成部１０８…分
類データ格納部１０９…データベース１１０…類
似評価知識１１１…第二検索部DESCRIPTION OF SYMBOLS 101 ... Input device 102 ... Input / output control part 103 ... Inference processing part 104 ... First search part 105 ... First index generation part 106 ... Data classification part 107 ... Target similarity generation part 108 ... Classification data storage part 109 ... Database 110 … Similarity evaluation knowledge 111… Second search unit

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 G06F 9/44 ＪＩＣＳＴファイル（ＪＯＩＳ)────────────────────────────────────────────────── ─── Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/30 G06F 9/44 JICST file (JOIS)

Claims

(57) [Claims]

1. A data storage means for storing a plurality of data represented by a plurality of sets of attributes and attribute values, and a similarity storing a hierarchical structure of the attribute values and similarity evaluation knowledge indicating a similarity between the attribute values. Based on the evaluation knowledge storage means, based on the similarity evaluation knowledge, a plurality of data stored in the data storage means are used to determine the range of attribute values for each attribute and the importance of each attribute given from the outside. Data classifying means for classifying according to the second index shown; target similarity generating means for generating a target similarity from the data classification result by the data classifying means; First index generating means for generating a first index indicating a range, using the first index, for a plurality of data stored in the data storage means A first search means for performing a search to narrow down the data amount; and a second search means for extracting data similar to data having a specific attribute value from the narrowed data using the second index. A similarity search device characterized by comprising:

2. A data storage means storing a plurality of data represented by a plurality of sets of attributes and attribute values, and a similarity storing a hierarchical structure of the attribute values and similarity evaluation knowledge indicating a similarity between the attribute values. Based on the evaluation knowledge storage means, based on the similarity evaluation knowledge, a plurality of data stored in the data storage means are used to determine the range of attribute values for each attribute and the importance of each attribute given from the outside. Data classifying means for classifying according to the second index shown; classification result presenting means for presenting the data classification result to a user; First index generating means for generating a first index to be shown, using the first index, search for a plurality of data stored in the data storage means First search means for narrowing the data amount by using the second index, and second search means for extracting data similar to data having a specific attribute value from the narrowed data using the second index. A similarity search device characterized by the following.

3. The similarity search device according to claim 1, further comprising target similarity changing means for interactively changing the target similarity.

4. A data storage means for storing a plurality of data represented by a plurality of sets of attributes and attribute values, and a similarity storing a hierarchical structure of the attribute values and similarity evaluation knowledge indicating a similarity between the attribute values. Based on the evaluation knowledge storage means, based on the similarity evaluation knowledge, a plurality of data stored in the data storage means are used to determine the range of attribute values for each attribute and the importance of each attribute given from the outside. Data classification means for classifying according to the second index shown, first index generation means for generating a first index indicating the range of attribute values of each attribute from the data classification result by the data classification means, and using the first index A first search unit that searches a plurality of data stored in the data storage unit to narrow down the data amount; and There, the similarity search apparatus characterized by comprising a second search means for extracting data similar to data having a specific attribute value from among the narrowed-down data.

5. The similarity search apparatus according to claim 1, further comprising a first index changing means for interactively changing the first index.

6. The similarity search apparatus according to claim 1, further comprising means for storing the data classification result.