JP7724506B2

JP7724506B2 - Question and answer search device and question and answer search program

Info

Publication number: JP7724506B2
Application number: JP2024002871A
Authority: JP
Inventors: 利之濱▲崎▼
Original assignee: Ndkcom
Current assignee: Ndkcom
Priority date: 2024-01-11
Filing date: 2024-01-11
Publication date: 2025-08-18
Anticipated expiration: 2044-01-11
Also published as: JP2025109137A

Description

特許法第３０条第２項適用令和５年６月１２日に「ソフトウエア・シンポジウム２０２３ｉｎ仙台」にて公開Article 30, Paragraph 2 of the Patent Act will be applied. Published at the "Software Symposium 2023 in Sendai" on June 12, 2023.

本発明は、質問を表すものとして入力された文字列に対応する問答文を検索する問答文検索装置及び問答文検索プログラムに関する。 The present invention relates to a question and answer search device and a question and answer search program that search for question and answer sentences corresponding to a character string entered as a question.

コールセンターのオペレータ業務のように問い合わせに対して回答を返す現場では、問い合わせ内容に対応する回答文を検索して出力（例えば、モニタに表示）するシステムが利用され、その具体例が特許文献１に記載されている。特許文献１に記載されたシステムは、問い合わせが想定される質問及びその質問に対する回答の組み合わせからなるＦＡＱを記憶したデータベースを具備している。 In situations where responses to inquiries are provided, such as in the work of call center operators, systems are used that search for and output (for example, display on a monitor) responses corresponding to the content of the inquiry; a specific example of this is described in Patent Document 1. The system described in Patent Document 1 has a database that stores FAQs consisting of combinations of anticipated questions and their answers.

当該システムは、質問者と質問者から質問を受けた応対者の会話からキーワードの抽出等を行って検索クエリを生成し、その検索クエリを基に、データベースに記憶されている問答文（ＦＡＱ）を対象に、質問者からの質問に対応する問答文を検索してモニタに表示する。このシステムを利用すれば、応対者はモニタに表示された問答文を参照して質問者に返す回答を決定できる。
特許文献１には、検索クエリを基にした最適な問答文の検索を、検索クエリと問答文のキーワードマッチや概念検索によってなす旨が開示されている。 The system generates a search query by extracting keywords from the conversation between the asker and the person answering the question, and based on the search query, searches for a question and answer statement (FAQ) stored in a database that corresponds to the question from the asker and displays it on a monitor. By using this system, the person answering the question can refer to the question and answer statement displayed on the monitor and decide what answer to return to the asker.
Patent Document 1 discloses that an optimal question-and-answer sentence is searched for based on a search query by keyword matching or concept search between the search query and the question-and-answer sentence.

ここで、検索クエリ及び問答文は共に文字列であるところ、異なる文字列について各文字列が有する意味の類似性を比較する手法として、例えば、特許文献２に記載されたコサイン類似度の採用が考えられる。コサイン類似度は、文字列を特徴空間のベクトルとして表現し、ベクトルのなす角度から異なる文字列の意味の類似度を数値で定量的に表すものである。ベクトルが文字列の持つ意味を表現することを考慮すると、文字列の持つ意味が一意に定義できることが重要である。 Here, since both the search query and the question and answer are character strings, one possible method for comparing the similarity in meaning between different character strings is to use cosine similarity, as described in Patent Document 2. Cosine similarity represents character strings as vectors in feature space, and quantitatively expresses the similarity in meaning between different character strings numerically based on the angle between the vectors. Considering that vectors represent the meaning of character strings, it is important that the meaning of a character string can be uniquely defined.

特開２０２２－６６４８９号公報JP 2022-66489 A 特開２０２３－０８４６４４号公報JP 2023-084644 A

この点、例えば、コールセンターのオペレータ業務において、応対者が質問者との会話を通じて回答を検索するためのキーワードを入力し、データベースに記憶している問答文から適切な問答文を検索する場合、即ち、文章として成立しておらず、意味を一意に定めることができない文字列（例えば、３つの名詞がスペースを空けて入力された文字列）を基に問答文を検索する場合、単純にコサイン類似度を利用しても質問に対応する問答文が検索できないという問題が生じる。 For example, in the case of a call center operator, when a call center operator inputs keywords to search for answers through conversation with a questioner and searches for an appropriate question and answer from among the questions and answers stored in a database, that is, when searching for a question and answer based on a string of characters that does not form a sentence and whose meaning cannot be uniquely determined (for example, a string of three nouns entered with a space between them), simply using cosine similarity will not be enough to find the question and answer that corresponds to the question.

本発明は、かかる事情に鑑みてなされたもので、質問に基づいて入力された文字列が完全な文章でなくとも、質問に適した問答文を安定して検索可能な問答文検索装置及び問答文検索プログラムを提供することを目的とする。 The present invention was made in light of these circumstances, and aims to provide a question-and-answer sentence search device and question-and-answer sentence search program that can reliably search for question-and-answer sentences appropriate to a question, even if the character string entered based on the question is not a complete sentence.

前記目的に沿う第１の発明に係る問答文検索装置は、それぞれ対となる質問テキスト文及び回答テキスト文からなる複数の問答文を記憶したテキスト記憶部と、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列を基に該入力文字列に対応する前記問答文を検索する検索手段とを有する問答文検索装置であって、複数の前記質問テキスト文を特徴空間のベクトルにそれぞれ変換した複数の質問ベクトルを記憶したベクトル記憶部と、前記入力文字列を前記特徴空間のベクトルで表現した入力ベクトルに変換するベクトル変換手段とを備え、前記検索手段は、前記入力ベクトルに対する前記質問ベクトルの類似度として、コサイン類似度を補正した修正コサイン類似度を採用し、該修正コサイン類似度が大きい前記質問ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出し、前記修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記質問ベクトルの基となった前記質問テキスト文に含まれている語句との一致割合が高いほど、大きくなる。 A question and answer search device according to a first aspect of the present invention that achieves the above-mentioned objective comprises a text storage unit that stores a plurality of question and answer sentences, each consisting of a pair of question text sentences and answer text sentences, and search means that searches for the question and answer sentence corresponding to an input string based on an input string consisting of one or more words or phrases given as a string representing a question. The device is equipped with a vector storage unit that stores a plurality of question vectors obtained by converting each of the question text sentences into a vector in a feature space, and vector conversion means that converts the input string into an input vector expressed as a vector in the feature space. The search means uses modified cosine similarity, which is a correction of cosine similarity, as the similarity of the question vector to the input vector, and preferentially selects a question and answer sentence corresponding to the question vector with a larger modified cosine similarity as the question and answer sentence corresponding to the input string. The modified cosine similarity increases as the degree of match between the words or phrases included in the input string that formed the basis of the input vector and the words or phrases included in the question text sentence that formed the basis of the question vector increases.

前記目的に沿う第２の発明に係る問答文検索装置は、それぞれ対となる質問テキスト文及び回答テキスト文からなる複数の問答文を記憶したテキスト記憶部と、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列を基に該入力文字列に対応する前記問答文を検索する検索手段とを有する問答文検索装置であって、複数の前記回答テキスト文を特徴空間のベクトルにそれぞれ変換した複数の回答ベクトルを記憶したベクトル記憶部と、前記入力文字列を前記特徴空間のベクトルで表現した入力ベクトルに変換するベクトル変換手段とを備え、前記検索手段は、前記入力ベクトルに対する前記回答ベクトルの類似度として、コサイン類似度を補正した修正コサイン類似度を採用し、該修正コサイン類似度が大きい前記回答ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出し、前記修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記回答ベクトルの基となった前記回答テキスト文に含まれている語句との一致割合が高いほど、大きくなる。 A question and answer search device according to a second invention that achieves the above object includes a text storage unit that stores a plurality of question and answer sentences, each consisting of a pair of question text sentences and answer text sentences, and search means that searches for the question and answer sentence corresponding to an input string based on an input string consisting of one or more words given as a string representing a question. The device also includes a vector storage unit that stores a plurality of answer vectors obtained by converting each of the answer text sentences into a vector in a feature space, and vector conversion means that converts the input string into an input vector expressed as a vector in the feature space. The search means uses modified cosine similarity, which is a correction of cosine similarity, as the similarity of the answer vector to the input vector, and preferentially selects a question and answer sentence corresponding to an answer vector with a larger modified cosine similarity as the question and answer sentence corresponding to the input string. The modified cosine similarity increases the higher the match rate between the words contained in the input string that formed the basis of the input vector and the words contained in the answer text sentence that formed the basis of the answer vector.

前記目的に沿う第３の発明に係る問答文検索装置は、それぞれ対となる質問テキスト文及び回答テキスト文からなる複数の問答文を記憶したテキスト記憶部と、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列を基に該入力文字列に対応する前記問答文を検索する検索手段とを有する問答文検索装置であって、複数の前記質問テキスト文を特徴空間のベクトルにそれぞれ変換した複数の質問ベクトル及び複数の前記回答テキスト文を前記特徴空間のベクトルにそれぞれ変換した複数の回答ベクトルを記憶したベクトル記憶部と、前記入力文字列を前記特徴空間のベクトルで表現した入力ベクトルに変換するベクトル変換手段とを備え、前記検索手段は、前記入力ベクトルに対する前記質問ベクトルの類似度及び該入力ベクトルに対する前記回答ベクトルの類似度として、コサイン類似度をそれぞれ補正した第１及び第２の修正コサイン類似度をそれぞれ採用し、該第１の修正コサイン類似度が大きい前記質問ベクトルに対応する前記問答文及び前記第２の修正コサイン類似度が大きい前記回答ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出し、前記第１の修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記質問ベクトルの基となった前記質問テキスト文に含まれている語句との一致割合が高いほど、大きくなり、前記第２の修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記回答ベクトルの基となった前記回答テキスト文に含まれている語句との一致割合が高いほど、大きくなる。 A question and answer text search device according to a third invention that meets the above objective is a question and answer text search device having a text storage unit that stores a plurality of question and answer texts, each consisting of a pair of question text texts and answer text texts, and a search means that searches for the question and answer texts corresponding to an input string based on an input string consisting of one or more words or phrases given as a string representing a question, and is equipped with a vector storage unit that stores a plurality of question vectors obtained by converting a plurality of the question text texts into vectors in a feature space and a plurality of answer vectors obtained by converting a plurality of the answer text texts into vectors in the feature space, and a vector conversion means that converts the input string into an input vector expressed as a vector in the feature space, and the search means calculates the similarity of the question vector to the input vector and the answer vector to the input vector. First and second modified cosine similarities, which are obtained by correcting the cosine similarity, are used as the similarity between the input string and the answer vector, and the question and answer sentences corresponding to the question vector with larger first modified cosine similarity and the answer sentences corresponding to the answer vector with larger second modified cosine similarity are preferentially selected as the question and answer sentences corresponding to the input string, and the first modified cosine similarity increases as the degree of match between the words included in the input string on which the input vector is based and the words included in the question text on which the question vector is based increases, and the second modified cosine similarity increases as the degree of match between the words included in the input string on which the input vector is based and the words included in the answer text on which the answer vector is based increases.

前記目的に沿う第４の発明に係る問答文検索プログラムは、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列を基に、テキスト記憶部に記憶されている、それぞれ対となる質問テキスト文及び回答テキスト文からなる複数の問答文から、前記入力文字列に対応する前記問答文を検索する処理を、コンピュータに行わせる問答文検索プログラムであって、前記入力文字列を特徴空間のベクトルで表現した入力ベクトルに変換する処理と、複数の前記質問テキスト文を前記特徴空間のベクトルにそれぞれ変換した複数の質問ベクトルの、前記入力ベクトルに対する各類似度として、コサイン類似度を補正した修正コサイン類似度を採用し、該修正コサイン類似度が大きい前記質問ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出する処理とを、前記コンピュータに行わせ、前記修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記質問ベクトルの基となった前記質問テキスト文に含まれている語句との一致割合が高いほど、大きくなる。 A fourth aspect of the present invention that achieves the above-mentioned objective is a question and answer search program that causes a computer to perform a process of searching, based on an input string consisting of one or more words given as a string representing a question, for a question and answer sentence that corresponds to the input string from a plurality of question and answer sentences, each consisting of a pair of question text sentences and answer text sentences, which are stored in a text storage unit. The program causes the computer to perform a process of converting the input string into an input vector represented by a vector in a feature space, and a process of using modified cosine similarity, which is a correction of cosine similarity, as the similarity of each of a plurality of question vectors, obtained by converting a plurality of question text sentences into vectors in the feature space, to the input vector, and preferentially selecting a question and answer sentence that corresponds to the question vector with a larger modified cosine similarity as the question and answer sentence that corresponds to the input string. The modified cosine similarity becomes larger the higher the degree of match between the words contained in the input string that formed the basis of the input vector and the words contained in the question text sentence that formed the basis of the question vector.

前記目的に沿う第５の発明に係る問答文検索プログラムは、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列を基に、テキスト記憶部に記憶されている、それぞれ対となる質問テキスト文及び回答テキスト文からなる複数の問答文から、前記入力文字列に対応する前記問答文を検索する処理を、コンピュータに行わせる問答文検索プログラムであって、前記入力文字列を特徴空間のベクトルで表現した入力ベクトルに変換する処理と、複数の前記回答テキスト文を前記特徴空間のベクトルにそれぞれ変換した複数の回答ベクトルの、前記入力ベクトルに対する各類似度として、コサイン類似度を補正した修正コサイン類似度を採用し、該修正コサイン類似度が大きい前記回答ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出する処理とを、前記コンピュータに行わせ、前記修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記回答ベクトルの基となった前記回答テキスト文に含まれている語句との一致割合が高いほど、大きくなる。 A fifth aspect of the present invention, which achieves the above-mentioned objective, provides a question and answer search program that causes a computer to perform a process of searching, based on an input string consisting of one or more words or phrases given as a string representing a question, for a question and answer sentence that corresponds to the input string from a plurality of question and answer sentences, each consisting of a pair of question text sentences and answer text sentences, which are stored in a text storage unit. The program causes the computer to perform the following processes: converting the input string into an input vector represented by a vector in a feature space; and using modified cosine similarity, which is a correction of cosine similarity, as the similarity of a plurality of answer vectors, each obtained by converting a plurality of answer text sentences into vectors in the feature space, to the input vector; and preferentially selecting a question and answer sentence that corresponds to an answer vector with a larger modified cosine similarity as the question and answer sentence that corresponds to the input string; the modified cosine similarity becomes larger the higher the degree of match between the words or phrases included in the input string that formed the basis of the input vector and the words or phrases included in the answer text sentences that formed the basis of the answer vector.

前記目的に沿う第６の発明に係る問答文検索プログラムは、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列を基に、テキスト記憶部に記憶されている、それぞれ対となる質問テキスト文及び回答テキスト文からなる複数の問答文から、前記入力文字列に対応する前記問答文を検索する処理を、コンピュータに行わせる問答文検索プログラムであって、前記入力文字列を特徴空間のベクトルで表現した入力ベクトルに変換する処理と、複数の前記質問テキスト文を前記特徴空間のベクトルにそれぞれ変換した複数の質問ベクトルの、前記入力ベクトルに対する各類似度として、コサイン類似度を補正した第１の修正コサイン類似度を採用し、該第１の修正コサイン類似度が大きい前記質問ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出する処理と、複数の前記回答テキスト文を前記特徴空間のベクトルにそれぞれ変換した複数の回答ベクトルの、前記入力ベクトルに対する各類似度として、コサイン類似度を補正した第２の修正コサイン類似度を採用し、該第２の修正コサイン類似度が大きい前記回答ベクトルに対応する前記問答文ほど、前記入力文字列に対応する前記問答文として優先的に選出する処理とを、前記コンピュータに行わせ、前記第１の修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記質問ベクトルの基となった前記質問テキスト文に含まれている語句との一致割合が高いほど、大きくなり、前記第２の修正コサイン類似度は、前記入力ベクトルの基となった前記入力文字列に含まれている語句と前記回答ベクトルの基となった前記回答テキスト文に含まれている語句との一致割合が高いほど、大きくなる。 A sixth invention that achieves the above-mentioned objective is a question and answer search program that causes a computer to perform a process of searching, based on an input string consisting of one or more words given as a string representing a question, for a question and answer sentence that corresponds to the input string from a plurality of question and answer sentences that are each a pair of question text sentences and answer text sentences and are stored in a text storage unit, the process including: a process of converting the input string into an input vector expressed as a vector in a feature space; a process of using a first modified cosine similarity, which is obtained by correcting the cosine similarity, as the similarity to the input vector of a plurality of question vectors obtained by converting a plurality of the question text sentences into vectors in the feature space, and a process of preferentially selecting a question and answer sentence that corresponds to the question vector with a larger first modified cosine similarity as the question and answer sentence that corresponds to the input string; The computer is caused to perform a process of employing a second modified cosine similarity, obtained by correcting the cosine similarity, as the similarity to the input vector of a plurality of answer vectors obtained by converting the plurality of answer text sentences into vectors in the feature space, and preferentially selecting, as the question and answer sentence corresponding to the input string, a question and answer sentence corresponding to the answer vector with a larger second modified cosine similarity, wherein the first modified cosine similarity increases as the degree of match between the words contained in the input string on which the input vector is based and the words contained in the question text sentence on which the question vector is based, and the second modified cosine similarity increases as the degree of match between the words contained in the input string on which the input vector is based and the words contained in the answer text sentence on which the answer vector is based.

第１の発明に係る問答文検索装置は、複数の質問テキスト文を特徴空間のベクトルにそれぞれ変換した複数の質問ベクトルを記憶したベクトル記憶部と、入力文字列を特徴空間のベクトルで表現した入力ベクトルに変換するベクトル変換手段とを備え、検索手段が、入力ベクトルに対する質問ベクトルの類似度として、コサイン類似度を補正した修正コサイン類似度を採用し、修正コサイン類似度が大きい質問ベクトルに対応する問答文ほど、入力文字列に対応する問答文として優先的に選出し、修正コサイン類似度が、入力ベクトルの基となった入力文字列に含まれている語句と質問ベクトルの基となった質問テキスト文に含まれている語句との一致割合が高いほど、大きくなるので、文としての意味の類似性と語句の一致性を加味した問答文の検索ができ、質問に基づいて入力された文字列が完全な文章でなくとも、質問に適した問答文を安定して検索可能である。 A question and answer search device according to a first aspect of the present invention includes a vector storage unit that stores multiple question vectors obtained by converting multiple question text sentences into vectors in a feature space, and vector conversion means that converts an input character string into an input vector expressed as a vector in the feature space. The search means uses modified cosine similarity, which is a corrected cosine similarity, as the similarity of the question vector to the input vector, and preferentially selects question and answer sentences corresponding to question vectors with larger modified cosine similarity as question and answer sentences corresponding to the input character string. The higher the degree of match between the words contained in the input character string that formed the basis of the input vector and the words contained in the question text sentence that formed the basis of the question vector, the larger the modified cosine similarity becomes. This makes it possible to search for question and answer sentences that take into account the similarity of their meanings as sentences and the consistency of their words, and enables stable search for question and answer sentences that are appropriate for a question even if the string entered based on the question is not a complete sentence.

また、第２の発明に係る問答文検索装置及び第３の発明に係る問答文検索装置も、第１の発明に係る問答文検索装置と同等の構成を有することから、質問に基づいて入力された文字列が完全な文章でなくとも、質問に適した問答文を安定して検索可能である。 Furthermore, the question and answer sentence search device according to the second invention and the question and answer sentence search device according to the third invention also have the same configuration as the question and answer sentence search device according to the first invention, and are therefore able to reliably search for a question and answer sentence that is appropriate for the question even if the character string entered based on the question is not a complete sentence.

更に、第４の発明に係る問答文検索プログラム、第５の発明に係る問答文検索プログラム及び第６の発明に係る問答文検索プログラムは、第１の発明に係る問答文検索装置、第２の発明に係る問答文検索装置及び第３の発明に係る問答文検索装置にそれぞれ対応することから、質問に基づいて入力された文字列が完全な文章でなくとも、質問に適した問答文を安定して検索可能である。 Furthermore, the question and answer text search program according to the fourth invention, the question and answer text search program according to the fifth invention, and the question and answer text search program according to the sixth invention correspond to the question and answer text search device according to the first invention, the question and answer text search device according to the second invention, and the question and answer text search device according to the third invention, respectively, and therefore can reliably search for a question and answer text appropriate to the question even if the string of characters entered based on the question is not a complete sentence.

本発明の第１の実施の形態に係る問答文検索装置の説明図である。1 is an explanatory diagram of a question and answer text search device according to a first embodiment of the present invention; 同問答文検索装置による問答文を検索する処理の説明図である。10 is an explanatory diagram of a process of searching for a question and answer sentence by the question and answer sentence search device. FIG. 本発明の第２の実施の形態に係る問答文検索装置の説明図である。FIG. 10 is an explanatory diagram of a question and answer text search device according to a second embodiment of the present invention. 同問答文検索装置による問答文を検索する処理の説明図である。10 is an explanatory diagram of a process of searching for a question and answer sentence by the question and answer sentence search device. FIG. 本発明の第３の実施の形態に係る問答文検索装置の説明図である。FIG. 10 is an explanatory diagram of a question and answer text search device according to a third embodiment of the present invention. 同問答文検索装置による問答文を検索する処理の説明図である。10 is an explanatory diagram of a process of searching for a question and answer sentence by the question and answer sentence search device. FIG.

続いて、添付した図面を参照しつつ、本発明を具体化した実施の形態につき説明し、本発明の理解に供する。
図１、図２に示すように、本発明の第１の実施の形態に係る問答文検索装置１０は、それぞれ対となる質問テキスト文１１及び回答テキスト文１２からなる複数の問答文１３を記憶したテキスト記憶部１４と、質問を表す文字列として与えられる１つ又は複数の語句からなる入力文字列１５を基に入力文字列１５に対応する問答文１３を検索する検索手段１６とを有する装置である。以下、詳細に説明する。 Next, embodiments of the present invention will be described with reference to the accompanying drawings to facilitate understanding of the present invention.
1 and 2, a question and answer sentence search device 10 according to a first embodiment of the present invention is an apparatus having a text storage unit 14 that stores a plurality of question and answer sentences 13, each of which is made up of a pair of question text sentence 11 and answer text sentence 12, and a search means 16 that searches, based on an input character string 15 made up of one or more words given as a character string representing a question, for a question and answer sentence 13 that corresponds to the input character string 15. A detailed description will be given below.

問答文検索装置１０は、オペレーティングシステムや問答文検索用のソフトウェアプログラム（以下、「問答文検索プログラム」とも言う）等のソフトウェア及び当該ソフトウェアを搭載した（がインストールされた）コンピュータや記憶媒体等のハードウェアによって構成されている。従って、問答文検索装置１０が備えるテキスト記憶部１４、検索手段１６及びその他の手段や部はそれぞれソフトウェア及びハードウェアのいずれか一方又は双方によって構成されている。 The question and answer text search device 10 is composed of software such as an operating system and a software program for searching questions and answers (hereinafter also referred to as the "question and answer text search program"), and hardware such as a computer or storage medium that has the software installed. Therefore, the text storage unit 14, search means 16, and other means and units provided in the question and answer text search device 10 are each composed of either software or hardware, or both.

ここで言うコンピュータとは、例えば、タブレット型、ノート型及びデスクトップ型のパーソナルコンピュータ、携帯端末、並びに、サーバを意味し、問答文検索装置１０が有するコンピュータの台数に制限はない。よって、問答文検索装置１０は１台のコンピュータを有することもあるし、複数台のコンピュータを有することもある。
また、本実施の形態では、問答文検索装置１０が製品やサービスの使用者から受けた質問に対して回答を返すコールセンターで利用されるものであるが、これには限定されない。例えば、製品やサービスの使用者自身が問い合わせたい事柄（質問）を入力してその回答を得るようなシステムとして問答文検索装置１０を利用してもよい。 The term "computer" as used here refers to, for example, tablet, notebook, and desktop personal computers, mobile terminals, and servers, and there is no limit to the number of computers that the question and answer search device 10 may have. Therefore, the question and answer search device 10 may have one computer or multiple computers.
In addition, in this embodiment, the question and answer search device 10 is used in a call center that returns answers to questions received from users of products or services, but the use is not limited to this. For example, the question and answer search device 10 may be used as a system in which users of products or services themselves input inquiries (questions) that they wish to inquire about and receive answers to those inquiries.

問答文検索装置１０は、図１、図２に示すように、問答文検索装置１０の利用者（以下、単に「利用者」とも言う）が図示しない入力デバイスを用いて入力した入力文字列１５が与えられる入力部１７、及び、テキスト文を特徴空間（高次元空間、ベクトル空間）のベクトルに変換可能なベクトル変換手段１９を備えている。なお、テキスト文をベクトルにエンコードする手法には公知の学習済みモデルを採用している。 As shown in Figures 1 and 2, the question and answer search device 10 includes an input unit 17 to which an input string 15 is input by a user of the question and answer search device 10 (hereinafter simply referred to as "user") using an input device (not shown), and vector conversion means 19 capable of converting text sentences into vectors in a feature space (high-dimensional space, vector space). Note that a publicly known trained model is used as the method for encoding text sentences into vectors.

入力デバイスはキーボードやマイク等であり、利用者は、入力デバイスがキーボードの場合、キーボードの操作により入力文字列１５を入力することができ、入力デバイスがマイクの場合、マイクへの発声により入力文字列１５を入力することができる。１つの入力文字列１５は１つ又は複数の語句からなるテキスト文である。語句とは単語（例えば、「更新」、「証明書」）や複数の単語からなる句（例えば、「例外的な処置」、「証明書未提出」）を意味する。本実施の形態において、１つの入力文字列１５が複数の語句によって構成されている場合、当該入力文字列１５は、語句と語句の間に文字１つ分のスペースが入れられて、入力部１７に与えられる。 The input device may be a keyboard, a microphone, or the like. If the input device is a keyboard, the user can input the input string 15 by operating the keyboard. If the input device is a microphone, the user can input the input string 15 by speaking into the microphone. One input string 15 is a text sentence consisting of one or more phrases. A phrase means a word (e.g., "renewal," "certificate") or a phrase consisting of multiple words (e.g., "exceptional action," "certificate not submitted"). In this embodiment, if one input string 15 is made up of multiple phrases, the input string 15 is provided to the input unit 17 with a space of one character between each phrase.

入力部１７に接続されたベクトル変換手段１９は入力部１７から入力文字列１５を取得でき、取得した入力文字列１５を基にして、当該入力文字列１５を特徴空間のベクトルで表現した入力ベクトル２０を生成する。つまり、ベクトル変換手段１９は入力文字列１５を入力ベクトル２０に変換する。入力文字列１５をベクトル化することにより、例えば、複数の入力文字列１５それぞれの文としての意味の類似性を定量的に扱うことができる。 Vector conversion means 19 connected to input unit 17 can acquire input character strings 15 from input unit 17, and based on the acquired input character string 15, generates an input vector 20 that represents the input character string 15 as a vector in feature space. In other words, vector conversion means 19 converts input character strings 15 into input vectors 20. By vectorizing input character strings 15, it is possible, for example, to quantitatively handle the similarity in meaning of each of multiple input character strings 15 as a sentence.

テキスト記憶部１４が記憶している問答文１３は、質問及びその質問に対する回答をそれぞれ文字データ化した質問テキスト文１１及び回答テキスト文１２である。問答文１３は、問答文検索装置１０を使用する前に予め想定問答文として用意されたものや、問答文検索装置１０を用いた業務を行う中で得られたものである。テキスト記憶部１４は、質問テキスト文１１及び回答テキスト文１２に識別番号を紐付けて、各問答文１３を記憶している。 The question and answer sentences 13 stored in the text storage unit 14 are question text sentences 11 and answer text sentences 12, which are question and answer responses converted into character data. The question and answer sentences 13 are either prepared in advance as expected question and answer sentences before using the question and answer sentence search device 10, or obtained while performing work using the question and answer sentence search device 10. The text storage unit 14 stores each question and answer sentence 13 by linking an identification number to the question text sentence 11 and answer text sentence 12.

検索手段１６は、入力部１７、ベクトル変換手段１９及びテキスト記憶部１４に接続されている。検索手段１６は、入力部１７から入力文字列１５を、ベクトル変換手段１９から入力ベクトル２０を、テキスト記憶部１４から質問テキスト文１１及び回答テキスト文１２（問答文１３）をそれぞれ取得することができる。
また、検索手段１６には、テキスト記憶部１４に記憶されている全ての質問テキスト文１１にそれぞれ対応する複数の質問ベクトル２１を記憶したベクトル記憶部２２が接続されている。 The search means 16 is connected to the input unit 17, the vector conversion means 19, and the text storage unit 14. The search means 16 can obtain the input character string 15 from the input unit 17, the input vector 20 from the vector conversion means 19, and the question text sentence 11 and the answer text sentence 12 (question and answer sentence 13) from the text storage unit 14.
The search means 16 is also connected to a vector storage unit 22 that stores a plurality of question vectors 21 that correspond to all of the question text sentences 11 stored in the text storage unit 14 .

質問ベクトル２１は、図示しないベクトル変換手段が質問テキスト文１１を特徴空間のベクトルに変換したものであり、複数の質問ベクトル２１が予めベクトル記憶部２２に記憶されている。ベクトル記憶部２２は、各質問ベクトル２１を、テキスト記憶部１４が各質問テキスト文１１に紐付けて記憶している識別番号と共通する識別番号に紐付けて記憶している。テキスト記憶部１４に新たな質問テキスト文１１が識別番号と共に加えられると、その質問テキスト文１１に対応する質問ベクトル２１が同じ識別番号と共にベクトル記憶部２２に追加される。 Question vectors 21 are obtained by converting question text sentences 11 into vectors in feature space using vector conversion means (not shown), and multiple question vectors 21 are pre-stored in the vector storage unit 22. The vector storage unit 22 stores each question vector 21 by linking it to an identification number that is the same as the identification number stored in the text storage unit 14 and linked to each question text sentence 11. When a new question text sentence 11 is added to the text storage unit 14 along with its identification number, the question vector 21 corresponding to that question text sentence 11 is added to the vector storage unit 22 along with the same identification number.

本実施の形態では、ベクトル記憶部２２が、各質問ベクトル２１を、相互に類似する質問ベクトル２１が同じグループに含まれるようにグループ化して記憶している。これによって、ベクトル記憶部２２に記憶されている質問ベクトル２１は後述する近似最近傍検索が可能な対象となっている。ベクトル記憶部２２に記憶されている質問ベクトル２１が数十万個だとして、例えば、各グループに含まれる質問ベクトル２１の数は１千個前後である。また、質問ベクトル２１の特徴空間の次元数と入力ベクトル２０の特徴空間の次元数は同一である。 In this embodiment, the vector storage unit 22 stores each question vector 21 in groups such that mutually similar question vectors 21 are included in the same group. This makes the question vectors 21 stored in the vector storage unit 22 targets for approximate nearest neighbor search, which will be described later. Assuming that the vector storage unit 22 stores hundreds of thousands of question vectors 21, for example, the number of question vectors 21 included in each group is around 1,000. Furthermore, the number of dimensions of the feature space of the question vector 21 is the same as the number of dimensions of the feature space of the input vector 20.

検索手段１６は、ベクトル変換手段１９から入力ベクトル２０を取得する最近傍検索部２３、及び、当該入力ベクトル２０に対する質問ベクトル２１の類似度を算出する類似度演算部２４を具備している。
図２に示すように、入力部１７からベクトル変換手段１９に新たに入力文字列１５が与えられると、ベクトル変換手段１９は当該入力文字列１５を入力ベクトル２０に変換して最近傍検索部２３（検索手段１６）に与える。 The search means 16 includes a nearest neighbor search unit 23 that acquires an input vector 20 from the vector conversion means 19 , and a similarity calculation unit 24 that calculates the similarity of a query vector 21 to the input vector 20 .
As shown in FIG. 2, when a new input character string 15 is provided from the input unit 17 to the vector conversion means 19, the vector conversion means 19 converts the input character string 15 into an input vector 20 and provides it to the nearest neighbor search unit 23 (search means 16).

最近傍検索部２３は、ベクトル変換手段１９から１つの入力ベクトル２０を取得するごとに、近似最近傍検索によって、ベクトル記憶部２２にグループ化して記憶されている全ての質問ベクトル２１から、取得した入力ベクトル２０に対するコサイン類似度（修正コサイン類似度ではなく）が高い質問ベクトル２１を多く含む１つのグループを選択し、そのグループに含まれている全ての質問ベクトル２１を取得する。 Each time an input vector 20 is acquired from the vector conversion means 19, the nearest neighbor search unit 23 performs an approximate nearest neighbor search to select, from all the question vectors 21 grouped and stored in the vector storage unit 22, one group that contains many question vectors 21 with high cosine similarity (not modified cosine similarity) to the acquired input vector 20, and acquires all the question vectors 21 contained in that group.

本実施の形態において、最近傍検索部２３がベクトル記憶部２２から取得する質問ベクトル２１は複数個である（即ち、各グループには２個以上の質問ベクトル２１が含まれている）。また、近似最近傍検索として、本実施の形態では、ランダムプロジェクションツリーを採用しているが、これには限定されず、例えば、ＬＳＨ（ＬｏｃａｌｉｔｙＳｅｎｓｉｔｉｖｅＨａｓｈｉｎｇ）やＫＤｔｒｅｅを採用してもよい。ベクトル記憶部２２は、採用されている近似最近傍検索のアルゴリズムに適した形式で質問ベクトル２１を記憶している。 In this embodiment, the nearest neighbor search unit 23 acquires multiple query vectors 21 from the vector storage unit 22 (i.e., each group contains two or more query vectors 21). Furthermore, in this embodiment, a random projection tree is used for the approximate nearest neighbor search, but this is not limited to this; for example, LSH (Locality Sensitive Hashing) or KD tree may also be used. The vector storage unit 22 stores the query vectors 21 in a format suitable for the approximate nearest neighbor search algorithm used.

類似度演算部２４は、図１に示すように、入力部１７、ベクトル変換手段１９、最近傍検索部２３及びテキスト記憶部１４に接続されている。類似度演算部２４は、入力部１７に入力文字列１５が与えられる度にその入力文字列１５を入力部１７から直接又はメモリ（記憶媒体）を介して取得する。更に、類似度演算部２４は、ベクトル変換手段１９が入力文字列１５を入力ベクトル２０に変換する度にその入力ベクトル２０をベクトル変換手段１９から直接又はメモリを介して取得する。 As shown in FIG. 1, the similarity calculation unit 24 is connected to the input unit 17, vector conversion means 19, nearest neighbor search unit 23, and text storage unit 14. Each time an input string 15 is provided to the input unit 17, the similarity calculation unit 24 obtains the input string 15 directly from the input unit 17 or via memory (storage medium). Furthermore, each time the vector conversion means 19 converts the input string 15 into an input vector 20, the similarity calculation unit 24 obtains the input vector 20 directly from the vector conversion means 19 or via memory.

また、類似度演算部２４は、最近傍検索部２３が複数（ここでは、Ｔ個とする）の質問ベクトル２１をベクトル記憶部２２から取得するごとに、その複数の質問ベクトル２１を最近傍検索部２３から取得し、更に、取得した複数の質問ベクトル２１に紐づけられている識別番号を基に、取得した各質問ベクトル２１に対応する質問テキスト文１１（即ち、取得した各質問ベクトル２１の基となった質問テキスト文１１）及び回答テキスト文１２（つまり、取得した各質問ベクトル２１に対応する問答文１３）をテキスト記憶部１４から取得する。 Furthermore, each time the nearest neighbor search unit 23 acquires multiple (here, T) question vectors 21 from the vector storage unit 22, the similarity calculation unit 24 acquires the multiple question vectors 21 from the nearest neighbor search unit 23, and further acquires, based on the identification numbers linked to the multiple acquired question vectors 21, the question text sentence 11 (i.e., the question text sentence 11 on which each acquired question vector 21 was based) and the answer text sentence 12 (i.e., the question and answer sentence 13 corresponding to each acquired question vector 21) from the text storage unit 14 corresponding to each acquired question vector 21.

類似度演算部２４は、最近傍検索部２３から取得した複数の質問ベクトル２１を対象に、取得した入力ベクトル２０に対する類似度を算出する。よって、ベクトル記憶部２２に記憶されている質問ベクトル２１の数をＮ個とした際、検索手段１６は、近似最近傍検索によって、Ｎ個より少ない数の質問ベクトル２１を、修正コサイン類似度を算出する対象として選択することとなる。 The similarity calculation unit 24 calculates the similarity to the acquired input vector 20 for multiple question vectors 21 acquired from the nearest neighbor search unit 23. Therefore, when the number of question vectors 21 stored in the vector storage unit 22 is N, the search means 16 will select fewer than N question vectors 21 through the approximate nearest neighbor search as targets for calculating the modified cosine similarity.

類似度演算部２４は、入力ベクトル２０に対する質問ベクトル２１の類似度として、コサイン類似度を補正した修正コサイン類似度を採用し、修正コサイン類似度が大きい質問ベクトル２１に対応する問答文１３（該当の質問ベクトル２１を有する問答文１３）ほど、入力文字列１５に対応する問答文１３として優先的に選出する（入力文字列１５に対応する可能性が高い問答文１３として扱う）。類似度演算部２４には図示しないモニタ（出力デバイスの一例）に問答文１３を表示（出力）させることができる出力部２５が接続されている。 The similarity calculation unit 24 uses modified cosine similarity, which is a corrected cosine similarity, as the similarity of the question vector 21 to the input vector 20, and preferentially selects a question-and-answer sentence 13 (a question-and-answer sentence 13 having the relevant question vector 21) corresponding to the question vector 21 with a larger modified cosine similarity as the question-and-answer sentence 13 that corresponds to the input character string 15 (treating it as a question-and-answer sentence 13 that is more likely to correspond to the input character string 15). An output unit 25 that can display (output) the question-and-answer sentence 13 on a monitor (an example of an output device), not shown, is connected to the similarity calculation unit 24.

本実施の形態において、類似度演算部２４は、テキスト記憶部１４からＴ個の問答文１３を取得し、取得したＴ個の問答文１３の中から、修正コサイン類似度が大きい上位Ｐ個（Ｔ＞Ｐ）の質問ベクトル２１にそれぞれ対応するＰ個の問答文１３のみをモニタに表示させる対象として選択する。類似度演算部２４は、更に、選択したＰ個の問答文１３をソートして、問答文１３に対応する修正コサイン類似度が大きいほど、モニタにおいて入力文字列１５に対応する可能性が高い問答文１３として表示（例えば、問答文１３の一覧表の上側に表示）されるようにし、そのＰ個の問答文１３を出力部２５に与える。 In this embodiment, the similarity calculation unit 24 acquires T question-and-answer sentences 13 from the text storage unit 14, and selects, from the acquired T question-and-answer sentences 13, only P question-and-answer sentences 13 that correspond to the top P (T>P) question vectors 21 with the largest modified cosine similarities as targets for display on the monitor. The similarity calculation unit 24 further sorts the selected P question-and-answer sentences 13 so that the greater the modified cosine similarity corresponding to the question-and-answer sentence 13, the more likely it is to be displayed on the monitor as a question-and-answer sentence 13 that corresponds to the input character string 15 (for example, displayed at the top of the list of question-and-answer sentences 13), and provides the P question-and-answer sentences 13 to the output unit 25.

ここで、θを特徴空間における入力ベクトル２０と質問ベクトル２１のなす角度として、コサイン類似度は、ｃｏｓθとなり、入力ベクトル２０と質問ベクトル２１のなす角度のみの影響を受ける。これに対し、修正コサイン類似度は、入力ベクトル２０と質問ベクトル２１のなす角度に加えて、入力部１７から取得した入力文字列１５（最近傍検索部２３から取得した入力ベクトル２０の基となった入力文字列１５）に含まれている語句と最近傍検索部２３から取得した質問ベクトル２１の基となった質問テキスト文１１に含まれている語句との一致割合の影響も受ける。 Here, if θ is the angle between the input vector 20 and the question vector 21 in the feature space, the cosine similarity is cosθ, and is affected only by the angle between the input vector 20 and the question vector 21. In contrast, the modified cosine similarity is affected not only by the angle between the input vector 20 and the question vector 21, but also by the degree of match between the words contained in the input string 15 acquired from the input unit 17 (the input string 15 that formed the input vector 20 acquired from the nearest neighbor search unit 23) and the words contained in the question text sentence 11 that formed the question vector 21 acquired from the nearest neighbor search unit 23.

具体的には、入力文字列１５に含まれている語句と質問テキスト文１１に含まれている語句との一致割合が高いほど修正コサイン類似度は大きくなり、当該一致割合が低いほど修正コサイン類似度は小さくなる。なお、ここで言う、修正コサイン類似度が大きいや、小さいは、相対的な大小を意味し、例えば、入力文字列１５に含まれている語句のうち質問テキスト文１１に含まれている語句に一致する割合が１００％のときの修正コサイン類似度を最大値とし、同割合が８０％のときの修正コサイン類似度を最大値から所定の値を引いた値とする場合も、入力文字列１５に含まれている語句と質問テキスト文１１に含まれている語句との一致割合が高いほど修正コサイン類似度が大きくなり、当該一致割合が低いほど修正コサイン類似度が小さくなることに該当する（これは他の実施の形態についても同様）。本実施の形態では、以下の式１で演算されるＳを修正コサイン類似度としている。 Specifically, the higher the match rate between the words in the input character string 15 and the words in the question text sentence 11, the higher the modified cosine similarity; and the lower the match rate, the lower the modified cosine similarity. Note that "large" and "small" modified cosine similarity here refer to relative magnitudes. For example, even if the modified cosine similarity is set to the maximum value when the match rate between the words in the input character string 15 and the words in the question text sentence 11 is 100%, and the modified cosine similarity is set to the maximum value minus a predetermined value when the match rate is 80%, this also means that the higher the match rate between the words in the input character string 15 and the words in the question text sentence 11, the higher the modified cosine similarity, and the lower the match rate, the lower the modified cosine similarity (this also applies to other embodiments). In this embodiment, S, calculated using the following formula 1, is the modified cosine similarity.

Ｓ＝ｃｏｓ（α×θ）式１ S=cos(α×θ) Formula 1

ここで、入力文字列１５に含まれている語句の総数をＱとし、入力文字列１５に含まれている語句のうち質問テキスト文１１にも含まれている語句の数をｘ、ｋを０＜ｋ＜１の係数とした際、式１中のαは以下の式２で表される。 Here, let Q be the total number of words contained in input string 15, let x be the number of words contained in input string 15 that are also contained in question text sentence 11, and let k be a coefficient in the range 0<k<1. Then, α in equation 1 can be expressed as equation 2 below.

α＝１－ｋ×（ｘ／Ｑ）式２ α=1−k×(x/Q) Equation 2

なお、修正コサイン類似度として、式１、式２で表現される以外のものを採用してもよく、例えば、αは、α＝１．２－ｋ×（ｘ／Ｑ）、α＝１－（１／２）×ｔａｎｈ｛π（ｘ／Ｑ）｝、α＝（１／２）×［１－ｔａｎｈ｛π×（（ｘ／Ｑ）－１）｝］又はα＝（３／４）－（１／４）×ｔａｎｈ［２π×｛（ｘ／Ｑ）－（１／２）｝］であってもよい。 Modified cosine similarities other than those expressed by Equation 1 and Equation 2 may also be used. For example, α may be α = 1.2 - k × (x/Q), α = 1 - (1/2) × tanh{π(x/Q)}, α = (1/2) × [1 - tanh{π × ((x/Q) - 1)}], or α = (3/4) - (1/4) × tanh[2π × {(x/Q) - (1/2)}].

入力ベクトル２０に対する質問ベクトル２１の類似度として、修正コサイン類似度を採用することによって、コサイン類似度を採用する場合と比較して、モニタに利用者にとって有益な問答文１３が多く表示されることを実験的検証によって確認した。ここで、入力文字列１５及び質問テキスト文１１の双方に含まれている語句を、入力文字列１５に含まれている語句の文字列と質問テキスト文１１にも含まれている語句の文字列とが完全に一致する場合の語句のみとしてもよいし、文字列が完全に一致する語句に加えて、両者の語句が同じ対象を表すものであれば文字列が異なっている語句も対象にしてもよい。 Experimental verification has confirmed that by using modified cosine similarity as the similarity between the input vector 20 and the question vector 21, more questions and answers 13 that are useful to the user are displayed on the monitor compared to when cosine similarity is used. Here, the words contained in both the input string 15 and the question text sentence 11 may be limited to words whose strings of characters exactly match those contained in the input string 15 and the question text sentence 11, or in addition to words whose strings of characters exactly match, words whose strings of characters differ may also be included as long as the words in both sentences represent the same subject.

また、検索手段１６が入力ベクトル２０に対する修正コサイン類似度を算出する対象を、ベクトル記憶部２２に記憶されている全ての質問ベクトル２１とはせず、所定数の質問ベクトル２１に限定しているのは、検索手段１６がベクトル変換手段１９から入力ベクトル２０を取得してからＰ個の問答文１３を出力部２５に与えるまでの処理時間を短縮するためである。本実施の形態では、ベクトル記憶部２２に記憶されている質問ベクトル２１が多く（例えば、数十万個）、更に、修正コサイン類似度の算出の際に入力ベクトル２０の基となる入力文字列１５の語句に対する質問ベクトル２１の基となる質問テキスト文１１の語句の一致性を検出することから、当該処理時間の短縮効果は大きい。 Furthermore, the search means 16 does not calculate the modified cosine similarity for the input vector 20 for all question vectors 21 stored in the vector storage unit 22, but limits it to a predetermined number of question vectors 21 in order to shorten the processing time from when the search means 16 acquires the input vector 20 from the vector conversion means 19 until when it provides P question and answer sentences 13 to the output unit 25. In this embodiment, there are a large number of question vectors 21 stored in the vector storage unit 22 (e.g., hundreds of thousands), and furthermore, when calculating the modified cosine similarity, the matching between the words in the input string 15 that is the basis of the input vector 20 and the words in the question text sentence 11 that is the basis of the question vector 21 is detected, so the effect of shortening the processing time is significant.

ここまでの説明から、問答文検索装置１０に利用される問答文検索プログラムは、質問を表す文字列として与えられる入力文字列１５を基に、テキスト記憶部１４に記憶されている複数の問答文１３から、入力文字列１５に対応する問答文１３を検索する処理を、コンピュータに行わせるソフトウェアプログラムであって、入力文字列１５を入力ベクトル２０に変換する処理と、複数の質問ベクトル２１の入力ベクトル２０に対する各類似度として修正コサイン類似度を採用し、修正コサイン類似度が大きい質問ベクトル２１に対応する問答文１３ほど、入力文字列１５に対応する問答文１３として優先的に選出する処理とを、コンピュータに行わせるものと言える。 From the explanation so far, it can be said that the question and answer search program used in the question and answer search device 10 is a software program that causes a computer to perform the process of searching for a question and answer sentence 13 that corresponds to an input string 15 from multiple question and answer sentences 13 stored in the text storage unit 14, based on an input string 15 given as a string representing a question. It causes the computer to perform the process of converting the input string 15 into an input vector 20, and the process of using modified cosine similarity as the similarity of each of the multiple question vectors 21 to the input vector 20, and preferentially selecting a question and answer sentence 13 that corresponds to a question vector 21 with a larger modified cosine similarity as the question and answer sentence 13 that corresponds to the input string 15.

問答文検索装置１０は、入力ベクトル２０と質問ベクトル２１の修正コサイン類似度を算出したが、これには限定されない。次に、図３、図４を参照して、回答テキスト文１２を特徴空間のベクトルに変換した回答ベクトル３１の入力ベクトル２０に対する修正コサイン類似度を算出する問答文検索装置３０について説明する。なお、問答文検索装置３０において、問答文検索装置１０と同様の構成については、同じ符号を付して詳しい説明を省略する。 The question and answer text search device 10 calculates the modified cosine similarity between the input vector 20 and the question vector 21, but this is not limited to this. Next, with reference to Figures 3 and 4, we will explain the question and answer text search device 30 that calculates the modified cosine similarity between the input vector 20 and the answer vector 31, which is obtained by converting the answer text sentence 12 into a vector in feature space. Note that in the question and answer text search device 30, components that are similar to those in the question and answer text search device 10 are assigned the same reference numerals and detailed descriptions will be omitted.

本発明の第２の実施の形態に係る問答文検索装置３０は、図３、図４に示すように、問答文検索装置１０が具備しているものと同様の入力部１７、ベクトル変換手段１９、テキスト記憶部１４及び出力部２５を備えている。
一方、問答文検索装置３０が有する検索手段３２及びベクトル記憶部３３は、問答文検索装置１０の検索手段１６及びベクトル記憶部２２とはそれぞれ異なっている。 As shown in Figures 3 and 4, the question and answer sentence search device 30 according to the second embodiment of the present invention includes an input unit 17, vector conversion means 19, text storage unit 14, and output unit 25, which are similar to those included in the question and answer sentence search device 10.
On the other hand, the search means 32 and the vector storage unit 33 of the question and answer sentence search device 30 are different from the search means 16 and the vector storage unit 22 of the question and answer sentence search device 10, respectively.

ベクトル記憶部３３は、質問ベクトル２１ではなく、回答ベクトル３１を記憶している。検索手段３２は、ベクトル変換手段１９から入力ベクトル２０を、ベクトル記憶部３３から回答ベクトル３１をそれぞれ取得する最近傍検索部３４、及び、当該入力ベクトル２０に対する回答ベクトル３１の修正コサイン類似度を算出する類似度演算部３５を具備している。 The vector storage unit 33 stores the answer vector 31, not the question vector 21. The search means 32 includes a nearest neighbor search unit 34 that retrieves the input vector 20 from the vector conversion means 19 and the answer vector 31 from the vector storage unit 33, and a similarity calculation unit 35 that calculates the modified cosine similarity of the answer vector 31 with respect to the input vector 20.

最近傍検索部３４は、ベクトル変換手段１９から１つの入力ベクトル２０を取得するごとに、近似最近傍検索によって、ベクトル記憶部３３にグループ化して記憶されている全ての回答ベクトル３１から、取得した入力ベクトル２０に対するコサイン類似度が高い回答ベクトル３１を多く含む１つのグループを選択し、そのグループに含まれている全ての回答ベクトル３１を取得する。 Each time an input vector 20 is acquired from the vector conversion means 19, the nearest neighbor search unit 34 performs an approximate nearest neighbor search to select, from all answer vectors 31 grouped and stored in the vector storage unit 33, one group that contains many answer vectors 31 with high cosine similarity to the acquired input vector 20, and acquires all answer vectors 31 contained in that group.

類似度演算部３５は、入力文字列１５を入力部１７から直接又はメモリを介して取得し、入力ベクトル２０をベクトル変換手段１９から直接又はメモリを介して取得する。更に、類似度演算部３５は、ベクトル記憶部２２から複数（ここでは、Ｔ個とする）の回答ベクトル３１を取得した最近傍検索部３４からそのＴ個の回答ベクトル３１を取得し、取得した各回答ベクトル３１に対応する質問テキスト文１１及び回答テキスト文１２（即ち、取得した各回答ベクトル３１に対応する問答文１３）をテキスト記憶部１４から取得する。 The similarity calculation unit 35 obtains the input character string 15 directly from the input unit 17 or via memory, and obtains the input vector 20 directly from the vector conversion means 19 or via memory. Furthermore, the similarity calculation unit 35 obtains multiple (here, T) answer vectors 31 from the nearest neighbor search unit 34, which has obtained the T answer vectors 31 from the vector storage unit 22, and obtains the question text sentence 11 and answer text sentence 12 corresponding to each obtained answer vector 31 (i.e., the question and answer sentence 13 corresponding to each obtained answer vector 31) from the text storage unit 14.

その後、類似度演算部３５は、最近傍検索部３４から取得したＴ個の回答ベクトル３１を対象に、取得した入力ベクトル２０に対する修正コサイン類似度を算出する。よって、ベクトル記憶部３３に記憶されている回答ベクトル３１の数をＭ個とした際、検索手段３２は、近似最近傍検索によって、Ｍ個より少ない数の回答ベクトル３１を、修正コサイン類似度を算出する対象として選択することとなる。 Then, the similarity calculation unit 35 calculates the modified cosine similarity for the acquired input vector 20 for the T answer vectors 31 acquired from the nearest neighbor search unit 34. Therefore, when the number of answer vectors 31 stored in the vector storage unit 33 is M, the search means 32 will select fewer than M answer vectors 31 through the approximate nearest neighbor search as targets for calculating the modified cosine similarity.

類似度演算部３５が算出する修正コサイン類似度は、上述した式１及び式２で表される修正コサイン類似度と同等であり、式２におけるｘは、入力文字列１５に含まれている語句のうち回答ベクトル３１の基となる回答テキスト文１２に含まれている語句の数となる。即ち、当該修正コサイン類似度は、入力ベクトル２０の基となった入力文字列１５に含まれている語句と回答ベクトル３１の基となった回答テキスト文１２に含まれている語句との一致割合が高いほど、大きくなる。 The modified cosine similarity calculated by the similarity calculation unit 35 is equivalent to the modified cosine similarity expressed by the above-mentioned formulas 1 and 2, where x in formula 2 is the number of words contained in the input string 15 that are contained in the answer text sentence 12 that is the basis of the answer vector 31. In other words, the modified cosine similarity increases the higher the degree of match between the words contained in the input string 15 that is the basis of the input vector 20 and the words contained in the answer text sentence 12 that is the basis of the answer vector 31.

そして、類似度演算部３５は、修正コサイン類似度が大きい上位Ｐ個（Ｔ＞Ｐ）の回答ベクトル３１にそれぞれ対応するＰ個の問答文１３のみをモニタに表示させる対象として選択し、更に、選択したＰ個の問答文１３について、修正コサイン類似度が大きいほど、モニタにおいて入力文字列１５に対応する可能性が高い問答文１３として表示されるように、Ｐ個の問答文１３をソートし出力部２５に与える。 Then, the similarity calculation unit 35 selects only the P question-and-answer sentences 13 corresponding to the top P (T>P) answer vectors 31 with the largest modified cosine similarities as the targets for display on the monitor, and further sorts the P selected question-and-answer sentences 13 so that the larger the modified cosine similarity, the more likely the question-and-answer sentence 13 is displayed on the monitor as a question-and-answer sentence 13 that corresponds to the input character string 15, and provides the sorted P question-and-answer sentences 13 to the output unit 25.

また、問答文検索装置３０に利用される問答文検索プログラムは、質問を表す文字列として与えられる入力文字列１５を基に、テキスト記憶部１４に記憶されている複数の問答文１３から、入力文字列１５に対応する問答文１３を検索する処理を、コンピュータに行わせるソフトウェアプログラムであって、入力文字列１５を入力ベクトル２０に変換する処理と、複数の回答ベクトル３１の入力ベクトル２０に対する各類似度として修正コサイン類似度を採用し、修正コサイン類似度が大きい回答ベクトル３１に対応する問答文１３ほど、入力文字列１５に対応する問答文１３として優先的に選出する処理とを、コンピュータに行わせるものである。 The question and answer sentence search program used by the question and answer sentence search device 30 is a software program that causes a computer to perform the process of searching for a question and answer sentence 13 that corresponds to an input string 15 from multiple question and answer sentences 13 stored in the text storage unit 14, based on an input string 15 given as a string representing a question. The program causes the computer to perform the process of converting the input string 15 into an input vector 20, and the process of using modified cosine similarity as the similarity of each of the multiple answer vectors 31 to the input vector 20, and preferentially selecting a question and answer sentence 13 that corresponds to an answer vector 31 with a larger modified cosine similarity as the question and answer sentence 13 that corresponds to the input string 15.

また、問答文検索装置１０は入力ベクトル２０と質問ベクトル２１の修正コサイン類似度のみを算出し、問答文検索装置３０は入力ベクトル２０と回答ベクトル３１の修正コサイン類似度のみを算出したが、これには限定されない。図５、図６を参照して、入力ベクトル２０と質問ベクトル２１の修正コサイン類似度及び入力ベクトル２０と回答ベクトル３１の修正コサイン類似度の双方を算出する問答文検索装置５０について説明する。なお、問答文検索装置５０において、問答文検索装置１０又は問答文検索装置３０と同様の構成については、同じ符号を付して詳しい説明を省略する。 Furthermore, while the question and answer text search device 10 calculates only the modified cosine similarity between the input vector 20 and the question vector 21, and the question and answer text search device 30 calculates only the modified cosine similarity between the input vector 20 and the answer vector 31, this is not limited to this. With reference to Figures 5 and 6, a question and answer text search device 50 that calculates both the modified cosine similarity between the input vector 20 and the question vector 21 and the modified cosine similarity between the input vector 20 and the answer vector 31 will be described. Note that in the question and answer text search device 50, components similar to those in the question and answer text search device 10 or the question and answer text search device 30 are designated by the same reference numerals and detailed description thereof will be omitted.

本発明の第３の実施の形態に係る問答文検索装置５０は、図５、図６に示すように、問答文検索装置１０や問答文検索装置３０が具備しているものと同様の入力部１７、ベクトル変換手段１９、テキスト記憶部１４及び出力部２５を備えている。
一方、問答文検索装置５０が有する検索手段５１及びベクトル記憶部５２は、問答文検索装置１０の検索手段１６及びベクトル記憶部２２や、問答文検索装置３０の検索手段３２及びベクトル記憶部３３とはそれぞれ異なっている。 As shown in Figures 5 and 6, the question and answer sentence search device 50 according to the third embodiment of the present invention includes an input unit 17, vector conversion means 19, text storage unit 14, and output unit 25, which are similar to those included in the question and answer sentence search device 10 and the question and answer sentence search device 30.
On the other hand, the search means 51 and vector storage unit 52 of the question and answer search device 50 are different from the search means 16 and vector storage unit 22 of the question and answer search device 10 and the search means 32 and vector storage unit 33 of the question and answer search device 30, respectively.

ベクトル記憶部５２は、質問ベクトル２１及び回答ベクトル３１の双方を識別番号に紐付けて記憶している。検索手段５１は、ベクトル変換手段１９から入力ベクトル２０を取得し、ベクトル記憶部５２から質問ベクトル２１及び回答ベクトル３１を取得する最近傍検索部５３、及び、当該入力ベクトル２０に対する質問ベクトル２１の類似度（以下、当該類似度を「類似度Ｐ」とする）及び当該入力ベクトル２０に対する回答ベクトル３１の類似度（以下、当該類似度を「類似度Ｑ」とする）の双方を算出する類似度演算部５４を具備している。 The vector storage unit 52 stores both the question vector 21 and the answer vector 31, linked to an identification number. The search means 51 includes a nearest neighbor search unit 53 that acquires the input vector 20 from the vector conversion means 19 and the question vector 21 and answer vector 31 from the vector storage unit 52, and a similarity calculation unit 54 that calculates both the similarity of the question vector 21 to the input vector 20 (hereinafter, this similarity will be referred to as "similarity P") and the similarity of the answer vector 31 to the input vector 20 (hereinafter, this similarity will be referred to as "similarity Q").

本実施の形態では、類似度Ｐとしてコサイン類似度を補正した第１の修正コサイン類似度を採用し、類似度Ｑとしてコサイン類似度を補正した第２の修正コサイン類似度を採用している。本実施の形態において、第１の修正コサイン類似度は問答文検討装置１０が利用する修正コサイン類似度と同じで、第２の修正コサイン類似度は問答文検討装置３０が利用する修正コサイン類似度と同じである。 In this embodiment, the first modified cosine similarity, which is a corrected cosine similarity, is used as the similarity P, and the second modified cosine similarity, which is a corrected cosine similarity, is used as the similarity Q. In this embodiment, the first modified cosine similarity is the same as the modified cosine similarity used by the question-and-answer text review device 10, and the second modified cosine similarity is the same as the modified cosine similarity used by the question-and-answer text review device 30.

ここで、第１の修正コサイン類似度は、入力ベクトル２０の基となった入力文字列１５に含まれている語句と質問ベクトル２１の基となった質問テキスト文１１に含まれている語句との一致割合が高いほど大きくなればよく、第２の修正コサイン類似度は、同入力文字列１５に含まれている語句と回答ベクトル３１の基となった回答テキスト文１２に含まれている語句との一致割合が高いほど大きくなればよい。本実施の形態では、第１の修正コサイン類似度及び第２の修正コサイン類似度は、いずれも上記式１及び式２で表される（即ち、ｘが表す対象を除いて同じ数式により算出される）。なお、第１の修正コサイン類似度を算出する数式と第２の修正コサイン類似度を算出する数式は同じである必要はなく、多少異なっていてもよく、例えば、式２の係数ｋの値が異なっていてもよい。 Here, the first modified cosine similarity should increase as the degree of match between the words contained in the input string 15 on which the input vector 20 is based and the words contained in the question text sentence 11 on which the question vector 21 is based, increases, and the second modified cosine similarity should increase as the degree of match between the words contained in the input string 15 and the words contained in the answer text sentence 12 on which the answer vector 31 is based, increases. In this embodiment, the first modified cosine similarity and the second modified cosine similarity are both expressed by the above Equation 1 and Equation 2 (i.e., they are calculated using the same formula except for the object represented by x). Note that the formula for calculating the first modified cosine similarity and the formula for calculating the second modified cosine similarity do not need to be the same and may be slightly different; for example, the value of the coefficient k in Equation 2 may be different.

質問ベクトル２１はベクトル記憶部５２にグループ化して記憶され、回答ベクトル３１もベクトル記憶部５２にグループ化して記憶されている。
最近傍検索部５３は、ベクトル変換手段１９から１つの入力ベクトル２０を取得するごとに、近似最近傍検索によって、ベクトル記憶部５２に記憶されている全ての質問ベクトル２１から、取得した入力ベクトル２０に対する類似度が高い質問ベクトル２１を多く含む１つのグループを選択し、そのグループに含まれている全ての質問ベクトル２１を取得すると共に、近似最近傍検索によって、ベクトル記憶部５２に記憶されている全ての回答ベクトル３１から、取得した入力ベクトル２０に対する類似度が高い回答ベクトル３１を多く含む１つのグループを選択し、そのグループに含まれている全ての回答ベクトル３１を取得する。 The question vectors 21 are grouped and stored in the vector storage unit 52, and the answer vectors 31 are also grouped and stored in the vector storage unit 52.
Each time an input vector 20 is acquired from the vector conversion means 19, the nearest neighbor search unit 53 performs an approximate nearest neighbor search to select, from all question vectors 21 stored in the vector storage unit 52, a group that contains many question vectors 21 that have a high similarity to the acquired input vector 20, and acquires all question vectors 21 included in that group. At the same time, the nearest neighbor search unit 53 performs an approximate nearest neighbor search to select, from all answer vectors 31 stored in the vector storage unit 52, a group that contains many answer vectors 31 that have a high similarity to the acquired input vector 20, and acquires all answer vectors 31 included in that group.

類似度演算部５４は、入力文字列１５を入力部１７から直接又はメモリを介して取得し、入力ベクトル２０をベクトル変換手段１９から直接又はメモリを介して取得する。更に、類似度演算部５４は、ベクトル記憶部２２から複数（ここでは、Ｔ１個とする）の質問ベクトル２１及び複数（ここでは、Ｔ２個とする）の回答ベクトル３１を取得した最近傍検索部５３からそのＴ１個の質問ベクトル２１及びそのＴ２個の回答ベクトル３１を取得し、取得した各質問ベクトル２１に対応する質問テキスト文１１及び回答テキスト文１２、並びに、取得した各回答ベクトル３１に対応する質問テキスト文１１及び回答テキスト文１２をテキスト記憶部１４から取得する。 The similarity calculation unit 54 obtains the input character string 15 directly from the input unit 17 or via memory, and obtains the input vector 20 directly from the vector conversion means 19 or via memory. Furthermore, the similarity calculation unit 54 obtains multiple (here, T1) question vectors 21 and multiple (here, T2) answer vectors 31 from the nearest neighbor search unit 53, which has obtained the T1 question vectors 21 and the T2 answer vectors 31 from the vector storage unit 22, and obtains the question text sentences 11 and answer text sentences 12 corresponding to each of the obtained question vectors 21, as well as the question text sentences 11 and answer text sentences 12 corresponding to each of the obtained answer vectors 31, from the text storage unit 14.

その後、類似度演算部５４は、最近傍検索部５３から取得したＴ１個の質問ベクトル２１を対象に、取得した入力ベクトル２０に対する第１の修正コサイン類似度を算出し、最近傍検索部５３から取得したＴ２個の回答ベクトル３１を対象に、取得した入力ベクトル２０に対する第２の修正コサイン類似度を算出する。 Then, the similarity calculation unit 54 calculates a first modified cosine similarity for the acquired input vector 20 for the T1 question vectors 21 acquired from the nearest neighbor search unit 53, and calculates a second modified cosine similarity for the acquired input vector 20 for the T2 answer vectors 31 acquired from the nearest neighbor search unit 53.

よって、ベクトル記憶部５２に記憶されている質問ベクトル２１の数及び回答ベクトル３１の数をそれぞれＮ個及びＭ個とした際、検索手段５１は、近似最近傍検索によって、Ｎ個より少ない数の質問ベクトル２１を第１の修正コサイン類似度を算出する対象として選択し、Ｍ個より少ない数の回答ベクトル３１を第２の修正コサイン類似度を算出する対象として選択することとなる。 Therefore, when the number of question vectors 21 and the number of answer vectors 31 stored in the vector storage unit 52 are N and M, respectively, the search means 51 will use the approximate nearest neighbor search to select fewer than N question vectors 21 as targets for calculating the first modified cosine similarity, and fewer than M answer vectors 31 as targets for calculating the second modified cosine similarity.

そして、類似度演算部５４は、算出したＴ１個の第１の修正コサイン類似度及びＴ２個の第２の修正コサイン類似度（Ｔ１＋Ｔ２個の修正コサイン類似度）を対象に、値が大きい上位Ｐ個（Ｔ１＋Ｔ２＞Ｐ）に該当する質問ベクトル２１及び回答ベクトル３１にそれぞれ対応するＰ個の問答文１３のみをモニタに表示させる対象として選択し、更に、選択したＰ個の問答文１３について、第１の修正コサイン類似度又は第２の修正コサイン類似度が大きいほど、モニタにおいて入力文字列１５に対応する可能性が高い問答文１３として表示されるように、Ｐ個の問答文１３をソートし出力部２５に与える。 Then, the similarity calculation unit 54 selects, from the calculated T1 first modified cosine similarities and T2 second modified cosine similarities (T1 + T2 modified cosine similarities), only P question-and-answer sentences 13 corresponding to the question vectors 21 and answer vectors 31 with the highest P values (T1 + T2 > P) as candidates for display on the monitor, and further sorts the P selected question-and-answer sentences 13 so that the larger the first modified cosine similarity or second modified cosine similarity, the more likely the question-and-answer sentence 13 is displayed on the monitor as a sentence that corresponds to the input string 15, and provides the sorted P question-and-answer sentences 13 to the output unit 25.

従って、類似度演算部５４は、第１の修正コサイン類似度が大きい質問ベクトル２１に対応する問答文１３及び第２の修正コサイン類似度が大きい回答ベクトル３１に対応する問答文１３ほど、入力文字列１５に対応する問答文１３として優先的に選出することとなる。 Therefore, the similarity calculation unit 54 will preferentially select, as the question and answer sentence 13 corresponding to the input character string 15, the question and answer sentence 13 corresponding to the question vector 21 with the larger first modified cosine similarity and the question and answer sentence 13 corresponding to the answer vector 31 with the larger second modified cosine similarity.

また、問答文検索装置５０に利用される問答文検索プログラムは、質問を表す文字列として与えられる入力文字列１５を基に、テキスト記憶部１４に記憶されている複数の問答文１３から、入力文字列１５に対応する問答文１３を検索する処理を、コンピュータに行わせるソフトウェアプログラムであって、入力文字列１５を入力ベクトル２０に変換する処理と、複数の質問ベクトル２１の入力ベクトル２０に対する各類似度として第１の修正コサイン類似度を採用し、第１の修正コサイン類似度が大きい質問ベクトル２１に対応する問答文１３ほど、入力文字列１５に対応する問答文１３として優先的に選出する処理と、複数の回答ベクトル３１の入力ベクトル２０に対する各類似度として第２の修正コサイン類似度を採用し、第２の修正コサイン類似度が大きい回答ベクトル３１に対応する問答文１３ほど、入力文字列１５に対応する問答文１３として優先的に選出する処理とを、コンピュータに行わせるものである。 The question and answer search program used in the question and answer search device 50 is a software program that causes a computer to perform the following processes: search for a question and answer sentence 13 that corresponds to an input string 15 from multiple question and answer sentences 13 stored in the text storage unit 14, based on an input string 15 given as a string representing a question. The program causes the computer to perform the following processes: converting the input string 15 into an input vector 20; using a first modified cosine similarity as the similarity of each of the multiple question vectors 21 to the input vector 20, and preferentially selecting, as the question and answer sentence 13 that corresponds to the input string 15, question and answer sentences 13 that correspond to the question vector 21 with a larger first modified cosine similarity; and using a second modified cosine similarity as the similarity of each of the multiple answer vectors 31 to the input vector 20, and preferentially selecting, as the question and answer sentence 13 that corresponds to the input string 15, question and answer sentences 13 that correspond to the answer vector 31 with a larger second modified cosine similarity.

以上、本発明の実施の形態を説明したが、本発明は、上記した形態に限定されるものでなく、要旨を逸脱しない条件の変更等は全て本発明の適用範囲である。
例えば、前記第１の実施の形態において、検索手段は、ベクトル記憶部に記憶されている全ての質問ベクトルを、修正コサイン類似度を算出する対象としてもよく、前記第２の実施の形態において、検索手段は、ベクトル記憶部に記憶されている全ての回答ベクトルを、修正コサイン類似度を算出する対象としてもよい。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and all changes in conditions that do not depart from the gist of the present invention are within the scope of application of the present invention.
For example, in the first embodiment, the search means may calculate modified cosine similarities for all question vectors stored in the vector storage unit, and in the second embodiment, the search means may calculate modified cosine similarities for all answer vectors stored in the vector storage unit.

また、前記第３の実施の形態において、検索手段は、ベクトル記憶部に記憶されている全ての質問ベクトルを第１の修正コサイン類似度を算出する対象とし、ベクトル記憶部に記憶されている全ての回答ベクトルを第２の修正コサイン類似度を算出する対象とすることができる。 Furthermore, in the third embodiment, the search means can calculate the first modified cosine similarity for all question vectors stored in the vector storage unit, and calculate the second modified cosine similarity for all answer vectors stored in the vector storage unit.

更に、ベクトル記憶部に質問ベクトル及び回答ベクトルの双方を記憶させ、入力文字列に対応する問答文として優先する問答文を選出するにあたり、入力ベクトルに対して修正コサイン類似度（第１の修正コサイン類似度及び第２の修正コサイン類似度を含む）を算出する対象を、質問ベクトルのみとするか、回答ベクトルのみとするか、又は、質問ベクトル及び回答ベクトルの双方とするかを、設定により選択できるようにしてもよい。 Furthermore, both question vectors and answer vectors may be stored in the vector storage unit, and when selecting a priority question and answer sentence corresponding to an input character string, it may be possible to select by settings whether the target for calculating the modified cosine similarity (including the first modified cosine similarity and the second modified cosine similarity) for the input vector is only the question vector, only the answer vector, or both the question vector and the answer vector.

１０：問答文検索装置、１１：質問テキスト文、１２：回答テキスト文、１３：問答文、１４：テキスト記憶部、１５：入力文字列、１６：検索手段、１７：入力部、１９：ベクトル変換手段、２０：入力ベクトル、２１：質問ベクトル、２２：ベクトル記憶部、２３：最近傍検索部、２４：類似度演算部、２５：出力部、３０：問答文検索装置、３１：回答ベクトル、３２：検索手段、３３：ベクトル記憶部、３４：最近傍検索部、３５：類似度演算部、５０：問答文検索装置、５１：検索手段、５２：ベクトル記憶部、５３：最近傍検索部、５４：類似度演算部 10: Question and Answer Search Device, 11: Question Text, 12: Answer Text, 13: Question and Answer, 14: Text Storage Unit, 15: Input String, 16: Search Means, 17: Input Unit, 19: Vector Conversion Means, 20: Input Vector, 21: Question Vector, 22: Vector Storage Unit, 23: Nearest Neighbor Search Unit, 24: Similarity Calculation Unit, 25: Output Unit, 30: Question and Answer Search Device, 31: Answer Vector, 32: Search Means, 33: Vector Storage Unit, 34: Nearest Neighbor Search Unit, 35: Similarity Calculation Unit, 50: Question and Answer Search Device, 51: Search Means, 52: Vector Storage Unit, 53: Nearest Neighbor Search Unit, 54: Similarity Calculation Unit

Claims

A question and answer sentence search device having a text storage unit that stores a plurality of question and answer sentences, each of which is a pair of a question text sentence and an answer text sentence, and a search means that searches for the question and answer sentence corresponding to an input character string based on an input character string that is made up of one or more words or phrases given as a character string representing a question,
a vector storage unit that stores a plurality of question vectors obtained by converting the plurality of question text sentences into vectors in a feature space;
vector conversion means for converting the input character string into an input vector expressed as a vector in the feature space;
the search means employs a modified cosine similarity obtained by correcting a cosine similarity as the similarity of the question vector to the input vector, and preferentially selects the question and answer sentence corresponding to the input character string as the question and answer sentence corresponding to the input character string when the modified cosine similarity is greater;
The question and answer search device is characterized in that the modified cosine similarity increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the question text that formed the question vector increases.

The question and answer text search device according to claim 1, wherein when the number of question vectors stored in the vector storage unit is N, the search means selects, by an approximate nearest neighbor search, a number of question vectors less than N as targets for calculating the modified cosine similarity.

A question and answer sentence search device having a text storage unit that stores a plurality of question and answer sentences, each of which is a pair of a question text sentence and an answer text sentence, and a search means that searches for the question and answer sentence corresponding to an input character string based on an input character string that is made up of one or more words or phrases given as a character string representing a question,
a vector storage unit that stores a plurality of answer vectors obtained by converting the plurality of answer text sentences into vectors in a feature space;
vector conversion means for converting the input character string into an input vector expressed as a vector in the feature space;
the search means employs a modified cosine similarity obtained by correcting a cosine similarity as the similarity of the answer vector to the input vector, and preferentially selects the question and answer sentence corresponding to the answer vector with a larger modified cosine similarity as the question and answer sentence corresponding to the input character string;
A question and answer text search device characterized in that the modified cosine similarity becomes larger the higher the degree of match between the words contained in the input string that formed the input vector and the words contained in the answer text that formed the answer vector.

In the question and answer text search device described in claim 3, when the number of answer vectors stored in the vector storage unit is M, the search means selects, by approximate nearest neighbor search, answer vectors fewer than M as targets for calculating the modified cosine similarity.

A question and answer sentence search device having a text storage unit that stores a plurality of question and answer sentences, each of which is a pair of a question text sentence and an answer text sentence, and a search means that searches for the question and answer sentence corresponding to an input character string based on an input character string that is made up of one or more words or phrases given as a character string representing a question,
a vector storage unit that stores a plurality of question vectors obtained by converting each of the plurality of question text sentences into vectors in a feature space and a plurality of answer vectors obtained by converting each of the plurality of answer text sentences into vectors in the feature space;
vector conversion means for converting the input character string into an input vector expressed as a vector in the feature space;
the search means employs first and second modified cosine similarities, which are corrected cosine similarities, as the similarity of the question vector to the input vector and the similarity of the answer vector to the input vector, respectively, and preferentially selects, as the question and answer sentence corresponding to the input character string, a question and answer sentence corresponding to the question vector with a larger first modified cosine similarity and a question and answer sentence corresponding to the answer vector with a larger second modified cosine similarity;
the first modified cosine similarity increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the question text sentence that formed the question vector increases, and the second modified cosine similarity increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the answer text sentence that formed the answer vector increases.

In the question and answer text search device described in claim 5, when the number of question vectors and the number of answer vectors stored in the vector storage unit are N and M, respectively, the search means selects, by an approximate nearest neighbor search, question vectors fewer than N as targets for calculating the first modified cosine similarity, and selects answer vectors fewer than M as targets for calculating the second modified cosine similarity.

A question and answer search program that causes a computer to perform a process of searching, based on an input character string consisting of one or more words given as a character string representing a question, a question and answer sentence corresponding to the input character string from a plurality of question and answer sentences each consisting of a pair of question text sentences and answer text sentences, the question and answer sentences being stored in a text storage unit, the process comprising:
A process of converting the input character string into an input vector expressed as a vector in a feature space;
employing a modified cosine similarity obtained by correcting cosine similarity as a similarity to the input vector of a plurality of question vectors obtained by converting the plurality of question text sentences into vectors in the feature space, and preferentially selecting, as the question and answer sentence corresponding to the input character string, a question and answer sentence corresponding to the question vector with a larger modified cosine similarity;
a modified cosine similarity that increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the question text that formed the question vector increases.

A question and answer search program that causes a computer to perform a process of searching, based on an input character string consisting of one or more words given as a character string representing a question, a question and answer sentence corresponding to the input character string from a plurality of question and answer sentences each consisting of a pair of question text sentences and answer text sentences, the question and answer sentences being stored in a text storage unit, the process comprising:
A process of converting the input character string into an input vector expressed as a vector in a feature space;
employing a modified cosine similarity obtained by correcting cosine similarity as each similarity to the input vector of a plurality of answer vectors obtained by converting the plurality of answer text sentences into vectors in the feature space, and preferentially selecting the question and answer sentence corresponding to the answer vector with a larger modified cosine similarity as the question and answer sentence corresponding to the input character string;
A question and answer search program characterized in that the modified cosine similarity increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the answer text that formed the answer vector increases.

A question and answer search program that causes a computer to perform a process of searching, based on an input character string consisting of one or more words given as a character string representing a question, a question and answer sentence corresponding to the input character string from a plurality of question and answer sentences each consisting of a pair of question text sentences and answer text sentences, the question and answer sentences being stored in a text storage unit, the process comprising:
A process of converting the input character string into an input vector expressed as a vector in a feature space;
a process of adopting a first modified cosine similarity obtained by correcting cosine similarity as each similarity to the input vector of a plurality of question vectors obtained by converting the plurality of question text sentences into vectors in the feature space, and preferentially selecting a question and answer sentence corresponding to the question vector with a larger first modified cosine similarity as the question and answer sentence corresponding to the input character string;
adopting a second modified cosine similarity obtained by correcting the cosine similarity as each similarity to the input vector of a plurality of answer vectors obtained by converting the plurality of answer text sentences into vectors in the feature space, and preferentially selecting, as the question and answer sentence corresponding to the input character string, a question and answer sentence corresponding to the answer vector having a larger second modified cosine similarity;
the first modified cosine similarity increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the question text sentence that formed the question vector increases, and the second modified cosine similarity increases as the degree of match between the words contained in the input string that formed the input vector and the words contained in the answer text sentence that formed the answer vector increases.