JP5590610B2

JP5590610B2 - Synonym determining device, synonym determining method and program

Info

Publication number: JP5590610B2
Application number: JP2010258182A
Authority: JP
Inventors: 健吉村; 隼赤塚
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2010-11-18
Filing date: 2010-11-18
Publication date: 2014-09-17
Anticipated expiration: 2030-11-18
Also published as: JP2012108795A

Description

本発明は、検索に使用されたキーワードの同義語を判定する技術に関する。 The present invention relates to a technique for determining a synonym of a keyword used for a search.

キーワードを取得し、取得したキーワードを含むＷｅｂページのアドレスを提供する検索サービスにおいては、様々なキーワードが検索される。キーワードは、カタカナ、ひらがな、漢字、ローマ字など、同じ意味であっても様々な表記で入力される場合がある。また、「サーバ」と「サーバー」のように表記がゆれることや、キーワードを入力する際に入力を誤り、例えば、入力したキーワードのうち一文字だけ入力を誤るということも生じえる。検索サービスのユーザにとっては、表記の違いや表記ゆれ、誤った入力があっても、ユーザが得たい情報が提供されるのが望ましい。そのためには、表記の違いや表記ゆれ、誤った綴りでキーワードが入力されても、入力されたキーワードと同義と考えられる同義語の検索結果もユーザに提供することが望まれる。 In a search service that acquires a keyword and provides an address of a Web page that includes the acquired keyword, various keywords are searched. The keywords may be input in various notation even if they have the same meaning, such as katakana, hiragana, kanji, and romaji. In addition, the notation such as “server” and “server” may fluctuate, or an input error may occur when a keyword is input, for example, only one character of the input keyword may be input incorrectly. For users of search services, it is desirable to provide information that the user wants to obtain even if there is a difference in notation, a notation or an incorrect input. For this purpose, it is desirable to provide the user with search results of synonyms that are considered synonymous with the input keyword even if the keyword is input with a difference in notation, a notation fluctuation, or an incorrect spelling.

同義語を得る技術としては、様々な技術が提案されている。
例えば特許文献１においては、同義の判定対象となる一方の第１文字列に関する第１関連テキストと、他方の第２文字列に関する第２関連テキストを取得し、第１関連テキストにおける第２文字列の出現頻度と、第２関連テキストにおける第１文字列の出現頻度に基づいて、第１文字列と第２文字列が同義であるか否か判定する技術が開示されている。
また、特許文献２においては、類似の文章の対に対して、係り受け、対の文書に共通して含まれる共通表現、一方の文書にのみ含まれる相違表現を解析することにより、同義表現を抽出する技術が開示されている。
また、同義の判定に読みを用いた技術もあり、例えば特許文献３においては、入力されたテキストから同義語の候補を抽出し、表記および読みを正規化したうえで同義語候補のペアを生成し、同義語の判定を行う技術が開示されている。 Various techniques have been proposed for obtaining synonyms.
For example, in Patent Document 1, a first related text related to one first character string to be a synonymous determination target and a second related text related to the other second character string are acquired, and a second character string in the first related text is acquired. A technique for determining whether the first character string and the second character string are synonymous based on the appearance frequency of the first character string and the appearance frequency of the first character string in the second related text is disclosed.
Further, in Patent Document 2, synonymous expressions are analyzed by analyzing dependency expressions, common expressions that are commonly included in a pair of documents, and differential expressions that are included only in one document. Techniques for extraction are disclosed.
There is also a technique that uses reading for synonymous determination. For example, in Patent Document 3, synonym candidate candidates are extracted from input text, and a pair of synonym candidates is generated after normalizing notation and reading. However, a technique for determining synonyms is disclosed.

特開２００８−２９３２３１号公報JP 2008-293231 A 特開２００８−２３４１７５号公報JP 2008-234175 A 特開２００９−２２３４６３号公報JP 2009-223463 A

特許文献１〜３に開示された技術によれば、同義語を判定することが可能となっているものの、同義語を判定するために文書を収集する必要があり、多くの同義語を判定するためには大量の文書を必要とするため、判定に費やす処理量や時間が膨大となる。 According to the techniques disclosed in Patent Documents 1 to 3, it is possible to determine synonyms, but it is necessary to collect documents in order to determine synonyms, and many synonyms are determined. In order to do so, a large amount of documents are required, and the amount of processing and time spent for the determination are enormous.

本発明は、上述した背景の下になされたものであり、文書を収集せずに同義語を判定する技術を提供することを目的とする。 The present invention has been made under the above-described background, and an object thereof is to provide a technique for determining synonyms without collecting documents.

本発明は、検索のキーワードおよび前記キーワードの検索結果からアクセスされた情報のアドレスの組のログを取得する取得手段と、前記組においてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出するアクセス確率算出手段と、前記組においてキーワードと対のアドレスへ該キーワードの検索結果で誘導された誘導確率を、該アドレスの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出する誘導確率算出手段と、前記ログから第１の組と、前記第１の組とアドレスが同じでキーワードが異なる第２の組を取得し、前記第１の組のキーワードと前記第２の組のキーワードとの類似度を、前記第１の組の前記ログにおける登場回数と、前記第２の組の前記ログにおける登場回数から算出する類似度算出手段と、前記類似度算出手段で算出された類似度を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する判定手段とを有し、前記類似度算出手段は、前記第１の組のキーワードに係る前記アクセス確率と、前記第２の組のキーワードに係る前記誘導確率から類似度を算出する同義語判定装置を提供する。
また、本発明は、検索のキーワードおよび前記キーワードの検索結果からアクセスされた情報のアドレスの組のログを取得する取得手段と、前記組においてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出するアクセス確率算出手段と、前記ログから第１の組と、前記第１の組とアドレスが同じでキーワードが異なる第２の組を取得し、前記第１の組のキーワードと前記第２の組のキーワードとの類似度を、前記第１の組の前記ログにおける登場回数と、前記第２の組の前記ログにおける登場回数と、前記アクセス確率算出手段が算出したアクセス確率とに基づいて算出する類似度算出手段と、前記類似度算出手段で算出された類似度を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する判定手段とを有する同義語判定装置を提供する。 The present invention provides an acquisition means for acquiring a search keyword and a log of a set of addresses of information accessed from the search result of the keyword, and an access probability of accessing the address paired with the keyword in the set from the search result of the keyword And an access probability calculation means for calculating the number of appearances of the keyword in the log and the number of appearances of the set in the log, and a guidance probability derived from the search result of the keyword to an address paired with the keyword in the set. , The number of appearances of the address in the log, and the guidance probability calculating means for calculating from the number of appearances of the set in the log, the first set from the log, and the first set from the log and the same keyword and different keywords Obtain a second set, and the similarity between the first set of keywords and the second set of keywords Using the similarity calculated from the number of appearances in the log of the first set, the number of appearances in the log of the second set, and the similarity calculated by the similarity calculation unit wherein the first set of keywords a second set of keywords have a determination means whether synonyms, the similarity calculation means, said access probability according to the first set of keywords, Provided is a synonym determination device for calculating a similarity from the guidance probability related to the second set of keywords .
The present invention also provides an acquisition means for acquiring a log of a search keyword and a set of addresses of information accessed from the search result of the keyword, and accessing the address of a pair with the keyword in the set from the search result of the keyword The access probability calculating means for calculating the access probability from the number of appearances of the keyword in the log and the number of appearances of the set in the log, the first set from the log, and the address of the first set are the same. A second set having different keywords is acquired, the similarity between the first set of keywords and the second set of keywords is calculated, the number of appearances in the log of the first set, and the second set. Similarity calculation means for calculating based on the number of appearances in the log and the access probability calculated by the access probability calculation means, and the similarity calculation Wherein said first set of keywords using the degree of similarity calculated by means a second set of keywords provides synonyms determination apparatus having a determination unit whether synonyms.

本発明においては、前記判定手段は、同義語と判定した場合、前記第１の組のキーワードと前記第２の組のキーワードのうち、前記ログに記憶された数が多いキーワードを正表記語とし、前記ログに記憶された数が少ないキーワードを前記正表記語の同義語と判定する構成としてもよい。 In the present invention, when the determination means determines that the synonym is a synonym, a keyword having a large number stored in the log among the keywords of the first group and the keyword of the second group is defined as a normal notation word. A configuration may be adopted in which keywords with a small number stored in the log are determined to be synonyms of the normal notation word.

また、本発明においては、前記第１の組のキーワードと前記第２の組のキーワードの読み方の距離を算出する第１距離算出手段を有し、前記判定手段は、前記類似度算出手段で算出された類似度と前記第１距離算出手段で算出された距離を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する構成としてもよい。 Further, in the present invention, there is provided a first distance calculating means for calculating a distance of reading the first set of keywords and the second set of keywords, and the determining means is calculated by the similarity calculating means. It is good also as a structure which determines whether the said 1st group of keywords and the said 2nd group of keywords are synonyms using the calculated similarity and the distance calculated by the said 1st distance calculation means.

また、本発明においては、前記判定手段は、前記第１距離算出手段で算出された距離と前記第１の組のキーワードの読みの長さを用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する構成としてもよい。 Also, in the present invention, the determination means uses the distance calculated by the first distance calculation means and the length of reading of the first set of keywords and the second set of keywords and the second It is good also as a structure which determines whether the keyword of this group is a synonym.

また、本発明においては、前記第１の組のキーワードと前記第２の組のキーワードの表記の距離を算出する第２距離算出手段を有し、前記判定手段は、前記類似度算出手段で算出された類似度と前記第２距離算出手段で算出された距離を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する構成としてもよい。 In the present invention, there is provided a second distance calculating means for calculating a distance between the first set of keywords and the second set of keywords, and the determining means is calculated by the similarity calculating means. It is good also as a structure which determines whether the said 1st set of keywords and the said 2nd set of keywords are synonyms using the calculated similarity and the distance calculated by the said 2nd distance calculation means.

また、本発明においては、前記判定手段は、前記第２距離算出手段で算出された距離と前記第１の組のキーワードの表記の文字数を用いて第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する構成としてもよい。 In the present invention, the determination means uses the distance calculated by the second distance calculation means and the number of characters in the first set of keywords to describe the first set of keywords and the second set. It is good also as a structure which determines whether these keywords are synonyms.

また、本発明は、コンピュータ装置の制御手段が、検索のキーワードおよび前記キーワードの検索結果からアクセスされた情報のアドレスの組のログを取得する取得ステップと、前記制御手段が、前記組においてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出するアクセス確率算出ステップと、前記制御手段が、前記組においてキーワードと対のアドレスへ該キーワードの検索結果で誘導された誘導確率を、該アドレスの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出する誘導確率算出ステップと、前記制御手段が、前記ログから第１の組と、前記第１の組とアドレスが同じでキーワードが異なる第２の組を取得し、前記第１の組のキーワードと前記第２の組のキーワードとの類似度を、前記第１の組の前記ログにおける登場回数と、前記第２の組の前記ログにおける登場回数から算出する類似度算出ステップと、前記制御手段が、前記類似度算出ステップで算出された類似度を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する判定ステップとを有し、前記類似度算出ステップは、前記第１の組のキーワードに係る前記アクセス確率と、前記第２の組のキーワードに係る前記誘導確率から類似度を算出する同義語判定方法を提供する。
また、本発明は、コンピュータ装置の制御手段が、検索のキーワードおよび前記キーワードの検索結果からアクセスされた情報のアドレスの組のログを取得する取得ステップと、前記制御手段が、前記組においてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出するアクセス確率算出ステップと、前記制御手段が、前記ログから第１の組と、前記第１の組とアドレスが同じでキーワードが異なる第２の組を取得し、前記第１の組のキーワードと前記第２の組のキーワードとの類似度を、前記第１の組の前記ログにおける登場回数と、前記第２の組の前記ログにおける登場回数と、前記アクセス確率算出手段が算出したアクセス確率とに基づいて算出する類似度算出ステップと、前記制御手段が、前記類似度算出ステップで算出された類似度を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する判定ステップとを有する同義語判定方法を提供する。 In addition, the present invention provides an acquisition step in which a control unit of a computer apparatus acquires a log of a search keyword and an address set of information accessed from the keyword search result, and the control unit includes a keyword in the set. An access probability calculating step of calculating an access probability of accessing a pair of addresses from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the set in the log; and the control means includes the group A guidance probability calculation step of calculating a guidance probability induced by the keyword search result to an address paired with the keyword from the number of appearances of the address in the log and the number of appearances of the set in the log; and the control means However, the first set from the log has the same address as the first set. A second set having different keywords is acquired, the similarity between the first set of keywords and the second set of keywords is calculated, the number of appearances in the log of the first set, and the second set. A similarity calculation step that is calculated from the number of appearances in the log, and the control means uses the similarity calculated in the similarity calculation step to determine the first set of keywords and the second set of keywords. have a synonymous whether determination step, the similarity calculation step, said access probability according to the first set of keywords, the similarity from the inductive probability according to the second set of keywords A synonym determination method for calculating
In addition, the present invention provides an acquisition step in which a control unit of a computer apparatus acquires a log of a search keyword and an address set of information accessed from the keyword search result, and the control unit includes a keyword in the set. An access probability calculating step of calculating an access probability of accessing a pair of addresses from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the pair in the log; and the control means includes the log To obtain a first set and a second set having the same address and different keywords as the first set, and calculating the similarity between the first set of keywords and the second set of keywords, The number of appearances in the log of the first set, the number of appearances in the log of the second set, and the access probability calculating means A similarity calculation step that is calculated based on the issued access probability, and the control means uses the similarity calculated in the similarity calculation step to use the first set of keywords and the second set of keywords. And a determination step for determining whether or not is a synonym.

また、本発明は、コンピュータを、検索のキーワードおよび前記キーワードの検索結果からアクセスされた情報のアドレスの組のログを取得する取得手段と、前記取得手段で前記キーワードとアドレスが取得される毎に、取得されたキーワードおよびアドレスの組をログとして記憶部に記憶させる記憶部制御手段と、前記組においてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出するアクセス確率算出手段と、前記組においてキーワードと対のアドレスへ該キーワードの検索結果で誘導された誘導確率を、該アドレスの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出する誘導確率算出手段と、前記ログから第１の組と、前記第１の組とアドレスが同じでキーワードが異なる第２の組を取得し、前記第１の組のキーワードと前記第２の組のキーワードとの類似度を、前記第１の組の前記ログにおける登場回数と、前記第２の組の前記ログにおける登場回数から算出する類似度算出手段と、前記類似度算出手段で算出された類似度を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する判定手段として機能させるためのプログラムであって、前記類似度算出手段は、前記第１の組のキーワードに係る前記アクセス確率と、前記第２の組のキーワードに係る前記誘導確率から類似度を算出するプログラムを提供する。
また、本発明は、コンピュータを、検索のキーワードおよび前記キーワードの検索結果からアクセスされた情報のアドレスの組のログを取得する取得手段と、前記取得手段で前記キーワードとアドレスが取得される毎に、取得されたキーワードおよびアドレスの組をログとして記憶部に記憶させる記憶部制御手段と、前記組においてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードの前記ログにおける登場回数、および該組の前記ログにおける登場回数から算出するアクセス確率算出手段と、前記ログから第１の組と、前記第１の組とアドレスが同じでキーワードが異なる第２の組を取得し、前記第１の組のキーワードと前記第２の組のキーワードとの類似度を、前記第１の組の前記ログにおける登場回数と、前記第２の組の前記ログにおける登場回数と、前記アクセス確率算出手段が算出したアクセス確率とに基づいて算出する類似度算出手段と、前記類似度算出手段で算出された類似度を用いて前記第１の組のキーワードと前記第２の組のキーワードが同義語か否か判定する判定手段ととして機能させるためのプログラムを提供する。 In addition, the present invention provides a computer that acquires a log of a set of search keywords and an address of information accessed from the keyword search results, and each time the keyword and address are acquired by the acquisition unit. Storage unit control means for storing the acquired keyword and address pair in the storage unit as a log, and the access probability of accessing the address paired with the keyword in the set from the keyword search result in the log of the keyword An access probability calculating means for calculating the number of appearances and the number of appearances in the log of the set, and the induction probability derived from the search result of the keyword to an address paired with the keyword in the set, the appearance probability in the log of the address Guidance calculated from the number of times and the number of appearances in the log of the set A rate calculating unit, a first set from the log, the first set and the address to get the different second set the same keyword, the first set of keywords and the second set of keywords The similarity calculated by the number of appearances in the log of the first set and the number of appearances in the log of the second set, and the similarity calculated by the similarity calculation unit a program for the second set of keywords to function as a synonym whether determination hand stage and the first set of keywords using the similarity calculation means, said first Provided is a program for calculating a similarity degree from the access probability related to one set of keywords and the guidance probability related to the second set of keywords .
In addition, the present invention provides a computer that acquires a log of a set of search keywords and an address of information accessed from the keyword search results, and each time the keyword and address are acquired by the acquisition unit. Storage unit control means for storing the acquired keyword and address pair in the storage unit as a log, and the access probability of accessing the address paired with the keyword in the set from the keyword search result in the log of the keyword An access probability calculating means for calculating the number of appearances and the number of appearances of the set in the log, and obtaining a first set from the log and a second set having the same address and different keyword as the first set. , The degree of similarity between the first set of keywords and the second set of keywords is expressed as the log of the first set. Similarity calculation means calculated based on the number of appearances in the log, the number of appearances in the log of the second set, and the access probability calculated by the access probability calculation means, and the similarity calculated by the similarity calculation means A program for functioning as a determination means for determining whether or not the first set of keywords and the second set of keywords are synonyms using the degree is provided.

本発明によれば、文書を収集せずに同義語を判定することができる。 According to the present invention, synonyms can be determined without collecting documents.

本発明の一実施形態に係る通信システム１の全体構成を示した図。1 is a diagram showing an overall configuration of a communication system 1 according to an embodiment of the present invention. 同義語判定装置４０のハードウェア構成を示したブロック図。The block diagram which showed the hardware constitutions of the synonym determination apparatus 40. FIG. 同義語判定装置４０において実現する機能の構成を示したブロック図。The block diagram which showed the structure of the function implement | achieved in the synonym determination apparatus 40. FIG. 実施形態の動作を説明するためのシーケンス図。The sequence diagram for demonstrating operation | movement of embodiment. ＣＰＵ４０２が行う処理の流れを示したフローチャート。The flowchart which showed the flow of the process which CPU402 performs. ログの内容を示した図。The figure which showed the contents of the log. テーブルＴＢ１０のフォーマットを示した図。The figure which showed the format of table TB10. テーブルＴＢ１１のフォーマットを示した図。The figure which showed the format of table TB11. 第１実施形態の変形例に係るＣＰＵ４０２の処理の流れを示したフローチャート。The flowchart which showed the flow of the process of CPU402 which concerns on the modification of 1st Embodiment. テーブルＴＢ１２のフォーマットを示した図。The figure which showed the format of table TB12. 第２実施形態に係るＣＰＵ４０２の処理の流れを示したフローチャート。The flowchart which showed the flow of the process of CPU402 which concerns on 2nd Embodiment. テーブルＴＢ２０のフォーマットを示した図。The figure which showed the format of table TB20. 第３実施形態に係るＣＰＵ４０２の処理の流れを示したフローチャート。The flowchart which showed the flow of the process of CPU402 which concerns on 3rd Embodiment.

［第１実施形態］
（全体構成）
図１は、本発明の実施形態に係わる通信システム１の全体構成を例示した図である。通信回線２０は、本実施形態においてはインターネットである。図１においては一つの端末装置１０が通信回線２０に接続されているが、通信回線２０には複数の端末装置１０が接続可能である。端末装置１０は、Ｗｅｂブラウザがインストールされたパーソナルコンピュータであり、通信回線２０を介して他のコンピュータ装置とデータ通信を行う。なお、端末装置１０は、パーソナルコンピュータに限定されるものではなく、例えば、スマートフォン、携帯電話機、ＰＤＡ（Personal Digital Assistant）など、Ｗｅｂブラウザがインストールされているコンピュータ装置であってもよい。 [First Embodiment]
(overall structure)
FIG. 1 is a diagram illustrating an overall configuration of a communication system 1 according to an embodiment of the present invention. The communication line 20 is the Internet in this embodiment. In FIG. 1, one terminal device 10 is connected to the communication line 20, but a plurality of terminal devices 10 can be connected to the communication line 20. The terminal device 10 is a personal computer in which a Web browser is installed, and performs data communication with other computer devices via the communication line 20. The terminal device 10 is not limited to a personal computer, and may be a computer device in which a Web browser is installed, such as a smartphone, a mobile phone, or a PDA (Personal Digital Assistant).

検索サーバ装置３０は、キーワードを取得し、取得したキーワードに関連する情報（例えばキーワードを含むＷｅｂページ）のアドレス（場所）を提供する検索サービスを端末装置１０のユーザに提供するものであり、ロボット型検索エンジンを備えたサーバ装置である。検索サーバ装置３０は、通信回線２０に接続されており、通信回線２０を介して端末装置１０とデータ通信を行う。検索サーバ装置３０は、端末装置１０から送られたキーワードに関連するＷｅｂページを検索し、検索で見つけたＷｅｂページのアドレスを端末装置１０へ提供する機能を備えている。なお、検索サーバ装置３０は、Ｗｅｂページだけでなく、キーワードに関連する情報を含む文書ファイルのアドレスを提供してもよい。 The search server device 30 acquires a keyword, and provides a search service that provides an address (location) of information related to the acquired keyword (for example, a Web page including the keyword) to the user of the terminal device 10. A server device having a type search engine. The search server device 30 is connected to the communication line 20 and performs data communication with the terminal device 10 via the communication line 20. The search server device 30 has a function of searching for a Web page related to the keyword sent from the terminal device 10 and providing the terminal device 10 with the address of the Web page found by the search. Note that the search server device 30 may provide not only a Web page but also an address of a document file including information related to a keyword.

サーバ装置３５は、通信回線２０に接続されており、通信回線２０を介して端末装置１０とデータ通信を行うコンピュータ装置である。サーバ装置３５は、端末装置１０のユーザが検索サービスを利用する時に用いたキーワードと、ユーザが閲覧しようとしているＷｅｂページのアドレスを示す情報を取得し、取得したキーワードとアドレスの組をログとして記憶する。なお、サーバ装置３５のＵＲＬ（Uniform Resource Locator）は、本実施形態においては「http://redirect_server」としている。 The server device 35 is a computer device that is connected to the communication line 20 and performs data communication with the terminal device 10 via the communication line 20. The server device 35 acquires the keyword used when the user of the terminal device 10 uses the search service and the information indicating the address of the Web page the user is going to browse, and stores the acquired keyword / address pair as a log. To do. The URL (Uniform Resource Locator) of the server device 35 is “http: // redirect_server” in this embodiment.

同義語判定装置４０は、サーバ装置３５に接続されており、サーバ装置３５が記憶したログをデータ通信によってサーバ装置３５から取得する。なお、同義語判定装置４０を通信回線２０に接続し、通信回線２０を解してログをサーバ装置３５から取得してもよい。同義語判定装置４０は、サーバ装置３５から取得したログから端末装置１０のユーザが検索サービスを利用する時に用いたキーワードと、ユーザが閲覧しようとしているＷｅｂページのアドレスを示す情報を抽出し、抽出したキーワードの同義語を抽出する機能を備えている。 The synonym determination device 40 is connected to the server device 35 and acquires the log stored by the server device 35 from the server device 35 by data communication. Note that the synonym determination device 40 may be connected to the communication line 20 and the log may be acquired from the server device 35 via the communication line 20. The synonym determination device 40 extracts, from the log acquired from the server device 35, the keyword used when the user of the terminal device 10 uses the search service and information indicating the address of the Web page that the user is going to browse. It has a function to extract synonyms of keywords.

（同義語判定装置４０の構成）
図２は、同義語判定装置４０のハードウェア構成を例示したブロック図である。図２に示したように、同義語判定装置４０の各部は、バス４０１に接続されており、このバス４０１を介して各部間でデータのやり取りを行う。 (Configuration of synonym determination device 40)
FIG. 2 is a block diagram illustrating a hardware configuration of the synonym determination device 40. As shown in FIG. 2, each unit of the synonym determination device 40 is connected to a bus 401, and data is exchanged between the units via the bus 401.

通信部４０８は、サーバ装置３５とデータ通信を行うためのインターフェースとして機能する。通信部４０８は、図示を省略した通信ケーブルによりサーバ装置３５に接続されている。操作部４０６は、ユーザにより操作されるキーボード（図示略）およびマウス（図示略）を有している。操作部４０６は、マウスまたはキーボードが操作されると、操作されたボタンやキーを示す信号をＣＰＵ（Central Processing Unit）４０２へ出力する。表示部４０７は、液晶ディスプレイを有しており、ＣＰＵ４０２の制御の下、文字やグラフィック画面、同義語判定装置４０を操作するためのメニュー画面等を表示する。 The communication unit 408 functions as an interface for performing data communication with the server device 35. The communication unit 408 is connected to the server device 35 by a communication cable (not shown). The operation unit 406 includes a keyboard (not shown) and a mouse (not shown) operated by the user. When the mouse or the keyboard is operated, the operation unit 406 outputs a signal indicating the operated button or key to a CPU (Central Processing Unit) 402. The display unit 407 has a liquid crystal display, and displays a character and graphic screen, a menu screen for operating the synonym determination device 40, and the like under the control of the CPU 402.

記憶部４０５は、データを永続的に記憶する装置（例えばハードディスク装置）を有しており、同義語判定装置４０においてオペレーティングシステムの機能を実現するプログラムを記憶する。また、記憶部４０５は、端末装置１０のユーザが検索サービスを利用する時に用いたキーワードを取得し、取得したキーワードの同義語を抽出する機能を実現するアプリケーションプログラムを記憶している。 The storage unit 405 includes a device (for example, a hard disk device) that permanently stores data, and stores a program that realizes the function of the operating system in the synonym determination device 40. In addition, the storage unit 405 stores an application program that implements a function of acquiring a keyword used when the user of the terminal device 10 uses the search service and extracting a synonym of the acquired keyword.

ＲＯＭ４０３は、ＩＰＬ（Initial Program Loader）を記憶している。ＲＡＭ４０４は、ＣＰＵ４０２がプログラムを実行するときの作業エリアとして使用される。ＣＰＵ４０２は、同義語判定装置４０の各部を制御する制御手段である。同義語判定装置４０において電源が入れられると、ＣＰＵ４０２が、ＲＯＭ４０３からＩＰＬを読み出して実行する。ＣＰＵ４０２は、ＩＰＬを実行した後、記憶部４０５からオペレーティングシステムの機能を実現するプログラムを読み出して実行する。ＣＰＵ４０２がオペレーティングシステムの機能を実現するプログラムを実行すると、同義語判定装置４０は、クライアントサーバシステムのサーバとして機能し、他のコンピュータ装置とデータ通信を行うことが可能となる。また、ＣＰＵ４０２は、記憶部４０５に記憶されているアプリケーションプログラムを実行する。ＣＰＵ４０２がアプリケーションプログラムを実行すると、図３に示した各種機能が実現し、端末装置１０のユーザが検索サービスを利用する時に用いたキーワードと、ユーザが閲覧しようとしているＷｅｂページのアドレスを取得し、取得したキーワードの同義語を抽出する機能が実現する。 The ROM 403 stores an IPL (Initial Program Loader). The RAM 404 is used as a work area when the CPU 402 executes a program. The CPU 402 is a control unit that controls each unit of the synonym determination device 40. When the synonym determination device 40 is turned on, the CPU 402 reads the IPL from the ROM 403 and executes it. After executing the IPL, the CPU 402 reads out a program that realizes the function of the operating system from the storage unit 405 and executes it. When the CPU 402 executes a program that realizes the function of the operating system, the synonym determination device 40 functions as a server of the client server system and can perform data communication with other computer devices. Further, the CPU 402 executes an application program stored in the storage unit 405. When the CPU 402 executes the application program, the various functions shown in FIG. 3 are realized, and the keyword used when the user of the terminal device 10 uses the search service and the address of the Web page that the user is going to browse are acquired. A function to extract synonyms of acquired keywords is realized.

（同義語判定装置４０の機能構成）
図３は、送信プログラムを実行することにより同義語判定装置４０において実現する機能の構成を示したブロック図である。
取得手段４５０は、通信部４０８がサーバ装置３５から受信した情報を取得する。具体的には、サーバ装置３５が生成したログ、即ち、端末装置１０のユーザが検索サービスを利用する時に用いたキーワードと、該ユーザが閲覧しようとしているＷｅｂページのアドレスを示す情報を取得する。記憶部制御手段４５１は、取得手段４５０で取得されたログを記憶部４０５に記憶させる。アクセス確率算出手段４５２は、ログにおいてキーワードと対のアドレスへ該キーワードの検索結果からアクセスしたアクセス確率を、該キーワードのログにおける登場回数、および該キーワードと該アドレスの組のログにおける登場回数から算出する。類似度算出手段４５３は、ログとして記憶されたキーワードから２つのキーワードを取得し、取得したキーワード間の類似度を、取得した各キーワードのログにおける登場回数から算出する。判定手段４５４は、類似度算出手段で算出された類似度を用いて２つのキーワードが同義語か否か判定する。 (Functional configuration of the synonym determination device 40)
FIG. 3 is a block diagram showing a configuration of functions realized in the synonym determination device 40 by executing the transmission program.
The acquisition unit 450 acquires information received from the server device 35 by the communication unit 408. Specifically, the log generated by the server device 35, that is, the keyword used when the user of the terminal device 10 uses the search service and the information indicating the address of the Web page that the user intends to browse is acquired. The storage unit control unit 451 stores the log acquired by the acquisition unit 450 in the storage unit 405. The access probability calculating means 452 calculates the access probability of accessing the address paired with the keyword in the log from the search result of the keyword from the number of appearances in the log of the keyword and the number of appearances in the log of the pair of the keyword and the address. To do. The similarity calculation means 453 acquires two keywords from the keywords stored as logs, and calculates the similarity between the acquired keywords from the number of appearances of each acquired keyword in the log. The determination unit 454 determines whether the two keywords are synonyms using the similarity calculated by the similarity calculation unit.

（実施形態の動作）
まず、通信システム１に係る各装置間の情報のやり取りについて説明する。図４は、実施形態の動作を説明するためのシーケンス図である。Ｗｅｂブラウザが実行されている端末装置１０において、検索サーバ装置３０のＵＲＬがユーザにより入力され、検索サーバ装置３０へのアクセスを指示する操作が行われると、端末装置１０から検索サーバ装置３０へＨＴＴＰ（HyperText Transfer Protocol）リクエストが送られる（ステップＳ１０１）。検索サーバ装置３０は、このＨＴＴＰリクエストを受け取ると、検索のキーワードを入力するためのＷｅｂページを端末装置１０へ送る（ステップＳ１０２）。端末装置１０において、このＷｅｂページが受け取られると、端末装置１０において、検索のキーワードを入力するためのＷｅｂページが表示される。 (Operation of the embodiment)
First, the exchange of information between the devices related to the communication system 1 will be described. FIG. 4 is a sequence diagram for explaining the operation of the embodiment. When the URL of the search server device 30 is input by the user in the terminal device 10 in which the Web browser is executed and an operation for instructing access to the search server device 30 is performed, the HTTP is sent from the terminal device 10 to the search server device 30. A (HyperText Transfer Protocol) request is sent (step S101). When receiving the HTTP request, the search server device 30 sends a Web page for inputting a search keyword to the terminal device 10 (step S102). When the terminal device 10 receives this Web page, the terminal device 10 displays a Web page for inputting a search keyword.

端末装置１０のユーザが、このページに検索のキーワードを入力し、キーワードを検索サーバ装置３０へ送る操作が行われると、入力されたキーワードが検索サーバ装置３０へ送られる（ステップＳ１０３）。検索サーバ装置３０は、このキーワードを受け取ると、受け取ったキーワードに関連するＷｅｂページを検索する。次に検索サーバ装置３０は、サーバ装置３５のホスト名を含み、受け取ったキーワードおよび検索したＷｅｂページのアドレスを引数としたＵＲＬを複数生成し、生成したＵＲＬをリンクとして表示するＷｅｂページを端末装置１０へ送る（ステップＳ１０４）。例えば、キーワードが「server」であった場合、生成されるＵＲＬの一例は、「http://redirect_server/?q=server&url=xxxxx」というＵＲＬとなる。ここで、「q=」と「&」の間にある「server」が受け取ったキーワードであり、「url=」の後にある「xxxxx」がキーワードに関連するＷｅｂページのアドレスである。なお、「xxxxx」は、当該ＷｅｂページのＵＲＬである。 When the user of the terminal device 10 inputs a search keyword on this page and performs an operation of sending the keyword to the search server device 30, the input keyword is sent to the search server device 30 (step S103). Upon receiving this keyword, the search server device 30 searches for a Web page related to the received keyword. Next, the search server device 30 generates a plurality of URLs including the host name of the server device 35, the received keyword and the address of the searched Web page as arguments, and displays a Web page that displays the generated URL as a link as a terminal device. 10 (step S104). For example, when the keyword is “server”, an example of the generated URL is “http: // redirect_server /? Q = server & url = xxxxx”. Here, “server” between “q =” and “&” is the keyword received, and “xxxxx” after “url =” is the address of the Web page related to the keyword. “Xxxxx” is the URL of the Web page.

端末装置１０は、検索サーバ装置３０から送られたＷｅｂページ受け取ると、受け取ったＷｅｂページを表示する。これにより、検索サーバ装置３０が生成した上記ＵＲＬがリンクとして表示される。端末装置１０のユーザが、表示されたリンクをクリックする操作を行うと、クリックされたリンクに係るＵＲＬに基づいてサーバ装置３５へＨＴＴＰリクエストが送られる（ステップＳ１０５）。例えば、クリックされたリンクに係るＵＲＬが「http://redirect_server/?q=server&url=xxxxx」であった場合、ホスト名が「redirect_server」であるサーバ装置３５へ、引数として「q=server&url=xxxxx」を含むＨＴＴＰリクエストが送られる。 When receiving the Web page sent from the search server device 30, the terminal device 10 displays the received Web page. Thereby, the URL generated by the search server device 30 is displayed as a link. When the user of the terminal device 10 performs an operation of clicking the displayed link, an HTTP request is sent to the server device 35 based on the URL related to the clicked link (step S105). For example, when the URL related to the clicked link is “http: // redirect_server /? Q = server & url = xxxxx”, the argument “q = server & url = xxxxx” is sent to the server device 35 whose host name is “redirect_server”. An HTTP request containing "is sent.

サーバ装置３５は、このＨＴＴＰリクエストを受け取ると、「server」というキーワードと、「xxxxx」というアドレスを抽出し、例えば図６に示したように、ＨＴＴＰリクエストの受信日時、引数のキーワードｑおよびアドレスｓを対応付けてログとして記憶する。また、サーバ装置３５は、「xxxxx」というアドレスを含み、このアドレスへリダイレクトさせるＨＴＴＰリダイレクトメッセージを生成し、生成したメッセージを端末装置１０へ送る（ステップＳ１０６）。端末装置１０は、このメッセージを受け取ると、メッセージ中に含まれているアドレスへＷｅｂページを要求するＨＴＴＰリクエストを送信する。ＨＴＴＰリクエストを送った先からＷｅｂページが送られると、送られたＷｅｂページが端末装置１０において表示される。 Upon receiving this HTTP request, the server device 35 extracts the keyword “server” and the address “xxxxx”. For example, as shown in FIG. 6, the reception date and time of the HTTP request, the keyword q of the argument, and the address s Are associated and stored as a log. The server device 35 includes an address “xxxxx”, generates an HTTP redirect message to be redirected to this address, and sends the generated message to the terminal device 10 (step S106). Upon receiving this message, the terminal device 10 transmits an HTTP request for requesting a Web page to an address included in the message. When a web page is sent from the destination to which the HTTP request is sent, the sent web page is displayed on the terminal device 10.

このように、本実施形態においては、キーワードに関連するＷｅｂページへリダイレクトさせるメッセージがサーバ装置３５から端末装置１０へ送られ、端末装置１０は、このメッセージに応じてリダイレクト先のサーバ装置へアクセスする構成となっている。なお、上述した実施形態においては、検索サーバ装置３０は、キーワードに関連するＷｅｂページのアドレスを端末装置１０へ送る場合、ホスト名とパス名を組み合わせたＵＲＬだけではなく、ホスト名を送る構成であってもよい。 Thus, in the present embodiment, a message for redirecting to a Web page related to a keyword is sent from the server device 35 to the terminal device 10, and the terminal device 10 accesses the server device that is the redirect destination in response to this message. It has a configuration. In the above-described embodiment, the search server device 30 is configured to send not only the URL combining the host name and the path name but also the host name when sending the address of the Web page related to the keyword to the terminal device 10. There may be.

次に、同義語判定装置４０の動作について説明する。図５は、同義語判定装置４０のＣＰＵ４０２が行う処理の流れを示したフローチャートである。なお、同義語判定装置４０は、図５の処理を行う前にサーバ装置３５が生成したログをサーバ装置３５から取得する。ログを取得するタイミングは、予め定められた時間が経過する毎にサーバ装置３５から取得してもよく、ログに新たにキーワードとアドレスの組が追加される毎にサーバ装置３５から取得してもよい。 Next, the operation of the synonym determination device 40 will be described. FIG. 5 is a flowchart showing the flow of processing performed by the CPU 402 of the synonym determination device 40. The synonym determination device 40 acquires a log generated by the server device 35 from the server device 35 before performing the processing of FIG. The log acquisition timing may be acquired from the server device 35 every time a predetermined time elapses, or may be acquired from the server device 35 every time a new keyword / address pair is added to the log. Good.

まずＣＰＵ４０２は、サーバ装置３５から取得したログからキーワードｑとアドレスｓの組毎に、ログにおける各組の登場回数Ｎ（ｑ，ｓ）をカウントする（ステップＳＡ１）。なお、端末装置１０は、上述したようにログにおいてキーワードと組となっているアドレスへＷｅｂページを要求するため、この登場回数Ｎ（ｑ，ｓ）は、端末装置１０のユーザが、キーワードｑに基づいてアドレスｓへアクセスした回数ということができる。次にＣＰＵ４０２は、ログにおいて各キーワードｑの登場回数Ｎ（ｑ）を求め、キーワードｑと組のアドレスｓへアクセスしたアクセス確率Ｐ（ｓ｜ｑ）を以下の数１の式によって計算する（ステップＳＡ２）。

First, the CPU 402 counts the number of appearances N (q, s) of each pair in the log for each pair of the keyword q and the address s from the log acquired from the server device 35 (step SA1). Since the terminal device 10 requests a Web page from the address paired with the keyword in the log as described above, the number of appearances N (q, s) is determined by the user of the terminal device 10 as the keyword q. It can be said that the number of accesses to the address s is based on this. Next, the CPU 402 obtains the number of appearances N (q) of each keyword q in the log, and calculates the access probability P (s | q) of accessing the keyword s and the paired address s by the following equation (1) (step) SA2).

ＣＰＵ４０２は、アクセス確率Ｐ（ｓ｜ｑ）をキーワードｑとアドレスｓの組毎に求め、図７に示したように、キーワードｑ、アドレスｓ、登場回数Ｎ（ｑ，ｓ）およびアクセス確率Ｐ（ｓ｜ｑ）を対応付けたテーブルＴＢ１０を生成する（ステップＳＡ３）。次にＣＰＵ４０２は、記憶されているキーワードから２つのキーワードを取得し、一方のキーワードｑ_ａと、他方のキーワードｑ_ｂについて類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）を以下の数２の式によって計算する（ステップＳＡ４）。

The CPU 402 obtains the access probability P (s | q) for each pair of the keyword q and the address s, and as shown in FIG. 7, the keyword q, the address s, the number of appearances N (q, s), and the access probability P ( A table TB10 associated with s | q) is generated (step SA3). Next, the CPU 402 acquires two keywords from the stored keywords, and calculates the similarity Sim (q _a , q _b ) for one keyword q _a and the other keyword q _b by the following equation (2). (Step SA4).

数２の式で計算されるＳｉｍ（ｑ_ａ，ｑ_ｂ）は、０から１の間の値をとり、１に近いほどキーワードｑ_ａとキーワードｑ_ｂとの間で類似度が高いことを意味する。なお、類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）は、ログに記憶されている全てのキーワードの組み合わせについて求める必要はない。例えば、テーブルＴＢ１０において一つのレコードを特定し、この特定したレコードとキーワードが異なるもののアドレスが同じであるレコードを特定し、特定した一方のレコードの登場回数Ｎ（ｑ_ａ，ｓ）と、他方のレコードの登場回数Ｎ（ｑ_ｂ，ｓ）が予め定められた回数以上の場合に、キーワードｑ_ａとキーワードｑ_ｂとの類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）を求めるようにしてもよい。また、テーブルＴＢ１０において一つのレコードを特定し、この特定したレコードとキーワードが異なるもののアドレスが同じであるレコードを特定し、特定した一方のレコードのアクセス確率Ｐ（ｓ｜ｑ_ａ）と、他方のレコードのアクセス確率Ｐ（ｓ｜ｑ_ｂ）が予め定められた確率以上の場合に、キーワードｑ_ａとキーワードｑ_ｂとの類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）を求めるようにしてもよい。 Sim (q _a , q _b ) calculated by the formula 2 takes a value between 0 and 1, and the closer to 1, the higher the similarity between the keyword q _a and the keyword q _b. To do. Note that the similarity Sim (q _a , q _b ) need not be obtained for all keyword combinations stored in the log. For example, one record is specified in the table TB10, a record having a different address from the specified record but having the same address is specified, the number of appearances N (q _a , s) of one specified record, When the record appearance count N (q _b , s) is equal to or greater than a predetermined number, the similarity Sim (q _a , q _b ) between the keyword q _a and the keyword q _b may be obtained. Further, one record is identified in the table TB10, a record having a different address from the identified record but having the same address is identified, the access probability P (s | q _a ) of the identified one record, and the other record When the record access probability P (s | q _b ) is equal to or higher than a predetermined probability, the similarity Sim (q _a , q _b ) between the keyword q _a and the keyword q _b may be obtained.

例えば、図７のテーブルＴＢ１０の場合、１行目のレコードと６行目のレコードで、キーワードが「server」と「サーバ」で異なるもののアドレスが同じである場合、この１行目のレコードの登場回数Ｎ（ｑ_ａ，ｓ）が予め定められた回数以上であり、６行目のレコードの登場回数Ｎ（ｑ_ｂ，ｓ）も予め定められた回数以上であると、キーワード「server」とキーワード「サーバ」の類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）を求める。なお、アクセス確率Ｐ（ｓ｜ｑ）を用いる場合には、１行目のレコードのアクセス確率Ｐ（ｓ｜ｑ_ａ）が予め定められた確率以上であり、６行目のレコードのアクセス確率Ｐ（ｓ｜ｑ_ｂ）も予め定められた確率以上である場合、「server」というキーワードｑ_ａと「サーバ」というキーワードｑ_ｂの類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）を求める。 For example, in the case of the table TB10 of FIG. 7, if the addresses of the records in the first line and the record in the sixth line are different for the keyword “server” and “server”, the appearance of the record in the first line When the number of times N (q _a , s) is equal to or greater than the predetermined number of times, and the number of appearances N (q _b , s) of the record in the sixth row is also equal to or greater than the predetermined number of times, the keyword “server” and the keyword The similarity “Sim” (q _a , q _b ) of “server” is obtained. When the access probability P (s | q) is used, the access probability P (s | q _a ) of the record in the first row is equal to or higher than a predetermined probability, and the access probability P of the record in the sixth row. When (s | q _b ) is also equal to or higher than a predetermined probability, the similarity Sim (q _a , q _b ) between the keyword q _a “server” and the keyword q _b “server” is obtained.

次にＣＰＵ４０２は、求めた類似度Ｓｉｍ（ｑ_ａ，ｑ_ｂ）が予め定められた閾値以上（例えば０．５）となったキーワードｑ_ａとキーワードｑ_ｂの組を同義語と判定し、同義語の組として抽出する（ステップＳＡ５）。
そして、ログにおけるキーワードｑ_ａの登場回数Ｎ（ｑ_ａ）と、ログにおけるキーワードｑ_ｂの登場回数Ｎ（ｑ_ｂ）とを比較する。ここで、ＣＰＵ４０２は、キーワードｑ_ａとキーワードｑ_ｂのうち、ログのおける登場回数が多い方のキーワードを正表記語Ｑ１、登場回数が少なかったほうのキーワードを同義語Ｑ２とし、図８（ａ）に示したように、正表記語Ｑ１と、登場回数が少なかったほうの同義語Ｑ２とを対応付けたテーブルＴＢ１１を生成する（ステップＳＡ６）。なお、テーブルＴＢ１１においては、図８（ｂ）に示したように、ログにおける同義語Ｑ２の登場回数Ｎ（Ｑ２）、ログにおける正表記語Ｑ１の登場回数Ｎ（Ｑ１）、および同義語Ｑ２と正表記語Ｑ１の類似度Ｓｉｍ（Ｑ２，Ｑ１）を対応付けてもよい。 Next, the CPU 402 determines that the combination of the keyword q _a and the keyword q _b for which the obtained similarity Sim (q _a , q _b ) is equal to or greater than a predetermined threshold (for example, 0.5) is _a synonym, and is synonymous. Extracted as a set of words (step SA5).
Then, the number of appearances N (q _a ) of the keyword q _{a in} the log is compared with the number of appearances N (q _b ) of the keyword q _{b in} the log. Here, CPU402, the keyword q _a and out of the keyword q _b, definitive appearance number of times there are more keywords a positive notation word Q1 of the log, the keyword of more frequent appearances were limited as a synonym Q2, FIG. 8 (a ), A table TB11 is generated in which the correct notation word Q1 is associated with the synonym Q2 with the smaller number of appearances (step SA6). In the table TB11, as shown in FIG. 8B, the number of appearances N (Q2) of the synonym Q2 in the log, the number of appearances N (Q1) of the regular notation word Q1 in the log, and the synonym Q2 The similarity Sim (Q2, Q1) of the positive notation word Q1 may be associated.

このテーブルＴＢ１１によれば、正表記語Ｑ１とは表記が異なるものの、同じＷｅｂページにアクセス可能なキーワード、即ち、検索結果として同じアドレスを得ることができ、類似度から正表記語Ｑ１と同義とみなせる同義語Ｑ２を特定することができる。検索サーバ装置３０において、同義語Ｑ２を端末装置１０から取得した時に、正表記語Ｑ１の検索結果も端末装置１０へ送るようにすれば、入力された同義語Ｑ２と同義と考えられる正表記語Ｑ１の検索結果もユーザに提供することができる。 According to this table TB11, although the notation is different from the normal notation word Q1, it is possible to obtain a keyword that can access the same Web page, that is, the same address as the search result, and is synonymous with the correct notation word Q1 from the similarity. The synonym Q2 which can be considered can be specified. If the search server device 30 acquires the synonym Q2 from the terminal device 10 and sends the search result of the correct notation word Q1 to the terminal device 10 as well, the correct notation word that is considered synonymous with the input synonym Q2 The search result of Q1 can also be provided to the user.

（第１実施形態の変形例）
図９は、第１実施形態の変形例に係るＣＰＵ４０２の処理の流れを示したフローチャートである。同義語判定装置４０は、図９のフローチャートにしたがって、２つのキーワードの類似度を求めてもよい。具体的には、まずＣＰＵ４０２は、キーワードｑとアドレスｓの組毎に、ログにおける各組の登場回数Ｎ（ｑ，ｓ）をカウントする（ステップＳＢ１）。またＣＰＵ４０２は、ログにおいてキーワードｑの登場回数Ｎ（ｑ）を求め、キーワードと組のアドレスｓへアクセスするアクセス確率Ｐ（ｓ｜ｑ）を数１の式によって計算する（ステップＳＢ２）。ここまでのステップＳＢ１，ステップＳＢ２の処理は、ステップＳＡ１，ステップＳＡ２の処理と同じである。 (Modification of the first embodiment)
FIG. 9 is a flowchart illustrating a process flow of the CPU 402 according to the modification of the first embodiment. The synonym determination device 40 may obtain the similarity between two keywords according to the flowchart of FIG. Specifically, the CPU 402 first counts the number of appearances N (q, s) of each pair in the log for each pair of the keyword q and the address s (step SB1). Further, the CPU 402 obtains the number of appearances N (q) of the keyword q in the log, and calculates the access probability P (s | q) for accessing the address s of the keyword and the pair according to the equation 1 (step SB2). The processing in steps SB1 and SB2 so far is the same as the processing in steps SA1 and SA2.

次にＣＰＵ４０２は、ログとして記憶されたアドレスｓをもとに、ログに記憶されたアドレスｓへアクセスするユーザが、キーワードｑの検索結果で誘導された誘導確率Ｐ（ｑ｜ｓ）を計算する（ステップＳＢ３）。なお、誘導確率Ｐ（ｑ｜ｓ）は、以下の数３の式で求められ、数３の式のＮ（ｓ）は、ログにおけるアドレスｓの登場回数である。ＣＰＵ４０２は、誘導確率（ｑ｜ｓ）を求めると、図１０に示したように、キーワードｑ、アドレスｓ、キーワードｑとアドレスｓの組のログにおける登場回数Ｎ（ｑ，ｓ）および誘導確率Ｐ（ｑ｜ｓ）を対応付けたテーブルＴＢ１２を生成する（ステップＳＢ４）。

Next, the CPU 402 calculates, based on the address s stored as a log, a guidance probability P (q | s) that the user accessing the address s stored in the log is guided by the search result of the keyword q. (Step SB3). The guidance probability P (q | s) is obtained by the following equation (3), and N (s) in the equation (3) is the number of appearances of the address s in the log. When the CPU 402 obtains the guidance probability (q | s), as shown in FIG. 10, the number of appearances N (q, s) and the guidance probability P in the log of the keyword q, the address s, and the combination of the keyword q and the address s. A table TB12 associated with (q | s) is generated (step SB4).

次にＣＰＵ４０２は、求めたアクセス確率Ｐ（ｓ｜ｑ）および誘導確率Ｐ（ｑ｜ｓ）から、数４の式により、キーワードｑ_ａからキーワードｑ_ｂへの類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）を計算する（ステップＳＢ５）。

Next, the CPU 402 calculates the similarity Sim (q _a → q _b ) from the keyword q _a to the keyword q _{b according} to the equation 4 from the obtained access probability P (s | q) and guidance probability P (q | s). ) Is calculated (step SB5).

類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）は、０から１の間の値をとり、この値が高いほど、キーワードｑ_ａでアクセスされるアドレスｓは、キーワードｑ_ｂによって多くアクセスされていることを示す。また、ＣＰＵ４０２は、以下の数５の式でキーワードｑ_ａからキーワードｑ_ａへの類似度Ｓｉｍ（ｑ_ａ→ｑ_ａ）を計算する（ステップＳＢ６）。

The similarity Sim (q _a → q _b ) takes a value between 0 and 1, and the higher this value, the more the address s accessed by the keyword q _a is accessed by the keyword q _b . Show. Further, CPU 402 calculates the similarity Sim _{(q a} → _{q a)} of the keyword _{q a} by the following formula having 5 to keyword _{q a} (step SB6).

ＣＰＵ４０２は、類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）と類似度Ｓｉｍ（ｑ_ａ→ｑ_ａ）の計算を、様々なキーワードｑ_ａおよびキーワードｑ_ｂの組について行う。そして、Ｓｉｍ（ｑ_ａ→ｑ_ｂ）＞α＊Ｓｉｍ（ｑ_ａ→ｑ_ａ）の場合、キーワードｑ_ａとキーワードｑ_ａの組を同義語と判定し、同義語の組として抽出する（ステップＳＢ７）。なお、ここでαは１より大きい実数であり、本変形例では２となっている。類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）が類似度Ｓｉｍ（ｑ_ａ→ｑ_ａ）よりも大きな値をとるということは、キーワードｑ_ｂは、キーワードｑ_ａと同じアドレスへアクセス可能であるとともに、多くの者に使用されているキーワードであることを表している。したがって、ＣＰＵ４０２は、キーワードｑ_ｂをキーワードｑ_ａの正表記語Ｑ１、キーワードｑ_ａを同義語Ｑ２とし、正表記語Ｑ１と同義語Ｑ２を対応付けたテーブルＴＢ１１を作成する（ステップＳＢ８）。 The CPU 402 calculates the similarity Sim (q _a → q _b ) and the similarity Sim (q _a → q _a ) for various combinations of the keyword q _a and the keyword q _b . If Sim (q _a → q _b )> α * Sim (q _a → q _a ), the set of the keyword q _a and the keyword q _a is determined as a synonym and extracted as a set of synonyms (step SB7). ). Here, α is a real number larger than 1, and is 2 in this modification. The similarity Sim (q _a → q _b ) takes a larger value than the similarity Sim (q _a → q _a ), which means that the keyword q _b is accessible to the same address as the keyword q _a and many Indicates that the keyword is used by someone who is. Accordingly, CPU 402 is positive notation word Q1 of the keyword _{q b} keyword _{q a,} the keyword _{q a} is synonymous Q2, to create a table TB11 that associates a positive notation word Q1 synonym Q2 (step SB8).

なお、Ｓｉｍ（ｑ_ａ→ｑ_ｂ）＞α＊Ｓｉｍ（ｑ_ａ→ｑ_ａ）の条件を満たす場合であっても、類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）が予め定められた閾値β（例えば０．０１）よりも小さい場合には、キーワードｑ_ａとキーワードｑ_ｂは全く関係ない語である可能性も高いため、テーブルＴＢ１２に登録しないようにしてもよい。
また、類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）の計算は、全てのキーワードの組について行う必要はなく、例えば、テーブルＴＢ１０において一つのレコードを特定し、この特定したレコードとキーワードが異なるもののアドレスが同じであるレコードを特定し、特定した一方のレコードの登場回数Ｎ（ｑ_ａ，ｓ）と、他方のレコードの登場回数Ｎ（ｑ_ｂ，ｓ）が予め定められた回数以上の場合に、これらのレコードに含まれているキーワードの類似度を求めるようにしてもよい。また、アクセス確率や誘導確率が予め定められた確率以上の場合に、類似度を求めるようにしてもよい。 Even when the condition of Sim (q _a → q _b )> α * Sim (q _a → q _a ) is satisfied, the similarity Sim (q _a → q _b ) is a predetermined threshold value β (for example, If it is smaller than 0.01), there is a high possibility that the keyword q _a and the keyword q _b are completely unrelated words, so they may not be registered in the table TB12.
The similarity Sim (q _a → q _b ) need not be calculated for all keyword pairs. For example, one record is identified in the table TB10, and the address of the identified record and the keyword is different. identify the record is the same, number of occurrences N (q _{a, s)} of the specified one of the record and, number of occurrences N (q _{b, s)} of the other records in the case of more than the number of times the predetermined these You may make it obtain | require the similarity of the keyword contained in this record. Further, the similarity may be obtained when the access probability or the guidance probability is equal to or higher than a predetermined probability.

［第２実施形態］
次に、本発明の第２実施形態について説明する。本発明の第２実施形態においては、第２実施形態に係る通信システム１は、ハードウェア構成は第１実施形態と同じであり、同義語判定装置４０において行われる処理が第１実施形態と異なる。このため、第１実施形態と同じ構成については説明を省略し、以下では第１実施形態との相違点について説明する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the second embodiment of the present invention, the communication system 1 according to the second embodiment has the same hardware configuration as that of the first embodiment, and the processing performed in the synonym determination device 40 is different from that of the first embodiment. . For this reason, description about the same structure as 1st Embodiment is abbreviate | omitted, and the difference with 1st Embodiment is demonstrated below.

図１１は、第２実施形態に係るＣＰＵ４０２の処理の流れを示したフローチャートである。まずＣＰＵ４０２は、ログからキーワードｑとアドレスｓの組毎に、ログにおける各組の登場回数Ｎ（ｑ，ｓ）をカウントする（ステップＳＣ１）。次にＣＰＵ４０２は、ログにおいてキーワードｑの登場回数Ｎ（ｑ）を求め、キーワードと組のアドレスｓへアクセスするアクセス確率Ｐ（ｓ｜ｑ）を数１の式によって計算する（ステップＳＣ２）。ここまでのステップＳＣ１，ステップＳＣ２の処理は、ステップＳＡ１，ステップＳＡ２の処理と同じである。 FIG. 11 is a flowchart showing a process flow of the CPU 402 according to the second embodiment. First, the CPU 402 counts the number of appearances N (q, s) of each pair in the log for each pair of the keyword q and the address s from the log (step SC1). Next, the CPU 402 obtains the number of appearances N (q) of the keyword q in the log, and calculates the access probability P (s | q) for accessing the address s of the keyword and the pair according to the equation 1 (step SC2). The processing in steps SC1 and SC2 up to this point is the same as the processing in steps SA1 and SA2.

次にＣＰＵ４０２は、図１２に示したように、キーワードｑ、キーワードｑの読み、アドレスｓ、登場回数Ｎ（ｑ，ｓ）およびアクセス確率Ｐ（ｓ｜ｑ）を対応付けたテーブルＴＢ２０を生成する（ステップＳＣ３）。なお、本実施形態において記憶部４０５は、単語と単語の読みを対応付けた辞書を記憶し、ＣＰＵ４０２は、この辞書を利用してキーワードの読みを特定し、特定した読みをキーワードに対応づける。例えば、この辞書を利用して「server」と言うキーワードに対しては「サーバ」という読みが振られることとなる。 Next, as shown in FIG. 12, the CPU 402 generates a table TB20 in which the keyword q, the reading of the keyword q, the address s, the number of appearances N (q, s), and the access probability P (s | q) are associated. (Step SC3). In the present embodiment, the storage unit 405 stores a dictionary in which words and word readings are associated with each other, and the CPU 402 specifies keyword readings using the dictionary and associates the specified readings with keywords. For example, using this dictionary, the keyword “server” is read as “server”.

次にＣＰＵ４０２は、ログとして記憶されたアドレスｓをもとに、ログに記憶されたアドレスｓへアクセスするユーザが、キーワードｑで誘導された誘導確率Ｐ（ｑ｜ｓ）を計算する（ステップＳＣ４）。なお、誘導確率Ｐ（ｑ｜ｓ）は、数３の式で求められる。ＣＰＵ４０２は、誘導確率（ｑ｜ｓ）を求めると、キーワードｑ、アドレスｓ、キーワードｑとアドレスｓの組のログにおける登場回数Ｎ（ｑ，ｓ）および誘導確率Ｐ（ｑ｜ｓ）を対応付けたテーブルＴＢ１２を生成する（ステップＳＣ５）。 Next, based on the address s stored as a log, the CPU 402 calculates a guidance probability P (q | s) that the user who accesses the address s stored in the log is guided by the keyword q (step SC4). ). The guidance probability P (q | s) is obtained by the equation (3). When determining the guidance probability (q | s), the CPU 402 associates the number of appearances N (q, s) and the guidance probability P (q | s) in the log of the keyword q, the address s, and the set of the keyword q and the address s. The table TB12 is generated (step SC5).

次にＣＰＵ４０２は、求めたアクセス確率Ｐ（ｓ｜ｑ）および誘導確率Ｐ（ｑ｜ｓ）から、数４の式により、キーワードｑ_ａからキーワードｑ_ｂへの類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）を計算するとともに、数５の式でキーワードｑ_ａからキーワードｑ_ａへの類似度Ｓｉｍ（ｑ_ａ→ｑ_ａ）を計算する（ステップＳＣ６）。 Next, the CPU 402 calculates the similarity Sim (q _a → q _b ) from the keyword q _a to the keyword q _{b according} to the equation 4 from the obtained access probability P (s | q) and guidance probability P (q | s). ) as well as calculating a degree of similarity is calculated from the keyword _{q a} by the numerical formula 5 to the keyword _{q a} Sim _{(q a} → _{q a)} (step SC6).

また、ＣＰＵ４０２は、キーワードｑ_ａとキーワードｑ_ｂの読み方の編集距離ｄ（ｑ_ａ，ｑ_ｂ）を計算する（ステップＳＣ７）。例えば、読みが「ノートピーシー」である「ノートＰＣ」というキーワードと、読みが「ノートピーイー」である「ノートＰＥ」というキーワードは、「イ」を「シ」に置換すれば同じ読みとなるため、編集距離ｄ（ｑ_ａ，ｑ_ｂ）＝１となる。 Further, the CPU 402 calculates an editing distance d (q _a , q _b ) for reading the keyword q _a and the keyword q _b (step SC7). For example, the keyword “note PC” whose reading is “note PC” and the keyword “note PE” whose reading is “note PC” will be the same reading if “i” is replaced with “shi”. Therefore, the edit distance d (q _a , q _b ) = 1.

ＣＰＵ４０２は、アクセス確率Ｐ（ｓ｜ｑ）、誘導確率Ｐ（ｑ｜ｓ）および編集距離ｄ（ｑ_ａ，ｑ_ｂ）を算出し終えると、Ｓｉｍ（ｑ_ａ→ｑ_ｂ）＞α＊Ｓｉｍ（ｑ_ａ→ｑ_ａ）となり、ｄ（ｑ_ａ，ｑ_ｂ）≦β（βは０以上の数であり、例えば１）となるキーワードｑ_ａとキーワードｑ_ｂの組を同義語と判定して抽出する（ステップＳＣ８）。そして、この条件を満たすキーワードｑ_ａとキーワードｑ_ｂの組を抽出すると、キーワードｑ_ｂをキーワードｑ_ａの正表記語Ｑ１、キーワードｑ_ａを同義語Ｑ２とし、正表記語Ｑ１と同義語Ｑ２を対応付けたテーブルＴＢ１１を生成する（ステップＳＣ９）。
本実施形態によれば、読みの編集距離、即ち、キーワード間の読みの近さも同義語を抽出する基準に加わるため、精度良く同義語を抽出することができる。 When the CPU 402 finishes calculating the access probability P (s | q), the guidance probability P (q | s), and the edit distance d (q _a , q _b ), Sim (q _a → q _b )> α * Sim ( q _a → q _a ), and d (q _a , q _b ) ≦ β (β is a number greater than or equal to 0, for example, 1), and a pair of keyword q _a and keyword q _b is determined as a synonym and extracted (Step SC8). When extracting a set of conditions are satisfied keyword _{q a} and keyword _{q b,} positive notation word Q1 keyword _{q a} keyword _{q b,} the keyword _{q a} as synonyms Q2, the positive notation word Q1 synonymous Q2 The associated table TB11 is generated (step SC9).
According to the present embodiment, the edit distance of reading, that is, the closeness of reading between keywords is added to the reference for extracting synonyms, so that synonyms can be extracted with high accuracy.

（第２実施形態の変形例）
上述した第２実施形態においては、読み方の編集距離を用いて同義語を抽出しているが、例えば「ア」と「ァ」のような大文字と小文字の違いや、長音符号「ー」の有無などは、読み方として大きな違いではないため、編集距離を「０」としてもよい。この場合、発音が近いキーワードは、より同義語として抽出されやすくなる。 (Modification of the second embodiment)
In the second embodiment described above, synonyms are extracted using the editing distance of reading, but for example, the difference between uppercase and lowercase letters such as “a” and “a”, and the presence or absence of the long sound code “-” Is not a big difference in reading, so the edit distance may be set to “0”. In this case, keywords with similar pronunciation are more likely to be extracted as synonyms.

また、上述した第２実施形態では、キーワード間の読み方の編集距離を利用しているが、読みの長さを用いて同義語を抽出するようにしてもよい。例えば、キーワードｑ_ａのキーワードｑ_ｂに対する編集距離ｄ（ｑ_ａ→ｑ_ｂ）をｄ（ｑ_ａ→ｑ_ｂ）＝ｄ（ｑ_ａ，ｑ_ｂ）／Ｌｅｎｇｔｈ（ｑ_ａ）と定義する。なお、Ｌｅｎｇｔｈ（ｑ_ａ）は、キーワードｑ_ａの読みの長さを表す。例えば、キーワードｑ_ａが「ノートＰＥ」である場合、読みは「ノートピーイー」となるため、Ｌｅｎｇｔｈ（ｑ_ａ）＝７となる。また、キーワードｑ_ａを「ノートＰＥ」とし、キーワードｑ_ｂを「ノートＰＣ」とすると、ｄ（ｑ_ａ→ｑ_ｂ）は、１／７≒０．１４となる。そして、編集距離ｄ（ｑ_ａ→ｑ_ｂ）が予め定められた閾値γ（０以上の値であり、例えば０．５）以下の場合、キーワードｑ_ａとキーワードｑ_ｂを同義語として抽出する。なお、Ｓｉｍ（ｑ_ａ→ｑ_ｂ）−ｄ（ｑ_ａ→ｑ_ｂ）＞δ（δは、予め定められた値）の場合に、キーワードｑ_ａとキーワードｑ_ｂを同義語として同義語と判定して抽出してもよい。 In the second embodiment described above, the editing distance for reading between keywords is used, but synonyms may be extracted using the reading length. For example, the edit distance d (q _a → q _b ) of the keyword q _a with respect to the keyword q _b is defined as d (q _a → q _b ) = d (q _a , q _b ) / Length (q _a ). Note that Length (q _a ) represents the length of reading of the keyword q _a . For example, when the keyword q _a is “note PE”, the reading is “note pe-e”, so that Length (q _a ) = 7. If the keyword q _a is “note PE” and the keyword q _b is “note PC”, d (q _a → q _b ) is 1 / 7≈0.14. When the edit distance d (q _a → q _b ) is a predetermined threshold γ (a value greater than or equal to 0, for example 0.5), the keywords q _a and q _b are extracted as synonyms. When Sim (q _a → q _b ) −d (q _a → q _b )> δ (δ is a predetermined value), the keywords q _a and q _b are determined as synonyms and determined to be synonyms. And may be extracted.

［第３実施形態］
次に、本発明の第３実施形態について説明する。本発明の第３実施形態においても、実施形態に係る通信システム１のハードウェア構成は第１，第２実施形態と同じであり、同義語判定装置４０において行われる処理が第２実施形態と異なる。このため、ハードウェア構成については説明を省略し、以下では第２実施形態との相違点について説明する。 [Third Embodiment]
Next, a third embodiment of the present invention will be described. Also in the third embodiment of the present invention, the hardware configuration of the communication system 1 according to the embodiment is the same as that of the first and second embodiments, and the processing performed in the synonym determination device 40 is different from that of the second embodiment. . For this reason, description about a hardware configuration is abbreviate | omitted and the difference with 2nd Embodiment is demonstrated below.

図１３は、本実施形態に係るＣＰＵ４０２の処理の流れを示したフローチャートである。なお、ステップＳＤ１からステップＳＤ６までの処理は、ステップＳＣ１からステップＳＣ６までの処理と同じである。
次に、上述した第２実施形態では、キーワード間の読み方の編集距離ｄ（ｑ_ａ，ｑ_ｂ）を利用して同義語を抽出しているが、本実施形態においては、キーワード間の表記上の編集距離を利用して同義語を抽出する。例えば、キーワードｑ_ａを「ノートＰＥ」とし、キーワードｑ_ｂを「ノートＰＣ」とした場合、「Ｅ」と「Ｃ」が異なるだけであるため、表記上の編集距離ｅ（ｑ_ａ，ｑ_ｂ）は１となる。本実施形態においては、ＣＰＵ４０２は、類似度Ｓｉｍ（ｑ_ａ→ｑ_ｂ）、類似度Ｓｉｍ（ｑ_ａ→ｑ_ａ）および編集距離ｅ（ｑ_ａ，ｑ_ｂ）を計算する（ステップＳＤ７）。そして、ＣＰＵ４０２は、Ｓｉｍ（ｑ_ａ→ｑ_ｂ）＞α＊Ｓｉｍ（ｑ_ａ→ｑ_ａ）となり、ｅ（ｑ_ａ，ｑ_ｂ）≦β（βは０以上の数であり、例えば１）となるキーワードｑ_ａとキーワードｑ_ｂの組を同義語と判定して抽出する（ステップＳＤ８）。ＣＰＵ４０２は、この条件を満たすキーワードｑ_ａとキーワードｑ_ｂの組を抽出すると、キーワードｑ_ｂをキーワードｑ_ａの正表記語Ｑ１、キーワードｑ_ａを同義語Ｑ２とし、正表記語Ｑ１と同義語Ｑ２を対応付けたテーブルＴＢ１１を生成する（ステップＳＤ９）。 FIG. 13 is a flowchart showing a processing flow of the CPU 402 according to the present embodiment. Note that the processing from step SD1 to step SD6 is the same as the processing from step SC1 to step SC6.
Next, in the second embodiment described above, synonyms are extracted by using the editing distance d (q _a , q _b ) of how to read between keywords. Synonyms are extracted using the edit distance. For example, if the keyword q _a is “note PE” and the keyword q _b is “note PC”, only the “E” and “C” are different, and therefore the edit distance e (q _a , q _b on the notation. ) Is 1. In the present embodiment, the CPU 402 calculates the similarity Sim (q _a → q _b ), the similarity Sim (q _a → q _a ), and the edit distance e (q _a , q _b ) (step SD7). Then, the CPU 402 satisfies Sim (q _a → q _b )> α * Sim (q _a → q _a ), and e (q _a , q _b ) ≦ β (β is a number of 0 or more, for example, 1) consisting keyword _{q a} and extracting it is determined that the set of synonyms for keywords _{q b} (step SD8). CPU402 is, extracting a set of conditions are satisfied keyword _{q a} and keyword _{q b,} keyword _{q b} keyword _{q a} positive notation word Q1, the keyword _{q a} as synonymous Q2, positive notation word Q1 synonymous Q2 Is created (step SD9).

（第３実施形態の変形例）
第３実施形態においては、キーワード間の表記上の編集距離ｅ（ｑ_ａ，ｑ_ｂ）を利用しているが、表記の文字数を用いて同義語を抽出するようにしてもよい。例えば、キーワードｑ_ａのキーワードｑ_ｂに対する編集距離ｅ（ｑ_ａ→ｑ_ｂ）をｅ（ｑ_ａ→ｑ_ｂ）＝ｅ（ｑ_ａ，ｑ_ｂ）／Ｌｅｎｇｔｈ（ｑ_ａ）と定義する。なお、Ｌｅｎｇｔｈ（ｑ_ａ）は、キーワードｑ_ａの表記の文字数を表す。例えば、キーワードｑ_ａが「ノートＰＥ」である場合、文字数が６であるためＬｅｎｇｔｈ（ｑ_ａ）＝５となる。また、キーワードｑ_ａを「ノートＰＥ」とし、キーワードｑ_ｂを「ノートＰＣ」とすると、ｅ（ｑ_ａ→ｑ_ｂ）＝１／５＝０．２となる。そして、編集距離ｅ（ｑ_ａ→ｑ_ｂ）が予め定められた閾値γ（０以上の値であり、例えば０．５）以下の場合、キーワードｑ_ａとキーワードｑ_ｂを同義語として抽出する。なお、Ｓｉｍ（ｑ_ａ→ｑ_ｂ）−ｅ（ｑ_ａ→ｑ_ｂ）＞δ（δは、予め定められた値）の場合に、キーワードｑ_ａとキーワードｑ_ｂを同義語として同義語と判定して抽出してもよい。 (Modification of the third embodiment)
In the third embodiment, the edit distance e (q _a , q _b ) on the notation between the keywords is used, but synonyms may be extracted using the number of characters in the notation. For example, the edit distance e (q _a → q _b ) of the keyword q _a with respect to the keyword q _b is defined as e (q _a → q _b ) = e (q _a , q _b ) / Length (q _a ). Note that Length (q _a ) represents the number of characters represented by the keyword q _a . For example, when the keyword q _a is “note PE”, since the number of characters is 6, Length (q _a ) = 5. If the keyword q _a is “note PE” and the keyword q _b is “note PC”, e (q _a → q _b ) = 1/5 = 0.2. When the edit distance e (q _a → q _b ) is a predetermined threshold γ (a value greater than or equal to 0, for example 0.5), the keywords q _a and q _b are extracted as synonyms. When Sim (q _a → q _b ) −e (q _a → q _b )> δ (δ is a predetermined value), the keywords q _a and q _b are determined as synonyms and determined to be synonyms. And may be extracted.

［変形例］
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。例えば、上述の実施形態を以下のように変形して本発明を実施してもよく、各変形例を組み合わせて実施してもよい。 [Modification]
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. For example, the above-described embodiment may be modified as follows to implement the present invention, or may be implemented in combination with each modification.

上述した実施形態においては、同義語判定装置４０は、サーバ装置３５からログを取得して同義語を抽出しているが、同義語判定装置４０の機能をサーバ装置３５が備え、サーバ装置３５が同義語を抽出してもよい。また、同義語判定装置４０の機能とサーバ装置３５の機能を検索サーバ装置３５が備え、検索サーバ装置３０が同義語を抽出してもよい。 In the embodiment described above, the synonym determination device 40 acquires logs from the server device 35 and extracts synonyms, but the server device 35 includes the functions of the synonym determination device 40, and the server device 35 Synonyms may be extracted. Further, the search server device 35 may include the function of the synonym determination device 40 and the function of the server device 35, and the search server device 30 may extract the synonym.

同義語を抽出する機能を実現するプログラムは、磁気記録媒体（磁気テープ、磁気ディスク（ＨＤＤ（Hard Disk Drive）、ＦＤ（Flexible Disk））など）、光記録媒体（光ディスク（ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk））など）、光磁気記録媒体、半導体メモリなどのコンピュータ読取り可能な記録媒体に記憶した状態で提供し、同義語判定装置４０または検索サーバ装置３０にインストールしてもよい。また、通信回線を介してダウンロードしてインストールしてもよい。 Programs that realize the function of extracting synonyms include magnetic recording media (magnetic tape, magnetic disk (HDD (Hard Disk Drive), FD (Flexible Disk)), etc.), optical recording media (optical disc (CD (Compact Disc), DVD (Digital Versatile Disk), etc.), a magneto-optical recording medium, a semiconductor readable recording medium such as a semiconductor memory, and the like, and may be installed in the synonym determination device 40 or the search server device 30. Alternatively, it may be downloaded and installed via a communication line.

１…通信システム、１０…端末装置、２０…通信回線、３０…検索サーバ装置、３５…サーバ装置、４０…同義語判定装置、４０１…バス、４０２…ＣＰＵ、４０３…ＲＯＭ、４０４…ＲＡＭ、４０５…記憶部、４０６…操作部、４０７…表示部、４０８…通信部、４５０…取得手段、４５１…記憶部制御手段、４５２…アクセス確率算出手段、４５３…類似度算出手段、４５４…判定手段 DESCRIPTION OF SYMBOLS 1 ... Communication system, 10 ... Terminal device, 20 ... Communication line, 30 ... Search server device, 35 ... Server device, 40 ... Synonym determination device, 401 ... Bus, 402 ... CPU, 403 ... ROM, 404 ... RAM, 405 ... storage unit, 406 ... operation unit, 407 ... display unit, 408 ... communication unit, 450 ... acquisition unit, 451 ... storage unit control unit, 452 ... access probability calculation unit, 453 ... similarity calculation unit, 454 ... determination unit

Claims

An acquisition means for acquiring a log of a set of search keywords and an address of information accessed from a search result of the keywords;
An access probability calculating means for calculating an access probability of accessing an address paired with a keyword in the set from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the set in the log;
A guidance probability calculating means for calculating a guidance probability induced by a search result of the keyword to an address paired with the keyword in the set from the number of appearances of the address in the log and the number of appearances of the set in the log;
The first set and the second set having the same address and different keywords as the first set are acquired from the log, and the similarity between the first set keyword and the second set keyword is obtained. Similarity calculation means for calculating the number of appearances in the log of the first set and the number of appearances in the log of the second set;
Wherein said first set of keywords using the degree of similarity calculated by the similarity calculation means a second set of keywords have a determination means whether synonyms,
The similarity calculation means is a synonym determination device that calculates a similarity from the access probability related to the first set of keywords and the guidance probability related to the second set of keywords .

An acquisition means for acquiring a log of a set of search keywords and an address of information accessed from a search result of the keywords;
An access probability calculating means for calculating an access probability of accessing an address paired with a keyword in the set from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the set in the log;
The first set and the second set having the same address and different keywords as the first set are acquired from the log, and the similarity between the first set keyword and the second set keyword is obtained. Similarity calculation means for calculating based on the number of appearances in the log of the first set, the number of appearances in the log of the second set, and the access probability calculated by the access probability calculation means;
Determination means for determining whether or not the first set of keywords and the second set of keywords are synonyms using the similarity calculated by the similarity calculation means;
A synonym determining device.

When the determination unit determines that the word is a synonym, the keyword having a large number stored in the log among the keywords of the first group and the keyword of the second group is set as a normal notation word and stored in the log. synonyms determination apparatus according to claim 1 or claim 2, wherein determining that synonyms number keyword of the positive notation word less that is.

First distance calculating means for calculating a distance of reading the first set of keywords and the second set of keywords;
The determination means uses the similarity calculated by the similarity calculation means and the distance calculated by the first distance calculation means to determine whether the first set of keywords and the second set of keywords are synonymous. The synonym determination device according to any one of claims 1 to 3, wherein a determination is made as to whether or not.

The determination means uses the distance calculated by the first distance calculation means and the length of reading of the first set of keywords, and the first set of keywords and the second set of keywords are synonymous. The synonym determining device according to claim 4, wherein the synonym determining device according to claim 4 is determined.

A second distance calculating means for calculating a distance between the first set of keywords and the second set of keywords;
The determination means uses the similarity calculated by the similarity calculation means and the distance calculated by the second distance calculation means to determine whether the first set of keywords and the second set of keywords are synonymous. The synonym determination device according to any one of claims 1 to 3, wherein a determination is made as to whether or not.

The determination means uses the distance calculated by the second distance calculation means and the number of characters in the first set of keyword descriptions to determine whether the first set of keywords and the second set of keywords are synonyms. The synonym determination device according to claim 6, wherein:

An acquisition step in which the control means of the computer device acquires a log of a set of addresses of information accessed from a search keyword and a search result of the keyword;
Access probability calculation in which the control means calculates the access probability of accessing the address paired with the keyword in the group from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the keyword in the log Steps,
The control means calculates the induction probability induced by the search result of the keyword to an address paired with the keyword in the group from the number of appearances of the address in the log and the number of appearances of the group in the log A calculation step;
The control means obtains a first set from the log and a second set having the same address and different keyword as the first set, and the first set of keywords and the second set of keywords. The similarity calculation step of calculating the similarity with the number of appearances in the log of the first set and the number of appearances in the log of the second set;
Said control means, said and said using the degree of similarity calculated by the similarity calculating step a first set of keywords a second set of keywords have a a determination step of determining whether synonyms,
The similarity calculation step is a synonym determination method of calculating similarity from the access probability related to the first set of keywords and the guidance probability related to the second set of keywords .

Computer
An acquisition means for acquiring a log of a set of search keywords and an address of information accessed from a search result of the keywords;
A storage unit control unit that stores a set of the acquired keyword and address in a storage unit as a log each time the keyword and address are acquired by the acquisition unit;
An access probability calculating means for calculating an access probability of accessing an address paired with a keyword in the set from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the set in the log;
A guidance probability calculating means for calculating a guidance probability induced by a search result of the keyword to an address paired with the keyword in the set from the number of appearances of the address in the log and the number of appearances of the set in the log;
The first set and the second set having the same address and different keywords as the first set are acquired from the log, and the similarity between the first set keyword and the second set keyword is obtained. Similarity calculation means for calculating the number of appearances in the log of the first set and the number of appearances in the log of the second set;
Wherein using the similarity calculated by the similarity calculation means a first set of keywords and the second set of keywords synonym whether determination hand stage
A program to function as a,
The similarity calculation means calculates a similarity from the access probability related to the first set of keywords and the guidance probability related to the second set of keywords.

An acquisition step in which the control means of the computer device acquires a log of a set of addresses of information accessed from a search keyword and a search result of the keyword;
Access probability calculation in which the control means calculates the access probability of accessing the address paired with the keyword in the group from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the keyword in the log Steps,
The control means obtains a first set from the log and a second set having the same address and different keyword as the first set, and the first set of keywords and the second set of keywords. The similarity is calculated based on the number of appearances in the log of the first set, the number of appearances in the log of the second set, and the access probability calculated by the access probability calculation means A calculation step;
A determining step in which the control means determines whether the first set of keywords and the second set of keywords are synonyms using the similarity calculated in the similarity calculation step;
A synonym determination method.

Computer
An acquisition means for acquiring a log of a set of search keywords and an address of information accessed from a search result of the keywords;
A storage unit control unit that stores a set of the acquired keyword and address in a storage unit as a log each time the keyword and address are acquired by the acquisition unit;
An access probability calculating means for calculating an access probability of accessing an address paired with a keyword in the set from the search result of the keyword from the number of appearances of the keyword in the log and the number of appearances of the set in the log;
The first set and the second set having the same address and different keywords as the first set are acquired from the log, and the similarity between the first set keyword and the second set keyword is obtained. Similarity calculation means for calculating based on the number of appearances in the log of the first set, the number of appearances in the log of the second set, and the access probability calculated by the access probability calculation means;
Determination means for determining whether or not the first set of keywords and the second set of keywords are synonyms using the similarity calculated by the similarity calculation means;
Program to function as.