JP5374938B2

JP5374938B2 - Related information registration apparatus, related information registration method, and related information registration program

Info

Publication number: JP5374938B2
Application number: JP2008169415A
Authority: JP
Inventors: 友哉岩倉; 青史岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-06-27
Filing date: 2008-06-27
Publication date: 2013-12-25
Anticipated expiration: 2028-06-27
Also published as: JP2010009414A

Abstract

<P>PROBLEM TO BE SOLVED: To reduce processing loads to be applied to a creator when registering words and a linked URL. <P>SOLUTION: A component in which related information is set and the related information are extracted from an optional document including the component in which the related information for relating the component included in the document to another document is set, and the extracted component and related information are correspondingly registered in a related information storage part for storing related information to be related to another document correspondingly to the component. Then, an intra-range component is extracted by adopting a point in which the extracted component is positioned as a base in the optional document and the intra-range component is registered correspondingly to each combination of the extracted component and related information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、関連情報登録装置、関連情報登録方法および関連情報登録プログラムに関する。 The present invention relates to a related information registration apparatus, a related information registration method, and a related information registration program.

従来より、テキスト中の語句にリンクタグを設定するリンク自動生成技術（Auto Link）が知られている。具体的には、リンク自動生成技術では、コンピュータが、語句とリンク先ＵＲＬ（Uniform Resource Locator）とを対応づけたリンク設定用辞書を予め保持する。ここで、コンピュータは、利用者によってテキストが指定されると、指定されたテキスト中に存在する語句のうち、リンク設定用辞書に登録されている語句を識別する。そして、コンピュータは、識別した語句に対して、リンク設定用辞書に登録してあるリンク先ＵＲＬへのリンクタグを設定する（特許文献１や２など）。 Conventionally, an automatic link generation technique (Auto Link) for setting a link tag for a word or phrase in a text is known. Specifically, in the link automatic generation technology, the computer holds in advance a link setting dictionary in which a phrase is associated with a link destination URL (Uniform Resource Locator). Here, when the text is designated by the user, the computer identifies the words registered in the link setting dictionary among the words existing in the designated text. Then, the computer sets a link tag to the link destination URL registered in the link setting dictionary for the identified word (for example, Patent Documents 1 and 2).

また、例えば、リンク自動生成技術では、リンク設定用辞書に、語句とリンク先ＵＲＬとを利用者が手動によって登録する登録手法が用いられている（特許文献３など）。 In addition, for example, in the link automatic generation technique, a registration method in which a user manually registers a phrase and a link destination URL in a link setting dictionary is used (for example, Patent Document 3).

特開平１０−３３４０８６号公報（第１−５頁、第１図）Japanese Patent Laid-Open No. 10-334086 (page 1-5, FIG. 1) 特開２００３−１０８４２５号公報（第８−９頁、第１図）JP 2003-108425 A (page 8-9, FIG. 1) 特開２００６−００４３０８号公報（第１−４頁、第１図）JP 2006-004308 A (page 1-4, FIG. 1)

ところで、上記した従来の登録手法では、語句とリンク先ＵＲＬとをリンク設定用辞書に蓄積する処理すべてが利用者によって行われており、多大な処理負荷が利用者にかかっていたという課題があった。 By the way, in the conventional registration method described above, all the processing for storing the phrase and the link destination URL in the link setting dictionary is performed by the user, and there is a problem that a great processing load is imposed on the user. It was.

そこで、この発明は、上述した従来技術の課題を解決するためになされたものであり、
作成者への処理負荷を軽減することが可能である関連情報登録装置、関連情報登録方法および関連情報登録プログラムを提供することを目的とする。 Therefore, the present invention has been made to solve the above-described problems of the prior art,
It is an object of the present invention to provide a related information registration device, a related information registration method, and a related information registration program that can reduce the processing load on the creator.

上述した課題を解決し、目的を達成するため、文書に含まれる構成要素を他の文書に関連付ける関連情報が設定されている構成要素を含む任意の文書から、当該関連情報が設定されている構成要素と当該関連情報とを抽出する抽出ステップを備える。また、他の文書に関連付ける関連情報を当該構成要素に対応付けて記憶する関連情報記憶部に、前記抽出ステップによって抽出された構成要素と関連情報とを対応付けて登録する登録ステップを備える。 A configuration in which the related information is set from an arbitrary document including a component in which related information for associating the component included in the document with another document is set in order to solve the above-described problem and achieve the object. An extraction step for extracting the element and the related information is provided. Further, the information processing apparatus includes a registration step of registering the component information extracted by the extraction step and the related information in association with each other in a related information storage unit that stores the related information associated with the other document in association with the component.

作成者への処理負荷を軽減することが可能である。 It is possible to reduce the processing load on the creator.

以下に添付図面を参照して、この発明に係る関連情報登録装置、関連情報登録方法および関連情報登録プログラムの実施例を詳細に説明する。なお、以下では、実施例１に係る関連情報登録装置の概要、関連情報登録装置の構成および処理の流れを順に説明し、その後、その他の実施例について説明する。 Exemplary embodiments of a related information registration apparatus, a related information registration method, and a related information registration program according to the present invention will be described below in detail with reference to the accompanying drawings. In the following, the outline of the related information registration apparatus according to the first embodiment, the configuration of the related information registration apparatus, and the flow of processing will be described in order, and then other embodiments will be described.

［関連情報登録装置の概要］
まず最初に、図１を用いて、実施例１に係る関連情報登録装置３００の概要を説明する。図１は、実施例１に係る関連情報登録装置の概要を説明するための図である。 [Outline of related information registration device]
First, the outline of the related information registration apparatus 300 according to the first embodiment will be described with reference to FIG. FIG. 1 is a diagram for explaining the outline of the related information registration apparatus according to the first embodiment.

同図に示すように、実施例１に係る関連情報登録装置３００は、語句（「構成要素」とも称する）に対応付けてリンク先ＵＲＬ（「関連情報」とも称する）を記憶する辞書記憶部４０３（「関連情報記憶部」とも称する）を有する。例えば、関連情報登録装置３００では、辞書記憶部４０３が、語句「ＡＡＡ社」に対応付けて、リンク先ＵＲＬ「AAA.jp」を記憶する。 As shown in the figure, the related information registration apparatus 300 according to the first embodiment stores a link destination URL (also referred to as “related information”) in association with a phrase (also referred to as “component”). (Also referred to as “related information storage unit”). For example, in the related information registration apparatus 300, the dictionary storage unit 403 stores the link destination URL “AAA.jp” in association with the phrase “AAA company”.

ここで、図１の（１）に示すように、実施例１に係る関連情報登録装置３００は、リンク先ＵＲＬが設定されている語句を含む任意の文書（抽出対象文書）から、リンク先ＵＲＬが設定されている語句とリンク先ＵＲＬとを抽出する。例えば、図１の（１）に示す例では、関連情報登録装置３００は、文書から、リンク先ＵＲＬ「BBB.jp」が設定されている語句「ＢＢＢ社」と、リンク先ＵＲＬ「BBB.jp」とを抽出する。 Here, as illustrated in (1) of FIG. 1, the related information registration apparatus 300 according to the first embodiment can link URLs from arbitrary documents (extraction target documents) including words / phrases for which link destination URLs are set. And the link destination URL are extracted. For example, in the example shown in (1) of FIG. 1, the related information registration device 300, from the document, the phrase “BBB company” in which the link destination URL “BBB.jp” is set, and the link destination URL “BBB.jp”. Is extracted.

そして、図１の（２）に示すように、実施例１に係る関連情報登録装置３００は、リンク先ＵＲＬと語句とを対応付けて辞書記憶部４０３に登録し、例えば、語句「ＢＢＢ社」とリンク先ＵＲＬ「BBB.jp」とを対応付けて登録する。 Then, as illustrated in (2) of FIG. 1, the related information registration device 300 according to the first embodiment registers the link destination URL and the phrase in association with each other in the dictionary storage unit 403. For example, the phrase “BBB company” And the link destination URL “BBB.jp” are registered in association with each other.

このようなことから、実施例１に係る関連情報登録装置３００によれば、語句とリンク先ＵＲＬとを辞書記憶部４０３に登録する際に、作成者への処理負荷を軽減することが可能である。 For this reason, according to the related information registration apparatus 300 according to the first embodiment, it is possible to reduce the processing load on the creator when the phrase and the link destination URL are registered in the dictionary storage unit 403. is there.

［関連情報登録装置の構成］
次に、図２を用いて、図１に示した関連情報登録装置３００の構成を説明する。なお、図２は、実施例１に係る関連情報登録装置の構成を説明するためのブロック図である。同図に示すように、関連情報登録装置３００は、記憶部４００と制御部５００とを有し、インターネット１００と接続され、また、クライアント２００から接続される。 [Configuration of related information registration device]
Next, the configuration of the related information registration apparatus 300 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a block diagram for explaining the configuration of the related information registration apparatus according to the first embodiment. As shown in the figure, the related information registration apparatus 300 includes a storage unit 400 and a control unit 500, is connected to the Internet 100, and is connected from the client 200.

クライアント２００は、関連情報登録装置３００に接続し、リンク先ＵＲＬを設定する対象となる設定対象文書を関連情報登録装置３００に入力し、また、関連情報登録装置３００によって登録処理が行われた文書を関連情報登録装置３００から受け取る。 The client 200 is connected to the related information registration device 300, inputs a setting target document for setting a link destination URL to the related information registration device 300, and a document for which registration processing has been performed by the related information registration device 300. Is received from the related information registration device 300.

記憶部４００は、リンク先ＵＲＬを登録する登録処理やリンク先ＵＲＬを設定する設定処理に必要なデータなどを記憶し、ＷＥＢページ記憶部４０１と辞書候補記憶部４０２と辞書記憶部４０３とを有する。 The storage unit 400 stores data necessary for registration processing for registering link destination URLs and setting processing for setting link destination URLs, and includes a WEB page storage unit 401, a dictionary candidate storage unit 402, and a dictionary storage unit 403. .

ＷＥＢページ記憶部４０１は、ＷＥＢページ収集部５０１と辞書候補抽出部５０２とに接続され、ＷＥＢページを記憶する。なお、ＷＥＢページは、「ＵＲＬ」によって他のＷＥＢページから一意に識別される。具体的には、図３に示すように、ＷＥＢページ記憶部４０１は、「ＵＲＬ」に対応付けて、「ＵＲＬ」によって識別されるＷＥＢページ内容である「ＷＥＢページ内容」を記憶する。なお、図３は、実施例１におけるＷＥＢページ記憶部に記憶されている情報の一例を説明するための図である。例えば、ＷＥＢページ記憶部４０１は、ＷＥＢページ内容の一例として、ＵＲＬ「a.jp」に対応付けて、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」を記憶する。 The WEB page storage unit 401 is connected to the WEB page collection unit 501 and the dictionary candidate extraction unit 502, and stores a WEB page. The WEB page is uniquely identified from other WEB pages by “URL”. Specifically, as illustrated in FIG. 3, the WEB page storage unit 401 stores “WEB page content” that is the content of the WEB page identified by “URL” in association with “URL”. FIG. 3 is a diagram for explaining an example of information stored in the WEB page storage unit according to the first embodiment. For example, the WEB page storage unit 401 associates the URL “a.jp” as an example of the WEB page content with the WEB page content “President <a href='aaa.jp'> AAA </a>”. Is memorized.

また、ＵＲＬとＷＥＢページ内容との対応付けは、ＷＥＢページ収集部５０１によってインターネット１００に接続して収集されてＷＥＢページ記憶部４０１に登録され、辞書候補抽出部５０２によって抽出され、その後、辞書候補抽出部５０２によってＷＥＢページ記憶部４０１から削除される。 Also, the association between the URL and the WEB page contents is collected by connecting to the Internet 100 by the WEB page collection unit 501, registered in the WEB page storage unit 401, extracted by the dictionary candidate extraction unit 502, and then dictionary candidates It is deleted from the WEB page storage unit 401 by the extraction unit 502.

なお、図３に示す<a href=‘aaa.jp’>AAA</a>とは、ＵＲＬ「aaa.jp」によって一意に識別されるＷＥＢページへの関連付けとなるリンクタグ（アンカータグ）であり、ＵＲＬ「aaa.jp」によって一意に識別されるＷＥＢページへのハイパーリンクである。ここで、リンク先ＵＲＬとは、語句に対して関連付けが設定されている際に、関連付けられた他の文書を一意に特定する情報であり、例えば、図３に示す<a href=‘aaa.jp’>と</a>では、「aaa.jp」が該当する。つまり、例えば、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」では、リンク先ＵＲＬ「aaa.jp」によって一意に識別されるＷＥＢページへのハイパーリンクが、語句「ＡＡＡ」に対して設定されている。 Note that <a href='aaa.jp'> AAA </a> shown in FIG. 3 is a link tag (anchor tag) that is associated with a WEB page uniquely identified by the URL “aaa.jp”. Yes, it is a hyperlink to a WEB page uniquely identified by the URL “aaa.jp”. Here, the link destination URL is information for uniquely identifying another associated document when the association is set for the phrase, for example, <a href = 'aaa. In jp '> and </a>, “aaa.jp” corresponds. In other words, for example, in the WEB page content "President of <a href='aaa.jp'> AAA </a>", a hyperlink to the WEB page uniquely identified by the link destination URL "aaa.jp" Is set for the phrase “AAA”.

辞書候補記憶部４０２は、辞書候補抽出部５０２と辞書作成部５０３とに接続され、辞書記憶部４０３に登録される候補となる情報を記憶する。具体的には、図４に示すように、辞書候補記憶部４０２は、「アンカー文字列」と「リンク先ＵＲＬ」との対応付けごとに、「文脈情報」と「リンク元ＵＲＬ」とを対応付けて記憶する。なお、図４は、実施例１における辞書候補記憶部に記憶されている情報の一例を説明するための図である。 The dictionary candidate storage unit 402 is connected to the dictionary candidate extraction unit 502 and the dictionary creation unit 503, and stores candidate information registered in the dictionary storage unit 403. Specifically, as illustrated in FIG. 4, the dictionary candidate storage unit 402 associates “context information” with “link source URL” for each association between “anchor character string” and “link destination URL”. Add and remember. FIG. 4 is a diagram for explaining an example of information stored in the dictionary candidate storage unit according to the first embodiment.

ここで、アンカー文字列とは、抽出対象文書内の語句の内リンクタグが設定されている語句であり、例えば、リンク先ＵＲＬ「aaa.jp」へのハイパーリンクが設定されている語句「ＡＡＡ」が該当する。また、リンク元ＵＲＬとは、「アンカー文字列」と「リンク先ＵＲＬ」とが抽出されるＷＥＢページを一意に特定する情報である。例えば、リンク元ＵＲＬとは、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」とが抽出されるＷＥＢページを識別するＵＲＬ「a.jp」（図３参照）が該当する。 Here, the anchor character string is a word / phrase in which a link tag of a word / phrase in the extraction target document is set. For example, a word / phrase “AAA” in which a hyperlink to the link destination URL “aaa.jp” is set. Is applicable. The link source URL is information that uniquely specifies the WEB page from which the “anchor character string” and the “link destination URL” are extracted. For example, the link source URL corresponds to the URL “a.jp” (see FIG. 3) for identifying the WEB page from which the anchor character string “AAA” and the link destination URL “aaa.jp” are extracted.

また、文脈情報（「範囲内構成要素」とも称する）とは、抽出対象文書内にある語句の内、アンカー文字列の位置を基点とする所定の範囲内に含まれる語句である。例えば、文脈情報とは、アンカー文字列の前後にある語句の内、所定の文字数（例えば、１０文字など）内に位置する語句や、アンカー文字列を含む一つの文章に含まれる語句すべてなどが該当する。なお、所定の範囲は、利用者によって予め設定される。 The context information (also referred to as “in-range component”) is a phrase included in a predetermined range based on the position of the anchor character string among the phrases in the extraction target document. For example, the context information includes words located within a predetermined number of characters (for example, 10 characters) among words before and after the anchor character string, all words included in one sentence including the anchor character string, and the like. Applicable. The predetermined range is set in advance by the user.

例えば、文脈情報とは、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」では、アンカー文字列「ＡＡＡ」を含む文書に含まれる語句である「株式会社」や「社長」が該当する。 For example, in the context of the WEB page content “President of <a href='aaa.jp'> AAA </a>”, the context information is “stock,” which is a phrase contained in a document containing the anchor character string “AAA”. “Company” and “President” are applicable.

すなわち、例えば、図４に示す例では、辞書候補記憶部４０２は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」とに対応付けて、文脈情報「株式会社、社長」とリンク元ＵＲＬ「a.jp」とを記憶する。 That is, for example, in the example illustrated in FIG. 4, the dictionary candidate storage unit 402 associates the context information “corporation, president” with the link source in association with the anchor character string “AAA” and the link destination URL “aaa.jp”. The URL “a.jp” is stored.

また、アンカー文字列とリンク先ＵＲＬと文脈情報とリンク元ＵＲＬとの対応付けは、辞書候補抽出部５０２によってＷＥＢページ記憶部４０１から抽出されて辞書候補記憶部４０２に登録され、辞書作成部５０３によって辞書候補記憶部４０２から抽出される。また、その後、アンカー文字列とリンク先ＵＲＬと文脈情報とリンク元ＵＲＬとの対応付けは、辞書作成部５０３によって辞書候補記憶部４０２から削除される。 Further, the association between the anchor character string, the link destination URL, the context information, and the link source URL is extracted from the WEB page storage unit 401 by the dictionary candidate extraction unit 502 and registered in the dictionary candidate storage unit 402, and the dictionary creation unit 503. Is extracted from the dictionary candidate storage unit 402. Thereafter, the association between the anchor character string, the link destination URL, the context information, and the link source URL is deleted from the dictionary candidate storage unit 402 by the dictionary creation unit 503.

辞書記憶部４０３（「関連情報記憶部」とも称する）は、辞書作成部５０３と関連情報設定部５０４とに接続される。また、図５に示すように、辞書記憶部４０３は、「辞書見出し」に対応付けて、「回数付文脈情報」と「閾値」と「リンク先ＵＲＬ」とを記憶する。なお、図５は、実施例１における辞書記憶部に記憶されている情報の一例を説明するための図である。 The dictionary storage unit 403 (also referred to as “related information storage unit”) is connected to the dictionary creation unit 503 and the related information setting unit 504. Further, as illustrated in FIG. 5, the dictionary storage unit 403 stores “number-of-times context information”, “threshold value”, and “link destination URL” in association with “dictionary heading”. FIG. 5 is a diagram for explaining an example of information stored in the dictionary storage unit according to the first embodiment.

ここで、「辞書見出し」とは、関連情報設定部５０４によってリンク先ＵＲＬが設定される対象となる語句であり、例えば、図５に示す例では、「ＡＡＡ」などが該当する。また、「回数付文脈情報」とは、文脈情報各々に、アンカー文字列とリンク先ＵＲＬとが同一となる組み合わせに対応付けて当該文脈情報が抽出された回数を対応付けた情報であり、例えば、図５に示す例では、「株式会社：１」や「社長：２」などが該当する。なお、ここで、「株式会社：１」とは、文脈情報「株式会社」が、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」との組み合わせに対応付けて辞書候補抽出部５０２によって一回抽出されたことを示す。また、「閾値」とは、リンク先ＵＲＬを設定する対象となる文書に含まれる語句の内辞書見出しと同一の語句に対して、リンク先ＵＲＬを設定するか否かを判別する際に関連情報設定部５０４によって用いられる値であり、例えば、図５に示す例では、「３」などが該当する。また、閾値は、辞書作成部５０３によって算出される値である。 Here, the “dictionary headline” is a word / phrase for which a link destination URL is set by the related information setting unit 504, and corresponds to “AAA” in the example shown in FIG. The “numbered context information” is information in which each context information is associated with the number of times the context information is extracted in association with a combination in which the anchor character string and the link destination URL are the same. In the example shown in FIG. 5, “stock company: 1”, “president: 2”, and the like are applicable. Here, “corporation: 1” means that the context information “corporation” is associated with the combination of the anchor character string “AAA” and the link destination URL “aaa.jp” by the dictionary candidate extraction unit 502. Indicates that it has been extracted once. The “threshold value” is related information when determining whether or not to set a link destination URL for the same phrase as the dictionary heading in the phrase included in the document for which the link destination URL is set. This value is used by the setting unit 504. For example, in the example shown in FIG. The threshold value is a value calculated by the dictionary creation unit 503.

例えば、辞書記憶部４０３は、図５に示す例では、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「株式会社：１、社長：２、部長：１」と閾値「３」とリンク先ＵＲＬ「aaa.jp」とを記憶する。 For example, in the example illustrated in FIG. 5, the dictionary storage unit 403 links the numbered context information “corporation: 1, president: 2, manager: 1” and the threshold value “3” in association with the dictionary heading “AAA”. The destination URL “aaa.jp” is stored.

また、「辞書見出し」と「回数付文脈情報」と「閾値」と「リンク先ＵＲＬ」との対応付けは、辞書作成部５０３によって辞書記憶部４０３に登録され、関連情報設定部５０４によって用いられる。 Further, the association between “dictionary heading”, “context information with number of times”, “threshold value”, and “link destination URL” is registered in the dictionary storage unit 403 by the dictionary creation unit 503 and used by the related information setting unit 504. .

制御部５００は、ＷＥＢページ収集部５０１と辞書候補抽出部５０２と辞書作成部５０３と関連情報設定部５０４とを備え、登録処理や設定処理を制御する。 The control unit 500 includes a web page collection unit 501, a dictionary candidate extraction unit 502, a dictionary creation unit 503, and a related information setting unit 504, and controls registration processing and setting processing.

ＷＥＢページ収集部５０１は、ＷＥＢページ記憶部４０１と辞書候補抽出部５０２とインターネット１００とに接続され、例えば、予め設定された収集タイミングとなると、インターネット１００に接続してＷＥＢページ内容を取得し、ＷＥＢページ内容をＷＥＢページ記憶部４０１に登録する。例えば、ＷＥＢページ収集部５０１は、ＵＲＬ「a.jp」に対応付けて、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」をＷＥＢページ記憶部４０１に登録する（図３参照）。 The WEB page collection unit 501 is connected to the WEB page storage unit 401, the dictionary candidate extraction unit 502, and the Internet 100. For example, when the collection timing is set in advance, the WEB page collection unit 501 is connected to the Internet 100 and acquires the contents of the WEB page. The web page content is registered in the web page storage unit 401. For example, the WEB page collection unit 501 associates the URL page “a.jp” with the content of the WEB page “President of <a href='aaa.jp'> AAA </a>”. (See FIG. 3).

また、ＷＥＢページ収集部５０１は、ＷＥＢページ内容をＷＥＢページ記憶部４０１に登録すると、ＷＥＢページ内容を登録した旨を辞書候補抽出部５０２に送る。 Further, when the WEB page collection unit 501 registers the WEB page content in the WEB page storage unit 401, the WEB page collection unit 501 transmits to the dictionary candidate extraction unit 502 that the WEB page content has been registered.

辞書候補抽出部５０２（「抽出ステップ」や「範囲内抽出ステップ」とも称する）は、ＷＥＢページ収集部５０１とＷＥＢページ記憶部４０１と辞書候補記憶部４０２とに接続される。また、辞書候補抽出部５０２は、ＷＥＢページ内容を登録した旨をＷＥＢページ収集部５０１から受け付けると、アンカー文字列とリンク先ＵＲＬと文脈情報とをＷＥＢページ記憶部４０１から抽出する。 Dictionary candidate extraction unit 502 (also referred to as “extraction step” or “in-scope extraction step”) is connected to WEB page collection unit 501, WEB page storage unit 401, and dictionary candidate storage unit 402. Also, when the dictionary candidate extraction unit 502 receives from the WEB page collection unit 501 that the WEB page contents have been registered, the dictionary candidate extraction unit 502 extracts the anchor character string, the link destination URL, and the context information from the WEB page storage unit 401.

アンカー文字列とリンク先ＵＲＬとを抽出する点について説明する。辞書候補抽出部５０２は、ＷＥＢページ記憶部４０１に記憶されているＷＥＢページ内容を一つ読み出し、読み出したＷＥＢページに含まれるリンクタグを一つ抽出する。そして、辞書候補抽出部５０２は、リンクタグ内に含まれるリンク先ＵＲＬを抽出し、また、リンクタグが設定されている語句（アンカー文字列）を抽出する。 The point of extracting the anchor character string and the link destination URL will be described. The dictionary candidate extraction unit 502 reads one WEB page content stored in the WEB page storage unit 401 and extracts one link tag included in the read WEB page. Then, the dictionary candidate extraction unit 502 extracts a link destination URL included in the link tag, and extracts a phrase (anchor character string) in which the link tag is set.

例えば、辞書候補抽出部５０２は、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」からリンクタグ<a href=‘aaa.jp’></a>を抽出し、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」とを抽出する。 For example, the dictionary candidate extraction unit 502 starts with the link tag <a href='aaa.jp'> </a> from the WEB page content "President of Co., Ltd. <a href='aaa.jp'> AAA </a>". And the anchor character string “AAA” and the link destination URL “aaa.jp” are extracted.

また、文脈情報を抽出する点について説明する。辞書候補抽出部５０２は、アンカー文字列を抽出すると、文脈情報として、当該アンカー文字列を抽出対象文書内での基点とする所定の範囲内に含まれる語句を抽出する。具体的には、辞書候補抽出部５０２は、文脈情報として、アンカー文字列の前後にある語句の内、所定の文字数（例えば、１０文字など）内に位置する語句や、アンカー文字列を含む一つの文書に含まれる語句すべてを抽出する。 The point of extracting context information will be described. When the anchor character string is extracted, the dictionary candidate extraction unit 502 extracts, as context information, a word / phrase included in a predetermined range with the anchor character string as a base point in the extraction target document. Specifically, the dictionary candidate extraction unit 502 includes, as context information, a word or phrase positioned within a predetermined number of characters (for example, 10 characters) out of words before and after the anchor character string, and an anchor character string. Extract all the terms in one document.

例えば、辞書候補抽出部５０２は、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」では、文脈情報として、アンカー文字列「ＡＡＡ」を含む文書に含まれる語句である「株式会社」と「社長」とを抽出する。 For example, the dictionary candidate extraction unit 502 includes, as context information, a document including the anchor character string “AAA” in the WEB page content “President of <a href='aaa.jp'> AAA </a>”. The words “corporation” and “president” are extracted.

また、辞書候補抽出部５０２は、アンカー文字列とリンク先ＵＲＬとリンク元ＵＲＬと文脈情報とを辞書候補記憶部４０２に登録する。具体的には、辞書候補抽出部５０２は、ＷＥＢページ内容から抽出したアンカー文字列とリンク先ＵＲＬとに対応付けて、当該ＷＥＢページ内容に対応付けられた「ＵＲＬ」を「リンク元ＵＲＬ」として辞書候補記憶部４０２に登録する。また、辞書候補抽出部５０２は、アンカー文字列とリンク先ＵＲＬとに対応付けて、ＷＥＢページ内容から抽出した文脈情報を辞書候補記憶部４０２に登録する。 Further, the dictionary candidate extraction unit 502 registers the anchor character string, the link destination URL, the link source URL, and the context information in the dictionary candidate storage unit 402. Specifically, the dictionary candidate extraction unit 502 associates the anchor character string extracted from the WEB page content with the link destination URL, and sets the “URL” associated with the WEB page content as the “link source URL”. It is registered in the dictionary candidate storage unit 402. The dictionary candidate extraction unit 502 registers the context information extracted from the WEB page contents in the dictionary candidate storage unit 402 in association with the anchor character string and the link destination URL.

例えば、辞書候補抽出部５０２は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」とに対応付けて、文脈情報「株式会社、社長」とリンク元ＵＲＬ「a.jp」とを辞書候補記憶部４０２に登録する。 For example, the dictionary candidate extraction unit 502 associates the anchor character string “AAA” and the link destination URL “aaa.jp” with the context information “Corporation, President” and the link source URL “a.jp” as a dictionary. Register in the candidate storage unit 402.

また、辞書候補抽出部５０２は、読み出したＷＥＢページに未処理のリンクタグがあるかを判定し、ある場合には、未処理のリンクタグがなくなるまで辞書候補登録処理を続行する。また、同様に、辞書候補抽出部５０２は、未処理のＷＥＢページがあるかを判定し、ある場合には、未処理のＷＥＢページがなくなるまで辞書候補登録処理を続行する。 Further, the dictionary candidate extraction unit 502 determines whether or not there is an unprocessed link tag in the read WEB page. If there is, the dictionary candidate extraction process continues until there is no unprocessed link tag. Similarly, the dictionary candidate extraction unit 502 determines whether there is an unprocessed WEB page. If there is, the dictionary candidate extraction process continues until there is no unprocessed WEB page.

また、辞書候補抽出部５０２は、ＷＥＢページ記憶部４０１から、アンカー文字列などを抽出したＷＥＢページ内容とＵＲＬとの対応付けを削除する。また、辞書候補抽出部５０２は、アンカー文字列などを辞書候補記憶部４０２に登録すると、辞書候補記憶部４０２に登録した旨の情報を辞書作成部５０３に送る。 Further, the dictionary candidate extraction unit 502 deletes the association between the WEB page contents extracted from the anchor character string and the URL and the URL from the WEB page storage unit 401. Further, when registering an anchor character string or the like in the dictionary candidate storage unit 402, the dictionary candidate extraction unit 502 sends information indicating that it has been registered in the dictionary candidate storage unit 402 to the dictionary creation unit 503.

辞書作成部５０３（「登録ステップ」や「範囲内登録ステップ」や「回数登録ステップ」とも称する）は、辞書候補記憶部４０２と辞書記憶部４０３と辞書候補抽出部５０２とに接続される。また、辞書作成部５０３は、辞書候補記憶部４０２に登録した旨の情報を辞書候補抽出部５０２から受け付けると、辞書見出しとリンク先ＵＲＬと文脈情報とを辞書候補記憶部４０２から抽出し、辞書記憶部４０３に登録する。また、辞書作成部５０３は、回数と閾値とを算出して辞書記憶部４０３に登録する。 The dictionary creation unit 503 (also referred to as “registration step”, “in-range registration step”, and “number of times registration step”) is connected to the dictionary candidate storage unit 402, the dictionary storage unit 403, and the dictionary candidate extraction unit 502. When the dictionary creation unit 503 receives information indicating that it has been registered in the dictionary candidate storage unit 402 from the dictionary candidate extraction unit 502, the dictionary creation unit 503 extracts a dictionary heading, a link destination URL, and context information from the dictionary candidate storage unit 402. Register in the storage unit 403. Further, the dictionary creation unit 503 calculates the number of times and the threshold value and registers them in the dictionary storage unit 403.

具体的には、辞書作成部５０３は、図６に示すように、辞書候補記憶部４０２から、アンカー文字列とリンク先ＵＲＬとが同一となる組み合わせに対応付けられた情報をすべて読み出す。なお、アンカー文字列とリンク先ＵＲＬとの組み合わせを、辞書ＤＢキーとも称する。また、図６は、実施例１における辞書作成部による登録処理を説明するための図である。例えば、図６の（１）に示すように、辞書作成部５０３は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」との組み合わせに対応付けられた情報として、文脈情報「株式会社」「社長」と、文脈情報「部長」「社長」とを辞書候補記憶部４０２から読み出す。 Specifically, as illustrated in FIG. 6, the dictionary creation unit 503 reads all information associated with the combination in which the anchor character string and the link destination URL are the same from the dictionary candidate storage unit 402. The combination of the anchor character string and the link destination URL is also referred to as a dictionary DB key. FIG. 6 is a diagram for explaining registration processing by the dictionary creation unit in the first embodiment. For example, as illustrated in (1) of FIG. 6, the dictionary creation unit 503 uses context information “corporation” as information associated with a combination of the anchor character string “AAA” and the link destination URL “aaa.jp”. "President" and context information "Director" and "President" are read from the dictionary candidate storage unit 402.

また、図７に示すように、辞書作成部５０３は、アンカー文字列とリンク先ＵＲＬとを辞書記憶部４０３に登録し、具体的には、アンカー文字列を「辞書見出し」として登録し、「辞書見出し」に対応付けてリンク先ＵＲＬを登録する。なお、図７は、実施例１における辞書作成部による登録処理を説明するための図である。例えば、辞書作成部５０３は、図７の（１）に示すように、アンカー文字列「ＡＡＡ」を辞書見出し「ＡＡＡ」として辞書記憶部４０３に登録し、辞書見出し「ＡＡＡ」に対応付けてリンク先ＵＲＬ「aaa.jp」を辞書記憶部４０３に登録する。 Also, as shown in FIG. 7, the dictionary creation unit 503 registers the anchor character string and the link destination URL in the dictionary storage unit 403, specifically, registers the anchor character string as a “dictionary heading”. The link destination URL is registered in association with the “dictionary headline”. FIG. 7 is a diagram for explaining registration processing by the dictionary creation unit in the first embodiment. For example, as shown in FIG. 7 (1), the dictionary creation unit 503 registers the anchor character string “AAA” in the dictionary storage unit 403 as a dictionary heading “AAA”, and links the dictionary heading “AAA” in association with the dictionary heading “AAA”. The destination URL “aaa.jp” is registered in the dictionary storage unit 403.

また、辞書作成部５０３は、文脈情報各々について、語句とリンク先ＵＲＬとが同一となる組み合わせに対応付けられて抽出された回数を算出し、文脈情報各々を回数に対応付けて辞書記憶部４０３に登録する。例えば、辞書作成部５０３は、文脈情報「株式会社」について回数「１」を算出し、文脈情報「社長」について回数「２」を算出し、文脈情報「部長」について回数「１」を算出する。そして、例えば、図７の（２）に示すように、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けて、文脈情報「株式会社：１」「社長：２」「部長：１」を辞書記憶部４０３に登録する。 In addition, the dictionary creation unit 503 calculates the number of times each context information is extracted in association with the combination in which the phrase and the link destination URL are the same, and the context storage unit 403 associates each context information with the number of times. Register with. For example, the dictionary creation unit 503 calculates the number of times “1” for the context information “corporation”, calculates the number of times “2” for the context information “president”, and calculates the number of times “1” for the context information “department”. . Then, for example, as shown in (2) of FIG. 7, the dictionary creation unit 503 associates the context information “corporation: 1”, “president: 2”, “department: 1” with the dictionary heading “AAA”. Register in the dictionary storage unit 403.

また、図７の（３）に示すように、辞書作成部５０３は、辞書見出しごとに閾値を算出して辞書記憶部４０３に登録し、例えば、辞書見出し「ＡＡＡ」に対応付けて閾値「３」を登録する。 Further, as illustrated in (3) of FIG. 7, the dictionary creation unit 503 calculates a threshold value for each dictionary heading and registers the threshold value in the dictionary storage unit 403, and associates the threshold value “3” with the dictionary heading “AAA”, for example. ".

ここで、閾値を算出する手法の一例について説明する。なお、アンカー文字列とリンク先ＵＲＬとが同一となる組み合わせに対応付けられた情報として、文脈情報「株式会社」「社長」と、文脈情報「部長」「社長」とを読み出した場合を例に説明する。また、文脈情報各々に対応付けられた回数は、「株式会社：１」「社長：２」「部長：１」とする。 Here, an example of a method for calculating the threshold will be described. As an example, the context information “corporation” “president” and the context information “department manager” “president” are read as information associated with the combination in which the anchor character string and the link destination URL are the same. explain. In addition, the number of times associated with each context information is “corporation: 1”, “president: 2”, and “manager: 1”.

ここで、辞書作成部５０３は、辞書候補記憶部４０２から読み出した組み合わせ各々について、文脈情報に対応付けられた回数の和を算出する。例えば、文脈情報「株式会社」「社長」が対応付けられた組み合わせについて、辞書作成部５０３は、「株式会社：１」「社長：２」であるので、回数の和が「３」であると算出する。また、辞書作成部５０３は、同様に、文脈情報「部長」「社長」が対応付けられた組み合わせについて、回数の和が「３」であると算出する。 Here, the dictionary creation unit 503 calculates the sum of the number of times associated with the context information for each combination read from the dictionary candidate storage unit 402. For example, for a combination in which context information “corporation” and “president” are associated, the dictionary creation unit 503 is “corporation: 1” and “president: 2”, so that the sum of the numbers is “3”. calculate. Similarly, the dictionary creation unit 503 calculates that the sum of the number of times is “3” for the combination associated with the context information “department manager” and “president”.

また、辞書作成部５０３は、組み合わせ各々について算出した回数の和の内、最も小さい値を閾値として辞書記憶部４０３に登録する。例えば、辞書見出し「ＡＡＡ」に対応付けて、閾値「３」を登録する。なお、辞書作成部５０３は、組み合わせ各々について算出した回数の和について、平均値を算出して閾値としてもよい。 Further, the dictionary creation unit 503 registers the smallest value in the dictionary storage unit 403 as a threshold value among the sums of the numbers calculated for each combination. For example, the threshold “3” is registered in association with the dictionary heading “AAA”. Note that the dictionary creation unit 503 may calculate an average value for the sum of the number of times calculated for each combination and use it as a threshold value.

つまり、例えば、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けて、リンク先ＵＲＬ「aaa.jp」と、回数付文脈情報「株式会社：１」「社長：２」「部長：１」と、閾値「３」とを辞書記憶部４０３に登録する。 In other words, for example, the dictionary creation unit 503 associates the link heading URL “aaa.jp” with the dictionary heading “AAA” and the context information “Number of companies: 1” “President: 2” “Department manager: 1”. And the threshold value “3” is registered in the dictionary storage unit 403.

また、辞書作成部５０３は、辞書候補記憶部４０２から読み出したアンカー文字列とリンク先ＵＲＬと文脈情報とを含む対応付けを削除する。例えば、辞書作成部５０３は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」と文脈情報「株式会社」「社長」との対応付けを辞書作成部５０３が読み出した場合について説明する。また、辞書作成部５０３は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」と文脈情報「部長」「社長」との対応付けを辞書作成部５０３が読み出したとする。ここで、辞書作成部５０３は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」と文脈情報「株式会社」「社長」とリンク元ＵＲＬ「aaa.jp」との対応付けを辞書候補記憶部４０２から削除する。また、辞書作成部５０３は、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」と文脈情報「部長」「社長」とリンク元ＵＲＬ「aaa.jp」との対応付けを辞書候補記憶部４０２から削除する。 Further, the dictionary creation unit 503 deletes the association including the anchor character string, the link destination URL, and the context information read from the dictionary candidate storage unit 402. For example, the case where the dictionary creation unit 503 reads the association between the anchor character string “AAA”, the link destination URL “aaa.jp”, and the context information “corporation” “president” will be described. Further, it is assumed that the dictionary creation unit 503 reads the association between the anchor character string “AAA”, the link destination URL “aaa.jp”, and the context information “department manager” “president”. Here, the dictionary creation unit 503 associates the anchor character string “AAA” with the link destination URL “aaa.jp”, the context information “corporation”, “president”, and the link source URL “aaa.jp” as a dictionary candidate. Delete from the storage unit 402. Further, the dictionary creation unit 503 associates the anchor character string “AAA” with the link destination URL “aaa.jp”, the context information “department”, “president”, and the link source URL “aaa.jp” with a dictionary candidate storage unit. Delete from 402.

関連情報設定部５０４は、クライアント２００から受け付けた設定対象文書に対して、辞書記憶部４０３に記憶されたリンク先ＵＲＬを用いて、リンク先ＵＲＬを設定する。具体的には、関連情報設定部５０４は、設定対象文書に含まれる語句の内、辞書記憶部４０３に記憶されている辞書見出しと一致する語句に対して、辞書記憶部４０３に記憶されているリンク先ＵＲＬへのハイパーリンクを設定する。 The related information setting unit 504 sets a link destination URL for the setting target document received from the client 200 using the link destination URL stored in the dictionary storage unit 403. Specifically, the related information setting unit 504 stores words / phrases matching the dictionary heading stored in the dictionary storage unit 403 among the words / phrases included in the setting target document. A hyperlink to the link destination URL is set.

また、関連情報設定部５０４は、設定対象文書に含まれる語句の内、辞書記憶部４０３に記憶されている辞書見出しと一致する語句について、回数付文脈情報を用いて類似度を算出し、リンク先ＵＲＬを設定するか否かを類似度を用いて判定する。なお、類似度とは、リンク先ＵＲＬを設定するか否かを判定する際に用いる情報である。 In addition, the related information setting unit 504 calculates the degree of similarity using the number-of-times context information for a word that matches the dictionary heading stored in the dictionary storage unit 403 among the words included in the setting target document. Whether to set the destination URL is determined using the similarity. The similarity is information used when determining whether to set a link destination URL.

ここで、まず、類似度の算出手法の一例について説明する。例えば、関連情報設定部５０４は、設定対象文書に含まれる語句の内、辞書記憶部４０３に記憶されている辞書見出しと一致する語句を起点とする所定の範囲内にあるその他の語句を抽出する。そして、抽出した語句の内、文脈情報と一致する語句がある場合には、当該文脈情報に対応付けられた回数の和を類似度として算出する。 Here, first, an example of a similarity calculation method will be described. For example, the related information setting unit 504 extracts other words / phrases within a predetermined range starting from a word / phrase that matches the dictionary heading stored in the dictionary storage unit 403 among the words / phrases included in the setting target document. . If there is a phrase that matches the context information among the extracted phrases, the sum of the number of times associated with the context information is calculated as the similarity.

また、関連情報設定部５０４は、類似度が閾値以上の値である場合には、リンク先ＵＲＬを設定すると判定し、類似度が閾値未満の値である場合には、リンク先ＵＲＬを設定しないと判定する。 The related information setting unit 504 determines that the link destination URL is set when the similarity is a value equal to or greater than the threshold, and does not set the link destination URL when the similarity is a value less than the threshold. Is determined.

例えば、関連情報設定部５０４は、「ＡＡＡの社長の山田さんと部長の田中さん」という文書をクライアント２００から受け付けた場合について説明する。また、辞書記憶部４０３は、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「株式会社：１」「社長：２」「部長：１」と閾値「３」とリンク先ＵＲＬ「aaa.jp」とを記憶するとして説明する。また、辞書記憶部４０３は、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「格付け：１」「評価：１」と閾値「２」とリンク先ＵＲＬ「kakuduke.jp」とを記憶するとして説明する。 For example, the related information setting unit 504 will describe a case where a document “AAA president Mr. Yamada and general manager Mr. Tanaka” is received from the client 200. Further, the dictionary storage unit 403 associates the dictionary heading “AAA” with the context information “Number: 1”, “President: 2”, “Director: 1”, the threshold “3”, and the link destination URL “aaa. jp "will be stored. In addition, the dictionary storage unit 403 stores the context information “Number Rating: 1”, “Evaluation: 1”, the threshold value “2”, and the link destination URL “kakuduke.jp” in association with the dictionary heading “AAA”. Will be described.

ここで、「ＡＡＡの社長の山田さんと部長の田中さん」に含まれる語句の内、「ＡＡＡ」は辞書見出し「ＡＡＡ」と一致する。このため、関連情報設定部５０４は、「ＡＡＡの社長の山田さんと部長の田中さん」から「ＡＡＡ」の周辺にある語句を抽出し、例えば、語句「社長」「部長」を抽出する。そして、関連情報設定部５０４は、辞書見出し「ＡＡＡ」それぞれについて、類似度を算出する。すなわち、関連情報設定部５０４は、辞書見出し「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」との対応付けについて、「社長：２」「部長：１」であるので、類似度が「３」であると算出する。また、関連情報設定部５０４は、辞書見出し「ＡＡＡ」とリンク先ＵＲＬ「kakuduke.jp」との対応付けについて、類似度が「０」であると算出する。 Here, “AAA” matches the dictionary heading “AAA” among the phrases included in “Mr. Yamada, President of AAA and Mr. Tanaka, General Manager”. Therefore, the related information setting unit 504 extracts words / phrases around “AAA” from “AAA president Mr. Yamada and Mr. Tanaka”, and extracts, for example, the words “President” and “Manager”. Then, the related information setting unit 504 calculates the similarity for each of the dictionary headings “AAA”. That is, the related information setting unit 504 is “president: 2” and “department manager: 1” for the correspondence between the dictionary heading “AAA” and the link destination URL “aaa.jp”, so the similarity is “3”. Calculate that there is. Further, the related information setting unit 504 calculates that the similarity is “0” for the association between the dictionary heading “AAA” and the link destination URL “kakuduke.jp”.

また、関連情報設定部５０４は、算出した類似度の内、閾値以上の値がある場合に、リンク先ＵＲＬを設定する。例えば、関連情報設定部５０４は、「ＡＡＡの社長の山田さんと部長の田中さん」に含まれる語句「ＡＡＡ」に対して、リンク先ＵＲＬ「aaa.jp」を設定し、リンク先ＵＲＬ「kakuduke.jp」を設定しない。 In addition, the related information setting unit 504 sets the link destination URL when there is a value equal to or greater than the threshold value among the calculated similarities. For example, the related information setting unit 504 sets the link destination URL “aaa.jp” for the phrase “AAA” included in “AAA president Yamada-san and general manager Tanaka-san”, and the link destination URL “kakuduke”. .jp "is not set.

また、同様に、例えば、「わが社の格付けは最高評価のＡＡＡです」という文書をクライアント２００から受け付けた場合について説明する。関連情報設定部５０４は、辞書見出し「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」との対応付けについて、類似度が「０」であると判定する。また、関連情報設定部５０４は、辞書見出し「ＡＡＡ」とリンク先ＵＲＬ「kakuduke.jp」との対応付けについて、類似度が「２」であると算出する。そして、関連情報設定部５０４は、「わが社の格付けは最高評価のＡＡＡです」に含まれる語句「ＡＡＡ」に対して、リンク先ＵＲＬ「aaa.jp」を設定せず、リンク先ＵＲＬ「kakuduke.jp」を設定する。 Similarly, a case will be described where, for example, a document “A rating of our company is the highest rated AAA” is received from the client 200. The related information setting unit 504 determines that the similarity is “0” for the association between the dictionary heading “AAA” and the link destination URL “aaa.jp”. Further, the related information setting unit 504 calculates that the similarity is “2” for the association between the dictionary heading “AAA” and the link destination URL “kakuduke.jp”. Then, the related information setting unit 504 does not set the link destination URL “aaa.jp” for the phrase “AAA” included in “the rating of our company is the highest rated AAA”, but the link destination URL “kakuduke” .jp "is set.

また、同様に、例えば、「ＡＡＡとはグループ名です」という文書をクライアント２００から受け付けた場合について説明する。関連情報設定部５０４は、辞書見出し「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」との対応付けについて、類似度が「０」であると判定する。また、関連情報設定部５０４は、辞書見出し「ＡＡＡ」とリンク先ＵＲＬ「kakuduke.jp」との対応付けについて、類似度が「０」であると算出する。そして、関連情報設定部５０４は、「ＡＡＡとはグループ名です」に含まれる語句「ＡＡＡ」に対して、リンク先ＵＲＬ「aaa.jp」を設定せず、リンク先ＵＲＬ「kakuduke.jp」を設定しない。 Similarly, for example, a case where a document “AAA is a group name” is received from the client 200 will be described. The related information setting unit 504 determines that the similarity is “0” for the association between the dictionary heading “AAA” and the link destination URL “aaa.jp”. Further, the related information setting unit 504 calculates that the similarity is “0” for the association between the dictionary heading “AAA” and the link destination URL “kakuduke.jp”. Then, the related information setting unit 504 does not set the link destination URL “aaa.jp” for the word “AAA” included in “AAA is a group name”, but sets the link destination URL “kakuduke.jp”. Not set.

また、関連情報設定部５０４は、リンク先ＵＲＬを設定した設定対象文書をクライアント２００に送る。なお、関連情報設定部５０４によるリンク先ＵＲＬ設定処理の詳細な流れの一例については、後述するため、ここでは説明を省略する。 In addition, the related information setting unit 504 sends the setting target document in which the link destination URL is set to the client 200. Note that an example of a detailed flow of the link destination URL setting process by the related information setting unit 504 will be described later, and the description thereof is omitted here.

なお、この関連情報登録装置３００は、既知のパーソナルコンピュータ、ワークステーション、携帯電話、ＰＨＳ（Personal Handyphone System）、移動体通信端末またはＰＤＡ（Personal Digital Assistant）などの情報処理装置にて実現することができる。具体的には、既知のＰＤＡ等に、ＷＥＢページ記憶部４０１、辞書候補記憶部４０２、辞書記憶部４０３、ＷＥＢページ収集部５０１、辞書候補抽出部５０２、辞書作成部５０３、および関連情報設定部５０４の各機能を搭載することによって実現することもできる。 The related information registration apparatus 300 can be realized by an information processing apparatus such as a known personal computer, workstation, mobile phone, PHS (Personal Handyphone System), mobile communication terminal, or PDA (Personal Digital Assistant). it can. Specifically, a WEB page storage unit 401, a dictionary candidate storage unit 402, a dictionary storage unit 403, a WEB page collection unit 501, a dictionary candidate extraction unit 502, a dictionary creation unit 503, and a related information setting unit are included in a known PDA or the like. It can also be realized by mounting each function 504.

［関連情報登録装置による処理］
次に、実施例１に係る関連情報登録装置３００による処理について説明する。以下では、ＷＥＢページ登録処理の流れ、辞書候補登録処理の流れ、辞書登録処理の流れ、リンク先ＵＲＬ設定処理の流れについて順に説明する。 [Processing by related information registration device]
Next, processing by the related information registration device 300 according to the first embodiment will be described. In the following, the flow of WEB page registration processing, the flow of dictionary candidate registration processing, the flow of dictionary registration processing, and the flow of link destination URL setting processing will be described in order.

［ＷＥＢページ登録処理］
図８を用いて、実施例１におけるＷＥＢページ登録処理の流れを説明する。図８は、実施例１におけるＷＥＢページ登録処理の流れを説明するためのフローチャートである。 [WEB page registration process]
The flow of WEB page registration processing in the first embodiment will be described with reference to FIG. FIG. 8 is a flowchart for explaining the flow of the WEB page registration process in the first embodiment.

図８に示すように、ＷＥＢページ収集部５０１は、予め設定された登録タイミングとなると（ステップＳ１０１肯定）、インターネット１００に接続してＷＥＢページ内容を取得する（ステップＳ１０２）。そして、ＷＥＢページ収集部５０１は、ＷＥＢページ内容をＷＥＢページ記憶部４０１に登録する（ステップＳ１０３）。例えば、ＷＥＢページ収集部５０１は、ＵＲＬ「a.jp」に対応付けて、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」をＷＥＢページ記憶部４０１に登録する。 As shown in FIG. 8, the WEB page collection unit 501 connects to the Internet 100 and acquires the contents of the WEB page when the preset registration timing is reached (Yes at Step S101) (Step S102). Then, the WEB page collection unit 501 registers the contents of the WEB page in the WEB page storage unit 401 (step S103). For example, the WEB page collection unit 501 associates the URL page “a.jp” with the content of the WEB page “President of <a href='aaa.jp'> AAA </a>”. Register with.

［辞書候補登録処理］
図９を用いて、実施例１における辞書候補登録処理の流れを説明する。図９は、実施例１における辞書候補登録処理の流れを説明するためのフローチャートである。 [Dictionary candidate registration process]
The flow of dictionary candidate registration processing in the first embodiment will be described with reference to FIG. FIG. 9 is a flowchart for explaining the flow of dictionary candidate registration processing according to the first embodiment.

図９に示すように、辞書候補抽出部５０２は、辞書候補登録タイミングとなると（ステップＳ２０１肯定）、例えば、ＷＥＢページ内容を登録した旨をＷＥＢページ収集部５０１から受け付けると、ＷＥＢページ内容を一つ読み出す（ステップＳ２０２）。そして、辞書候補抽出部５０２は、読み出したＷＥＢページに含まれるリンクタグを一つ選択する（ステップＳ２０３）。 As shown in FIG. 9, when the dictionary candidate extraction unit 502 has reached the dictionary candidate registration timing (Yes in step S201), for example, when the fact that the WEB page content has been registered is received from the WEB page collection unit 501, the WEB page content is reduced. Are read out (step S202). Then, the dictionary candidate extraction unit 502 selects one link tag included in the read WEB page (step S203).

そして、辞書候補抽出部５０２は、選択したリンクタグからアンカー文字列を抽出し（ステップＳ２０４）、リンク先ＵＲＬを抽出する（ステップＳ２０５）。例えば、辞書候補抽出部５０２は、ＷＥＢページ内容「株式会社<a href=‘aaa.jp’>AAA</a>の社長」からリンクタグ<a href=‘aaa.jp’></a>を抽出し、アンカー文字列「ＡＡＡ」とリンク先ＵＲＬ「aaa.jp」とを抽出する。 Then, the dictionary candidate extraction unit 502 extracts an anchor character string from the selected link tag (step S204), and extracts a link destination URL (step S205). For example, the dictionary candidate extraction unit 502 starts with the link tag <a href='aaa.jp'> </a> from the WEB page content "President of Co., Ltd. <a href='aaa.jp'> AAA </a>". And the anchor character string “AAA” and the link destination URL “aaa.jp” are extracted.

そして、辞書候補抽出部５０２は、文脈情報を抽出する（ステップＳ２０６）。つまり、辞書候補抽出部５０２は、アンカー文字列を抽出対象文書内での基点とする所定の範囲内に含まれる語句を抽出し、例えば、辞書候補抽出部５０２は、アンカー文字列「ＡＡＡ」を含む文書に含まれる語句である「株式会社」と「社長」とを抽出する。そして、辞書候補抽出部５０２は、アンカー文字列とリンク先ＵＲＬとに対応付けて、リンク元ＵＲＬと文脈情報とを辞書候補記憶部４０２に登録する（ステップＳ２０７）。 Then, the dictionary candidate extraction unit 502 extracts context information (step S206). That is, the dictionary candidate extraction unit 502 extracts words / phrases included in a predetermined range using the anchor character string as a base point in the extraction target document. For example, the dictionary candidate extraction unit 502 selects the anchor character string “AAA”. The words “corporation” and “president” that are included in the included document are extracted. Then, the dictionary candidate extraction unit 502 registers the link source URL and the context information in the dictionary candidate storage unit 402 in association with the anchor character string and the link destination URL (step S207).

そして、辞書候補抽出部５０２は、未処理のリンクタグがまだあるかを判定する（ステップＳ２０８）。ここで、辞書候補抽出部５０２は、未処理のリンクタグがあると判定する場合には（ステップＳ２０８肯定）、上記したステップＳ２０３〜Ｓ２０７までの処理を繰り返す。一方、辞書候補抽出部５０２は、未処理のリンクタグがないと判定する場合には（ステップＳ２０８否定）、未処理のＷＥＢページがあるかを判定する（ステップＳ２０９）。ここで、辞書候補抽出部５０２は、未処理のＷＥＢページがあると判定する場合には（ステップＳ２０９肯定）、上記したステップＳ２０２〜Ｓ２０８までの処理を繰り返す。一方、辞書候補抽出部５０２は、未処理のＷＥＢページがないと判定した場合には（ステップＳ２０９否定）、辞書候補登録処理を終了する。 Then, the dictionary candidate extraction unit 502 determines whether there are still unprocessed link tags (step S208). Here, when it is determined that there is an unprocessed link tag (Yes at Step S208), the dictionary candidate extraction unit 502 repeats the processes from Steps S203 to S207 described above. On the other hand, when determining that there is no unprocessed link tag (No at Step S208), the dictionary candidate extraction unit 502 determines whether there is an unprocessed WEB page (Step S209). Here, when it is determined that there is an unprocessed WEB page (Yes at Step S209), the dictionary candidate extraction unit 502 repeats the processes from Steps S202 to S208 described above. On the other hand, when it is determined that there is no unprocessed WEB page (No at step S209), the dictionary candidate extraction unit 502 ends the dictionary candidate registration process.

［辞書登録処理］
図１０を用いて、実施例１における辞書登録処理の流れを説明する。図１０は、実施例１における辞書登録処理の流れを説明するためのフローチャートである。 [Dictionary registration process]
The flow of dictionary registration processing in the first embodiment will be described with reference to FIG. FIG. 10 is a flowchart for explaining the flow of the dictionary registration process in the first embodiment.

図１０に示すように、辞書作成部５０３は、辞書登録タイミングとなると（ステップＳ３０１肯定）、例えば、辞書候補記憶部４０２に登録した旨の情報を辞書候補抽出部５０２から受け付けると、同じ辞書ＤＢキーに対応付けられたレコード（情報）すべてを辞書候補ＤＢから読み出す（ステップＳ３０２）。すなわち、辞書作成部５０３は、辞書候補記憶部４０２から、アンカー文字列とリンク先ＵＲＬとが同一となる組み合わせに対応付けられた情報をすべて読み出す。 As illustrated in FIG. 10, when the dictionary creation unit 503 has reached the dictionary registration timing (Yes in step S301), for example, when information indicating that it has been registered in the dictionary candidate storage unit 402 is received from the dictionary candidate extraction unit 502, the same dictionary DB All records (information) associated with the keys are read from the dictionary candidate DB (step S302). That is, the dictionary creation unit 503 reads all information associated with the combination in which the anchor character string and the link destination URL are the same from the dictionary candidate storage unit 402.

そして、辞書作成部５０３は、辞書ＤＢキーに含まれるアンカー文字列を辞書見出しとして辞書記憶部４０３に登録する（ステップＳ３０３）。例えば、辞書作成部５０３は、アンカー文字列「ＡＡＡ」を辞書見出し「ＡＡＡ」として辞書記憶部４０３に登録する。そして、辞書作成部５０３は、辞書ＤＢキーに含まれるリンク先ＵＲＬを辞書記憶部４０３に登録する（ステップＳ３０４）。例えば、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けてリンク先ＵＲＬ「aaa.jp」を辞書記憶部４０３に登録する。 Then, the dictionary creation unit 503 registers the anchor character string included in the dictionary DB key as a dictionary heading in the dictionary storage unit 403 (step S303). For example, the dictionary creation unit 503 registers the anchor character string “AAA” in the dictionary storage unit 403 as the dictionary heading “AAA”. Then, the dictionary creation unit 503 registers the link destination URL included in the dictionary DB key in the dictionary storage unit 403 (step S304). For example, the dictionary creation unit 503 registers the link destination URL “aaa.jp” in the dictionary storage unit 403 in association with the dictionary heading “AAA”.

そして、辞書作成部５０３は、文脈情報各々の回数を算出し（ステップＳ３０５）、文脈情報各々と回数とを対応付けて辞書記憶部４０３に登録する（ステップＳ３０６）。つまり、辞書作成部５０３は、文脈情報各々について、アンカー文字列とリンク先ＵＲＬとが同一となる組み合わせに対応付けられて抽出された回数を算出し、文脈情報各々を回数に対応付けて辞書記憶部４０３に登録する。例えば、辞書作成部５０３は、文脈情報「株式会社」について回数「１」を算出し、辞書見出し「ＡＡＡ」に対応付けて、文脈情報「株式会社：１」を登録する。 Then, the dictionary creation unit 503 calculates the number of times of each context information (step S305) and associates each context information with the number of times and registers it in the dictionary storage unit 403 (step S306). That is, the dictionary creation unit 503 calculates, for each context information, the number of times that the anchor character string and the link destination URL are extracted in association with the same combination, and stores the context information in correspondence with the number of times in the dictionary storage. Registered in the unit 403. For example, the dictionary creation unit 503 calculates the number of times “1” for the context information “corporation” and registers the context information “corporation: 1” in association with the dictionary heading “AAA”.

そして、辞書作成部５０３は、閾値を算出し（ステップＳ３０７）、閾値を辞書記憶部４０３に登録する（ステップＳ３０８）。例えば、辞書作成部５０３は、辞書見出しごとに閾値を算出し、例えば、辞書見出し「ＡＡＡ」に対応付けて、閾値「３」を登録する。 Then, the dictionary creation unit 503 calculates a threshold value (step S307), and registers the threshold value in the dictionary storage unit 403 (step S308). For example, the dictionary creation unit 503 calculates a threshold value for each dictionary heading, and registers the threshold value “3” in association with the dictionary heading “AAA”, for example.

そして、辞書作成部５０３は、未処理の辞書ＤＢキーがあるかを判定する（ステップＳ３０９）。ここで、辞書作成部５０３は、未処理の辞書ＤＢキーがあると判定する場合には（ステップＳ３０９肯定）、ステップＳ３０２〜Ｓ３０８までの処理を繰り返す。一方、辞書作成部５０３は、未処理の辞書ＤＢキーがないと判定する場合には（ステップＳ３０９否定）、辞書登録処理を終了する。 Then, the dictionary creation unit 503 determines whether there is an unprocessed dictionary DB key (step S309). Here, if the dictionary creation unit 503 determines that there is an unprocessed dictionary DB key (Yes at step S309), the process from steps S302 to S308 is repeated. On the other hand, if the dictionary creation unit 503 determines that there is no unprocessed dictionary DB key (No at step S309), the dictionary registration process ends.

［リンク先ＵＲＬ設定処理］
図１１を用いて、実施例１におけるリンク先ＵＲＬ設定処理の流れを説明する。図１１は、実施例１におけるリンク先ＵＲＬ設定処理の流れを説明するためのフローチャートである。 [Link destination URL setting process]
The flow of the link destination URL setting process in the first embodiment will be described with reference to FIG. FIG. 11 is a flowchart for explaining the flow of the link destination URL setting process in the first embodiment.

図１１に示すように、関連情報設定部５０４は、設定対象文書がクライアント２００から入力されると（ステップＳ４０１肯定）、辞書記憶部４０３から辞書見出しを読み出す（ステップＳ４０２）。例えば、関連情報設定部５０４は、辞書記憶部４０３に登録されている辞書見出しすべてを読み出す。そして、関連情報設定部５０４は、設定対象文書に辞書見出しと同一の語句があるかを判定する（ステップＳ４０３）。ここで、関連情報設定部５０４は、辞書見出しと同一の語句がないと判定した場合には（ステップＳ４０３否定）、処理結果を出力し（ステップＳ４１２）、リンク先ＵＲＬ設定処理を終了する。一方、関連情報設定部５０４は、辞書見出しと同一の語句があると判定した場合には（ステップＳ４０３肯定）、辞書見出しを一つ選択する（ステップＳ４０４）。つまり、関連情報設定部５０４は、設定対象文書に含まれる語句と一致する辞書見出しを一つ選択する。 As illustrated in FIG. 11, when the setting target document is input from the client 200 (Yes in step S401), the related information setting unit 504 reads a dictionary heading from the dictionary storage unit 403 (step S402). For example, the related information setting unit 504 reads all dictionary headings registered in the dictionary storage unit 403. Then, the related information setting unit 504 determines whether the setting target document has the same word / phrase as the dictionary heading (step S403). If the related information setting unit 504 determines that there is no phrase identical to the dictionary heading (No at Step S403), the processing result is output (Step S412), and the link destination URL setting process ends. On the other hand, if the related information setting unit 504 determines that there is the same phrase as the dictionary headline (Yes in step S403), the related information setting unit 504 selects one dictionary headline (step S404). In other words, the related information setting unit 504 selects one dictionary heading that matches the phrase included in the setting target document.

そして、関連情報設定部５０４は、選択した辞書見出しについてのレコードを一つ読み出す（ステップＳ４０５）。つまり、選択した辞書見出しに対応付けられた「リンク先ＵＲＬ」と「回数付文脈情報」と「閾値」とを読み出す。そして、関連情報設定部５０４は、設定対象文書から文脈情報を取得し（ステップＳ４０６）、類似度を算出する（ステップＳ４０７）。例えば、設定対象文書が「ＡＡＡの社長の山田さんと部長の田中さん」であり、辞書見出しが「ＡＡＡ」である場合には、「社長」と「部長」とを設定対象文書から抽出し、類似度が「３」であると算出する。 Then, the related information setting unit 504 reads one record for the selected dictionary heading (step S405). That is, the “link destination URL”, “number-of-times context information”, and “threshold value” associated with the selected dictionary heading are read out. Then, the related information setting unit 504 acquires context information from the setting target document (step S406), and calculates the similarity (step S407). For example, when the setting target document is “AAA president Yamada-san and department manager Tanaka-san” and the dictionary heading is “AAA”, “president” and “department manager” are extracted from the setting target document, The similarity is calculated to be “3”.

そして、関連情報設定部５０４は、類似度が閾値以上かを判定する（ステップＳ４０８）。ここで、関連情報設定部５０４は、類似度が閾値以上であると判定する場合には（ステップＳ４０８肯定）、辞書見出しが出現した位置にリンク先ＵＲＬを設定する（ステップＳ４０９）。例えば、関連情報設定部５０４は、設定対象文書が「ＡＡＡの社長の山田さんと部長の田中さん」の「ＡＡＡ」に対して、リンク先ＵＲＬを設定する。 Then, the related information setting unit 504 determines whether the similarity is greater than or equal to a threshold (step S408). Here, if the related information setting unit 504 determines that the similarity is greater than or equal to the threshold (Yes at Step S408), the related information setting unit 504 sets the link destination URL at the position where the dictionary heading appears (Step S409). For example, the related information setting unit 504 sets a link destination URL for “AAA” whose setting target document is “Mr. Yamada, President of AAA and Mr. Tanaka, General Manager”.

そして、関連情報設定部５０４は、類似度が閾値以上でないと判定する場合や（ステップＳ４０８否定）、類似度が閾値以上であると判定してリンク先ＵＲＬを設定した場合には（ステップＳ４０９）、選択した辞書見出しについて、未処理のレコードがあるかを判定する（ステップＳ４１０）。ここで、関連情報設定部５０４は、未処理のレコードがある場合には（ステップＳ４１０肯定）、例えば、設定対象文書内に、「ＡＡＡ」が二つ以上ある場合には、ステップＳ４０５〜Ｓ３０９までの処理を繰り返す。一方、関連情報設定部５０４は、未処理のレコードがない場合には（ステップＳ４１０否定）、未処理の辞書見出しがあるかを判定する（ステップＳ４１１）。ここで、関連情報設定部５０４は、未処理のレコードがあると判定する場合には（ステップＳ４１１肯定）、例えば、設定対象文書に含まれる語句と一致する辞書見出しに、「ＡＡＡ」以外の辞書見出しがある場合には、ステップＳ４０４〜Ｓ３０９までの処理を繰り返す。一方、関連情報設定部５０４は、未処理のレコードがないと判定する場合には（ステップＳ４１１否定）、処理結果を出力し（ステップＳ４１２）、リンク先ＵＲＬ設定処理を終了する。 When the related information setting unit 504 determines that the similarity is not greater than or equal to the threshold (No at Step S408), or determines that the similarity is greater than or equal to the threshold and sets the link destination URL (Step S409). Then, it is determined whether there is an unprocessed record for the selected dictionary heading (step S410). Here, when there is an unprocessed record (Yes at Step S410), the related information setting unit 504, for example, when there are two or more “AAA” in the setting target document, the process goes to Steps S405 to S309. Repeat the process. On the other hand, if there is no unprocessed record (No at Step S410), the related information setting unit 504 determines whether there is an unprocessed dictionary heading (Step S411). Here, if the related information setting unit 504 determines that there is an unprocessed record (Yes at step S411), for example, a dictionary other than “AAA” is included in the dictionary headline that matches the phrase included in the setting target document. If there is a headline, the processing from step S404 to S309 is repeated. On the other hand, when determining that there is no unprocessed record (No at Step S411), the related information setting unit 504 outputs the processing result (Step S412), and ends the link destination URL setting process.

［実施例１の効果］
上記したように、実施例１によれば、関連情報登録装置３００は、アンカー文字列含む任意の文書から、アンカー文字列とリンク先ＵＲＬとを抽出する。また、関連情報登録装置３００は、他の文書に関連付けるリンク先ＵＲＬを語句（辞書見出し）に対応付けて記憶する辞書記憶部４０３に、抽出した語句とリンク先ＵＲＬとを対応付けて登録する。これにより、実施例１によれば、語句とリンク先ＵＲＬとを登録する際に、作成者への処理負荷を軽減することが可能である。 [Effect of Example 1]
As described above, according to the first embodiment, the related information registration device 300 extracts an anchor character string and a link destination URL from an arbitrary document including the anchor character string. Further, the related information registration apparatus 300 registers the extracted phrase and the link destination URL in association with each other in the dictionary storage unit 403 that stores the link destination URL associated with another document in association with the phrase (dictionary heading). Thereby, according to Example 1, when registering a phrase and a link destination URL, it is possible to reduce the processing load to a creator.

すなわち、実施例１によれば、テキスト内の特定語句にリンク先ＵＲＬを自動付与するオートリンク用辞書をＷＥＢページを参照しつつ自動登録するので、作成者への辞書作成処理負荷を軽減できる。 That is, according to the first embodiment, the automatic link dictionary that automatically assigns a link destination URL to a specific phrase in the text is automatically registered with reference to the WEB page, so that the load on the dictionary creation processing for the creator can be reduced.

また、実施例１によれば、抽出するアンカー文字列が位置する点を任意の文書内での基点とし、当該基点から所定の範囲内に含まれる語句である文脈情報を抽出し、抽出したアンカー文字列とリンク先ＵＲＬとの組み合わせごとに対応付けて、文脈情報を登録するので、リンク設定用辞書に文脈情報を登録することが可能である。 Further, according to the first embodiment, a point where an anchor character string to be extracted is located is set as a base point in an arbitrary document, context information that is a phrase included in a predetermined range from the base point is extracted, and the extracted anchor Since context information is registered in association with each combination of a character string and a link destination URL, it is possible to register context information in the link setting dictionary.

この結果、設定対象文書内の語句について、辞書見出しに一致するかのみを判定してリンク先ＵＲＬを設定していた従来の手法に対して、さらに、文脈情報が一致するかを判定してリンク先ＵＲＬを設定することが可能である。このため、リンク先ＵＲＬを設定する際に、適切なリンク先ＵＲＬのみを設定することが可能である。 As a result, it is determined whether or not the context information matches the conventional method in which the link destination URL is set by determining only whether or not the phrase in the setting target document matches the dictionary heading. It is possible to set a destination URL. For this reason, when setting a link destination URL, it is possible to set only an appropriate link destination URL.

また、実施例１によれば、文脈情報各々に対応付けて、アンカー文字列とリンク先ＵＲＬとが同一となる組み合わせに対応付けられて文脈情報各々が抽出される頻度を登録するので、文脈情報それぞれについて、重み付け（回数）を登録することが可能である。 Also, according to the first embodiment, the frequency of extracting each context information in association with the combination in which the anchor character string and the link destination URL are the same is registered in association with each context information. It is possible to register a weight (number of times) for each.

この結果、リンク先ＵＲＬを設定する際に、それぞれの文脈情報の重要度（重み付け、回数）を考慮した上で、リンク先ＵＲＬを設定するか否かを判定することができ、適切なリンク先ＵＲＬのみを設定することが可能である。 As a result, when setting the link destination URL, it is possible to determine whether or not to set the link destination URL in consideration of the importance (weighting, the number of times) of each context information. It is possible to set only the URL.

さて、これまで、実施例１として、リンク先ＵＲＬが少しでも異なる場合には、別の組み合わせとして辞書記憶部４０３に登録する手法について説明したが、本発明はこれに限定されるものではない。例えば、リンク先ＵＲＬが異なる場合であっても、文脈情報が同一であれば、一つの組み合わせに集約して辞書記憶部４０３に登録してもよい。 So far, as the first embodiment, the method of registering in the dictionary storage unit 403 as another combination when the link destination URL is slightly different has been described, but the present invention is not limited to this. For example, even if the link destination URLs are different, if the context information is the same, they may be aggregated into one combination and registered in the dictionary storage unit 403.

そこで、実施例２では、辞書見出しとリンク先ＵＲＬとの組み合わせにて、リンク先ＵＲＬが異なる場合に、文脈情報が同一であれば、辞書見出し各々に対応付けられたそれぞれ別個の文脈情報や閾値を集約する手法について説明する。なお、以下では、実施例１に係る関連情報登録装置３００と同様の点については、簡単に説明し、または、説明を省略する。 Therefore, in the second embodiment, if the context information is the same when the link destination URL is different in the combination of the dictionary heading and the link destination URL, the individual context information or threshold value associated with each dictionary heading is the same. A method for aggregating these will be described. In addition, below, the point similar to the related information registration apparatus 300 which concerns on Example 1 is demonstrated easily, or description is abbreviate | omitted.

すなわち、図１２に示すように、実施例２では、辞書見出しと文脈情報とが同一の組み合わせが辞書記憶部４０３に複数登録されている場合には、辞書作成部５０３が、複数ある組み合わせを一つの組み合わせへと集約する。ここで、図１２に示すように、辞書記憶部４０３が、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「株式会社：１」「社長：２」と閾値「３」とリンク先ＵＲＬ「aaa.jp」とを記憶するものとして説明する。なお、図１２は、実施例２における辞書記憶部を説明するための図である。また、辞書記憶部４０３は、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「株式会社：３」「社長：１」と閾値「３」とリンク先ＵＲＬ「aaa.jp/test」とを記憶する。また、辞書記憶部４０３は、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「株式会社：１」「社長：２」と閾値「３」とリンク先ＵＲＬ「aaa.jp/test3」とを記憶する。 That is, as illustrated in FIG. 12, in the second embodiment, when a plurality of combinations having the same dictionary heading and context information are registered in the dictionary storage unit 403, the dictionary creation unit 503 selects a plurality of combinations. Aggregate into one combination. Here, as shown in FIG. 12, the dictionary storage unit 403 associates the dictionary heading “AAA” with the context information “Number: 1”, “President: 2”, the threshold “3”, and the link destination URL. It is assumed that “aaa.jp” is stored. FIG. 12 is a diagram for explaining the dictionary storage unit according to the second embodiment. In addition, the dictionary storage unit 403 associates with the dictionary heading “AAA”, the context information with the number “corporation: 3”, “president: 1”, the threshold value “3”, and the link destination URL “aaa.jp/test”. Remember. In addition, the dictionary storage unit 403 associates the dictionary heading “AAA” with the context information “Number: 1”, “President: 2”, the threshold “3”, and the link destination URL “aaa.jp/test3”. Remember.

ここで、関連情報登録装置３００では、辞書作成部５０３が、例えば、利用者から集約する旨の指示を受け付けると、同一となる辞書見出しのレコードを辞書記憶部４０３から読み出す。例えば、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けられた情報をすべて読み出し、文脈情報が同一である対応付けを識別する。例えば、辞書作成部５０３は、リンク先ＵＲＬ「aaa.jp」に対応付けられる文脈情報と、リンク先ＵＲＬ「aaa.jp/test」に対応付けられる文脈情報と、リンク先ＵＲＬ「aaa.jp/test3」に対応付けられる文脈情報とが同一であると識別する。そして、図１２の（１）に示すように、辞書作成部５０３は、識別した対応付け各々を一つの対応付けに集約する。 Here, in the related information registration apparatus 300, for example, when the dictionary creation unit 503 receives an instruction to collect from the user, the record of the same dictionary heading is read from the dictionary storage unit 403. For example, the dictionary creation unit 503 reads all information associated with the dictionary heading “AAA” and identifies associations having the same context information. For example, the dictionary creation unit 503 includes context information associated with the link destination URL “aaa.jp”, context information associated with the link destination URL “aaa.jp/test”, and link destination URL “aaa.jp/”. The context information associated with “test3” is identified as the same. Then, as illustrated in (1) of FIG. 12, the dictionary creation unit 503 aggregates each identified association into one association.

また、例えば、辞書作成部５０３は、集約する文脈情報各々について、対応付けられた回数の平均値を算出して対応付ける。例えば、文脈情報「株式会社」に対応付けられた回数が「１」「３」「１」となっているため、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「株式会社：２（ここでは、一例として、小数点以下四捨五入している）」を辞書記憶部４０３に登録する。また、同様に、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けて、回数付文脈情報「社長：２」を辞書記憶部４０３に登録する。 Further, for example, the dictionary creation unit 503 calculates and associates the average value of the number of times associated with each piece of context information to be aggregated. For example, since the number of times associated with the context information “stock” is “1”, “3”, and “1”, the dictionary creation unit 503 associates the context information with the dictionary heading “AAA” with the number of times context information. “Co., Ltd .: 2 (here, rounded to the nearest decimal point as an example)” is registered in the dictionary storage unit 403. Similarly, the dictionary creation unit 503 registers the context information “President: 2” with the number of times in the dictionary storage unit 403 in association with the dictionary heading “AAA”.

また、例えば、辞書作成部５０３は、閾値について平均値を算出し、辞書記憶部４０３に登録する。例えば、閾値が「３」「３」「３」となっているため、辞書作成部５０３は、閾値「３」を辞書記憶部４０３に登録する。 For example, the dictionary creation unit 503 calculates an average value for the threshold value and registers the average value in the dictionary storage unit 403. For example, since the threshold values are “3”, “3”, and “3”, the dictionary creation unit 503 registers the threshold value “3” in the dictionary storage unit 403.

また、例えば、辞書作成部５０３は、集約するリンク先ＵＲＬ各々を、辞書見出し「ＡＡＡ」に対応付けて辞書記憶部４０３に登録する。例えば、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けて、リンク先ＵＲＬ「aaa.jp、aaa.jp/test、aaa.jp/test3」を辞書記憶部４０３に登録する。 For example, the dictionary creation unit 503 registers each link destination URL to be aggregated in the dictionary storage unit 403 in association with the dictionary heading “AAA”. For example, the dictionary creation unit 503 registers the link destination URLs “aaa.jp, aaa.jp/test, aaa.jp/test3” in the dictionary storage unit 403 in association with the dictionary heading “AAA”.

ここで、複数の組み合わせを集約した一つの組み合わせを用いて、リンク先ＵＲＬを設定する際について説明する。関連情報設定部５０４は、リンク先ＵＲＬが複数登録されている場合には、複数登録されているリンク先ＵＲＬを予め設定される選択手法により選択し、リンク先ＵＲＬを設定する。例えば、関連情報設定部５０４は、リンク先ＵＲＬ「aaa.jp、aaa.jp/test、aaa.jp/test3」が登録されている場合には、例えば、登録されているリンク先ＵＲＬの内、ランダムにひとつのリンク先ＵＲＬを選択する。例えば、関連情報設定部５０４は、リンク先ＵＲＬ「aaa.jp」を選択する。そして、関連情報設定部５０４は、設定する対象となる語句に対してリンク先ＵＲＬ「aaa.jp」を設定する。 Here, a case where a link destination URL is set using one combination obtained by integrating a plurality of combinations will be described. When a plurality of link destination URLs are registered, the related information setting unit 504 selects a plurality of registered link destination URLs by a preset selection method, and sets the link destination URL. For example, when the link destination URLs “aaa.jp, aaa.jp/test, aaa.jp/test3” are registered, the related information setting unit 504, for example, among the registered link destination URLs, One link destination URL is selected at random. For example, the related information setting unit 504 selects the link destination URL “aaa.jp”. Then, the related information setting unit 504 sets the link destination URL “aaa.jp” for the word / phrase to be set.

なお、ここで、関連情報設定部５０４は、複数のリンク先ＵＲＬから一つのリンク先ＵＲＬを選択する手法として、ランダムに選択する手法に限定されるものではない。例えば、複数のリンク先ＵＲＬを比較して、より上位にあるリンク先ＵＲＬを選択してもよい。例えば、「aaa.jp、aaa.jp/test、aaa.jp/test3」の内、より上位にあるＵＲＬである「aaa.jp」を選択してもよい。 Here, the related information setting unit 504 is not limited to a random selection method as a method for selecting one link destination URL from a plurality of link destination URLs. For example, a plurality of link destination URLs may be compared to select a higher link destination URL. For example, “aaa.jp”, which is a higher-order URL among “aaa.jp, aaa.jp/test, aaa.jp/test3”, may be selected.

［集約処理］
次に、図１３を用いて、実施例２における集約処理の流れを説明する。なお、図１３は、実施例２における集約処理の流れを説明するためのフローチャートである。 [Aggregation processing]
Next, the flow of aggregation processing in the second embodiment will be described with reference to FIG. FIG. 13 is a flowchart for explaining the flow of aggregation processing in the second embodiment.

図１３に示すように、辞書作成部５０３は、集約タイミングとなると（ステップＳ５０１肯定）、例えば、利用者から集約する旨の指示を受け付けると、同一となる辞書見出しのレコードを辞書記憶部４０３から読み出す（ステップＳ５０２）。例えば、辞書作成部５０３は、辞書見出し「ＡＡＡ」に対応付けられた情報をすべて読み出す。そして、辞書作成部５０３は、読み出したレコードの内、同一の文脈情報になっているレコードを識別する（ステップＳ５０３）。 As illustrated in FIG. 13, when the dictionary creation unit 503 has reached the aggregation timing (Yes in step S <b> 501), for example, upon receiving an instruction to aggregate from the user, the dictionary storage unit 403 stores the same dictionary header record from the dictionary storage unit 403. Read (step S502). For example, the dictionary creation unit 503 reads all the information associated with the dictionary heading “AAA”. Then, the dictionary creation unit 503 identifies the records that have the same context information among the read records (step S503).

そして、辞書作成部５０３は、識別したレコードを集約する（ステップＳ５０４）。つまり、辞書作成部５０３は、識別した対応付け各々を一つの対応付けに集約し、例えば、辞書見出し「ＡＡＡ」に対応付けられた情報の内、文脈情報が同一である対応付けをすべて一つの組み合わせに集約する。 Then, the dictionary creation unit 503 collects the identified records (step S504). That is, the dictionary creation unit 503 aggregates each identified association into one association. For example, among the information associated with the dictionary heading “AAA”, all the associations having the same context information are combined into one association. Aggregate into combinations.

［実施例２の効果］
上記したように、実施例２によれば、辞書見出しとリンク先ＵＲＬとの組み合わせにてリンク先ＵＲＬが異なる場合に、文脈情報が同一であれば、辞書見出し各々に対応付けられたそれぞれ別個の文脈情報や閾値を集約する。これにより、辞書記憶部４０３に登録される辞書見出しとリンク先ＵＲＬとの組み合わせの数を減少することが可能である。 [Effect of Example 2]
As described above, according to the second embodiment, when the link destination URL is different depending on the combination of the dictionary heading and the link destination URL, if the context information is the same, each of the individual headings associated with the dictionary heading is different. Collect context information and thresholds. As a result, the number of combinations of dictionary headings and link destination URLs registered in the dictionary storage unit 403 can be reduced.

つまり、辞書見出しとリンク先ＵＲＬとの組み合わせを集約しない手法においては、リンク先ＵＲＬが少しでも異なる場合には、異なる辞書見出しとリンク先ＵＲＬとの組み合わせが別の組み合わせとして登録される。ここで、リンク先ＵＲＬが異なる場合であっても、一つのリンク先ＵＲＬに統合しても問題が生じることが少ない場合がある。例えば、ある会社のホームページを識別するＵＲＬ（例えば、「aaa.jp」）と、当該ホームページの下層のページを識別するＵＲＬ（例えば、「aaa.jp/test」）があり、いずれをリンク先ＵＲＬとして用いても問題が生じない場合などが該当する。このような場合に、実施例２によれば、辞書見出しとリンク先ＵＲＬとの組み合わせを集約することが可能である。 That is, in a method that does not aggregate combinations of dictionary headings and link destination URLs, if the link destination URLs are slightly different, combinations of different dictionary headings and link destination URLs are registered as different combinations. Here, even if the link destination URLs are different, there are few cases where problems occur even if they are integrated into one link destination URL. For example, there is a URL (for example, “aaa.jp”) that identifies a homepage of a company and a URL (for example, “aaa.jp/test”) that identifies a page below the homepage, and which is a link destination URL This applies to the case where no problem occurs even when used as. In such a case, according to the second embodiment, it is possible to aggregate combinations of dictionary headings and link destination URLs.

さて、これまで本発明の実施例について説明したが、本発明は上述した実施例以外にも、その他の実施例にて実施してもよいものである。そこで、以下では、その他の実施例について説明する。 Although the embodiments of the present invention have been described so far, the present invention may be implemented in other embodiments besides the above-described embodiments. Therefore, other embodiments will be described below.

［記憶部］
例えば、実施例１では、ＷＥＢページ記憶部４０１や辞書候補記憶部４０２に登録された情報は、辞書候補抽出部５０２や辞書作成部５０３によって読み出された後に、削除されるものとして説明した。しかし、本発明はこれに限定されるものではなく、辞書候補抽出部５０２や辞書作成部５０３は、ＷＥＢページ記憶部４０１や辞書候補記憶部４０２から情報を削除しなくてもよい。 [Storage unit]
For example, in the first embodiment, the information registered in the WEB page storage unit 401 and the dictionary candidate storage unit 402 is described as being deleted after being read out by the dictionary candidate extraction unit 502 and the dictionary creation unit 503. However, the present invention is not limited to this, and the dictionary candidate extraction unit 502 and the dictionary creation unit 503 may not delete information from the WEB page storage unit 401 and the dictionary candidate storage unit 402.

［インターネット］
また、例えば、実施例１では、インターネットに接続してＷＥＢページを収集する場合について説明したが、本発明はこれに限定されるものではなく、任意のネットワークにてＷＥＢページを収集してもよい。例えば、イントラネットに接続してＷＥＢページを収集してもよい。 [the Internet]
For example, in the first embodiment, the case where the WEB page is collected by connecting to the Internet has been described. However, the present invention is not limited to this, and the WEB page may be collected in an arbitrary network. . For example, a web page may be collected by connecting to an intranet.

［文書］
また、例えば、実施例１では、ＷＥＢページ内容から抽出する手法について説明したが、本発明はこれに限定されるものではなく、任意の文章から抽出してもよい。例えば、リンクタグが設定されている文書（例えば、ドキュメントファイル）を利用者から入力されると、当該文書から、アンカー文字列とリンク先ＵＲＬとを抽出してもよい。 [documents]
For example, in the first embodiment, the method of extracting from the contents of the WEB page has been described. However, the present invention is not limited to this, and may be extracted from an arbitrary sentence. For example, when a user inputs a document (for example, a document file) in which a link tag is set, an anchor character string and a link destination URL may be extracted from the document.

［回数］
また、例えば、実施例１や２では、文脈情報各々に、回数を対応付けて記憶する手法について説明したが、本発明はこれに限定されるものではなく、例えば、回数の代わりに頻度や重み付けや重要度を対応付けてもよい。例えば、辞書記憶部４０３は、頻度として、文脈情報各々が抽出された度合（例えば、文脈情報が抽出された回数／当該文脈情報が対応付けられる「アンカー文字列とリンク先ＵＲＬとの組み合わせ」が抽出された回数）を記憶してもよい。 [Number of times]
Further, for example, in the first and second embodiments, the method of storing the number of times associated with each context information has been described. However, the present invention is not limited to this, and for example, the frequency or weighting instead of the number of times. Or importance may be associated. For example, the dictionary storage unit 403 has, as the frequency, the degree to which each context information is extracted (for example, the number of times the context information is extracted / the “combination of anchor character string and link destination URL” associated with the context information). The number of times of extraction) may be stored.

［実施例の組み合わせについて］
また、例えば、実施例１では、アンカー文字列とリンク先ＵＲＬとを自動で抽出する手法に加えて、（１）文脈情報を登録し、（２）文脈情報各々に回数を登録する手法とを併せて用いる手法について説明した。また、実施例２では、（３）辞書見出しとリンク先ＵＲＬとを集約する手法について説明した。しかし、本発明は、実施例１や２にて説明した手法に限定されるものではなく、例えば、アンカー文字列とリンク先ＵＲＬとを自動で抽出する手法に加えて、（１）〜（３）までの内、任意の一つまたは複数の手法を組み合わせて実施してもよい。 [Combination of Examples]
Further, for example, in the first embodiment, in addition to the method of automatically extracting the anchor character string and the link destination URL, (1) the context information is registered, and (2) the number of times is registered in each context information. The method used together was described. Further, in the second embodiment, (3) the technique of aggregating dictionary headings and link destination URLs has been described. However, the present invention is not limited to the method described in the first and second embodiments. For example, in addition to the method of automatically extracting the anchor character string and the link destination URL, (1) to (3 ), Any one or a plurality of techniques may be combined.

［システム構成］
また、例えば、本実施例において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともできる。例えば、集約処理を手動的にて行ってもよく、文脈情報各々に対応付ける回数や頻度や重要度を手動的に対応付けてもよい。また、この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる（例えば、図１〜図１３など）。 [System configuration]
Further, for example, all or a part of the processes described as being automatically performed among the processes described in the present embodiment may be manually performed. For example, the aggregation process may be performed manually, or the number, frequency, and importance associated with each context information may be manually associated. In addition, the processing procedures, control procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified (for example, 1 to 13).

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、図２に示す例を用いて説明すると、ＷＥＢページ記憶部４０１と辞書候補記憶部４０２と辞書記憶部４０３とを統合して一つの記憶部としてもよく、また、関連情報設定部５０４を分散して別の装置としてもよい。 Further, each component of each illustrated apparatus is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured. For example, to explain using the example shown in FIG. 2, the WEB page storage unit 401, the dictionary candidate storage unit 402, and the dictionary storage unit 403 may be integrated into one storage unit. Another device may be distributed.

［プログラム］
また、上記の実施例で説明した各種の処理は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１４を用いて、上記の実施例と同様の機能を有する関連情報登録プログラムを実行するコンピュータの一例を説明する。なお、図１４は、実施例１に係る関連情報登録装置のプログラムを説明するための図である。 [program]
The various processes described in the above embodiments can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. In the following, an example of a computer that executes a related information registration program having the same function as that of the above embodiment will be described with reference to FIG. FIG. 14 is a diagram for explaining a program of the related information registration apparatus according to the first embodiment.

同図に示すように、実施例１における関連情報登録装置３０００は、操作部３００１、マイク３００２、スピーカ３００３、ディスプレイ３００５、通信部３００６、ＣＰＵ３０１０、ＲＯＭ３０１１、ＨＤＤ３０１２、ＲＡＭ３０１３をバス３００９などで接続して構成されている。 As shown in the figure, the related information registration apparatus 3000 according to the first embodiment includes an operation unit 3001, a microphone 3002, a speaker 3003, a display 3005, a communication unit 3006, a CPU 3010, a ROM 3011, an HDD 3012, and a RAM 3013 connected by a bus 3009 or the like. It is configured.

ＲＯＭ３０１１には、上記の実施例１で示したＷＥＢページ収集部５０１と、辞書候補抽出部５０２と、辞書作成部５０３と、関連情報設定部５０４と同様の機能を発揮する制御プログラム、つまり、同図に示すように、ＷＥＢページ収集プログラム３０１１ａと、辞書候補抽出プログラム３０１１ｂと、辞書作成プログラム３０１１ｃと、関連情報設定プログラム３０１１ｄとが予め記憶されている。なお、これらのプログラム３０１１ａ〜３０１１ｄについては、図２に示した関連情報登録装置の各構成要素と同様、適宜統合または分離してもよい。 The ROM 3011 stores a control program that exhibits the same functions as the WEB page collection unit 501, dictionary candidate extraction unit 502, dictionary creation unit 503, and related information setting unit 504 described in the first embodiment, that is, the same program. As shown in the figure, a WEB page collection program 3011a, a dictionary candidate extraction program 3011b, a dictionary creation program 3011c, and a related information setting program 3011d are stored in advance. Note that these programs 3011a to 3011d may be integrated or separated as appropriate, similarly to each component of the related information registration apparatus shown in FIG.

そして、ＣＰＵ３０１０が、これらのプログラム３０１１ａ〜３０１１ｄをＲＯＭ３０１１から読み出して実行することにより、図１４に示すように、各プログラム３０１１ａ〜３０１１ｄについては、ＷＥＢページ収集プロセス３０１０ａと、辞書候補抽出プロセス３０１０ｂと、辞書作成プロセス３０１０ｃと、関連情報設定プロセス３０１０ｄとして機能するようになる。なお、各プロセス３０１０ａ〜３０１０ｄは、図２に示した、ＷＥＢページ収集部５０１と、辞書候補抽出部５０２と、辞書作成部５０３と、関連情報設定部５０４とにそれぞれ対応する。 Then, the CPU 3010 reads these programs 3011a to 3011d from the ROM 3011 and executes them, and as shown in FIG. 14, for each program 3011a to 3011d, a WEB page collection process 3010a, a dictionary candidate extraction process 3010b, It functions as a dictionary creation process 3010c and a related information setting process 3010d. Each process 3010a to 3010d corresponds to the WEB page collection unit 501, the dictionary candidate extraction unit 502, the dictionary creation unit 503, and the related information setting unit 504 shown in FIG.

そして、ＨＤＤ３０１２には、ＷＥＢページテーブル３０１２ａと、辞書候補テーブル３０１２ｂと、辞書テーブル３０１２ｃとが設けられている。なお、各テーブル３０１２ａ〜３０１２ｃは、図２に示した、ＷＥＢページ記憶部４０１と、辞書候補記憶部４０２と、辞書記憶部４０３とにそれぞれ対応する。 The HDD 3012 is provided with a WEB page table 3012a, a dictionary candidate table 3012b, and a dictionary table 3012c. Each table 3012a to 3012c corresponds to the WEB page storage unit 401, the dictionary candidate storage unit 402, and the dictionary storage unit 403 shown in FIG.

そして、ＣＰＵ３０１０は、ＷＥＢページテーブル３０１２ａと、辞書候補テーブル３０１２ｂと、辞書テーブル３０１２ｃとを読み出してＲＡＭ３０１３に格納し、ＲＡＭ３０１３に格納されたＷＥＢページデータ３０１３ａと、辞書候補データ３０１３ｂと、辞書データ３０１３ｃとを用いて、関連情報登録プログラムを実行する。 Then, the CPU 3010 reads the WEB page table 3012a, the dictionary candidate table 3012b, and the dictionary table 3012c, stores them in the RAM 3013, and stores the WEB page data 3013a, dictionary candidate data 3013b, dictionary data 3013c stored in the RAM 3013. Is used to execute a related information registration program.

［その他］
なお、実施例１で説明した関連情報登録装置３００は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。このプログラムは、インターネットなどのネットワークを介して配布することができる。また、このプログラムは、ハードディスク、フレキシブルディスク（ＦＤ）、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行することもできる。 [Others]
The related information registration apparatus 300 described in the first embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program can be distributed via a network such as the Internet. The program can also be executed by being recorded on a computer-readable recording medium such as a hard disk, a flexible disk (FD), a CD-ROM, an MO, and a DVD and being read from the recording medium by the computer.

以上の実施例１〜３を含む実施形態に関し、更に以下の付記を開示する。 The following appendices are further disclosed with respect to the embodiments including the first to third embodiments.

（付記１）文書に含まれる構成要素を他の文書に関連付ける関連情報が設定されている構成要素を含む任意の文書から、当該関連情報が設定されている構成要素と当該関連情報とを抽出する抽出ステップと
他の文書に関連付ける関連情報を構成要素に対応付けて記憶する関連情報記憶部に、前記抽出ステップによって抽出された構成要素と関連情報とを対応付けて登録する登録ステップと、
を含むことを特徴とする関連情報登録方法。 (Supplementary Note 1) Extracting a component set with the related information and the related information from an arbitrary document including a component set with related information for associating the component included in the document with another document A registration step of registering the component element extracted by the extraction step and the related information in association with each other in a related information storage unit that stores the association information associated with the other document in association with the component element;
The related information registration method characterized by including.

（付記２）前記抽出ステップによって抽出される構成要素が位置する点を前記任意の文書内での基点とし、当該基点から所定の範囲内に含まれる構成要素である範囲内構成要素を抽出する範囲内抽出ステップと、
前記抽出ステップによって抽出される構成要素と関連情報との組み合わせごとに対応付けて、前記範囲内抽出ステップによって抽出される前記範囲内構成要素を前記関連情報記憶部に登録する範囲内登録ステップと、
をさらに含むことを特徴とする付記１に記載の関連情報登録方法。 (Additional remark 2) The range which extracts the component in the range which is a component included in the predetermined range from the base point in which the point where the component extracted by the extraction step is located is a base point An internal extraction step;
In-range registration step of registering the in-range constituent elements extracted by the in-range extraction step in the related information storage unit in association with each combination of the constituent elements extracted by the extraction step and the related information;
The related information registration method according to supplementary note 1, further comprising:

（付記３）前記範囲内登録ステップによって登録される前記範囲内構成要素各々に対応付けて、前記関連情報記憶部に、前記構成要素と前記関連情報とが同一となる組み合わせに対応付けられて当該範囲内構成要素各々が抽出される回数を登録する回数登録ステップをさらに備えることを特徴とする付記２に記載の関連情報登録方法。 (Supplementary Note 3) In association with each in-range component registered by the in-range registration step, the related information storage unit is associated with a combination in which the component and the related information are the same. The related information registration method according to appendix 2, further comprising a number registration step of registering the number of times each component within the range is extracted.

（付記４）文書に含まれる構成要素を他の文書に関連付ける関連情報を、当該構成要素に対応付けて記憶する関連情報記憶手段と、
他の文書に関連付ける関連情報が設定されている構成要素を含む任意の文書から、当該関連情報が設定されている構成要素と当該関連情報とを抽出する抽出手段と、
前記抽出手段によって抽出される関連情報と構成要素とを対応付けて前記関連情報記憶手段に登録する登録手段と、
を備えることを特徴とする関連情報登録装置。 (Supplementary Note 4) Related information storage means for storing related information for associating a component included in a document with another document in association with the component,
Extraction means for extracting the constituent element for which the relevant information is set and the relevant information from any document including the constituent element for which the relevant information to be associated with another document is set;
Registration means for associating the relevant information extracted by the extracting means with the constituent elements and registering them in the relevant information storage means;
A related information registration device comprising:

（付記５）文書に含まれる構成要素を他の文書に関連付ける関連情報が設定されている構成要素を含む任意の文書から、当該関連情報が設定されている構成要素と当該関連情報とを抽出する抽出手順と、
他の文書に関連付ける関連情報を当該構成要素に対応付けて記憶する関連情報記憶部に、前記抽出手順によって抽出される構成要素と関連情報とを対応付けて登録する登録手順と、
をコンピュータに実行させることを特徴とする関連情報登録プログラム。 (Additional remark 5) Extract the component in which the relevant information is set, and the related information from an arbitrary document including the component in which the related information for associating the component contained in the document with another document is set Extraction procedure;
A registration procedure for registering the component information extracted by the extraction procedure in association with the related information in the related information storage unit that stores the related information associated with another document in association with the component;
The related information registration program characterized by causing a computer to execute.

実施例１に係る関連情報登録装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the related information registration apparatus which concerns on Example 1. FIG. 実施例１に係る関連情報登録装置の構成を説明するためのブロック図である。It is a block diagram for demonstrating the structure of the related information registration apparatus which concerns on Example 1. FIG. 実施例１におけるＷＥＢページ記憶部に記憶されている情報の一例を説明するための図である。6 is a diagram for explaining an example of information stored in a WEB page storage unit in Embodiment 1. FIG. 実施例１における辞書候補記憶部に記憶されている情報の一例を説明するための図である。It is a figure for demonstrating an example of the information memorize | stored in the dictionary candidate memory | storage part in Example 1. FIG. 実施例１における辞書記憶部に記憶されている情報の一例を説明するための図である。FIG. 6 is a diagram for explaining an example of information stored in a dictionary storage unit according to the first embodiment. 実施例１における辞書作成部による登録処理を説明するための図である。It is a figure for demonstrating the registration process by the dictionary creation part in Example 1. FIG. 実施例１における辞書作成部による登録処理を説明するための図である。It is a figure for demonstrating the registration process by the dictionary creation part in Example 1. FIG. 実施例１におけるＷＥＢページ登録処理の流れを説明するためのフローチャートである。6 is a flowchart for explaining the flow of a WEB page registration process in the first embodiment. 実施例１における辞書候補登録処理の流れを説明するためのフローチャートである。It is a flowchart for demonstrating the flow of the dictionary candidate registration process in Example 1. FIG. 実施例１における辞書登録処理の流れを説明するためのフローチャートである。7 is a flowchart for explaining a flow of dictionary registration processing according to the first embodiment. 実施例１におけるリンク先ＵＲＬ設定処理の流れを説明するためのフローチャートである。6 is a flowchart for explaining a flow of a link destination URL setting process in the first embodiment. 実施例２における辞書記憶部を説明するための図である。It is a figure for demonstrating the dictionary memory | storage part in Example 2. FIG. 実施例２における集約処理の流れを説明するためのフローチャートである。12 is a flowchart for explaining a flow of aggregation processing in the second embodiment. 実施例１に係る関連情報登録装置のプログラムを説明するための図である。It is a figure for demonstrating the program of the related information registration apparatus which concerns on Example 1. FIG.

Explanation of symbols

１００インターネット
２００クライアント
３００関連情報登録装置
４００記憶部
４０１ＷＥＢページ記憶部
４０２辞書候補記憶部
４０３辞書記憶部
５００制御部
５０１ＷＥＢページ収集部
５０２辞書候補抽出部
５０３辞書作成部
５０４関連情報設定部 100 Internet 200 Client 300 Related Information Registration Device 400 Storage Unit 401 Web Page Storage Unit 402 Dictionary Candidate Storage Unit 403 Dictionary Storage Unit 500 Control Unit 501 Web Page Collection Unit 502 Dictionary Candidate Extraction Unit 503 Dictionary Creation Unit 504 Related Information Setting Unit

Claims

Computer
An extraction step for extracting a component set with the related information and the related information from a document including the component set with related information for associating the component included in the document with another document, and another document A registration step of registering the component information extracted in the extraction step in association with the related information in a related information storage unit that stores the related information associated with the component in association with each other;
A base point of the point component extracted by the extraction step are located in said document, a range extracting a range in the components is a component contained in a predetermined range from the base point,
In-range registration step of registering the in-range constituent elements extracted by the in-range extraction step in the related information storage unit in association with each combination of the constituent elements extracted by the extraction step and the related information;
When the related information is newly set, if there is a matching constituent element that is a constituent element that matches the constituent element stored in the related information storage unit in the document to be processed, the document to be processed is Whether in-range constituent elements for the matching constituent elements are extracted, and whether the in-range constituent elements registered in the related information storage unit in association with the constituent elements match the in-range constituent elements for the matching constituent elements Related information registration method and executes the related information setting step of setting the related information to the matching component based.

The computer
Corresponding to each in-range component registered in the in-range registration step, the in-range component corresponding to the combination in which the component and the related information are the same in the related information storage unit Further performing a number registration step of registering the number of times each is extracted ,
The related information registration method according to claim 1, wherein the related information setting step sets the related information based on each of the in-range components weighted by the number of times registered by the number of times registration step. .

Related information storage means for storing related information for associating a component included in a document with another document in association with the component;
Extraction means for extracting a component for which the related information is set and the related information from a document including the component for which the related information to be associated with another document is set;
Registration means for associating the relevant information extracted by the extracting means with the constituent elements and registering them in the relevant information storage means;
In-range extraction means for extracting, as a base point in the document , a point where the constituent element extracted by the extraction means is located, and extracting an in-range constituent element that is a constituent element included in a predetermined range from the base point;
In-range registration means for registering the in-range constituent elements extracted by the in-range extraction means in the related information storage means in association with each combination of the constituent elements extracted by the extraction means and related information;
When newly setting the related information, if a matching constituent element that is a constituent element that matches the constituent element stored in the related information storage means is present in the document to be processed, the document to be processed is Whether in-range constituent elements for the matching constituent elements are extracted, and whether the in-range constituent elements registered in the related information storage unit in association with the constituent elements match the in-range constituent elements for the matching constituent elements And a related information setting unit configured to set the related information in the matching component based on the related information registration device.

An extraction procedure for extracting a component set with the related information and the related information from a document including the component set with related information for associating the component included in the document with another document;
A registration procedure for registering the component information extracted by the extraction procedure in association with the related information in the related information storage unit that stores the related information associated with another document in association with the component;
In-range extraction procedure for extracting a component in a range that is a component included in a predetermined range from the base point, the point where the component extracted by the extraction procedure is located, as a base point in the document ;
An in-range registration procedure for registering the in-range component extracted by the in-range extraction procedure in the related information storage unit in association with each combination of the component extracted by the extraction procedure and related information;
When the related information is newly set, if there is a matching constituent element that is a constituent element that matches the constituent element stored in the related information storage unit in the document to be processed, the document to be processed is Whether in-range constituent elements for the matching constituent elements are extracted, and whether the in-range constituent elements registered in the related information storage unit in association with the constituent elements match the in-range constituent elements for the matching constituent elements A related information registration program for causing a computer to execute a related information setting procedure for setting the related information in the matching component.