JP5810046B2

JP5810046B2 - Document search keyword presentation apparatus, method, and program

Info

Publication number: JP5810046B2
Application number: JP2012161985A
Authority: JP
Inventors: 正彬西野; 宜仁安田; 良治片岡
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2012-07-20
Filing date: 2012-07-20
Publication date: 2015-11-11
Anticipated expiration: 2032-07-20
Also published as: JP2014021861A

Description

本発明は、文書検索キーワード提示装置及び方法及びプログラムに係り、特に、ウェブ等から収集した文書集合の中から文書を検索するための文書検索キーワード提示装置及び方法及びプログラムに関する。 The present invention relates to a document search keyword presentation apparatus, method, and program, and more particularly, to a document search keyword presentation apparatus, method, and program for searching a document from a collection of documents collected from the web or the like.

従来より、インターネットから得られる情報等、大規模な文書を対象として、キーワードと興味対象の位置座標を与えることにより、与えたキーワードと位置と関連のある文書を検索するような文書検索が知られている（例えば、特許文献１参照）。 Conventionally, document retrieval that retrieves a document related to a given keyword and position by giving a keyword and a position coordinate of a target of interest for a large-scale document such as information obtained from the Internet is known. (For example, refer to Patent Document 1).

このような検索（地理情報検索）は、地図を閲覧しながら、その範囲で興味対象のキーワードに関する情報を検索することに利用することが可能である。しかし、利用者は現在閲覧中の地図の範囲における特定のキーワードに関連した情報を求めている場合ばかりではなく、単にその地図範囲における特徴的な情報を得たい場合もある。このような場合、従来の地理情報検索だけを用いたのでは、適切な検索キーワードを想起する必要があるが、そもそもその知識に特徴的な情報を得るという目的を持つ利用者は検索キーワードも想起しようがない。 Such a search (geographic information search) can be used for searching information related to a keyword of interest within a range while browsing a map. However, the user may not only seek information related to a specific keyword in the currently viewed map range, but also simply want to obtain characteristic information in that map range. In such a case, if only the conventional geographic information search is used, it is necessary to recall an appropriate search keyword. However, a user with the purpose of obtaining information characteristic of the knowledge in the first place also recalls the search keyword. can not help it.

このような場合の方法として、統計に基づき、その地域に対して特徴的であると言える場合にキーワードを推薦するという方法が存在する（例えば、非特許文献１参照）。これはあるキーワードが、ある範囲内の地名においてより多く出現しているという、地名間での統計的な偏りを根拠に推薦していた。このために、キーワードと地名の対の頻度を事前に計数する。しかし、このような場合の「対で出現した」という判定の基準は必ずしも自明ではない。従来の技術では、地名とキーワードが同一の文書内に出現したかどうか、つまり、文書内で共起したかをもって判定していた。 As a method in such a case, there is a method of recommending a keyword when it can be said that the region is characteristic based on statistics (see, for example, Non-Patent Document 1). This recommendation was based on the statistical bias between place names, in which a certain keyword appears more frequently in place names within a certain range. For this purpose, the frequency of keyword / place name pairs is counted in advance. However, the criterion for determining “appeared in pairs” in such a case is not necessarily self-evident. In the conventional technology, determination is made based on whether a place name and a keyword appear in the same document, that is, whether they co-occur in the document.

特開2009-134463号公報JP 2009-134463 A

廣嶋伸章、安田宜仁、藤田尚樹、片岡良治、「地理情報検索におけるクエリ入力支援のための候補表現の提示」,第26回人工知能学会全国大会,2012.Nobuaki Takashima, Yoshihito Yasuda, Naoki Fujita, Ryoji Kataoka, "Presentation of candidate expressions for query input support in geographic information retrieval", 26th National Congress of the Japanese Society for Artificial Intelligence, 2012.

しかし、文書内での共起を対象にした場合、意味的繋がりがない、あるいは意味的繋がりが希薄な対も抽出してしまうという問題があった。例えば、以下のような文があったとする。 However, when co-occurrence in a document is targeted, there is a problem that pairs with no semantic connection or with a weak semantic connection are extracted. For example, consider the following sentence:

「昨日は横須賀でセパタクローの試合がありました。惨敗でした。ただ、その後自宅でやった打ち上げのヤキソバが美味しかったので満足です！」
この文において、｛横須賀−ヤキソバ｝という対は意味的には繋がりがないにもかかわらず、文書内での共起を対象とした場合、対として抽出することになってしまう。 “Yes, yesterday there was a Sepak Takraw match in Yokosuka. It was a defeat. However, I was satisfied because the yakisoba that I did at home was delicious!”
In this sentence, the {Yokosuka-Yakisoba} pair is not connected semantically, but it is extracted as a pair when co-occurrence in the document is targeted.

一方で、同一文内といった狭い範囲での共起や、「香川のうどん」のように地名とキーワードが助詞「の」で繋がっているような場合に限定し、意味的に強い繋がりの対を抽出するようにすることも考えられる。しかし、このような場合、取り扱う対が極端に減ってしまい、対の頻度を統計的に処理するには十分でない場合があるという問題があった。これは、地名は文書中で頻繁に出現するわけではなく、一度の出現で長い範囲にわたって関連するように用いられることが多いためである。例えば、以下のような文があったとする。 On the other hand, it is limited to co-occurrence in a narrow range such as in the same sentence, or when the place name and keyword are connected by the particle "no", such as "Kagawa no udon". It is also possible to extract. However, in such a case, the number of pairs handled is extremely reduced, and there is a problem that the frequency of the pairs may not be sufficient for statistical processing. This is because place names do not appear frequently in documents, but are often used to relate over a long range with a single appearance. For example, consider the following sentence:

「３年前に横須賀に越して以来、すっかりこの街が気に入っています。山がちな地形で、トンネルが多く、道路が渋滞しやすいのはたまに傷ですが、四方を海に囲まれているおかげでしょうか、温暖な気候で冬でもあまり暖房がいりません。京浜電鉄で都内にでやすいのも良いですね。」
この文において、｛横須賀−温暖｝や｛横須賀−京浜電鉄｝という対は、意味的繋がりが強いと考えられるが、一文内での共起に限定してしまった場合には対として取り扱われない。このために対の頻度を統計的な偏りとして処理するには十分な数が得られない。 “I have really liked this city since I moved to Yokosuka 3 years ago. It has mountainous topography, many tunnels, and roads tend to be congested. It ’s a mild climate that does n’t require much heating even in the winter.
In this sentence, {Yokosuka-warm} and {Yokosuka-Keihin Electric Railway} pairs are considered to have strong semantic connections, but they are not treated as pairs if they are limited to co-occurrence within a sentence. . For this reason, a sufficient number cannot be obtained to process the frequency of pairs as a statistical bias.

以上をまとめると、質の高い対を集めようとすると統計的に処理するための十分な数の対が得られない可能性があり、一方で対の数を優先して対を集めると、質の低い対を集めることになってしまうという問題があった。 In summary, if you try to collect high quality pairs, you may not get a sufficient number of pairs for statistical processing, but if you give priority to the number of pairs, There was a problem that would end up collecting low pairs.

本発明は、上記の点に鑑みなされたもので、推薦キーワードの有効性を犠牲とすることなく、十分な数の統計に基づいて推薦キーワードを算出することが可能な文書検索キーワード提示装置及び方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and provides a document search keyword presentation apparatus and method capable of calculating recommended keywords based on a sufficient number of statistics without sacrificing the effectiveness of the recommended keywords. And to provide a program.

上記の課題を解決するため、本発明は、利用者から入力された地図範囲に応じたキーワードを提示する文書検索キーワード提示装置であって、
文書集合からキーワードの候補表現Ａ、該候補表現Ａの文書中での位置、及び、文書中の地名表現を抽出する文書解析手段と、
前記候補表現Ａと基準となる地図範囲の緯度経度を所定の度数で区切った矩形であるメッシュについて、品質の異なる複数の抽出基準毎に、該抽出基準に対応する共起範囲における該候補表現Ａと該メッシュの共起頻度または出現範囲における該候補表現Ａまたは該メッシュの出現頻度を求め、記憶手段に格納する頻度算出手段と、
前記利用者から入力された注目地図範囲に含まれるメッシュと候補表現Ｂの対に基づいて、共起範囲または出現範囲が最も狭い前記抽出基準から順に適用し、前記記憶手段の共起頻度または出現頻度を参照し、対の数が前記記憶手段に格納されている値から導出された十分な数量に達しているかをスコアとして算出し、該スコアが該十分な量に達している場合は、該共起範囲または該出現範囲を採用し、達していない場合は共起範囲または出現範囲を広くしてスコアを算出するスコア算出手段と
前記スコア算出手段で算出された前記スコアの上位Ｎ件の対を出力する出力手段と、を有する。 In order to solve the above problems, the present invention is a document search keyword presentation device that presents keywords according to a map range input by a user,
A document analysis means for extracting a candidate expression A of a keyword from the document set, a position of the candidate expression A in the document, and a place name expression in the document;
For a mesh that is a rectangle obtained by dividing the candidate representation A and the latitude and longitude of the reference map range by a predetermined frequency, the candidate representation A in the co-occurrence range corresponding to the extraction criterion for each of a plurality of extraction criteria having different qualities. And a frequency calculation means for obtaining the candidate expression A or the appearance frequency of the mesh in the co-occurrence frequency or the appearance range of the mesh and storing in the storage means;
Based on the pair of the mesh and candidate expression B included in the attention map range input from the user, the co-occurrence range or the appearance range is applied in order from the extraction criterion that is the narrowest, and the co-occurrence frequency or appearance of the storage means Refer to the frequency, calculate whether the number of pairs has reached a sufficient quantity derived from the value stored in the storage means, and if the score has reached the sufficient quantity, A score calculation unit that adopts a co-occurrence range or the appearance range and, if not reached, widens the co-occurrence range or the appearance range and calculates a score; a pair of the top N scores of the score calculated by the score calculation unit Output means for outputting.

また、本発明の前記文書解析手段は、
前記文書集合の、見出しの位置、文の区切りの位置、段落区切りの位置、文書中の候補表現及び該候補表現の文書中の位置、及び、文書中に記述された地名表現を抽出する手段を含む。 The document analysis means of the present invention includes:
Means for extracting a heading position, a sentence break position, a paragraph break position, a candidate expression in the document, a position of the candidate expression in the document, and a place name expression described in the document of the document set; Including.

また、本発明の前記頻度算出手段は、
前記抽出基準として、
前記候補表現Ａの、同一文内での共起（第１抽出基準）、同一段落内での共起（第２抽出基準）、一方が文書の見出し中で、他方が本文のどこかである場合の共起（第３抽出基準）、同一文書内での共起（第４抽出基準）、の順に段階的に設定し、
前記第１抽出基準に基づいて、前記文書解析手段から得られた文書について、同一文内における地名と候補表現の同一文内共起頻度を算出し、該地名の含意する範囲に含まれるメッシュと該候補表現と該共起頻度を対応付けて前記記憶手段のメッシュ−候補表現頻度表に格納し、該候補表現の文内出現頻度を該記憶手段の候補表現頻度表に格納し、該メッシュの文内出現頻度を該記憶手段のメッシュ頻度表に格納する同一文内共起による頻度算出手段と、
前記第２抽出基準に基づいて、前記文書解析手段から得られた文書内の段落について、同一段落内における地名と候補表現の同一段落内共起頻度を算出し、該地名の含意する範囲に含まれるメッシュと該候補表現と該同一段落内共起頻度を対応付けて前記メッシュ−候補表現頻度表に格納し、該候補表現が同一段落に出現する段落内出現頻度を前記候補表現頻度表に格納し、該メッシュの段落内出現頻度を前記メッシュ頻度表に格納する同一段落内共起による頻度算出手段と、
前記第３抽出基準に基づいて、前記文書解析手段から得られた文書の見出し中に出現する候補表現と文書中に出現する地名の対、または、見出し中に出現する地名と文書中に出現する候補表現の対の見出し−本文共起頻度を算出し、前記メッシュ−候補表現頻度表に格納する見出し−本文関係での共起による頻度算出手段と、
前記第４抽出基準に基づいて、前記文書解析手段から得られた文書について、該文書内の候補表現と地名の含意する範囲に含まれるメッシュの同一文書内共起頻度を前記メッシュ−候補表現頻度表に格納し、該候補表現の文書内出現頻度を前記候補表現頻度表に格納し、該メッシュの文書内出現頻度を前記メッシュ頻度表に格納する同一文書内共起による頻度算出手段と、
を含み、
前記スコア算出手段は、
前記十分な数量として、前記メッシュ頻度表を参照し、前記利用者から入力された注目地図範囲に含まれるメッシュの重み付き文出現頻度の和（ｋ）を求め、前記抽出基準を最高位から順に適用して、前記メッシュ−候補表現頻度表、前記候補表現頻度表、前記メッシュ頻度表を参照して求めたスコアが、該ｋを超えた時点で、メッシュと候補表現の対と該スコアを出力する手段を含む。 Moreover, the frequency calculation means of the present invention includes:
As the extraction criteria,
Co-occurrence of the candidate expression A in the same sentence (first extraction criterion), co-occurrence in the same paragraph (second extraction criterion), one in the headline of the document and the other somewhere in the text In the order of co-occurrence of cases (third extraction criterion), co-occurrence in the same document (fourth extraction criterion),
Based on the first extraction criterion, for a document obtained from the document analysis means, a place name in the same sentence and a co-occurrence frequency in the same sentence of the candidate expression are calculated, and a mesh included in a range implied by the place name; The candidate expression and the co-occurrence frequency are associated with each other and stored in the mesh-candidate expression frequency table of the storage means, the appearance frequency of the candidate expression in a sentence is stored in the candidate expression frequency table of the storage means, and the mesh A frequency calculation means by co-occurrence in the same sentence for storing the appearance frequency in the sentence in the mesh frequency table of the storage means;
Based on the second extraction criteria, for the paragraphs in the document obtained from the document analysis means, the place name in the same paragraph and the co-occurrence frequency in the same paragraph of the candidate expression are calculated and included in the range implied by the place name And the candidate expression and the co-occurrence frequency in the same paragraph are stored in the mesh-candidate expression frequency table, and the appearance frequency in the paragraph in which the candidate expression appears in the same paragraph is stored in the candidate expression frequency table. And a frequency calculation means by co-occurrence in the same paragraph for storing the appearance frequency of the mesh in the paragraph in the mesh frequency table;
Based on the third extraction criterion, a candidate expression that appears in the document heading obtained from the document analysis means and a place name that appears in the document, or a place name that appears in the heading and the document appears in the document. A candidate expression pair heading-text co-occurrence frequency is calculated and stored in the mesh-candidate expression frequency table;
Based on the fourth extraction criterion, for the document obtained from the document analysis means, the co-occurrence frequency in the same document of the mesh included in the range implied by the candidate expression in the document and the place name is the mesh-candidate expression frequency. A frequency calculation means by co-occurrence in the same document storing in the table, storing the appearance frequency of the candidate expression in the document in the candidate expression frequency table, and storing the appearance frequency of the mesh in the document in the mesh frequency table;
Including
The score calculation means includes
As the sufficient quantity, referring to the mesh frequency table, the sum (k) of the weighted sentence appearance frequencies of the meshes included in the attention map range input by the user is obtained, and the extraction criteria are sequentially set from the highest level. Apply the mesh-candidate expression frequency table, the candidate expression frequency table, and the score obtained by referring to the mesh frequency table to output the mesh and candidate expression pair and the score when the score exceeds the k Means to do.

本発明によれば、利用者の表示している地図の範囲に応じたキーワードを提示することができるので、現在の地理範囲に応じた適切なキーワードを提示することが可能となる。この際に、質の高い対から繰り返し統計的に処理するための十分な数が得られればその時点で結果を出力するので、質の低い対に依存せずにキーワードを出力することが可能となる。また、十分な数が得られない場合は、十分な数が得られるまで相対的に質の低い基準まで適用するため、その地理範囲に対して統計的に十分な根拠をもって推薦キーワードを算出することが可能となる。 According to the present invention, it is possible to present a keyword according to the range of the map displayed by the user, so it is possible to present an appropriate keyword according to the current geographical range. In this case, if a sufficient number for repeated statistical processing is obtained from high quality pairs, the result is output at that point, so it is possible to output keywords without depending on low quality pairs Become. In addition, if sufficient numbers are not available, the recommended keywords should be calculated with a statistically sufficient basis for the geographical range in order to apply relatively low quality standards until sufficient numbers are obtained. Is possible.

本発明の一実施の形態における文書検索キーワード提示装置の構成図である。It is a block diagram of the document search keyword presentation apparatus in one embodiment of this invention. 本発明の一実施の形態におけるメッシュ−候補表現頻度表の例である。It is an example of the mesh-candidate expression frequency table in one embodiment of the present invention. 本発明の一実施の形態における候補表現頻度表の例である。It is an example of the candidate expression frequency table | surface in one embodiment of this invention. 本発明の一実施の形態におけるメッシュ頻度表の例である。It is an example of the mesh frequency table | surface in one embodiment of this invention. 本発明の一実施の形態における前処理のフローチャートである。It is a flowchart of the pre-process in one embodiment of this invention. 本発明の一実施の形態における本処理のフローチャートである。It is a flowchart of this process in one embodiment of the present invention. 本発明の一実施の形態におけるスコア算出部で用いる２×２分割表である。It is a 2 * 2 contingency table used by the score calculation part in one embodiment of the present invention.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施の形態における文書検索キーワード提示装置の構成を示す。 FIG. 1 shows the configuration of a document search keyword presentation device according to an embodiment of the present invention.

本装置は、前述の質の高い対（地名，キーワード）を集めようとすると統計的に処理するための十分な数の対が得られない可能性があり、一方で対の数を優先して対を集めると、質の低い対を集めることになってしまうという問題に対し、推薦キーワードの有効性を犠牲とすることなく、十分な数の統計に基づいて推薦キーワードを算出するものである。 This device may not obtain a sufficient number of pairs for statistical processing when trying to collect the above-mentioned high quality pairs (place names, keywords), while giving priority to the number of pairs. In response to the problem of collecting pairs of low quality, the recommended keywords are calculated based on a sufficient number of statistics without sacrificing the effectiveness of the recommended keywords.

本装置は、対の質の高さの異なる４つの対抽出部を有する。具体的には、同一文内での共起による頻度算出部６、同一段落内共起による頻度算出部７、見出し−本文関係での共起（一方が文書の見出し中で、他方が本文のどこかである場合の共起）による頻度算出部８、同一文書内共起による頻度算出部９の４つである。本発明はこの４種類の対抽出基準に限らず、品質の異なる複数の抽出基準を用いることができる。但し、本実施の形態で示した抽出基準以外である場合、事前にその基準によって抽出できる対の品質が他の抽出基準に対してよいか悪いかを予想できることが望ましい。 The apparatus has four pair extraction units with different pairs of high quality. Specifically, the frequency calculation unit 6 by co-occurrence in the same sentence, the frequency calculation unit 7 by co-occurrence in the same paragraph, and the co-occurrence in the headline-text relationship (one is in the headline of the document and the other is the text The frequency calculation unit 8 by co-occurrence in the case of somewhere) and the frequency calculation unit 9 by co-occurrence in the same document are four. The present invention is not limited to the four types of pair extraction criteria, and a plurality of extraction criteria having different qualities can be used. However, in cases other than the extraction criteria shown in the present embodiment, it is desirable to be able to predict in advance whether the quality of a pair that can be extracted based on the criteria is better or worse than other extraction criteria.

また、対の頻度を得た後に、その頻度から特徴的かどうかを判定するための統計的な手法は当業者に公知の技術を用いることができるが、本実施の形態では、二要因の偏りによる方法としてのカイ二乗値による方法を用いて述べる。この他にも、非特許文献１で示されているような、一要因の比の偏りによる方法としてのポアソン確率による方法を用いることも可能である。 Further, after obtaining the frequency of the pair, a statistical technique for determining whether or not the characteristic is characteristic from the frequency can use a technique known to those skilled in the art. In this embodiment, the bias of two factors is used. The method using the chi-square value as the method according to is described. In addition to this, it is also possible to use a method based on Poisson probability as a method based on a bias in the ratio of one factor as shown in Non-Patent Document 1.

キーワード算出の基準となる範囲の最小単位を定めるため、日本（あるいは世界）全体を緯度経度それぞれを決められた度数で区切った矩形を考える。この矩形を以後「メッシュ」と呼ぶ。メッシュの大きさは対象文書集合や、地図の性質に合わせて任意に設定できる。なお、本明細書では、概ね500m四方を想定している。また、必要に応じてメッシュの大きさの異なる本装置を複数作成し、いくつかの粒度に合わせた候補表現の提示をすることも考えられる。 In order to determine the minimum unit of the range that is the basis for keyword calculation, let us consider a rectangle that divides the entire Japan (or the world) into latitude and longitude that are determined in degrees. This rectangle is hereinafter referred to as “mesh”. The size of the mesh can be set arbitrarily according to the target document set and the nature of the map. In this specification, approximately 500 m square is assumed. It is also conceivable to create a plurality of the apparatuses having different mesh sizes as required and present candidate expressions according to several granularities.

本装置は、文書ＤＢ１、文書解析部２、メッシュ−候補表現頻度表記憶部３、候補表縁頻度表記憶部４、メッシュ頻度表記憶部５、同一文内での共起による頻度算出部６、同一段落内共起による頻度算出部７、見出し−本文関係での共起による頻度算出部８、同一文書内共起による頻度算出部９、スコア算出部１０、出力部１１から構成される。 The apparatus includes a document DB 1, a document analysis unit 2, a mesh-candidate expression frequency table storage unit 3, a candidate table frequency table storage unit 4, a mesh frequency table storage unit 5, and a frequency calculation unit 6 based on co-occurrence within the same sentence. , A frequency calculation unit 7 by co-occurrence in the same paragraph, a frequency calculation unit 8 by co-occurrence in the headline-text relationship, a frequency calculation unit 9 by co-occurrence in the same document, a score calculation unit 10, and an output unit 11.

文書ＤＢ１は、推薦キーワードを算出する根拠となる文書を格納したデータベースである。格納する文書は、本装置によって出力する検索キーワードによって検索される文書と同一であることが望ましい。但し、例えば、新聞記事１０年分を検索対象とする場合に、本装置の文書ＤＢ１は、新聞記事１年分といった、検索対象文書群全体を反映するような小さな文書集合でも構わない。 The document DB 1 is a database that stores documents that serve as a basis for calculating recommended keywords. The document to be stored is preferably the same as the document searched by the search keyword output by this apparatus. However, for example, when 10 years of newspaper articles are to be searched, the document DB 1 of this apparatus may be a small document set reflecting the entire search target document group, such as 1 year of newspaper articles.

メッシュ−候補表現頻度表記憶部３は、メッシュ−候補表現頻度表を格納する。メッシュ−候補表現頻度表は、図２に示すように、各メッシュ番号と、そのメッシュといずれかの共起抽出基準で共起した候補表現と、各基準での重み付きの共起頻度を記した表である。これらの値は装置の前処理の時点で、同一文内共起による頻度算出部６、見出し−本文関係での共起による頻度算出部７、同一文書内共起による頻度算出部８、同一文書内共起による頻度算出部９の各部によって値が格納される。 The mesh-candidate expression frequency table storage unit 3 stores a mesh-candidate expression frequency table. As shown in FIG. 2, the mesh-candidate expression frequency table describes each mesh number, the candidate expression co-occurred with the mesh and any of the co-occurrence extraction criteria, and the weighted co-occurrence frequency with each criterion. It is a table. These values are the frequency calculation unit 6 by co-occurrence in the same sentence, the frequency calculation unit 7 by co-occurrence in the headline-text relationship, the frequency calculation unit 8 by co-occurrence in the same document, and the same document at the time of pre-processing of the apparatus. A value is stored by each unit of the frequency calculation unit 9 by internal co-occurrence.

候補表現頻度表記憶部４は、候補表現頻度表を格納する。候補表現頻度表は、図３に示すように、各候補表現について、出現文書数、出現段落数、出現文書数を記した表である。表の各値は装置の前処理の時点で、同一文内共起による頻度算出部６、見出し−本文関係での共起による頻度算出部７、同一文書内共起による頻度算出部９により格納される。 The candidate expression frequency table storage unit 4 stores a candidate expression frequency table. As shown in FIG. 3, the candidate expression frequency table is a table in which the number of appearing documents, the number of appearing paragraphs, and the number of appearing documents are described for each candidate expression. Each value in the table is stored by the frequency calculation unit 6 due to co-occurrence in the same sentence, the frequency calculation unit 7 due to co-occurrence in the heading-text relationship, and the frequency calculation unit 9 due to co-occurrence within the same document at the time of pre-processing of the apparatus. Is done.

メッシュ頻度表記憶部５は、メッシュ頻度表を格納する。メッシュ頻度表は、図４に示すように、各メッシュ番号について、そのメッシュの出現文数、出現段落数、出現文書数についての重み付き頻度を記したものである。なお、ここで重み付き頻度とは、地名が含意する広さを考慮したものであり、例えば、Ｍ個のメッシュを包含するような地名が１度出現したとすると、各メッシュの重み付き頻度は1/M回と数えるものである。表の各値は装置の前処理の時点で、同一文内共起による頻度算出部６、見出し−本文関係での共起による頻度算出部７、同一文書内共起による頻度算出部９により格納される。 The mesh frequency table storage unit 5 stores a mesh frequency table. As shown in FIG. 4, the mesh frequency table describes the weighted frequencies for the number of appearing sentences, the number of appearing paragraphs, and the number of appearing documents for each mesh number. Note that the weighted frequency here refers to the area implied by the place name. For example, if a place name that includes M meshes appears once, the weighted frequency of each mesh is Count as 1 / M times. Each value in the table is stored by the frequency calculation unit 6 due to co-occurrence in the same sentence, the frequency calculation unit 7 due to co-occurrence in the heading-text relationship, and the frequency calculation unit 9 due to co-occurrence within the same document at the time of pre-processing of the apparatus. Is done.

本装置は、前処理と、注目する地理範囲が与えられてから推薦キーワードを出力する本処理に分けることができる。 This apparatus can be divided into pre-processing and main processing for outputting recommended keywords after a geographical range of interest is given.

図５は、本発明の位置実施の形態における前処理のフローチャートである。 FIG. 5 is a flowchart of the preprocessing in the position embodiment of the present invention.

ステップ１０１）文書解析部２は、文書ＤＢ１から要約対象文書群を入力として、
文書中に出現した候補表現のリストと文書内での位置と、
文書中に出現した地名のリストと文書内での位置と、
文書中に出現した地名の緯度経度、含意する地理的範囲と、
文書の見出しの位置と、
文書内の文区切りの位置と、
文書内の段落区切りの位置と、を出力する。以下に手順を示す。 Step 101) The document analysis unit 2 receives a summary target document group from the document DB 1 as an input,
A list of candidate expressions that appear in the document, their position in the document,
A list of place names that appear in the document, their position in the document,
The latitude and longitude of the place names that appear in the document, the geographical range to imply,
The position of the document header,
The position of sentence breaks in the document,
Outputs the paragraph break position in the document. The procedure is shown below.

まず、入力文書群中の、見出しの位置、文区切りの位置、段落区切りの位置を決定する。見出しの位置は入力がＨＴＭＬである場合は、ＨＴＭＬタグによって判別可能である。平文である場合は、各文書の先頭の文書を見出しと見做す。文区切りの位置は句点（「。」）の位置と見做すことができる。あるいは当業者に公知の機械学習に基づく判定方法を用いてもよい。段落区切りの位置は文区切りのあとに改行がある位置とみなすことができる。あるいは、当業者に公知の機械学習に基づく判定方法を用いてもよい。 First, the heading position, sentence break position, and paragraph break position in the input document group are determined. The position of the headline can be determined by an HTML tag when the input is HTML. In the case of plain text, the head document of each document is regarded as a heading. The sentence break position can be regarded as the position of a punctuation mark (“.”). Alternatively, a determination method based on machine learning known to those skilled in the art may be used. The paragraph break position can be regarded as the position where the line break follows the sentence break. Alternatively, a determination method based on machine learning known to those skilled in the art may be used.

次に、文書中の候補表現を特定し、候補表現の文書中での位置を特定する。候補表現の選択方法は任意であるが、一例として、当業者に公知の固有表現抽出技術を用いて、固有表現を候補表現とすることができる。あるいは、商品名や農産物をリストとして持っておき、文書と照合することで、文書中に出現したリスト中の語を候補表現としてもよい。 Next, the candidate expression in the document is specified, and the position of the candidate expression in the document is specified. The method for selecting the candidate expression is arbitrary, but as an example, the specific expression can be used as a candidate expression by using a specific expression extraction technique known to those skilled in the art. Or it is good also considering the word in the list which appeared in the document as a candidate expression by having a brand name and agricultural products as a list, and collating with a document.

次に、文書中に記述された地名と思われる表現を特定し、地名の文書中での位置、地名の代表点の緯度経度、地名の含意する範囲を決定する。 Next, an expression that seems to be a place name described in the document is specified, and the position of the place name in the document, the latitude / longitude of the representative point of the place name, and the range implied by the place name are determined.

地名の特定、その地名の代表点の緯度経度の特定には、非特許文献２「平野徹、松尾義博、菊井玄一郎、「地理的距離と有名度を用いた地名の曖昧性解消」，情報処理学会全国大会講演論文集，2008」等で示される従来の手法を用いることができる。 Non-patent document 2 “Toru Hirano, Yoshihiro Matsuo, Genichiro Kikui,“ Disambiguation of place names using geographical distance and famousness ”, The conventional method shown in “National Conference Annual Conference Proceedings, 2008”, etc. can be used.

また、各地名の含意する範囲については、非特許文献３「安田宜仁，戸田浩之，"検索位置のごく周辺を対象とした地理情報検索"，人工知能学会論文誌，Vol. 23, No.5, pp. 364-373, 2008年7月」で示されているような従来の手法によって、地名が含まれる最小外接矩形を取得したり、既存の数値地図を使うことができる。 As for the implications of the names of various places, Non-Patent Document 3 “Yoshihito Yasuda, Hiroyuki Toda,“ Retrieval of Geographic Information Targeting the Very Near Search Location ”, Journal of the Japanese Society for Artificial Intelligence, Vol. 23, No.5 , pp. 364-373, July 2008 ”, it is possible to obtain the minimum circumscribed rectangle that includes the place name or to use an existing numerical map.

ステップ１０２）同一文内共起による頻度算出部６は、文書解析部２によって解析済みの文書を入力として、全ての文について以下の処理を行う。 Step 102) The frequency calculation unit 6 based on co-occurrence within the same sentence performs the following processing for all sentences with the document analyzed by the document analysis unit 2 as an input.

まず、文内全ての重複を除いた候補表現について、候補表現頻度表記憶部４の候補表現頻度表のその語のエントリの出現文数のカラムに１を加える。 First, 1 is added to the column of the number of appearance sentences of the entry of the word of the candidate expression frequency table of the candidate expression frequency table storage unit 4 for the candidate expression excluding all duplication in the sentence.

もし、文内に少なくとも一つの地名と少なくとも一つの候補表現が出現している場合は、それらの地名と候補表現から構成される全ての地名−候補表現の対について以下の処理を行う。 If at least one place name and at least one candidate expression appear in the sentence, the following processing is performed for all place name-candidate expression pairs composed of the place name and the candidate expression.

まず、地名の含意する範囲含まれるメッシュ番号一覧を算出する。メッシュ頻度表記憶部５の文出現頻度に1/M加える。また、各メッシュについて、もし、メッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表に候補表現のエントリがなければ追加する。このとき含まれたメッシュの個数をＭとすると、メッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表の当該候補表現の同一文内共起頻度に1/Mを加える。 First, a list of mesh numbers included in the range implied by the place name is calculated. 1 / M is added to the sentence appearance frequency in the mesh frequency table storage unit 5. Further, for each mesh, if there is no entry for candidate expression in the mesh-candidate expression frequency table of the mesh-candidate expression frequency table storage unit 3, it is added. When the number of included meshes is M, 1 / M is added to the co-occurrence frequency in the same sentence of the candidate expression in the mesh-candidate expression frequency table of the mesh-candidate expression frequency table storage unit 3.

ステップ１０３）同一段落内共起による頻度算出部７は、文書解析部２によって解析済みの文書を入力として、文書内の全ての段落について以下の処理を行う。 Step 103) The frequency calculation unit 7 by co-occurrence within the same paragraph receives the document analyzed by the document analysis unit 2 and performs the following processing for all the paragraphs in the document.

まず、段落内全ての重複を除いた候補表現について、候補表現頻度表記憶部４の候補表現頻度表のその語のエントリの段落出現頻度のカラムに１を加える。 First, 1 is added to the column of the paragraph appearance frequency of the entry of the word in the candidate expression frequency table of the candidate expression frequency table storage unit 4 for the candidate expression excluding all duplication in the paragraph.

もし、段落内に少なくとも一つの地名と少なくとも一つの候補表現が出現している場合は、それらの地名と候補表現から構成されるすべての地名−候補表現の対について以下の処理を行う。 If at least one place name and at least one candidate expression appear in the paragraph, the following processing is performed for all place name-candidate expression pairs composed of the place name and the candidate expression.

まず、地名の含意する範囲に含まれるメッシュ番号一覧を算出する。メッシュ頻度表記憶部５の段落出現頻度に1/M加える。各メッシュについて、もしメッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表に候補表現のエントリがなければ追加する。このとき含まれたメッシュの個数をＭとすると、当該メッシュ−候補表現頻度表の当該候補表現の同一段落内共起頻度に1/Mを加える。 First, a list of mesh numbers included in the range implied by the place name is calculated. 1 / M is added to the paragraph appearance frequency in the mesh frequency table storage unit 5. For each mesh, if there is no candidate expression entry in the mesh-candidate expression frequency table of the mesh-candidate expression frequency table storage unit 3, it is added. If the number of meshes included at this time is M, 1 / M is added to the co-occurrence frequency in the same paragraph of the candidate expression in the mesh-candidate expression frequency table.

ステップ１０４）見出し−本文関係での共起による頻度算出部８は、文書解析部２によって解析済みの全ての文書について以下の処理を行う。 Step 104) The frequency calculation unit 8 based on the co-occurrence in the headline-text relationship performs the following processing for all the documents analyzed by the document analysis unit 2.

もし、候補表現が文書の見出し中に出現していて、かつ、地名が文書中に出現する場合、あるいは地名が文書の見出し中に出現していて、かつ、候補表現が文書中に出現する場合は、それらの地名と候補表現から構成される全ての地名−候補表現の対について以下の処理を行う。 If the candidate expression appears in the document heading and the place name appears in the document, or the place name appears in the document heading and the candidate expression appears in the document Performs the following processing for all place name-candidate expression pairs composed of these place names and candidate expressions.

まず、地名の含意する範囲に含まれるメッシュ番号一覧を算出する。各メッシュについて、もし、メッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表に候補表現のエントリがなければ追加する。このとき含まれたメッシュの個数をＭとすると、当該メッシュ−候補表現頻度表の見出し−本文共起頻度に1/Mを加える。 First, a list of mesh numbers included in the range implied by the place name is calculated. For each mesh, if there is no entry for a candidate expression in the mesh-candidate expression frequency table of the mesh-candidate expression frequency table storage unit 3, it is added. If the number of included meshes is M, 1 / M is added to the headline-text co-occurrence frequency of the mesh-candidate expression frequency table.

ステップ１０５）同一文書内共起による頻度算出部９は、文書解析部２によって解析済みの文書を入力として、文書内の全ての文書について以下の処理を行う。 Step 105) The frequency calculation unit 9 due to co-occurrence in the same document receives the document analyzed by the document analysis unit 2 and performs the following processing for all the documents in the document.

まず、文書内全ての重複を除いた候補表現について、候補表現頻度表記憶部４の候補表現頻度表のその語のエントリの文書出現数のカラムに１を加える。もし、文書内に少なくとも一つの地名と少なくとも一つの候補表現が出現している場合、それらの地名と候補表現から構成される全ての地名−候補表現の対について以下の処理を行う。 First, 1 is added to the column of the document appearance count of the entry of the word in the candidate expression frequency table of the candidate expression frequency table storage unit 4 for the candidate expression excluding all duplications in the document. If at least one place name and at least one candidate expression appear in the document, the following processing is performed for all place name-candidate expression pairs composed of the place name and the candidate expression.

まず、地名の含意する範囲に含まれるメッシュ番号一覧を算出する。メッシュ頻度表記憶部５の文書出現頻度に1/M加える。各メッシュについて、メッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表にもし候補表現のエントリがなければ追加する。このとき含まれたメッシュの個数をＭとすると、当該メッシュ−候補表現頻度表の当該候補表現の同一文書内共起頻度に1/Mを加える。 First, a list of mesh numbers included in the range implied by the place name is calculated. 1 / M is added to the document appearance frequency in the mesh frequency table storage unit 5. For each mesh, if there is no entry for a candidate expression in the mesh-candidate expression frequency table of the mesh-candidate expression frequency table storage unit 3, it is added. If the number of meshes included at this time is M, 1 / M is added to the co-occurrence frequency in the same document of the candidate expression in the mesh-candidate expression frequency table.

次に、本処理について説明する。 Next, this process will be described.

図６は、本発明の一実施の形態における本処理のフローチャートである。 FIG. 6 is a flowchart of this processing in one embodiment of the present invention.

ステップ２０１）スコア算出部１０は、利用者の示す注目地理範囲を入力として、候補表現がその注目範囲に特徴的に偏って出現しているかどうかを示すスコアを出力部１１に出力する。以下に手順を示す。 Step 201) The score calculation unit 10 receives the attention geographic range indicated by the user as an input, and outputs a score indicating whether or not the candidate expression appears characteristically biased to the attention range to the output unit 11. The procedure is shown below.

同一文内での共起、同一段落内での共起、一方が文書の見出し中で他方が本文のどこかである場合の共起、同一文書内の共起それぞれの重みパラメータを、α、β、γ、δとし、事前に定めておく。また、根拠とするに十分な頻度パラメータをｋとし、事前に定めておく。 Co-occurrence within the same sentence, co-occurrence within the same paragraph, co-occurrence when one is a document heading and the other is somewhere in the body, co-occurrence within the same document, α, β, γ, and δ are set in advance. In addition, a frequency parameter sufficient for the basis is set as k and is determined in advance.

まず、同一文書内共起について以下の手順により、統計的に偏って出現しているような候補表現があるかどうかを判定する。 First, it is determined whether there is a candidate expression that appears statistically biased by the following procedure for co-occurrence in the same document.

１．根拠頻度数ｚを初期化する（ｚ＝０）。 1. The basis frequency number z is initialized (z = 0).

２．文の総数をｎとする（n=０）。 2. Let n be the total number of sentences (n = 0).

３．メッシュ頻度表記憶部５のメッシュ頻度表を参照し、入力として与えられた注目地理範囲に含まれるメッシュについての、重み付き文出現頻度の和（ｋ）を求める。 3. With reference to the mesh frequency table of the mesh frequency table storage unit 5, the sum (k) of the weighted sentence appearance frequencies for the meshes included in the noted geographic range given as input is obtained.

４．メッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表を参照し、入力として与えられた注目地理範囲に含まれるメッシュ内で出現する候補表現ｉの集合Ｉを得る。 4). With reference to the mesh-candidate expression frequency table in the mesh-candidate expression frequency table storage unit 3, a set I of candidate expressions i appearing in the mesh included in the geographic region of interest given as an input is obtained.

５．集合Ｉ内の各候補表現ｉについて以下を繰り返す。 5. The following is repeated for each candidate expression i in set I:

５−１．候補表現頻度表記憶部４の候補表現頻度表を参照し、ｉの文出現頻度ｓ_ｉを得る。 5-1. The sentence expression frequency s _i of _i is obtained by referring to the candidate expression frequency table in the candidate expression frequency table storage unit 4.

５−２．メッシュ−候補表現頻度表記憶部３のメッシュ−候補表現頻度表を参照し、注目地理範囲内でのｉの重み付き文出現頻度の和ｒ_ｉを求める。 5-2. With reference to the mesh-candidate expression frequency table in the mesh-candidate expression frequency table storage unit 3, the sum r _i of the weighted sentence appearance frequencies of i within the geographic region of interest is obtained.

５−３．上記のｎ，ｋ，ｓ_ｉ，ｒ_ｉを用いて、図７のような２×２分割表の各値Ａ，Ｂ，Ｃ，Ｄを以下の式により算出する。 5-3. Additional n, _k, s i, by using the _{r i,} the value A of a 2 × 2 contingency table as in FIG. 7, to calculate B, C, by the following equation D.

５−５．ｉのスコアＳ（ｉ）をＳ（ｉ）＝αＸ（ｉ）とする。 5-5. Let the score S (i) of i be S (i) = αX (i).

６．上記の全ての候補ｉについて、Ｘ（ｉ）を求めた後、Ｘ（ｉ）が最大となったｉの注目範囲内での頻度であるｒ_ｉをｚに加える。 6). For all candidate i described above, after obtaining the X (i), is added r _i X (i) is the frequency within the target range of i which was maximal in z.

上記手順において、ｚがパラメータｋを越えれば処理を終え、Ｓ（ｉ）とｉの対のリストを出力し、スコア算出部１０の処理を終える。 In the above procedure, if z exceeds the parameter k, the process is terminated, a list of pairs of S (i) and i is output, and the process of the score calculation unit 10 is terminated.

もし、ｚがｋを越えなければ、次の抽出基準である、同一段落内での共起について、上記と同様の処理を行う。但し、ｉのスコアＳ（ｉ）はこれまでのスコアにβＸ（ｉ）を加えた値、すなわちＳ（ｉ）＝Ｓ（ｉ）＋βＸ（ｉ）とする。 If z does not exceed k, the same processing as described above is performed for co-occurrence within the same paragraph, which is the next extraction criterion. However, the score S (i) of i is a value obtained by adding βX (i) to the previous score, that is, S (i) = S (i) + βX (i).

以下同様に、ｚが越えるまで処理を繰り返し、最終的な抽出基準である同一文書内共起までを行ったＳ（ｉ）とｉの対のリストを出力する。 Similarly, the process is repeated until z exceeds, and a list of S (i) and i pairs in which the final extraction criterion, i.e., co-occurrence in the same document, is output.

なお、見出し−本文関係での共起の場合は、候補表現頻度表及びメッシュ頻度表の参照においては、文書出現頻度のカラムを利用する。 In the case of co-occurrence in the headline-text relationship, the column of the document appearance frequency is used when referring to the candidate expression frequency table and the mesh frequency table.

出力部１１は、スコア算出部１０によって得られた候補表現とそのスコアについて、スコアが高いものから順に、事前に定めら得た数Ｎ件を本装置の出力として出力する。 The output unit 11 outputs, as an output of the present apparatus, the number N obtained in advance in descending order of the candidate expressions obtained by the score calculation unit 10 and the scores in descending order.

上記のように、本発明は、地名と候補表現の対において、量は少ないが質の高い対と、量は多いが質が低い対があること、及び地名に応じたキーワードを推薦するには統計的な根拠として十分な数が必要であることに着目し、複数の基準を段階的に適用することで、高品質な推薦キーワードを決定することが可能となる。 As described above, the present invention recommends that a pair of place names and candidate expressions has a small quantity but a high quality pair, a large quantity but a low quality pair, and recommends a keyword corresponding to the place name. Focusing on the fact that a sufficient number is necessary as a statistical basis, it is possible to determine high-quality recommended keywords by applying a plurality of criteria in stages.

本発明は、図１に示す文書検索キーワード提示装置の構成要素の動作をプログラムとして構築し、文書検索キーワード提示装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 The present invention constructs the operation of the constituent elements of the document search keyword presenting apparatus shown in FIG. 1 as a program and installs and executes it on a computer used as the document search keyword presenting apparatus, or distributes it via a network. Is possible.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications can be made within the scope of the claims.

１文書ＤＢ
２文書解析部
３メッシュ−候補表現頻度表記憶部
４候補表現頻度表記憶部
５メッシュ頻度表記憶部
６同一文書内共起による頻度算出部
７同一段落内共起による頻度算出部
８見出し−本文関係での共起による頻度算出部
９同一文書内共起による頻度算出部
１０スコア算出部
１１出力部 1 Document DB
2 Document analysis unit 3 Mesh-candidate expression frequency table storage unit 4 Candidate expression frequency table storage unit 5 Mesh frequency table storage unit 6 Frequency calculation unit 7 by co-occurrence in the same document 8 Frequency calculation unit 8 by co-occurrence in the same paragraph Frequency calculation unit 9 by co-occurrence in relation Frequency calculation unit 10 by co-occurrence in the same document Score calculation unit 11 Output unit

Claims

A document search keyword presentation device that presents keywords according to a map range input by a user,
A document analysis means for extracting a set of keyword candidate expressions A from the document set, a position of each candidate expression A in the document, and a place name in the document;
For each of a plurality of extraction criteria with different corresponding document ranges , a rectangle obtained by dividing the latitude and longitude of the reference map range by a predetermined frequency for each candidate expression A and each place name in the document range corresponding to the extraction criteria. In the mesh, the co-occurrence frequency is obtained for each mesh implied by the name of each place, the appearance frequency of each candidate expression A in the document range is obtained, and the appearance frequency for each mesh implied by the place name in the document range is obtained. , A frequency calculation means for storing the co-occurrence frequency and the appearance frequency in a storage means;
Based on the pair of mesh and candidate expression B included in the attention map range input by the user, the document range is applied in order from the narrowest extraction criterion, and the co-occurrence frequency and appearance frequency of the storage means are referred to When the basis frequency number z indicating whether the number of pairs has reached a sufficient quantity derived from the value stored in the storage means is calculated, and the basis frequency number z has reached the sufficient quantity adopting the document range widely documents range has not been reached, widely in document scope, score indicates whether a candidate expression B has appeared biased in characteristic of the attention map range S Score calculating means for calculating
Output means for outputting a pair having a relatively high score S calculated by the score calculating means;
A document search keyword presentation device characterized by comprising:

The extraction criterion with the narrowest document range is the occurrence or co-occurrence within the same sentence,
The frequency calculating means includes
In accordance with the number of meshes implied by the place name, the co-occurrence frequency associated with the mesh is weighted, and the mesh, the candidate expression A, and the weighted co-occurrence frequency are associated with each other in the mesh-candidate expression frequency table of the storage means. Store and
The candidate expression A and the appearance frequency are associated with each other and stored in the candidate expression frequency table of the storage unit,
According to the number of meshes implied by the place name, the appearance frequency associated with the mesh is weighted, and the mesh and the weighted appearance frequency are associated with each other and stored in the mesh frequency table of the storage means.
The document search keyword presenting apparatus according to claim 1 .

A document search keyword presentation method for presenting a keyword corresponding to a map range input by a user,
Computer
Set of candidate expressions A keyword from the document set position in a document for each candidate representations A, and a document analysis step of extracting the place name in the document,
For each of a plurality of extraction criteria with different corresponding document ranges , a rectangle obtained by dividing the latitude and longitude of the reference map range by a predetermined frequency for each candidate expression A and each place name in the document range corresponding to the extraction criteria. In the mesh, the co-occurrence frequency is obtained for each mesh implied by the name of each place, the appearance frequency of each candidate expression A in the document range is obtained, and the appearance frequency for each mesh implied by the place name in the document range is obtained. A frequency calculating step of storing the co-occurrence frequency and the appearance frequency in a storage means;
Based on the pair of mesh and candidate expression B included in the attention map range input by the user, the document range is applied in order from the narrowest extraction criterion, and the co-occurrence frequency and appearance frequency of the storage means are referred to When the basis frequency number z indicating whether the number of pairs has reached a sufficient quantity derived from the value stored in the storage means is calculated, and the basis frequency number z has reached the sufficient quantity adopting the document range widely documents range has not been reached, widely in document scope, score indicates whether a candidate expression B has appeared biased in characteristic of the attention map range S A score calculating step for calculating
An output step of outputting a pair having a relatively high score S calculated in the score calculating step;
The document search keyword presentation method characterized by performing this .

The extraction criterion with the narrowest document range is the occurrence or co-occurrence within the same sentence,
The frequency calculating step includes:
In accordance with the number of meshes implied by the place name, the co-occurrence frequency associated with the mesh is weighted, and the mesh, the candidate expression A, and the weighted co-occurrence frequency are associated with each other in the mesh-candidate expression frequency table of the storage means. Store and
The candidate expression A and the appearance frequency are associated with each other and stored in the candidate expression frequency table of the storage unit,
According to the number of meshes implied by the place name, the appearance frequency associated with the mesh is weighted, and the mesh and the weighted appearance frequency are associated with each other and stored in the mesh frequency table of the storage means.
The document search keyword presentation method according to claim 3 .

Computer
Document retrieval keyword presentation program for functioning as each unit of the document search keyword presentation apparatus according to claim 1 or 2.