JP6592237B2

JP6592237B2 - Information acquisition server, information acquisition method, and information acquisition and distribution system

Info

Publication number: JP6592237B2
Application number: JP2014208546A
Authority: JP
Inventors: 孝利石井
Original assignee: Jcc株式会社; Ｊｃｃ株式会社
Priority date: 2014-10-10
Filing date: 2014-10-10
Publication date: 2019-10-16
Anticipated expiration: 2034-10-10
Also published as: JP2016081096A

Description

本発明は、インターネット上のウェブサイトから高い精度で情報を取得する情報取得サーバー、情報取得方法、及び情報取得配信システムに関する。 The present invention relates to an information acquisition server, an information acquisition method, and an information acquisition and distribution system that acquire information with high accuracy from a website on the Internet.

インターネット上に多数接続されたウェブサイトから必要とする情報を取得するには、インターネットにコンピューターを接続し、当該コンピューターにインストールしたインターネットブラウザーソフトを用いて「Ｇｏｏｇｌｅ（登録商標）」に代表される検索サイトに接続し、この検索サイトの検索欄にキーワードを入力して検索を行う。次いで、当該キーワードから抽出され、一覧表示された多数のウェブサイトから一つのウェブサイトを選択して当該ウェブサイトを閲覧して必要な情報を取得する。この操作を必要に応じて繰り返し、必要とする情報を取得する。 To obtain necessary information from many websites connected to the Internet, connect to a computer on the Internet and use Internet browser software installed on the computer to search for “Google (registered trademark)” Connect to the site and enter a keyword in the search field of this search site to perform a search. Next, one website is selected from a large number of websites extracted from the keyword and displayed as a list, and the website is browsed to obtain necessary information. This operation is repeated as necessary to obtain necessary information.

しかし、特定の情報、例えば特定の企業に関するトピック、ニュース等の情報を検索するとき、上述した検索サイトでのキーワード検索では、目的とする情報を取得することができない「検索漏れ」が生じたり、目的とする情報とは異なる情報である「ノイズ」を多く抽出してしまったりする。 However, when searching for specific information, for example, information on topics, news, etc. regarding a specific company, the keyword search on the search site described above may result in a “search omission” in which target information cannot be obtained, A lot of “noise” that is different from the target information is extracted.

これは、検索サイトの多くは、インターネット上のウェブサイトを自動的に巡回してデータを収集する、いわゆるロボット型サーチエンジンで収集した情報を対象とするためである。ロボット型サーチエンジンは、機械的に世界中のサイトの情報を取得しているため検索対象となる情報が極めて多く単なるキーワード検索では検索漏れ、あるいは不要な情報が抽出されることによる。また、ロボット型サーチエンジンでは、ウェブサイトの深い階層のページから情報を取得できていない可能性があり、この場合も検索漏れが発生する。 This is because many search sites target information collected by a so-called robot-type search engine that automatically circulates websites on the Internet and collects data. The robot-type search engine mechanically obtains information on sites all over the world, so there is a great deal of information to be searched, and a simple keyword search results in omission of search or unnecessary information is extracted. In addition, in the robot type search engine, there is a possibility that information cannot be acquired from a page in a deep hierarchy of the website, and in this case, a search omission occurs.

このため、情報検索に際しては、単なるキーワード検索ではなく、キーワードの組み合わせ等、複雑な設定をする必要がある。この結果、キーワードの設定のいかんによりその検索精度が大きく異なることとなる。また、サイトマップに沿って閲覧するか、情報の配置箇所を的確に掌握して閲覧しなければ、検索漏れを防止できない他、精度が低くなり、全てのウェブサイトについてこの操作を実際に実行することは難しい。 For this reason, when searching for information, it is necessary to make complicated settings such as a combination of keywords, not just a keyword search. As a result, the search accuracy varies greatly depending on the setting of keywords. In addition, if you browse along the site map or do not grasp and browse the location of information, you will not be able to prevent omissions in search, and the accuracy will be reduced, and this operation will be executed for all websites. It ’s difficult.

このような事情にかんがみ、インターネット上のウェブサイトにおけるニュース検索について従来さまざまな提案がなされている。 In view of these circumstances, various proposals have been made for news search on websites on the Internet.

特許文献１には、指定されたホームページの複数階層にわたって文字データをダウンロードし、そのダウンロードした文字データファイルについて予め設定されたキーワードによる検索を行い、前記文字データファイルの日付が既登録データと照合して不一致の場合に、キーワード検索でヒットしたホームページ全体をダウンロードし、ダウンロードしたホームページの変更内容について、関係ユーザーにメールにて通知する。また、キーワードがヒットしたホームページについて、ヘッドラインを編集しニュースとして配信するウエブ情報収集装置が記載されている。 In Patent Document 1, character data is downloaded over a plurality of hierarchies of a designated homepage, the downloaded character data file is searched with a keyword set in advance, and the date of the character data file is compared with registered data. If there is a discrepancy, download the entire homepage that was hit by the keyword search, and notify the related users by email about changes to the downloaded homepage. Further, there is described a web information collecting device that edits a headline and distributes it as news about a homepage where a keyword is hit.

また、特許文献２には、情報提供サーバーのトピックス情報収集手段はニュースサイトからニュース記事を収集し、論評情報収集手段はブログ運営サーバーからブログコンテンツを収集し、トピックス情報探索手段は、収集したニュース記事の中から、ユーザーから取得した探索キーワードに適合するニュース記事を探索し、情報配信手段は、探索結果として、探索したニュース記事に関連するブログコンテンツを抽出し、抽出したブログコンテンツが、探索したニュース記事ごとに分類されて表示される探索結果情報を生成し、ユーザーＰＣに配信する情報配信方法が記載されている。
特開２００５−３１８６７号公報特開２００６−３０９５１５号公報 In Patent Document 2, the topic information collecting means of the information providing server collects news articles from a news site, the commentary information collecting means collects blog content from a blog management server, and the topic information searching means collects the collected news. Search for news articles that match the search keyword obtained from the user from the articles, and the information distribution means extracted the blog content related to the searched news articles as the search result, and the extracted blog content was searched An information distribution method is described in which search result information classified and displayed for each news article is generated and distributed to a user PC.
JP 2005-31867 A JP 2006-309515 A

今日、企業等の組織にとって、自己や自己の製品がニュースで取り上げられることに大きな意味がある。これは、自社や製品が好意的に取り上げられることで大きな広告価値が生じ、売り上げの向上が期待でき、自社や製品のイメージを向上させ、認知度や好感度を向上させるためである。 Today, for organizations such as corporations, having news about their self and their products has great significance. This is because the company and products are favorably picked up to generate great advertising value, and sales can be expected to improve, and the image of the company and products can be improved, and the recognition and favorability can be improved.

一方、自社や自社の製品が悪く取り上げられると、製品の売り上げの減少や、会社や製品のイメージの低下を招く。このような場合、的確な対応をとらなければ、企業等は、大きな、場合によっては致命的な損失を受けることがある。 On the other hand, if the company or its product is taken up badly, it will lead to a decrease in sales of the product and a decline in the image of the company and the product. In such a case, if an appropriate response is not taken, a company or the like may receive a large and sometimes fatal loss.

このような企業や製品に関するニュースは、大手新聞社、地方新聞社、ニュース配信会社、テレビ会社の他、ニュース専門サイト、ニュースまとめサイト、その他のサイトに掲載される。このようなさまざまなサイトから網羅的に、自社ないしは自社製品に関するニュースを正確且つ迅速に取得することは、従来の検索サイトを使用したキーワード検索では困難である。 Such news about companies and products is posted on major newspapers, regional newspapers, news distribution companies, television companies, news special sites, news summarization sites, and other sites. It is difficult for a keyword search using a conventional search site to accurately and quickly obtain news about the company or its products comprehensively from such various sites.

なお、特許文献１に記載のものは、指定されたホームページについての詳細な検索を行うことができるものの、ウエブ上に配信された多量のニュース情報から所望の情報を高い精度で得ることはできない。 In addition, although the thing of patent document 1 can perform a detailed search about the designated homepage, it cannot obtain desired information with high accuracy from a large amount of news information distributed on the web.

また、特許文献２に記載のものは、探索したトピックス情報と、このトピックス情報に関連する論評情報を一括してユーザーに配信するものであり、ウエブ上に配信された多量のニュース情報から所望の情報を高い精度で得ることはできない。 Further, the one described in Patent Document 2 distributes the searched topic information and comment information related to the topic information to the user in a lump, and a desired amount of news information distributed on the web Information cannot be obtained with high accuracy.

このように、従来、ユーザーである企業等が所望する情報を的確に取得して配信する情報取得方法は提供されていない。そこで、本発明は、高い精度で所望の情報を取得することができる情報取得サーバー、情報取得方法、及び情報取得配信システムを提供することを目的とする。 Thus, conventionally, there has not been provided an information acquisition method for accurately acquiring and distributing information desired by a user company or the like. Accordingly, an object of the present invention is to provide an information acquisition server, an information acquisition method, and an information acquisition / distribution system that can acquire desired information with high accuracy.

前記課題を解決する請求項１に記載の発明は、複数のウェブサイトにインターネットを介して接続可能であり、前記ウェブサイトから所望の情報を取得する情報取得サーバーであって、
複数の前記ウェブサイトを網羅的に調査し、予め選定した分野に適合したウェブサイトを情報サイトとして抽出する手段と、抽出した前記情報サイトについて、各情報サイトのサイト構造を解析する手段と、前記抽出した各情報サイトを巡回し、前記解析した構造に基づいて前記情報サイトに記述された情報であって、新規に記述されているもの、及び前回の巡回時の記録と比較して異なる内容のものを取得する手段と、前記各情報サイトから取得した前記情報をデータベースに格納する手段と、指定に基づいて前記データベースに格納された情報を取得する手段と、を備えることを特徴とする情報取得サーバーである。ここで、ウェブサイトの構造の解析では、ウェブサイトに記載されたサイトの構造を明示したサイトマップに基づいて自動的に行うことが含まれる。また、サイトマップの記載がないサイトについては、そのウェブサイトの構造を自動的に解析したり、オペレーターが手動でその構造を解析したりすることが含まれる。 The invention according to claim 1, which solves the above problem, is an information acquisition server that is connectable to a plurality of websites via the Internet and acquires desired information from the websites.
A means for exhaustively investigating a plurality of the websites, extracting a website suitable for a preselected field as an information site, a means for analyzing a site structure of each information site for the extracted information site, and Each extracted information site is circulated, and the information described in the information site based on the analyzed structure is different from the newly described information and the record of the previous patrolling. An information acquisition system comprising: means for acquiring an object; means for storing the information acquired from each information site in a database; and means for acquiring information stored in the database based on designation. It is a server. Here, the analysis of the structure of the website includes automatically performing the analysis based on a site map that clearly shows the structure of the website described on the website. In addition, for a site for which no site map is described, the structure of the website is automatically analyzed, or the structure is manually analyzed by the operator.

請求項１に記載の発明によれば、対象となる複数のウェブサイトを網羅的に調査し、予め選定した分野に適合したウェブサイトを情報サイトとして抽出しておき、この抽出した情報サイトについて、各情報サイトの構造解析を行い、抽出した各情報サイトを巡回し、解析した構造に基づいて表示された情報を取得してデータベースに格納しておき、指定に基づいてデータベースに格納された情報を取得する。このため、予め選定された情報サイトにおいて、各情報サイトの構造を踏まえて情報を抽出してデータベースを構築し、このデータベースから情報を指定に基づいて情報を抽出できる。よって、漏れがなく、高い精度で情報を効率良く取得することができる。 According to the first aspect of the present invention, a plurality of target websites are comprehensively investigated, websites suitable for a preselected field are extracted as information sites, and the extracted information sites are Analyzes the structure of each information site, circulates through each extracted information site, acquires the information displayed based on the analyzed structure, stores it in the database, and stores the information stored in the database based on the designation. get. For this reason, in the information site selected beforehand, information can be extracted based on the structure of each information site, a database can be constructed, and information can be extracted from this database based on designation. Therefore, there is no leakage and information can be acquired efficiently with high accuracy.

同じく請求項２に記載の発明は、請求項１に記載の情報取得サーバーにおいて、少なくとも一つのユーザー端末に接続されており、前記ユーザー端末からの要求に応えて前記データベースに格納された前記情報を前記ユーザー端末に配信する手段を備え、前記ユーザー端末から指定されたキーワードに基づいて前記データベースを検索して、取得した情報を前記ユーザー端末に配信することを特徴とする。 Similarly, the invention according to claim 2 is the information acquisition server according to claim 1, wherein the information acquisition server is connected to at least one user terminal, and the information stored in the database in response to a request from the user terminal is stored. Means for delivering to the user terminal is provided, the database is searched based on a keyword designated from the user terminal, and the acquired information is delivered to the user terminal.

請求項２に記載の発明によれば、情報取得サーバーに接続されたユーザー端末からデータベースに格納された情報をキーワード検索できる、ユーザー端末から簡単に、漏れがなく、高い精度で情報を取得することができる。 According to the second aspect of the present invention, the information stored in the database can be searched for keywords from the user terminal connected to the information acquisition server, and the information can be easily acquired from the user terminal with high accuracy. Can do.

同じく請求項３に記載の発明は、請求項１に記載の情報取得サーバーにおいて、少なくとも一つのユーザー端末に接続されており、前記ユーザー端末からの要求に応えて前記データベースに格納された前記情報を前記ユーザー端末に配信する手段を備え、前記ユーザー端末から予め設定した所定のキーワードを含む情報が前記データベースに格納されたとき、前記キーワードに関連する情報を取得したこと及び前記キーワードに関連する情報の少なくとも一つを前記ユーザー端末に配信することを特徴とする。 Similarly, the invention according to claim 3 is the information acquisition server according to claim 1, wherein the information acquisition server is connected to at least one user terminal, and the information stored in the database in response to a request from the user terminal is stored. Means for delivering to the user terminal, and when information including a predetermined keyword set in advance from the user terminal is stored in the database, the information related to the keyword is acquired and the information related to the keyword At least one is delivered to the user terminal.

請求項３に記載の発明によれば、情報取得サーバーに接続されたユーザー端末に予め設定されたキーワードを含む情報を取得したこと、及びキーワードを含む情報を配信するので、ユーザー端末に、簡単に漏れがなく、高い精度で情報を配信することができる。 According to the third aspect of the present invention, since the information including the keyword set in advance is acquired to the user terminal connected to the information acquisition server, and the information including the keyword is distributed. There is no leakage and information can be distributed with high accuracy.

同じく請求項４に記載の発明は、請求項２又は請求項３の一項に記載のである情報取得サーバーにおいて、取得した前記情報の広告的価値を評価する手段を備え、配信する前記情報と共に前記広告的価値を前記ユーザー端末に配信することを特徴とする。 Similarly, the invention according to claim 4 is the information acquisition server according to claim 2 or claim 3, further comprising means for evaluating the advertising value of the acquired information, and the information to be distributed together with the information to be distributed. The advertising value is distributed to the user terminal.

請求項４に記載の発明によれば、ユーザー端末には、取得された情報と共に情報の広告的価値が配信されるので、ユーザー端末側では情報の広告的価値を知ることができる。 According to the fourth aspect of the invention, since the advertising value of the information is distributed to the user terminal together with the acquired information, the advertising value of the information can be known on the user terminal side.

同じく請求項５に記載の発明は、請求項４に記載の情報取得サーバーにおいて、前記広告的価値は、前記情報が表示される前記情報サイト上のページにおける広告料金に基づいて定めることを特徴とする。 Similarly, the invention according to claim 5 is the information acquisition server according to claim 4, wherein the advertising value is determined based on an advertising fee on a page on the information site where the information is displayed. To do.

請求項５に記載の発明によれば、広告的価値は情報サイト上における広告料金に基づいて定められるため、広告的価値を客観的な基準に基づいて判定することができる。 According to the fifth aspect of the present invention, since the advertising value is determined based on the advertising fee on the information site, the advertising value can be determined based on an objective standard.

同じく請求項６に記載の発明は、請求項４に記載の情報取得サーバーにおいて、前記情報が表示される前記情報サイト上のページにおける広告料金が特定できない場合、前記広告的価値は前記広告料金に相当する金額を数学的な手法によって他の広告料金が設定されているサイトとの比較で推定した金額に基づいて定めることを特徴とする。 Similarly, in the invention according to claim 6, in the information acquisition server according to claim 4, when the advertisement fee on the page on the information site where the information is displayed cannot be specified, the advertising value is included in the advertisement fee. The corresponding amount is determined on the basis of the amount estimated by comparison with other sites where advertising fees are set by a mathematical method.

請求項６に記載の発明によれば、広告的価値は情報サイト上におけるデータを数学的手法によって解析して定められるため、広告的価値を常に同一基準に基づいて客観的に判定することができる。 According to the invention described in claim 6, since the advertising value is determined by analyzing data on the information site by a mathematical method, the advertising value can always be objectively determined based on the same standard. .

同じく請求項７に記載の発明は、請求項２から請求項６のいずれか一項に記載の情報取得サーバーにおいて、前記取得した情報についての好感度についての評価を行う手段を備え、前記配信する情報と共に前記好感度を前記ユーザー端末に配信することを特徴とする。 Similarly, the invention according to claim 7 is the information acquisition server according to any one of claims 2 to 6, further comprising means for evaluating likability of the acquired information, and the distribution is performed. The preference is delivered to the user terminal together with information.

請求項７に記載の発明によれば、ユーザー端末には、取得された情報と共に情報についての好感度が配信されるので、ユーザー端末側では情報の好感度を知ることができる。 According to the seventh aspect of the present invention, since the user terminal is provided with the favorable information about the information together with the acquired information, the user terminal can know the favorable information.

同じく請求項８に記載の発明は、請求項７に記載の情報取得サーバーにおいて、前記好感度は、過去に蓄積されたビッグデータ及びナリッジマネジメントに基づいて評価することを特徴とする。 Similarly, the invention according to claim 8 is the information acquisition server according to claim 7, wherein the favorability is evaluated based on big data and knowledge management accumulated in the past.

請求項８に記載の発明によれば、好感度は過去に蓄積されたビッグデータ及びナリッジマネジメントに基づいて評価されるので、評価を自動的に且つ客観的に行うことができる。 According to the eighth aspect of the present invention, since the favorability is evaluated based on the big data and knowledge management accumulated in the past, the evaluation can be performed automatically and objectively.

同じく請求項９に記載の発明は、請求項２から請求項７のいずれか一項に記載のである情報取得サーバーにおいて、取得した情報を引用している引用サイトについての情報及び前記引用サイトの数の少なくとも一方についての情報を取得する手段を備え、配信する情報と共に前記引用サイトについての情報及び前記引用サイトの数の少なくとも一方を配信することを特徴とする。 Similarly, the invention according to claim 9 is the information acquisition server according to any one of claims 2 to 7, wherein the information about the citation site quoting the acquired information and the number of the citation sites are listed. Means for acquiring information on at least one of the above, and distributing at least one of the information about the cited site and the number of the cited sites together with the information to be distributed.

請求項９に記載の発明によれば、情報取得サーバーで引用サイトについての情報及び数の少なくとも一方が取得されるので、情報の評価を引用サイトについての情報及び数の少なくとも一方に基づいて行うことができる。 According to the invention described in claim 9, since at least one of the information and the number about the citation site is acquired by the information acquisition server, the information is evaluated based on at least one of the information about the citation site and the number. Can do.

同じく請求項１０に記載の発明は、請求項１から請求項８のいずれか一項に記載の情報取得サーバーにおいて、前記ユーザー端末から期間を指定して情報を取得することを特徴とする。 Similarly, the invention according to claim 10 is characterized in that in the information acquisition server according to any one of claims 1 to 8, information is acquired from the user terminal by specifying a period.

請求項１０に記載の発明によれば、ユーザー端末から指定した特定の期間にわたる情報を取得することができる。 According to the tenth aspect of the present invention, it is possible to acquire information over a specific period designated from the user terminal.

同じく請求項１１に記載の発明は、請求項１から請求項１０のいずれか一項に記載の情報取得サーバーにおいて、前記特定の分野がニュースであることを特徴とする。 Similarly, the invention according to claim 11 is the information acquisition server according to any one of claims 1 to 10, wherein the specific field is news.

請求項１１に記載の発明によれば、情報取得サーバーは、ニュースについての情報を簡単に漏れがなく、高い精度の情報を取得して、ユーザー端末に配信することができる。 According to the eleventh aspect of the present invention, the information acquisition server can easily acquire information with high accuracy without leaking information about news and distribute it to the user terminal.

請求項１２に記載の発明は、複数のウェブサイトにインターネットを介して接続可能であり、前記ウェブサイトから所望の情報を取得する情報取得方法であって、
複数の前記ウェブサイトを網羅的に調査し、予め選定した分野に適合したウェブサイトを情報サイトとして抽出する工程と、抽出した前記情報サイトについて、各情報サイトのサイト構造を解析する工程と、前記抽出した各情報サイトを巡回し、前記解析した構造に基づいて前記情報サイトに記述された情報であって、新規に記述されているもの、及び前回の巡回時の記録と比較して異なる内容のものを取得する工程と、前記各情報サイトから取得した前記情報をデータベースに格納する工程と、指定に基づいて前記データベースに格納された情報を取得する工程と、を備えることを特徴とする情報取得方法である。 The invention according to claim 12 is an information acquisition method capable of connecting to a plurality of websites via the Internet and acquiring desired information from the websites.
A step of exhaustively investigating a plurality of the websites, extracting a website suitable for a preselected field as an information site, analyzing the site structure of each information site for the extracted information site, Each extracted information site is circulated, and the information described in the information site based on the analyzed structure is different from the newly described information and the record of the previous patrolling. An information acquisition comprising: a step of acquiring a thing; a step of storing the information acquired from each information site in a database; and a step of acquiring information stored in the database based on designation Is the method.

請求項１２に記載の発明によれば、複数のウェブサイトを網羅的に調査し、予め選定した分野に適合したウェブサイトを情報サイトとして抽出しておき、この情報サイトについて、各情報サイトの構造解析を行い、抽出した各情報サイトを巡回し、解析した構造に基づいて表示された情報を取得してデータベースに格納しておき、指定に基づいてデータベースに格納された情報を取得する。このため、予め選定された情報サイトにおいて、各情報サイトの構造を踏まえて情報を抽出してデータベースを構築し、このデータベースから情報を指定に基づいて情報を抽出できる。よって、漏れがなく、高い精度で情報を効率良く取得することができる。 According to the invention described in claim 12, a plurality of websites are comprehensively investigated, and websites suitable for a preselected field are extracted as information sites. Analysis is performed, the extracted information sites are visited, information displayed based on the analyzed structure is acquired and stored in the database, and information stored in the database is acquired based on the designation. For this reason, in the information site selected beforehand, information can be extracted based on the structure of each information site, a database can be constructed, and information can be extracted from this database based on designation. Therefore, there is no leakage and information can be acquired efficiently with high accuracy.

請求項１３に記載の発明は、複数のウェブサイト及び少なくとも一つのユーザー端末にインターネットを介して接続された情報取得サーバーを備え、前記情報取得サーバーから取得した情報のうち前記ユーザー端末が要求する情報を配信する情報取得配信システムにおいて、複数の前記ウェブサイトを網羅的に調査し、予め選定した分野に適合したウェブサイトを情報サイトとして抽出する手段と、抽出した前記情報サイトについて、各情報サイトのサイト構造を解析する手段と、前記抽出した各情報サイトを巡回し、前記抽出した各情報サイトを巡回し、前記解析した構造に基づいて前記情報サイトに記述された情報であって、新規に記述されているもの、及び前回の巡回時の記録と比較して異なる内容のものを取得する手段と、前記各情報サイトから取得した前記情報をデータベースに格納する手段と、前記ユーザー端末からの指定に基づいて前記データベースに格納された情報を取得する手段と、取得した前記情報を前記ユーザー端末に配信する手段と、を備えることを特徴とする情報取得配信システムである。 The invention according to claim 13 comprises an information acquisition server connected to a plurality of websites and at least one user terminal via the Internet, and information requested by the user terminal out of information acquired from the information acquisition server In the information acquisition / distribution system for distributing information, a means for exhaustively investigating a plurality of the websites and extracting, as the information sites, websites suitable for a pre-selected field, and for the extracted information sites, A means for analyzing a site structure, and the extracted information sites are circulated, the extracted information sites are circulated, and the information described in the information sites based on the analyzed structure is newly described what it is, and means for obtaining of different contents as compared with the recording of the previous patrol, each information Means for storing the information acquired from a site in a database; means for acquiring information stored in the database based on designation from the user terminal; means for distributing the acquired information to the user terminal; An information acquisition / distribution system comprising:

請求項１３に記載の発明によれば、複数のウェブサイトを網羅的に調査し、予め選定した分野に適合したウェブサイトを情報サイトとして抽出しておき、この抽出した情報サイトについて、各情報サイトの構造解析を行い、抽出した各情報サイトを巡回し、解析した構造に基づいて表示された情報を取得してデータベースに格納しておき、指定に基づいてデータベースに格納された情報を取得し、ユーザー端末の要求に応えて情報を配信する。このため、予め選定された情報サイトにおいて、各情報サイトの構造を踏まえて情報を抽出してデータベースを構築でき、このデータベースから情報を指定に基づいて情報を抽出できる。よって、漏れがなく、高い精度で情報を効率良く取得してユーザー端末に配信することができる。 According to the thirteenth aspect of the present invention, a plurality of websites are comprehensively investigated, websites suitable for a preselected field are extracted as information sites, and each information site is extracted from the extracted information sites. Analyzing the structure, patrol each extracted information site, acquire the information displayed based on the analyzed structure and store it in the database, acquire the information stored in the database based on the designation, Deliver information in response to user terminal requests. For this reason, in the information site selected beforehand, information can be extracted based on the structure of each information site, a database can be constructed, and information can be extracted from this database based on designation. Therefore, there is no leakage, and information can be efficiently acquired with high accuracy and distributed to the user terminal.

本発明に係る情報取得サーバー及び情報取得方法によれば、予め選定された情報サイトにおいて、各情報サイトの構造を踏まえて情報を抽出してデータベースを構築して、このデータベースから情報を指定に基づいて情報を抽出するので、漏れがなく、高い精度で効率良く情報を取得することができる。 According to the information acquisition server and the information acquisition method of the present invention, in the information site selected in advance, the database is constructed by extracting information based on the structure of each information site, and the information is specified from the database. because extract information Te, can leak without acquires efficient information with high accuracy.

また、本発明に係る情報取得配信システムによれば、予め選定された情報サイトにおいて、各情報サイトの構造を踏まえて情報を抽出してデータベースを構築して、このデータベースから情報を指定に基づいて情報を抽出できるので、漏れがなく、高い精度で情報を効率良く取得してユーザー端末に配信することができる。 Further, according to the information acquisition and distribution system according to the present invention, in the information site selected in advance, the database is constructed by extracting information based on the structure of each information site, and the information is designated from the database based on the designation. Since the information can be extracted, there is no omission and the information can be efficiently acquired with high accuracy and distributed to the user terminal.

本発明の実施形態に係る同情報取得配信システム及び情報取得サーバーの構成を示すブロック図である。It is a block diagram which shows the structure of the information acquisition delivery system and information acquisition server which concern on embodiment of this invention. 同情報取得サーバーの動作を示す模式図である。It is a schematic diagram which shows operation | movement of the same information acquisition server. 同情報取得サーバーのハードウエア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the same information acquisition server. 同情報取得サーバーにおける処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process in the information acquisition server. 同情報取得サーバーにおける広告価値評価処理を示すフローチャートである。It is a flowchart which shows the advertising value evaluation process in the information acquisition server. 同情報取得サーバーにおけるニュアンス評価処理を示すフローチャートである。It is a flowchart which shows the nuance evaluation process in the information acquisition server. 同情報取得サーバーにおける引用評価処理を示すフローチャートである。It is a flowchart which shows the quotation evaluation process in the same information acquisition server. 同情報処理サーバーを使用した検索結果と、従来の検索サイトにおける検索結果との比較結果を示す表である。It is a table | surface which shows the comparison result of the search result using the same information processing server, and the search result in the conventional search site.

本発明を実施するための形態に係る情報取得サーバー、情報取得方法、及び情報取得配信システムについて説明する。 An information acquisition server, an information acquisition method, and an information acquisition / distribution system according to an embodiment for carrying out the present invention will be described.

まず、本発明の実施形態に係る情報取得サーバー及び情報取得配信システムの概略構成について説明する。図１は本発明の実施形態に係る同情報取得サーバー及び情報取得配信システムの構成を示すブロック図である。本発明の実施形態に係る情報取得配信システム１００は、情報取得サーバー２００を備える。 First, a schematic configuration of an information acquisition server and an information acquisition / distribution system according to an embodiment of the present invention will be described. FIG. 1 is a block diagram showing the configuration of the information acquisition server and information acquisition / distribution system according to the embodiment of the present invention. An information acquisition / distribution system 100 according to an embodiment of the present invention includes an information acquisition server 200.

情報取得サーバー２００、特定分野、本実施形態では、インターネット１４０に接続された多数のウェブサイト１３０（１１１〜１１ｎ、１２１〜１２ｍ）に接続可能である。
そして、これらのから複数、この例ではｎ台のウェブサイトを、ニュースを配信する情報サイト１１１、１１２、１１３、…、１１ｎとして予め選択する。また、情報取得サーバー２００は、少なくとも一つ、本実施形態では、Ｎ台のユーザー端末１５１、１５２、１５３、…、１５Ｎにインターネット１４０を介して接続可能である。なお、以下個別に特定をする必要がない場合、ウェブサイトの符号を１３０、情報サイトの符号を１１０、ユーザー端末の符号を１５０として説明する。 The information acquisition server 200 can be connected to a specific field, in this embodiment, a large number of websites 130 (111 to 11n, 121 to 12m) connected to the Internet 140.
Then, a plurality of, in this example, n websites are selected in advance as information sites 111, 112, 113,. Further, in the present embodiment, the information acquisition server 200 can be connected to N user terminals 151, 152, 153,..., 15N via the Internet 140. In the following description, it is assumed that the code of the website is 130, the code of the information site is 110, and the code of the user terminal is 150 when there is no need to specify each individually.

情報取得サーバー２００は、データベース２３０を備え、情報サイト１１１、１１２、１１３、…、１１ｎを巡回して、新規に記述された情報を取得してデータベース２３０に蓄積する。そして、配信を希望する特定のユーザー端末１５０に取得した情報を配信する。なお、図１では、作図の都合により情報取得サーバー２００及びウェブサイト１１１〜１１ｎ、１２１〜１３ｍを接続するインターネット１４０と、情報取得サーバー２００及びユーザー端末１５１〜、１５Ｎを接続するインターネット１４０とを個別のものとして描いているが、両者は同一のものである。 The information acquisition server 200 includes a database 230, circulates through the information sites 111, 112, 113,..., 11n, acquires newly described information, and stores the information in the database 230. Then, the acquired information is distributed to a specific user terminal 150 that desires distribution. In FIG. 1, for convenience of drawing, the Internet 140 connecting the information acquisition server 200 and the websites 111 to 11n and 121 to 13m, and the Internet 140 connecting the information acquisition server 200 and the user terminals 151 to 15N are individually provided. The two are the same.

情報取得サーバー２００は、ウェブサイト１３０（１１１〜１１ｎ、１２１〜１３ｍ）を網羅的に調査し、予め選定した分野に適合した情報サイト１１１〜１１ｎを抽出する。そして、抽出した情報サイト１１１〜１１ｎについて、各情報サイト１１１〜１１ｎのサイト構造を解析し、抽出した各情報サイト１１１〜１１ｎを巡回して、解析した構造に基づいて新たに記述された情報を取得する。更に、取得した各情報サイト１１１〜１１ｎからの情報をデータベース２３０に格納する。 The information acquisition server 200 comprehensively surveys the websites 130 (111 to 11n, 121 to 13m), and extracts information sites 111 to 11n suitable for a preselected field. Then, for the extracted information sites 111 to 11n, the site structure of each information site 111 to 11n is analyzed, the extracted information sites 111 to 11n are circulated, and information newly described based on the analyzed structure is stored. get. Further, the acquired information from each of the information sites 111 to 11n is stored in the database 230.

また、情報取得サーバー２００は、ユーザー端末１５０からの要求に応えて前記データベース２３０に格納された情報をユーザー端末１５０に配信する。配信方法としては、ユーザー端末１５０から指定されたキーワードに基づいてデータベース２３０を検索して、取得した情報をユーザー端末１５０に配信することができる。 Further, the information acquisition server 200 distributes the information stored in the database 230 to the user terminal 150 in response to a request from the user terminal 150. As a distribution method, it is possible to search the database 230 based on a keyword specified from the user terminal 150 and distribute the acquired information to the user terminal 150.

また配信方法としては、ユーザー端末１５０から予め設定した所定のキーワードを含む情報がデータベース２３０に格納されたとき、キーワードに関連する情報を取得したこと及びキーワードに関連する情報の少なくとも一つをユーザー端末１５０に配信することができる。 As a delivery method, when information including a predetermined keyword set in advance from the user terminal 150 is stored in the database 230, at least one of the information related to the keyword and the information related to the keyword is acquired from the user terminal 150 can be distributed.

更に、情報取得サーバー２００は、ユーザー端末１５０に配信する情報と共に、付加情報として広告的価値や、記事についての好感度をユーザー端末１５０に配信することができる。このとき、広告的価値は、前記情報が表示される情報サイト上のページにおける広告料金に基づいて定めることや、記事の好感度を、過去に蓄積されたビッグデータ及びナリッジマネジメントに基づいて評価することができる。 Furthermore, the information acquisition server 200 can distribute to the user terminal 150 the advertising value and the favorable feeling about the article as additional information together with the information distributed to the user terminal 150. At this time, the advertising value is determined based on the advertising fee on the page on the information site on which the information is displayed, and the favorableness of the article is evaluated based on the big data and knowledge management accumulated in the past. be able to.

そして、情報取得サーバー２００は、配信する情報と共に、付加情報として、情報を引用しているウェブサイトである引用サイトについての情報及び前記引用サイトの数を配信することができる。 And the information acquisition server 200 can distribute the information about the quotation site which is the website which cites information, and the number of the said quotation sites as additional information with the information to distribute.

このようにユーザー端末１５０に配信する情報は、ユーザー端末１５０からの指定により、期間を指定しておくことができる。例えば、現在から過去又は将来の一定期間、過去又は将来の指定期間について設定できる。 As described above, the information to be distributed to the user terminal 150 can designate a period by designation from the user terminal 150. For example, it can be set for a certain period from the present to the past or the future, a specified period in the past or the future.

以上のような機能を実現するため、情報取得サーバー２００は、以下の構成を備える。即ち図１に示すように、情報取得サーバー２００は、情報サイト１１０を巡回して情報を取得する情報取得部２１０、取得した情報についての広告的価値、好感度等の付加情報を生成する付加情報部２２０、取得した情報及び付加情報を格納するデータベース２３０、データベース２３０から所定の情報をユーザー端末１５０に配信する情報配信手段２４０を備える。また、情報取得サーバー２００は、ウェブサイト１３０をインターネット１４０経由で接続するウェブサイト接続手段２０１、ユーザー端末１５０をインターネット１４０経由で接続するユーザー端末接続手段２０２を備える。 In order to implement the functions as described above, the information acquisition server 200 has the following configuration. That is, as shown in FIG. 1, the information acquisition server 200 circulates through the information site 110 to acquire information, an information acquisition unit 210 that acquires information, and additional information that generates additional information such as advertising value and favorableness about the acquired information. Unit 220, a database 230 that stores the acquired information and additional information, and an information distribution unit 240 that distributes predetermined information from the database 230 to the user terminal 150. The information acquisition server 200 also includes a website connection unit 201 that connects the website 130 via the Internet 140 and a user terminal connection unit 202 that connects the user terminal 150 via the Internet 140.

まず情報取得部２１０について説明する。情報取得部２１０には、サイト抽出手段２１１、サイト構造解析手段２１２、サイト巡回手段２１３、及び新規情報取得手段２１４を備える。また付加情報部２２０には、広告価値評価手段２２１、好感度評価手段２２２、引用サイト情報取得手段２２３を備える。 First, the information acquisition unit 210 will be described. The information acquisition unit 210 includes a site extraction unit 211, a site structure analysis unit 212, a site patrol unit 213, and a new information acquisition unit 214. Further, the additional information unit 220 includes an advertisement value evaluation unit 221, a favorableness evaluation unit 222, and a citation site information acquisition unit 223.

サイト抽出手段２１１は、インターネット１４０に接続された多数のウェブサイト１３０を網羅的に調査して予め選定した分野に適合する、この例ではニュースに関するウェブサイトを抽出する。これにより、全てのウェブサイト１１１〜１１ｎ、１２１〜１３ｍから巡回の対象とする情報サイト１１１〜１１ｎを抽出してデータベース２３０に記録する。 The site extracting unit 211 comprehensively surveys a large number of websites 130 connected to the Internet 140 and extracts websites related to news in this example, which are suitable for a preselected field. Thereby, the information sites 111 to 11n to be visited are extracted from all the websites 111 to 11n and 121 to 13m and recorded in the database 230.

このサイトの抽出処理は、予めニュースサイトと判断されるサイトを選定しておいたリストによる他、自動的にウェブサイトを巡回して新たな情報サイトを検出することにより行うことができる。また、オペレーターによる判断によって行うことができる。 This site extraction process can be performed by automatically searching the website and detecting a new information site in addition to a list in which a site determined to be a news site is selected in advance. Moreover, it can carry out by judgment by an operator.

ここで抽出される情報サイトとしては、大手新聞社、地方新聞社、ニュース配信会社、テレビ局、ラジオ局、ポータルサイト、ニュースまとめサイト等が挙げられる。これらのサイトは定期的に選定を見直すことが望ましい。 Information sites extracted here include major newspaper companies, local newspaper companies, news distribution companies, television stations, radio stations, portal sites, news summarizing sites, and the like. These sites should be reviewed regularly.

サイト構造解析手段２１２は、サイト抽出手段２１１で抽出した情報サイト１１１〜１１ｎの構造を解析する。解析は対象とする情報サイトのサイトマップに基づいて行うことの他、対象とする情報サイトを自動的又はオペレーターによる判断により行う。このサイトの構造解析は、定期的又は随時行い、情報サイトの最新の構造を取得しておく。各情報サイトの構造は、データベース２３０に格納される。なお、構造の解析は、構造が定型であれば、ソフトウエアにより自動的に行う。また、定型でないものについては個別に解析を行う。 The site structure analyzing unit 212 analyzes the structures of the information sites 111 to 11n extracted by the site extracting unit 211. The analysis is performed based on the site map of the target information site, or the target information site is automatically or determined by the operator. The structure analysis of this site is performed regularly or at any time, and the latest structure of the information site is acquired. The structure of each information site is stored in the database 230. The structure analysis is automatically performed by software if the structure is fixed. Also, non-standard items are analyzed individually.

サイト巡回手段２１３は、サイト抽出手段２１１で抽出した情報サイト１１１〜１１ｎの全てを定期的、例えば一時間に１回、又は必要に応じて随時巡回する。各情報巡回に際して、各情報サイト１１１〜１１ｎに新たな情報が記述されているかを、解析した構造に沿って取得していく。この頻度は情報サイトにおける更新の頻度に応じて変更することができる。そして、新規情報取得手段２１４は、巡回した情報サイトに記述された新規情報を収集する。 The site patrol unit 213 periodically patrols all of the information sites 111 to 11n extracted by the site extraction unit 211, for example, once in an hour, or as necessary. In each information circulation, it is acquired along the analyzed structure whether or not new information is described in each information site 111 to 11n. This frequency can be changed according to the update frequency at the information site. Then, the new information acquisition unit 214 collects new information described in the visited information site.

この巡回は自動的に行われ、新規情報の収集は、新規であるとして記述されているものを取得する他、前回の巡回時の記録と比較して異なる内容のものを取得することができる。このとき、情報の収集に際しては、各情報サイトの構造に基づいて、漏れがないように全てのページを巡回する。 This patrol is automatically performed, and new information can be collected not only as described as being new, but also as a content different from the record of the previous round. At this time, when collecting information, all pages are circulated based on the structure of each information site so that there is no leakage.

新規情報取得手段２１４で収集された情報は、情報サイト、取得した時刻に基づいて整理されてデータベース２３０に格納される。取得した情報は、所定の形式でデータベースに格納される。 Information collected by the new information acquisition unit 214 is organized based on the information site and the acquired time and stored in the database 230. The acquired information is stored in a database in a predetermined format.

次に付加情報部２２０について説明する。おいて、広告価値評価手段２２１は、情報、例えば商品についてのニュースの広告的価値を評価する。評価は、ニュースの掲載サイトの広告換算値に基づいて行われる。広告換算値は、例えば当該ニュースが表示されているページにおける広告掲載料金を参考にして決定することができる。広告掲載料金が高い大手新聞サイトの第１面に情報が記述されている場合には、広告換算値が大きく、広告的価値が高くなる。 Next, the additional information unit 220 will be described. The advertisement value evaluation means 221 evaluates the advertisement value of news about information, for example, a product. The evaluation is performed based on the advertisement conversion value of the news posting site. The advertisement conversion value can be determined, for example, with reference to the advertisement insertion fee on the page displaying the news. When information is described on the first page of a major newspaper site with a high advertisement posting fee, the advertisement conversion value is large and the advertisement value is high.

ここで、広告掲載がなく広告掲載料を設定していないウェブサイトについては、前記広告的価値は、前記情報が表示される前記情報サイト上のページにおける広告料金が特定できない。このため、広告的価値は前記広告料金に相当する金額を数学的な手法によって他の広告料金が設定されているサイトとの比較で推定した金額に基づいて定める。即ち
同一のページビューや、引用件数を持つサイトの広告料金に基づいて広告的価値を数学的に推定して定めるのである。 Here, for a website where there is no advertisement insertion and no advertisement insertion fee is set, the advertisement value cannot specify the advertisement fee on the page on the information site where the information is displayed. For this reason, the advertisement value is determined based on an amount estimated by comparing the amount corresponding to the advertisement fee with a site where another advertisement fee is set by a mathematical method. In other words, the advertisement value is mathematically estimated and determined based on the advertisement fee of the same page view or the number of citations.

この推定は例えば以下の手法で行うことができる。まず、広告料金が設定されている多数のサイトについての読者数や、ページビュー等の値と、広告料金との関係を最小二乗法等の数学的な推定手法に基づいて定式化する。次に、広告料金が設定されていないサイトの読者数、ページビュー等をこの式に当てはめる。そして、これらの結果や、その他の評価を複合的に判断して、広告料金に相当する金額を推定する。これにより、この金額に基づいて広告的が設定されていないサイトの広告換算値を決定する。なお、この推定は上記手法に限ることなく他の公知の手法に基づいて行うことができる。 This estimation can be performed by the following method, for example. First, the relationship between the number of readers and page view values for a large number of sites for which an advertisement fee is set and the advertisement fee is formulated based on a mathematical estimation method such as the least square method. Next, the number of readers, page views, etc. of sites for which no advertising fee is set are applied to this formula. Then, these results and other evaluations are judged in combination, and the amount corresponding to the advertisement fee is estimated. Thereby, based on this amount, the advertisement conversion value of the site where the advertisement is not set is determined. Note that this estimation can be performed based on other known methods without being limited to the above method.

好感度評価手段２２２は、情報の好感度（ニュアンス）を評価する。この評価は、情報、例えば商品や会社についての記述が（ｇｏｏｄ）であれば高く（プラス評価）、非好意的（ｂａｄ）であれば低く（マイナス評価）とするものとする。 The favorableness evaluation means 222 evaluates the positiveness (nuance) of information. This evaluation is high (plus evaluation) if the description of information, for example, a product or company is (good), and low (minus evaluation) if it is unfavorable (bad).

好感度評価は、過去に蓄積されたビッグデータ及びナリッジマネジメントに基づいて自動的に行うことができる。この場合には、過去に蓄積した膨大な記事についてのデータに基づいてその傾向を計算する。また、オペレーターの判断により評価を行うことができる。 Favorability evaluation can be automatically performed based on big data and knowledge management accumulated in the past. In this case, the tendency is calculated based on data about a huge number of articles accumulated in the past. Moreover, evaluation can be performed based on the judgment of the operator.

引用サイト情報取得手段２２３は、一つのサイトに記述された情報が他の情報サイト（引用サイト）に引用あるいは転用されていることを検出し、引用サイトについての情報、及び引用数を検出する。例えば、ニュース配信会社から配信されたニュースが新聞サイトに記述された場合、新聞サイトの記事がニュースまとめサイトにリンクされている場合、引用サイトの数、引用サイトの地域分布等を取得し、その情報と共にデータベース２３０に格納する。 The citation site information acquisition unit 223 detects that information described in one site is cited or diverted to another information site (citation site), and detects information about the citation site and the number of citations. For example, when news distributed from a news distribution company is described in a newspaper site, if an article on a newspaper site is linked to a news summary site, the number of citation sites, the regional distribution of citation sites, etc. are acquired. The information is stored in the database 230 together with the information.

データベース２３０は、上述した各種データを、情報サイトへの記述時刻、収集時刻、キーワード等に基づいて検索可能に整理した状態で格納する。情報のデータベースへの格納形式は、必要に応じて適宜選択することができる。データ量がなるべく少なく効率的に格納でき、迅速に検索できる形式が望ましい。 The database 230 stores the above-described various data in a state where the data is arranged so as to be searchable based on the description time to the information site, the collection time, the keyword, and the like. The storage format of the information in the database can be appropriately selected as necessary. It is desirable to have a format that can store data as efficiently as possible and can be searched quickly.

情報配信手段２４０は、ユーザー端末１５０にデータベース２３０に格納された情報を配信する。情報配信手段２４０は、各ユーザー端末１５０から転送されたキーワードに基づいてデータベース２３０を検索する。このとき、情報サイトに記述された時期、情報が取得された時期を選択することができる。選択できる時期は任意であり、一定期間前から現在まで、過去における一定期間等が指定できる。 The information distribution unit 240 distributes information stored in the database 230 to the user terminal 150. The information distribution unit 240 searches the database 230 based on the keyword transferred from each user terminal 150. At this time, the time described in the information site and the time when the information was acquired can be selected. The period that can be selected is arbitrary, and a certain period in the past can be specified from a certain period before to the present.

また、情報配信手段２４０は、ユーザー端末１５０から指定されたキーワードに関する情報を取得した場合、その旨、及びその情報をユーザー端末１５０に配信するものとできる。この場合も情報取得を行う任意の期間、即ち現在から特定日時まで、未来の一定期間等の指定ができる。 In addition, when the information distribution unit 240 acquires information related to the specified keyword from the user terminal 150, the information distribution unit 240 can distribute the information and the information to the user terminal 150. In this case as well, it is possible to specify an arbitrary period for acquiring information, that is, a certain period in the future from the present to a specific date and time.

両方式の情報の配信に際しては、付加情報として、情報の広告的価値、好感度、引用サイトの数及び引用サイトに関する情報を配信することができる。このような付加情報の配信は任意であり、ユーザー端末からの要求によって行う。 When distributing both types of information, as the additional information, it is possible to distribute information regarding the advertising value of the information, likability, the number of citation sites, and the citation sites. Distribution of such additional information is arbitrary and is performed according to a request from the user terminal.

この情報取得サーバー２００は、コンピューターとして構成されている。図３は同情報取得サーバーのハードウエア構成を示すブロック図である。情報取得サーバー２００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３１０、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３２０、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３３０、ハードディスクドライブ（ｈａｒｄＤｉｓｃＤｒｉｖｅ）３４０、インターネットに接続するための通信制御手段３５０、キーボード、マウス等の入力手段３６０、プリンタ、モニター等の出力手段３７０をバス３８０で接続して構成される。なお、入力手段３６０、出力手段３７０は、インターフェース３６１、３７１を介してバス３８０に接続されている。また、データベース２３０は、ハードディスクドライブ３４０とは異なるものとして設置する。なお、データベース２３０は、ハードディスクドライブ３４０内に構築することができる。 This information acquisition server 200 is configured as a computer. FIG. 3 is a block diagram showing a hardware configuration of the information acquisition server. The information acquisition server 200 includes a CPU (Central Processing Unit) 310, a RAM (Random Access Memory) 320, a ROM (Read Only Memory) 330, a hard disk drive (hard Disc Drive) 340, and a communication control means 350 for connecting to the Internet. An input unit 360 such as a keyboard and a mouse and an output unit 370 such as a printer and a monitor are connected by a bus 380. The input unit 360 and the output unit 370 are connected to the bus 380 via the interfaces 361 and 371. In addition, the database 230 is installed differently from the hard disk drive 340. The database 230 can be built in the hard disk drive 340.

情報取得サーバー２００は、本発明の実施形態に係る情報取得方法を実現するプログラムを、ＲＡＭ３２０をワークエリアとしてＣＰＵ３１０で実行することにより上記の各機能を実現する。 The information acquisition server 200 realizes each of the above functions by executing a program for realizing the information acquisition method according to the embodiment of the present invention on the CPU 310 using the RAM 320 as a work area.

次に情報取得配信システム１００及び情報取得サーバー２００における処理の流れについて説明する。図４は同情報取得サーバーにおける処理の流れを示すフローチャートである。情報取得サーバー２００は以下の工程に従って処理を行う。まず、情報取得サーバー２００によってインターネット１４０に接続されたウェブサイト１３０を網羅的に調査する（Ｓ１）。次いで、サイト抽出手段２１１で、特定分野、例えばニュースに関するウェブサイトとして情報サイト１１１〜１１ｎを抽出する（Ｓ２）。これにより、巡回対象となる情報サイトが選択され、不必要な情報収集のために使用される無駄な時間や、ノイズ情報の収集がなくなり、高精度となる。 Next, the flow of processing in the information acquisition / distribution system 100 and the information acquisition server 200 will be described. FIG. 4 is a flowchart showing the flow of processing in the information acquisition server. The information acquisition server 200 performs processing according to the following steps. First, the website 130 connected to the Internet 140 is comprehensively investigated by the information acquisition server 200 (S1). Next, the site extraction unit 211 extracts information sites 111 to 11n as websites related to a specific field, for example, news (S2). As a result, an information site to be visited is selected, and wasteful time used for unnecessary information collection and noise information collection are eliminated, resulting in high accuracy.

この状態で、抽出した情報サイトの構造をサイト構造解析手段２１２で解析し（Ｓ３）、データベース２３０に格納する（Ｓ４）。次いで、サイト巡回手段２１３によって各情報サイトを定期的、あるいは順次巡回し、新規情報取得手段２１４で新規情報を取得する。取得された情報は、データベース２３０に検索可能に格納する（Ｓ６）。 In this state, the structure of the extracted information site is analyzed by the site structure analysis means 212 (S3) and stored in the database 230 (S4). Next, the site patrol unit 213 periodically or sequentially patrols each information site, and the new information acquisition unit 214 acquires new information. The acquired information is stored in the database 230 so as to be searchable (S6).

そして、ユーザー端末１５０からキーワードに基づく情報配信の要求があったとき、ユーザー端末接続手段２０２はデータベース２３０を検索して抽出した情報をユーザー端末１５０に配信する。このとき、ユーザー端末１５０は、情報掲載の時期や、取得の時期を選択することができる（Ｓ７）。このとき、キーワードでデータベース２３０の全てのデータを検索することができる。また、必要に応じて、限定された範囲のデータだけを検索することができる。 When there is a request for information distribution based on a keyword from the user terminal 150, the user terminal connection unit 202 searches the database 230 and distributes the extracted information to the user terminal 150. At this time, the user terminal 150 can select the information posting time and the acquisition time (S7). At this time, all the data in the database 230 can be searched with the keyword. Further, only a limited range of data can be searched as necessary.

次に付加情報として広告的価値を情報と共に配信する処理について説明する。図５は同情報取得サーバーにおける広告価値配信処理を示すフローチャートである。この処理は、ユーザー端末１５０側からの要望により行うものである。広告価値評価手段２２１は、取得した情報の広告的価値を評価し（ＳＡ１、ＳＡ２）、データベース２３０に当該情報と関連付けて格納する（ＳＡ３）、そして、ユーザー端末１５０から要求があったとき、ユーザー端末接続手段２０２は、ユーザー端末１５０からのキーワード等でデータベース２３０を検索して、該当情報と広告的価値をユーザー端末１５０に配信する（ＳＡ４）。 Next, a process for distributing advertising value together with information as additional information will be described. FIG. 5 is a flowchart showing an advertisement value distribution process in the information acquisition server. This process is performed according to a request from the user terminal 150 side. The advertisement value evaluation means 221 evaluates the advertisement value of the acquired information (SA1, SA2), stores it in the database 230 in association with the information (SA3), and when requested by the user terminal 150, The terminal connection unit 202 searches the database 230 with a keyword or the like from the user terminal 150, and distributes the corresponding information and the advertising value to the user terminal 150 (SA4).

この処理で配信される情報は、例えば、キーワード、ヒット件数、ヒットした記事の内容（例えば掲載サイト名、ＵＲＬ、掲載日時、掲載ページ、記事の内容（テキスト、写真、図））、広告換算値、ニュアンス、引用サイト数、引用サイトの分布等である。この内容は必要に応じて適宜変更できる Information distributed in this process includes, for example, keywords, number of hits, contents of hit articles (for example, posted site name, URL, published date, published page, article contents (text, photos, diagrams)), advertisement converted values Nuance, number of citation sites, distribution of citation sites, etc. This content can be changed as needed

次に付加情報として好感度を情報と共に配信する処理について説明する。図６は同情報取得サーバーにおける好感度配信処理を示すフローチャートである。この処理は、ユーザー端末１５０側からの要望により行うものである。好感度評価手段２２２は、取得した情報の好感度を評価し（ＳＢ１、ＳＢ２）、データベース２３０に当該情報と関連付けて格納する（ＳＢ３）、そして、ユーザー端末１５０から要求があったとき、ユーザー端末接続手段２０２は、ユーザー端末１５０からの所望のキーワード等でデータベース２３０を検索して、該当する情報と好感度とをユーザー端末１５０に配信する（ＳＢ４）。この処理で配信される情報では、特に悪い評価について重視することができる。 Next, a process for distributing likability with information as additional information will be described. FIG. 6 is a flowchart showing a favorable distribution process in the information acquisition server. This process is performed according to a request from the user terminal 150 side. The preference evaluation means 222 evaluates the preference of the acquired information (SB1, SB2), stores it in association with the information in the database 230 (SB3), and when requested by the user terminal 150, the user terminal The connection unit 202 searches the database 230 with a desired keyword or the like from the user terminal 150, and distributes the corresponding information and favorability to the user terminal 150 (SB4). In the information distributed by this processing, particularly bad evaluation can be emphasized.

次に付加情報として引用サイトに関する情報を情報と共に配信する処理について説明する。図７は同情報取得サーバーにおける引用評価処理を示すフローチャートである。この処理は、ユーザー端末１５０側からの要望により行うものである。引用サイト情報取得手段２２３は、取得した情報について引用サイトの数及び引用サイトの情報を取得し（ＳＣ１、ＳＣ２）、データベース２３０に当該情報と関連付けて格納する（ＳＣ３）、そして、ユーザー端末１５０から要求があったとき、ユーザー端末接続手段２０２は、ユーザー端末１５０からのキーワード等でデータベース２３０を検索して、該当情報と引用サイトの数及び引用サイトの情報の少なくとも一方をユーザー端末１５０に配信する（ＳＣ４）。 Next, a process for distributing information related to the cited site as additional information together with the information will be described. FIG. 7 is a flowchart showing a citation evaluation process in the information acquisition server. This process is performed according to a request from the user terminal 150 side. The citation site information acquisition means 223 acquires the number of citation sites and citation site information for the acquired information (SC1, SC2), stores the information in association with the information in the database 230 (SC3), and from the user terminal 150 When requested, the user terminal connection unit 202 searches the database 230 with a keyword or the like from the user terminal 150 and distributes at least one of the corresponding information, the number of citation sites, and the information of the citation sites to the user terminal 150. (SC4).

この処理で配信される情報は、上述したように、キーワード、ヒット件数、ヒットした記事の内容（例えば掲載サイト名、ＵＲＬ、掲載日時、掲載ページ、記事の内容（テキスト、写真、図））、広告換算値、ニュアンス、引用サイト数、引用サイトの分布等とすることができる。この内容は、ユーザーとの契約等により適宜変更できる。 As described above, the information distributed by this processing includes the keyword, the number of hits, the content of the hit article (for example, the posting site name, URL, posting date, posting page, article content (text, photograph, figure)), It can be an advertisement conversion value, nuance, number of citation sites, distribution of citation sites, and the like. This content can be changed as appropriate according to a contract with the user.

次に、実施形態に係る情報取得サーバー２００を使用して行った実験について説明する。図８は同情報処理サーバーを使用した検索と、従来の検索サイトでの検索の結果の一例を示す表である。 Next, an experiment performed using the information acquisition server 200 according to the embodiment will be described. FIG. 8 is a table showing an example of a result of a search using the information processing server and a search at a conventional search site.

この実験は、特定の会社名として「Ａ社」、「Ｂ社」、「Ｃ社」、「Ｄ社」、「Ｅ社」をキーワードとして、実施形態に係る情報取得サーバー２００を使用して検索したものである。なお、検索の期間は一週間とした。結果を図７の表に示す。ここで実施形態ヒット率は、「実施形態抽出件数／（実施形態抽出件数＋ニュース検索サイトでの抽出数）」で計算した。 This experiment is performed by using the information acquisition server 200 according to the embodiment with keywords “A company”, “B company”, “C company”, “D company”, and “E company” as specific company names. It is a thing. The search period was one week. The results are shown in the table of FIG. Here, the embodiment hit rate was calculated by “the number of extracted embodiments / (the number of extracted embodiments + the number extracted at the news search site)”.

「Ａ社」をキーワードとした結果、本情報取得サーバー２００で抽出した５６件のうち、従来のニュース検索では、５６件が検索できなかった。
「Ｂ社」をキーワードとした結果、本情報取得サーバー２００で抽出した１８件のうち、従来のニュース検索では、１４件が検索できなかった。
「Ｃ社」をキーワードとした結果、本情報取得サーバー２００で抽出した１０９件のうち、従来のニュース検索では、９４件が検索できなかった。
「Ｄ社」をキーワードとした結果、本情報取得サーバー２００で抽出した１１８件のうち、従来のニュース検索では、１０８件が検索できなかった。
「Ｅ社」をキーワードとした結果、本情報取得サーバー２００で抽出した２４件のうち、従来のニュース検索では、２０件が検索できなかった。
これに対して、ニュース検索サイトでしかできなかった情報は、それぞれ１件、１件、８件、２件、１件であった。 As a result of using “Company A” as a keyword, among the 56 cases extracted by the information acquisition server 200, 56 cases could not be searched by the conventional news search.
As a result of using “Company B” as a keyword, out of 18 cases extracted by the information acquisition server 200, 14 cases could not be searched in the conventional news search.
As a result of using “Company C” as a keyword, out of 109 cases extracted by the information acquisition server 200, 94 cases could not be searched by the conventional news search.
As a result of using “Company D” as a keyword, out of 118 cases extracted by the information acquisition server 200, 108 cases could not be searched by the conventional news search.
As a result of using “Company E” as a keyword, out of 24 cases extracted by the information acquisition server 200, 20 cases could not be searched in the conventional news search.
On the other hand, the information that could only be obtained from the news search site was 1, 1, 8, 2, and 1, respectively.

このように、従来のニュース検索サイトでは、単純なキーワード検索では、抽出漏れが多いのに対して、本実施形態に係る情報取得サーバー２００では確実に情報を抽出できることがわかる。 As described above, in the conventional news search site, there are many omissions in simple keyword search, but it is understood that the information acquisition server 200 according to the present embodiment can extract information with certainty.

なお、上記例では、選択する情報サイトとして一般的なニュースを配信するウェブサイトである場合について説明したが、情報サイトをより細分して選定することができる。例えば、産業、芸能、スポーツに関するものとすることや、まとめサイトや、投稿サイト、掲示板サイトにすることができる。これにより専門化した要望に沿って対応できる。 In the above example, the case where the information site to be selected is a website that distributes general news has been described. However, the information site can be further divided and selected. For example, it can be related to industry, entertainment, or sports, or can be a summary site, a posting site, or a bulletin board site. This makes it possible to respond to specialized requests.

本発明に係る情報取得サーバー、情報取得方法、及び情報取得配信システムは、企業、各種団体、個人等の要望に応えてインターネット上のウェブサイトに記述される情報を漏れなく、効率良く収集できるものであり、産業上の利用可能性を有する。 An information acquisition server, an information acquisition method, and an information acquisition and distribution system according to the present invention can efficiently collect information described on a website on the Internet in response to requests from companies, various organizations, and individuals. And has industrial applicability.

１００：情報取得配信システム
１１０（１１１〜１１ｎ）：情報サイト
１３０（１１１〜１１ｎ、１２１〜１２ｍ）：ウェブサイト
１４０：インターネット
１５０（１５１〜１５Ｎ）：ユーザー端末
２００：情報取得サーバー
２０１：ウェブサイト接続手段
２０２：ユーザー端末接続手段
２１０：情報取得部
２１１：サイト抽出手段
２１２：サイト構造解析手段
２１３：サイト巡回手段
２１４：新規情報取得手段
２２０：付加情報部
２２１：広告価値評価手段
２２２：好感度評価手段
２２３：引用サイト情報取得手段
２３０：データベース
２４０：情報配信手段 100: Information acquisition / distribution system 110 (111-11n): Information site 130 (111-11n, 121-12m): Website 140: Internet 150 (151-15N): User terminal 200: Information acquisition server 201: Website connection Means 202: User terminal connection means 210: Information acquisition section 211: Site extraction means 212: Site structure analysis means 213: Site patrol means 214: New information acquisition means 220: Additional information section 221: Advertising value evaluation means 222: Favorability evaluation Means 223: Cited site information acquisition means 230: Database 240: Information distribution means

Claims

An information acquisition server that is connectable to a plurality of websites via the Internet and acquires desired information from the websites,
Means for exhaustively investigating a plurality of the websites and extracting websites suitable for a preselected field as information sites;
Means for analyzing the site structure of each information site for the extracted information sites;
The information sites that have been circulated through the extracted information sites and described in the information sites based on the analyzed structure are different from the newly described information and the records of the previous tour. A means to get things ,
Means for storing the information obtained from each information site in a database;
Means for obtaining information stored in the database based on a designation;
An information acquisition server comprising:

Connected to at least one user terminal,
Means for delivering to the user terminal the information stored in the database in response to a request from the user terminal;
The information acquisition server according to claim 1, wherein the database is searched based on a keyword specified from the user terminal and the acquired information is distributed to the user terminal.

Connected to at least one user terminal,
Means for delivering to the user terminal the information stored in the database in response to a request from the user terminal;
When information including a predetermined keyword set in advance from the user terminal is stored in the database, the information related to the keyword is acquired and at least one of the information related to the keyword is distributed to the user terminal. The information acquisition server according to claim 1.

Means for evaluating the advertising value of the acquired information,
The information acquisition server according to any one of claims 2 to 3, wherein the advertising value is distributed to the user terminal together with the information to be distributed.

5. The information acquisition server according to claim 4, wherein the advertising value is determined based on an advertising fee on a page on the information site where the information is displayed.

When the advertising fee on the page on the information site where the information is displayed cannot be specified, the advertising value is the amount corresponding to the advertising fee with a site where another advertising fee is set by a mathematical method. The information acquisition server according to claim 4, wherein the information acquisition server is determined based on an amount estimated by comparison.

A means for evaluating the likability of the acquired information;
The information acquisition server according to claim 2, wherein the preference is delivered to the user terminal together with the information to be delivered.

The information acquisition server according to claim 7, wherein the preference is evaluated based on big data accumulated in the past and knowledge management.

Means for obtaining information about at least one of the citation sites that cite the acquired information and the number of the citation sites;
The information acquisition server according to any one of claims 2 to 8, wherein at least one of the information about the cited site and the number of the cited sites is distributed together with the information to be distributed.

The information acquisition server according to any one of claims 1 to 9, wherein information is acquired from the user terminal by specifying a period.

The information acquisition server according to any one of claims 1 to 9, wherein the specific field is news.

An information acquisition method capable of connecting to a plurality of websites via the Internet and acquiring desired information from the websites,
Exhaustively investigating a plurality of the websites, and extracting a website suitable for a preselected field as an information site;
Analyzing the site structure of each information site for the extracted information sites;
The information sites that have been circulated through the extracted information sites and described in the information sites based on the analyzed structure are different from the newly described information and the records of the previous tour. The process of acquiring
Storing the information obtained from each information site in a database;
Obtaining information stored in the database based on a designation;
An information acquisition method comprising:

In an information acquisition and distribution system comprising an information acquisition server connected to a plurality of websites and at least one user terminal via the Internet, and distributing information requested by the user terminal out of information acquired from the information acquisition server,
Means for exhaustively investigating a plurality of the websites and extracting websites suitable for a preselected field as information sites;
Means for analyzing the site structure of each information site for the extracted information sites;
The information sites that have been circulated through the extracted information sites and described in the information sites based on the analyzed structure are different from the newly described information and the records of the previous tour. A means to get things ,
Means for storing the information obtained from each information site in a database;
Means for obtaining information stored in the database based on designation from the user terminal;
Means for distributing the acquired information to the user terminal;
An information acquisition and delivery system comprising: