JP7075348B2

JP7075348B2 - How to analyze the source and destination of Internet traffic

Info

Publication number: JP7075348B2
Application number: JP2018554481A
Authority: JP
Inventors: ダーシュンジャン
Original assignee: Yamu Technology Co Ltd
Current assignee: Yamu Technology Co Ltd
Priority date: 2016-04-14
Filing date: 2016-08-17
Publication date: 2022-05-25
Anticipated expiration: 2036-08-17
Also published as: GB2564057A; JP2019514303A; CN105704260A; RU2702048C1; WO2017177591A1; CN105704260B

Description

本発明は、インターネットＤＮＳドメイン名解決の分野に関し、特にインターネットトラフィックの送信元と宛先の分析方法に関する。 The present invention relates to the field of Internet DNS domain name resolution, and particularly to methods of analyzing sources and destinations of Internet traffic.

いわゆるインターネットトラフィックの送信元と宛先とは、ユーザが先にアクセスしたウェブサイト、その後にアクセスしたウェブサイト等の一連のウェブサイトに対するアクセス経路を指す。如何にウェブサイトのトラフィックの送信元を確認するかについて、業界の主流方式は１種類のみとし、つまり、ウェブサイトのページにＪａｖａＳｃｒｉｐｔ監視コードを添加することである。最もよく使うのは、ｇｏｏｇｌｅａｎａｌｙｔｉｃｓと百度統計等の第三者検出ツールである。 The source and destination of so-called Internet traffic refer to an access route to a series of websites such as a website accessed first by a user and a website accessed after that. There is only one mainstream method in the industry for how to identify the source of website traffic, that is, to add a Javascript monitoring code to a website page. Most often used are third-party detection tools such as Google analytics and Baidu statistics.

上記統計方法には、大きな限界があり、各ウェブサイトは、ゲストが前回アクセスした１つのウェブサイトだけを知ることができ、該ゲストがこの前にアクセスした複数のウェブサイトを知ることができず、さらに該ゲストが該ウェブサイトから離れてどのウェブサイトにアクセスしていくかを了解することができない。ＤＮＳ（ＤｏｍａｉｎＮａｍｅＳｙｓｔｅｍ、ドメイン名システム）は、インターネットにおいてドメイン名とＩＰアドレスとして互いにマッピングしている分散型データベースであり、ユーザがマシンによって直接に読み取られたＩＰ数字列を覚える必要がなく、より便利にインターネットにアクセスすることを可能にする。「ＤＮＳドメイン名解決技術」とは、ユーザがウェブサイトにアクセスすると、ブラウザにこのウェブサイトのドメイン名を入力する必要があり、リターンキーを押すと、ブラウザは、まず、ＤＮＳリクエストをして、ＤＮＳ技術により、ブラウザはこのドメイン名に対応するサーバＩＰアドレスを取得し、その後に、このＩＰアドレスにＨＴＴＰリクエストをすることができることである。 The above statistical method has a big limitation, and each website can know only one website that the guest visited last time , and cannot know multiple websites that the guest visited before. Furthermore, it is not possible to understand which website the guest will visit while leaving the website. DNS (Domain Name System) is a distributed database that maps domain names and IP addresses to each other on the Internet, eliminating the need for users to remember IP numbers read directly by the machine. Allows convenient access to the Internet. "DNS domain name resolution technology" means that when a user visits a website, he or she must enter the domain name of this website in the browser, and when he presses the return key, the browser first makes a DNS request. DNS technology allows a browser to obtain a server IP address corresponding to this domain name and then make an HTTP request to this IP address.

ＤＮＳログは、毎回のＤＮＳリクエストの応答コンテンツを記録し、ユーザによって要求される全てのドメイン名情報をほとんど記録することができる。しかしながら、ログには、多くの異常及び無効情報が含まれ、例えば、サーバは、ＤＮＳリクエストをして大量のドメイン名情報を生成し、ウェブクローラーひいてはネットワーク攻撃は、いずれも大量のＤＮＳリクエストを生成する。これらのリクエストは、ユーザの実際のアクセス経路をリアルかつ効果的に反映できない。 The DNS log can record the response content of each DNS request and almost all the domain name information requested by the user. However, the log contains a lot of anomalous and invalid information, for example, the server makes a DNS request to generate a large amount of domain name information, and the web crawler and thus the network attack both generate a large amount of DNS requests. do. These requests cannot realistically and effectively reflect the user's actual access route.

現在、市場には、インターネットゲストのアクセス経路全体をよく分析できる方法が存在しておらず、本発明は、この不足を補い、ＤＮＳログに対する再処理によりウェブサイトのトラフィックがそれぞれどのウェブサイトから由来するのか、離れた後にどのウェブサイトにアクセスしたかを分析する方法である。 Currently, there is no way in the market to analyze the entire access path of Internet guests well, and the present invention makes up for this shortage and reprocesses DNS logs to derive website traffic from each website. It's a way to analyze which websites you visit after you're away.

上記欠陥の存在に鑑みて、本発明は、インターネットトラフィックの送信元と宛先の分析方法を提供し、本発明の方法により、ログ中の非人為的なアクセス行為をできるだけクリアし、インターネットトラフィックの送信元と宛先を効果的に取得することができる。 In view of the existence of the above defects, the present invention provides a method for analyzing the source and destination of Internet traffic, and the method of the present invention clears inhumane access acts in logs as much as possible and transmits Internet traffic. You can effectively get the source and destination.

本発明のインターネットトラフィックの送信元と宛先の分析方法は、ＤＮＳログを処理してインターネットトラフィックの送信元と宛先を取得するインターネットトラフィックの送信元と宛先の分析方法であって、ユーザの実際のアクセス経路を反映できないＤＮＳログをフィルタリングするログフィルタリングステップと、ソースＩＰ、タイムスタンプの差及び中央ドメインに基づいて、ログフィルタリングステップの後に取得したＤＮＳログを順に分割して、分割後のアクセス経路を取得するログ分割ステップと、全ての前記分割後のアクセス経路を集約するデータ集約ステップとを含み、前記タイムスタンプの差に基づいてログを分割することは、ソースＩＰに基づいて分割されたログを、さらにＤＮＳログのタイムスタンプの差に基づいて分割し、２つのＤＮＳログのタイムスタンプの差が所定時間の長さよりも大きければ、前記２つのＤＮＳログを分割し、その後、
前記タイムスタンプの差に基づいて分割するＤＮＳログに、ユーザの実際なアクセス行為で生成するドメイン名リクエストと、それに伴って生成するドメイン名リクエストを区別する。 The method of analyzing the source and destination of Internet traffic of the present invention is a method of analyzing the source and destination of Internet traffic that processes DNS logs to acquire the source and destination of Internet traffic, and is an actual access of a user. The DNS log acquired after the log filtering step is divided in order based on the log filtering step that filters the DNS log that cannot reflect the route and the source IP, the difference in the time stamp, and the central domain, and the access route after the division is acquired. A log splitting step that includes a log splitting step and a data summarizing step that aggregates all the post-split access routes, and splitting the log based on the difference in the time stamps is a split log based on the source IP. Further, if the difference between the time stamps of the two DNS logs is larger than the length of the predetermined time, the two DNS logs are divided, and then the two DNS logs are divided.
In the DNS log divided based on the difference in the time stamps, the domain name request generated by the actual access action of the user and the domain name request generated accordingly are distinguished .

好ましくは、ログフィルタリングステップにおいて、ブラックリスト及びホワイトリストを設定することにより、重視されるドメイン名リクエストを含むＤＮＳログを保留すると共に、サーバによって生成される非人為的なドメイン名リクエストを含むＤＮＳログを除去する。 Preferably, in the log filtering step, by setting blacklists and whitelists, DNS logs containing emphasized domain name requests are held and DNS logs containing inhumane domain name requests generated by the server. To remove.

好ましくは、ＤＮＳログを除去することは、さらに、企業ＩＰのアクセスするログの除去及びＩＰが解決されていないログの除去を含む。 Preferably, removing DNS logs further includes removing logs accessed by the corporate IP and removing logs for which the IP has not been resolved.

好ましくは、ソースＩＰに基づいてＤＮＳログを分割することは、ある時間内の同じソースＩＰの連続的なＤＮＳログを取得することである。 Preferably, partitioning the DNS log based on the source IP is to obtain a continuous DNS log of the same source IP over a period of time.

好ましくは、前記所定時間の長さは３秒間である。 Preferably , the length of the predetermined time is 3 seconds.

好ましくは、タイムスタンプの差に基づいてＤＮＳログを分割するステップの後に、さ
らに、分割して取得されたアクセス経路中のドメイン名をドメインに変換し、連続的な同
じドメインを併合して、前記ソースＩＰの経路を取得する併合ステップを含む。 Preferably, after the step of splitting the DNS log based on the difference in time stamps, the domain name in the split-acquired access route is further converted to a domain and the same continuous domain is merged as described above. Includes a merge step to get the route of the source IP.

本発明の分析方法により、インターネットトラフィックの送信元と宛先を把握することが可能であり、ウェブサイトのトラフィックの分析及び最適化をよりよく支援でき、さらに、インターネット全体のトラフィックの流れ状況を全面的に了解することにより、全局的観点から分析すると共に、他のウェブサイトのトラフィック状況を了解することができ、己を知り彼を知ることを実現する。 The analysis method of the present invention makes it possible to understand the source and destination of Internet traffic, better assist in the analysis and optimization of website traffic, and provide a complete view of the traffic flow status of the entire Internet. By understanding, you can analyze from a holistic point of view and understand the traffic situation of other websites, and realize that you know yourself and know him.

図１ａは、本発明のインターネットトラフィックの送信元と宛先の分析方法のフローチャートである。FIG. 1a is a flowchart of the source and destination analysis method of the Internet traffic of the present invention. 図１ｂは、本発明のインターネットトラフィックの送信元と宛先の分析方法のフローチャートである。FIG. 1b is a flowchart of the source and destination analysis method of the Internet traffic of the present invention. 図２ａは、本発明のインターネットトラフィックの送信元と宛先の分析方法により取得したトラフィックの送信元の概略図である。FIG. 2a is a schematic diagram of a source of traffic acquired by the method of analyzing the source and destination of Internet traffic of the present invention. 図２ｂは、本発明のインターネットトラフィックの送信元と宛先の分析方法により取得したトラフィックの送信元の概略図である。FIG. 2b is a schematic diagram of the source of the traffic acquired by the method of analyzing the source and destination of the Internet traffic of the present invention.

以下に、図面及び実施例を参照しながら、発明について詳細に説明する。以下の実施例は、本発明を限定するものではない。発明構想の精神及び範囲から逸脱しない場合、当業者が想到し得る変化及び利点はいずれも本発明に含まれる。 Hereinafter, the invention will be described in detail with reference to the drawings and examples. The following examples are not limited to the present invention. Any changes or advantages that can be conceived by one of ordinary skill in the art, provided that they do not deviate from the spirit and scope of the invention, are included in the invention.

上述したように、ＤＮＳ（ＤｏｍａｉｎＮａｍｅＳｙｓｔｅｍ、ドメイン名システム）は、インターネットにおいてドメイン名とＩＰアドレスとして互いにマッピングしている分散型データベースであり、ユーザがマシンによって直接に読み取られたＩＰ数字列を覚える必要がなく、より便利にインターネットにアクセスすることを可能にする。ユーザがウェブサイトにアクセスすると、まず、ブラウザにこのウェブサイトのドメイン名を入力し、リターンキーを押すと、ブラウザは、まず、ＤＮＳリクエストをして、ＤＮＳ技術により、ブラウザは、このドメイン名に対応するサーバＩＰアドレスを取得し、その後に、このＩＰアドレスにＨＴＴＰリクエストをすることができる。それは、ＤＮＳドメイン名解決技術である。 As mentioned above, DNS (Domain Name System) is a distributed database that maps domain names and IP addresses to each other on the Internet, where users remember IP numbers read directly by the machine. Allows you to access the Internet more conveniently without the need. When the user accesses the website, first enter the domain name of this website in the browser, and when the return key is pressed, the browser first makes a DNS request, and by DNS technology, the browser is sent to this domain name. You can get the corresponding server IP address and then make an HTTP request to this IP address. It is a DNS domain name resolution technique.

上記ドメイン名解決の過程において、ＤＮＳログを生成する。ＤＮＳログは、毎回のＤＮＳリクエストの応答コンテンツを記録し、ユーザによって要求される全てのドメイン名情報をほとんど記録することができる。ＤＮＳログのフォーマットは以下のとおりである。
１４．＊＊＊．＊＊＊．１０｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード
即ち、ＤＮＳログは、「ソースＩＰ」、「ドメイン名」、「タイムスタンプ」、「解決したＩＰ」及び「状態コード」の５部分の内容を含む。 In the process of resolving the domain name, a DNS log is generated. The DNS log can record the response content of each DNS request and almost all the domain name information requested by the user. The format of the DNS log is as follows.
14. ***. ***. 10 | www. baidu. com | 20141211035932 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
Source IP | Domain Name | Timestamp | Resolved IP | Status Code That is, the DNS log consists of five parts: "Source IP", "Domain Name", "Timestamp", "Resolved IP", and "Status Code". Includes content.

ＤＮＳログには、ユーザによって要求される全てのドメイン名情報が含まれるため、本発明者は、ＤＮＳログの再処理によりウェブサイトのトラフィックの送信元と宛先を分析することを想到する。しかしながら、ＤＮＳログには、多くの異常及び無効情報も含まれ、例えば、サーバは、ＤＮＳリクエストをして大量のドメイン名情報を生成し、ウェブクローラーひいてはネットワーク攻撃は、いずれも大量のＤＮＳリクエストを生成する。これらのリクエストは、ユーザの実際のアクセス経路をリアルかつ効果的に反映できない。上記状況に応じて、本発明者は、ログ中の非人為的なアクセス行為をできるだけクリアすることにより、インターネットトラフィックの送信元と宛先を効果的に取得することを想到する。 Since the DNS log contains all the domain name information requested by the user, the inventor conceives to analyze the source and destination of the website traffic by reprocessing the DNS log. However, the DNS log also contains a lot of anomaly and invalid information, for example, the server makes a DNS request to generate a large amount of domain name information, and web crawlers and thus network attacks all make a large amount of DNS requests. Generate. These requests cannot realistically and effectively reflect the user's actual access route. In response to the above situation, the inventor conceives to effectively acquire the source and destination of Internet traffic by clearing as much as possible the inhumane access behavior in the log.

図１は、本発明のインターネットトラフィックの送信元と宛先の分析方法のフローチャートである。図１に示すように、本発明のインターネットトラフィックの送信元と宛先の分析方法は、以下のステップを含む。 FIG. 1 is a flowchart of a method of analyzing a source and a destination of Internet traffic of the present invention. As shown in FIG. 1, the method of analyzing the source and destination of Internet traffic of the present invention includes the following steps.

まず、ユーザの実際のアクセス経路を反映できないＤＮＳログをフィルタリングする（ステップＳ１）。前述のように、ＤＮＳリクエストには、ユーザの実際のアクセス経路をリアルかつ効果的に反映できない多くのドメイン名が含まれるため、クリアする必要がある。例えば、ブラックリスト及びホワイトリストを設定することにより、重視されるドメイン名リクエストを含むＤＮＳログを保留すると共に、サーバによって生成される非人為的なドメイン名リクエストを含むＤＮＳログを除去する。ブラックリストを設定することにより、サーバによって生成される非人為的なドメイン名リクエストを除去することができる。ホワイトリストを設定することにより、重視されるいくつかのドメイン名を保留することができる。ホワイトリストの優先順位は、ブラックリストより高い。また、ＤＮＳログを除去することは、さらに、企業ＩＰのアクセスログの除去と、ＩＰが解決されていないログの除去を含む。企業ＩＰを除去するのは、企業ＩＰが多人の同時アクセスログを生成して、個人アクセス経路に対する判断に影響を与えるからである。解決されていないＩＰを有するログを除去し、即ち、アクセスに失敗したログを除去する。異なる次元によりログをフィルタリングすることにより、ユーザの実際のアクセス経路を反映するＤＮＳログを取得することができる。 First, the DNS log that cannot reflect the actual access route of the user is filtered (step S1). As mentioned above, the DNS request contains many domain names that cannot realistically and effectively reflect the user's actual access route and must be cleared. For example, by setting a blacklist and a whitelist, DNS logs containing important domain name requests are reserved, and DNS logs including inhumane domain name requests generated by the server are removed. By setting a blacklist, you can eliminate inhumane domain name requests generated by the server. By setting a whitelist, you can defer some important domain names. The whitelist has a higher priority than the blacklist. Also, removing DNS logs further includes removing access logs for corporate IPs and removing logs for which IPs have not been resolved. The reason for removing the corporate IP is that the corporate IP generates a simultaneous access log of many people and influences the judgment on the personal access route. Remove logs with unresolved IPs, i.e. remove logs with failed access. By filtering the logs according to different dimensions, it is possible to acquire DNS logs that reflect the actual access route of the user.

次に、ソースＩＰと、タイムスタンプの差及び中央ドメインに基づいて、ログフィルタリングステップの後に取得したＤＮＳログを順に分割して、分割後のドメインを取得する（ステップＳ２）。 Next, the DNS log acquired after the log filtering step is sequentially divided based on the source IP, the difference in the time stamp, and the central domain, and the divided domain is acquired (step S2).

詳細のステップは以下のとおりである。
１）ソースＩＰに基づいて分割する（ステップＳ２１）。ソースＩＰに基づいてＤＮＳログを分割することは、ある時間内の同じソースＩＰの連続的なＤＮＳログを取得することである。
例えば、ソースＩＰ１．１．１．１とソースＩＰ２．２．２．２が異なるソースＩＰであるため、ログを分割する。以下のとおりである。
ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード
１．１．１．１｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
１．１．１．１｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－ログ分割線－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
２．２．２．２｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０３５９３２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０ The detailed steps are as follows.
1) Divide based on the source IP (step S21). Splitting a DNS log based on a source IP is to get a continuous DNS log of the same source IP over a period of time.
For example, since the source IP 1.1.1.1 and the source IP 2.2.2.2 are different source IPs, the log is divided. It is as follows.
Source IP | Domain name | Timestamp | Resolved IP | Status code 1.1.1.1 | www. baidu. com | 20141211035932 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
1.1.1.1 | www. qq. com | 20141211035932 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
2.2.2.2 | www. baidu. com | 20141211035932 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
2.2.2.2 | www. qq. com | 20141211035932 | 180. ***. ***. 107; 180. ***. ***. 108 | 0

２）次に、ソースＩＰに基づいて分割されたログを、またタイムスタンプの差に基づいて分割する（ステップＳ２２）。タイムスタンプの差に基づく分割は、ソースＩＰに基づいて分割されたログを、さらにＤＮＳログのタイムスタンプの差に基づいて分割することである。２つのＤＮＳログのタイムスタンプの差が所定時間の長さよりも大きければ、この２つのＤＮＳログを分割する（分割の原因は、ログの時間間隔が長過ぎると２つの異なる行為であると見なされることである）。該所定時間の長さは、必要に応じて調整することができる。本実施例では、前記所定時間の長さは３秒間であり、即ちタイムスタンプの差が３秒間より大きいと分割される。 2) Next, the log divided based on the source IP is divided again based on the difference in time stamps (step S22). The division based on the time stamp difference is to further divide the log divided based on the source IP based on the time stamp difference of the DNS log. If the difference between the time stamps of the two DNS logs is greater than the length of the given time, then the two DNS logs are split (the cause of the split is considered to be two different actions if the log time interval is too long. That is). The length of the predetermined time can be adjusted as needed. In this embodiment, the predetermined time length is 3 seconds, that is, if the time stamp difference is larger than 3 seconds, it is divided.

例えば、ソースＩＰ２．２．２．２のＤＮＳログを、さらにそのタイムスタンプの差に基づいて分割し、以下のとおりである。（タイムスタンプ２０１４１２１１０３５９３２は、２０１４年１２月１１日３時５９分３２秒を示す）
ソースＩＰ｜ドメイン名｜タイムスタンプ｜解決したＩＰ｜状態コード
２．２．２．２｜ｗｗｗ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０００００１｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ａ．ｑｑ．ｃｏｍ｜２０１４１２１１０００００２｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｂ．ｂａｉｄｕ．ｃｏｍ｜２０１４１２１１０００００３｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｃ．ｔａｎｘ．ｃｏｍ｜２０１４１２１１０００００４｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
２．２．２．２｜ｃ．ａｌｌｙｅｓ．ｃｏｍ｜２０１４１２１１０００００５｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－ログ分割線－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
２．２．２．２｜ｗｗｗ．ｓｉｎａ．ｃｏｍ｜２０１４１２１１０００００９｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－ログ分割線－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
２．２．２．２｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０００００１５｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－ログ分割線－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
２．２．２．２｜ｗｗｗ．ｑｑ．ｃｏｍ｜２０１４１２１１０００００１９｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－ログ分割線－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
２．２．２．２｜ｗｗｗ．ａ．ｃｏｍ｜２０１４１２１１０００００２４｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－ログ分割線－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
２．２．２．２｜ｗｗｗ．ｂ．ｃｏｍ｜２０１４１２１１０００００２９｜１８０．＊＊＊．＊＊＊．１０７；１８０．＊＊＊．＊＊＊．１０８｜０ For example, the DNS log of the source IP 2.2.2.2 is further divided based on the difference in the time stamps, and is as follows. (Timestamp 20141211035932 indicates 3:59:32 on December 11, 2014)
Source IP | Domain name | Timestamp | Resolved IP | Status code 2.2.2.2 | www. baidu. com | 20141211000001 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
2.2.2.2 | a. qq. com | 20141211000002 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
2.2.2.2 | b. baidu. com | 20141211000003 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
2.2.2.2 | c. tanx. com | 20141211000004 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
2.2.2.2 | c. allies. com | 20141211000005 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
2.2.2.2 | www. sina. com | 20141211000009 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
2.2.2.2 | www. qq. com | 201412110000015 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
2.2.2.2 | www. qq. com | 201412110000019 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
2.2.2.2 | www. a. com | 201412110000024 | 180. ***. ***. 107; 180. ***. ***. 108 | 0
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
2.2.2.2 | www. b. com | 201412110000029 | 180. ***. ***. 107; 180. ***. ***. 108 | 0

上述したように、タイムスタンプ２０１４１２１１０００００５の０５秒と２０１４１２１１０００００９の０９秒の間の差が４秒間（３秒間より大きい）であるため、ログは分割される。２０１４１２１１０００００９と２０１４１２１１０００００１５の間の差が６秒間であるため、分割される。 As mentioned above, the log is split because the difference between 05 seconds of the time stamp 20141211000005 and 09 seconds of 20141211000009 is 4 seconds (greater than 3 seconds). Since the difference between 20141211000009 and 201412110000015 is 6 seconds, it is divided.

上述したように、ログは、合計で６つのセグメントに分割された。第１セグメントのログ中のソースＩＰ：２．２．２．２は、ｗｗｗ．ｂａｉｄｕ．ｃｏｍ、ａ．ｑｑ．ｃｏｍ、ｂ．ｂａｉｄｕ．ｃｏｍ、ｃ．ｔａｎｘ．ｃｏｍ、ｃ．ａｌｌｙｅｓ．ｃｏｍという５つのドメイン名にアクセスする。ユーザアクセス行為の判断方法により、ユーザが実際にｗｗｗ．ｂａｉｄｕ．ｃｏｍのみにアクセスし、残りの４つのドメイン名がユーザによるｗｗｗ．ｂａｉｄｕ．ｃｏｍのクリックに伴って生成するドメイン名リクエストであり、ユーザの実際のアクセス行為ではないことを得ることができる。従って、第１セグメントのログから、ユーザがｗｗｗ．ｂａｉｄｕ．ｃｏｍというドメイン名にアクセスした経路を得ることができる。ここで言及されたユーザアクセス行為の判断方法は以下のとおりである。あるユーザがｕｒｌをクリックすると、現在のｕｒｌのドメイン名に加えて、幾つかの他のドメイン名も要求する。クローラー技術により、該ｕｒｌのドメイン名リクエストをした後の全ての他のドメイン名リクエストを取得し、クロールした一連のドメイン名リクエストとＤＮＳログから分割されたドメイン名セグメントをマッチングして、該ＤＮＳログとユーザが実際にアクセスしたドメイン名との対応関係を得ることができる。該方法で得られた対応関係から分かるように、該セグメントのログは、ユーザが実際にｗｗｗ．ｂａｉｄｕ．ｃｏｍにアクセスしたことを反映する。第２セグメントのログは、ｗｗｗ．ｓｉｎａ．ｃｏｍのみを有するため、ｗｗｗ．ｓｉｎａ．ｃｏｍは、ユーザがアクセスしたドメイン名経路である。 As mentioned above, the log was divided into a total of 6 segments. The source IP: 2.2.2.2 in the log of the first segment can be found at www. baidu. com , a. qq. com , b. baidu. com , c. tanx. com , c. allies. Access 5 domain names called com . Depending on the method of determining the user access act, the user actually visits www. baidu. Only access to com, and the remaining 4 domain names are the user's www. baidu. It is possible to obtain that it is a domain name request generated by clicking com and is not an actual access act of the user. Therefore, from the log of the first segment, the user can use www. baidu. You can get the route to access the domain name com . The method for determining the user access behavior mentioned here is as follows. When a user clicks on a url, they request some other domain names in addition to the current url domain name. By crawler technology, all other domain name requests after making the domain name request of the url are acquired, and the crawled series of domain name requests are matched with the domain name segment divided from the DNS log to match the DNS log. And the domain name actually accessed by the user can be obtained. As can be seen from the correspondence obtained by the method, the log of the segment is actually www. baidu. Reflects that you have accessed com . The log of the second segment is www. sina. Since it has only com, www. sina. com is a domain name route accessed by the user.

以上のログの経路をつなぐと、以下のとおりである。
ｗｗｗ．ｂａｉｄｕ．ｃｏｍ＞ｗｗｗ．ｓｉｎａ．ｃｏｍ＞ｗｗｗ．ｑｑ．ｃｏｍ＞ｗｗｗ．ｑｑ．ｃｏｍ＞ｗｗｗ．ａ．ｃｏｍ＞ｗｗｗ．ｂ．ｃｏｍ
さらに、上記タイムスタンプの差に基づいて分割して取得された経路を同じドメインで併合するが、ここで、セカンドレベルドメインで併合し、併合後の結果は以下のとおりである。
ｂａｉｄｕ．ｃｏｍ＞ｓｉｎａ．ｃｏｍ＞ｑｑ．ｃｏｍ＞ａ．ｃｏｍ＞ｂ．ｃｏｍ
上記経路は、該ソースＩＰの全てのアクセス行為中の一つの経路であり、このような規則に従って全てのソースＩＰの全てのアクセス経路を算出することができる。 The above log routes are connected as follows.
www. baidu. com> www. sina. com> www. qq. com> www. qq. com> www. a. com> www. b. com
Further, the routes obtained by dividing based on the difference in the above time stamps are merged in the same domain, but here, they are merged in the second level domain, and the result after the merge is as follows.
baidu. com> sina. com> qq. com> a. com> b. com
The above route is one route in all access actions of the source IP, and all access routes of all source IPs can be calculated according to such a rule.

３）続いて、中央ドメインに基づいて、上記結果をさらに分割する（ステップＳ２３）。中央ドメインは、ユーザ／システムの需要に応じて重点分析すべきドメインであり、即ちユーザがどこから中央ドメインに来たのか、その後に中央ドメインからどのドメインにアクセスしていくのかである。例えば、ログ中のａ．ｃｏｍを中央ドメインとすると、以下のとおりである。
ｂａｉｄｕ．ｃｏｍ＞ｓｉｎａ．ｃｏｍ＞ｑｑ．ｃｏｍ＞ａ．ｃｏｍ＞ｂ．ｃｏｍ
以下は、前述のソースＩＰの一例としての４つの経路であり、かつ各経路中の中央ドメインの前３レイヤーの送信元ドメインのみを挙げ、中央ドメイン後の経路の処理ロジックと中央ドメイン前の経路を処理する処理ロジックが一致する。実際のレイヤー数は、具体的な要件に応じて調整することができる。また、図２（ａ）にも示されている。
送信元ドメイン３＞送信元ドメイン２＞送信元ドメイン１＞中央ドメイン
経路１：ｂａｉｄｕ．ｃｏｍ＞ｓｉｎａ．ｃｏｍ＞ｑｑ．ｃｏｍ＞ａ．ｃｏｍ（中央ドメイン）
経路２：ｓｉｎａ．ｃｏｍ＞ｂａｉｄｕ．ｃｏｍ＞ｑｑ．ｃｏｍ＞ａ．ｃｏｍ（中央ドメイン）
経路３：ｙｏｕｋｕ．ｃｏｍ＞ｓｉｎａ．ｃｏｍ＞ｂａｉｄｕ．ｃｏｍ＞ａ．ｃｏｍ（中央ドメイン）
経路４：ｂａｉｄｕ．ｃｏｍ＞ｑｑ．ｃｏｍ＞ｙｏｕｋｕ．ｃｏｍ＞ａ．ｃｏｍ（中央ドメイン）
3) Subsequently, the above result is further divided based on the central domain (step S23). The central domain is the domain that should be focused on according to the demand of the user / system, that is, where the user came from the central domain and then which domain is accessed from the central domain. For example, a. In the log. Assuming that com is the central domain, it is as follows.
baidu. com> sina. com> qq. com> a. com> b. com
The following are four routes as an example of the above-mentioned source IP, and only the source domain of the previous three layers of the central domain in each route is listed, and the processing logic of the route after the central domain and the processing logic before the central domain are listed. The processing logic that processes the route matches. The actual number of layers can be adjusted according to specific requirements. It is also shown in FIG. 2 (a).
Source domain 3> Source domain 2> Source domain 1> Central domain Route 1: baidu. com> sina. com> qq. com> a. com (central domain)
Route 2: sina. com> baidu. com> qq. com> a. com (central domain)
Route 3: youku. com> sina. com> baidu. com> a. com (central domain)
Route 4: baidu. com> qq. com> youku. com> a. com (central domain)

最後に、データ集約ステップにおいて、前述のソースＩＰの全ての４つのアクセス経路を集約する。集約図は、図２ｂに示されている。
中央ドメインの集約は、４つのａ．ｃｏｍである。
送信元ドメイン１の集約は、２つのｑｑ．ｃｏｍ、１つのｂａｉｄｕ．ｃｏｍ、１つのｙｏｕｋｕ．ｃｏｍである。
送信元ドメイン２の集約は、２つのｓｉｎａ．ｃｏｍ、１つのｂａｉｄｕ．ｃｏｍ、１つのｑｑ．ｃｏｍである。
送信元ドメイン３の集約は、２つのｂａｉｄｕ．ｃｏｍ、１つのｓｉｎａ．ｃｏｍ、１つのｙｏｕｋｕ．ｃｏｍである。 Finally, in the data aggregation step, all four access routes of the above-mentioned source IP are aggregated. The aggregated diagram is shown in FIG. 2b.
Central domain aggregation is four a. com.
The aggregation of source domain 1 is two qq. com, one baidu. com, one youku. com.
The aggregation of the source domain 2 is performed by two sina. com, one baidu. com, one qq. com.
The aggregation of the source domain 3 is performed by two baidu. com, one sina. com, one youku. com.

図２ｂのような可視化図から明らかなように、中央ドメインａ．ｃｏｍにアクセスするユーザは、この前にどのドメインにアクセスしたか、これらのドメインの前にまたどのドメインにアクセスしたか、これによって類推する。
全てのソースＩＰをこのロジックで処理すると、インターネット全体のトラフィックの送信元と宛先の状況を分かることができる。 As is clear from the visualization diagram as shown in FIG. 2b, the central domain a. The user who accesses com is inferred by which domain was accessed before this, and which domain was accessed before these domains.
By processing all source IPs with this logic, it is possible to know the status of the source and destination of traffic throughout the Internet.

本発明の上記方法により、分析すべき中央ドメイン名に基づいて、そのインターネットトラフィックの送信元と宛先を把握することにより、中央ドメイン名ウェブサイトのトラフィックの分析及び最適化をよりよく支援し、さらに、インターネット全体のトラフィックの流れ状況を完全に了解することにより、全局的観点から分析すると共に他のウェブサイトのトラフィック状況を了解することができ、己を知り彼を知ることを実現する。 The above method of the present invention better assists in analyzing and optimizing the traffic of a central domain name website by knowing the source and destination of its internet traffic based on the central domain name to be analyzed. By fully understanding the traffic flow status of the entire Internet, you can analyze from a holistic perspective and understand the traffic status of other websites, and realize that you know yourself and know him.

以上の記載は、本発明の好ましい実施例に過ぎず、本発明を限定するものではない。本発明の出願特許範囲内の内容に基づいて行われるいかなる同等変化や修飾は、いずれも本発明の技術的範囲内に属するべきである。

The above description is merely a preferred embodiment of the present invention and does not limit the present invention. Any equivalent change or modification made based on the content of the claims of the present invention should belong to the technical scope of the present invention.

Claims

A method of analyzing the source and destination of Internet traffic that processes DNS logs to obtain the source and destination of Internet traffic.
A log filtering step that filters DNS logs that cannot reflect the user's actual access route,
Based on the difference between the source IP and the time stamp, the DNS log acquired after the log filtering step is divided in order, and the log division step to acquire the access route after division, and
Including
Dividing the log based on the difference in the time stamps
The log divided based on the source IP is further divided based on the difference in the time stamps of the DNS logs, and if the difference in the time stamps of the two DNS logs is larger than the length of the predetermined time, the two DNS logs are divided. Divide and then
In the DNS log divided based on the difference in the time stamps, the domain name request generated by the actual access action of the user and the domain name request generated accordingly are distinguished .
Including that
A method of analyzing the sources and destinations of Internet traffic.

By setting blacklists and whitelists in the log filtering step, DNS logs containing critical domain name requests are retained and DNS logs containing inhumane domain name requests generated by the server are removed. The analysis method according to claim 1, wherein the analysis method is characterized by the above.

The analysis method according to claim 2, wherein removing the DNS log further includes removing the log accessed by the corporate IP and removing the log having no analyzed IP.

The analysis method according to claim 3, wherein dividing the DNS log based on the source IP is to acquire a continuous DNS log of the same source IP within a certain period of time.

The analysis method according to claim 4, wherein the predetermined time has a length of 3 seconds.

After the step of splitting the DNS log based on the difference in time stamps, the domain name in the access route obtained by splitting is further converted into a domain, and the same continuous domain is merged to obtain the source IP. The analysis method according to claim 5, further comprising a merge step of acquiring a route.