JP3715577B2

JP3715577B2 - Communication traffic analysis method and communication traffic analysis device

Info

Publication number: JP3715577B2
Application number: JP2002043309A
Authority: JP
Inventors: 弘文横井; 康森岡
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2002-02-20
Filing date: 2002-02-20
Publication date: 2005-11-09
Anticipated expiration: 2022-02-20
Also published as: JP2003244195A

Description

【０００１】
【発明の属する技術分野】
本発明は、通信網上の回線に伝送されている情報量であるトラヒック量を測定した時系列データを分析する通信トラヒック分析方法および通信トラヒック分析装置に関し、特に、通信の品質を管理する場合または通信のための設備を設計する場合に利用することのできるデータを提供することが可能な通信トラヒック分析方法および通信トラヒック分析装置に関する。
【０００２】
【従来の技術】
周知のように、電話網やインターネットなどの通信網は、端末とノードとリンクからなる。この通信網上の回線にある情報量を有しているデータが伝送されて、端末同士で通信が可能になる。このデータはトラヒックと呼ばれ、そのトラヒック量を分析することによって通信網での通信状況を分析することが可能になる。
【０００３】
端末がデータをノードに向けてリンクを通じて送信し、受信したノードは別のノードに向けてリンクを通じて転送することを繰り返し、最後に宛先の端末が受信する。通信事業者は、提供中のサービスの品質把握してその品質を管理するためとそのサービスを円滑に提供するための設備を適切に設計するため、トラヒックを測定しそのトラヒックを分析する。
【０００４】
たとえば、典型的な電話サービスを提供する事業者が、電話網のリンクの帯域設計をする場合は、ノードにおいて測定された呼量の時系列データを用いる。呼量は、ノード間の同時接続回線数の平均値で定義され、１時間周期に出力される。数年に一度の帯域設計のために、呼量の時系列データを分析して代表値（基礎呼量と呼ぶ）を算出する従来の方法は次の通りである。
【０００５】
（１）ノード間の同時接続回線数の平均値を１時間周期呼量とする。
（２）１時間周期呼量の時系列データから日ごとの最大の１時間周期呼量を日別最繁時呼量とする。
（３）年間３６５日の日別最繁時呼量の上位３０日分の平均を基礎呼量とする。
【０００６】
このような方法が採用される理由は、次の３つの時間周期による特徴のためである。
（ａ）１時間内の変動は一定率でランダムである（ポアソン過程に従う）と推定される。
（ｂ）１日の変動は日中が高く、日が切り替わる夜は低いと推定される。
（ｃ）年間の変動は安定した増加または減少と推定され、突発的変動はある程度無視してよい。ある程度とは、年間の日別最繁時呼量の上位３０日の平均を上回る値になる日と推定される。
【０００７】
さて、以上のようにして定められた基礎呼量に対して、リンクの帯域を設計する。リンクの帯域設計において、ランダムに変動する特性は理由（ａ）のポアソン過程で推定できるとされている。しかし、その変動分を含めて基礎呼量以上の呼量がリンクに加わる場合があるので、接続できないことがある。そこで、そのような事象がある確率以下であるように規定値（呼損率）が定められる。
【０００８】
この守るべき呼損率は「郵政省令第３０号」で規定されており、発信ノードから着信ノードまでの呼損率を０．１５以下とされている。事業会社は、省令の遵守のために省令よりも厳しい規定値を発信ノード間に与え、それを構成するリンクに配分する。配分値と基礎呼量からリンクの帯域をアーランＢ式により設計する。
【０００９】
またほかの例として、インターネットのサービス提供事業者（ＩＳＰ，Internet Service Provider）が、インターネットのリンクの帯域設計をする場合について説明する。ＩＳＰは多数ありその帯域設計の方法も多様である。また、帯域設計の方法は各ＩＳＰのノウハウや販売戦略が凝縮されたものになるため、ＩＳＰにとって重要な事項である。帯域設計の最も単純な方法の１つは、ＭＩＢ（Management Information Base）から回線における転送バイト数の時系列データを取り、回線の使用率を予め定めた値以下にする設計方法である。この方法の中で使用される測定周期については、ＩＳＰの運用方針により定まる。また、確率的に変動するトラヒックに対するパケット廃棄率をどこまで容認するかも同様にＩＳＰの運用方針により定まる。
【００１０】
【発明が解決しようとする課題】
しかしながら、インターネットは成長を続けていて、インターネットのリンクの帯域設計では、上述した（ａ）、（ｂ）、および（ｃ）に対応する理由を構成することができない。その理由は、つぎの３つの特徴による。
（Ａ）１時間内の変動にバースト性が見られる。
（Ｂ）１日の変動は、いくつかのピークがあり、日が切り替わる前後にもピークがある。
（Ｃ）年間の変動は不安定で特性は不明であり、突発的変動の特性も不明である。
したがって、インターネットのリンクにおいて論理的な理由が構成できる時系列データの分析方法が必要になっている。
【００１１】
また、電話網においても、電話網をアクセス回線として使ったインターネット接続サービスが普及し始め、理由（ａ）、（ｂ）、および（ｃ）が妥当とは言えない状況になってきている。すなわち、電話網においても、インターネット同様の上記の（Ａ）、（Ｂ）、および（Ｃ）の特徴がある場合が多い。
したがって、電話網においても、論理的な理由が構成できる時系列データの分析方法が必要になっている。
【００１２】
そこで本発明は、従来における問題点に鑑み、通信トラヒックの時系列データの統計値分析に関して、通信の品質を適切に管理したり、適切な設備を設計することを可能にする通信トラヒック分析方法および通信トラヒック分析装置を提供することを目的とする。
【００１３】
【課題を解決するための手段】
本発明の通信トラヒック分析方法は、
通信網上の回線に伝送されている情報量であるトラヒック量を分析する通信トラヒック分析方法において、
トラヒック量を測定するために通信網上の回線に接続されている測定点でのトラヒック量を時系列的に取得し、
前記取得されたトラヒック量を記憶し、
複数の前記記憶されているトラヒック量から、取得された前後の時刻のトラヒック量よりも多いトラヒック量を抽出し、
前記抽出されたトラヒック量にもとづいて、前記接続されている回線上のトラヒック量に関する統計的な特性を算出する
ことを具備し、
前記トラヒック量を抽出する過程は、
前記記憶されている複数のトラヒック量にもとづいて、トラヒック量の平均値を算出し、
トラヒック量が前記平均値を越えている期間のうち最大のトラヒック量から前記平均値を減じた値を波高として抽出し、
前記抽出された波高を記憶し、
前記記憶されている複数の波高にもとづいて、波高の平均値を算出する
ことを具備し、
前記統計的な特性を算出する過程は、前記記憶されているトラヒック量から前記トラヒック量の平均値を減じて、減じられた値を前記波高の平均値で除して正規化した値に前記取得されたトラヒック量を換算する
ことを備えていることを特徴とする。
【００１４】
また、本発明の通信トラヒック分析装置は、
通信網上の回線に伝送されている情報量であるトラヒック量を分析する通信トラヒック分析装置において、
トラヒック量を測定するために通信網上の回線に接続されている測定点でのトラヒック量を時系列的に取得する取得手段と、
前記取得されたトラヒック量を記憶する記憶手段と、
複数の前記記憶されているトラヒック量から、取得された前後の時刻のトラヒック量よりも多いトラヒック量を抽出する抽出手段と、
前記抽出されたトラヒック量にもとづいて、前記接続されている回線上のトラヒック量に関する統計的な特性を算出する特性算出手段と、
を具備し、
前記抽出手段は、
前記記憶されている複数のトラヒック量にもとづいてトラヒック量の平均値を算出する平均算出手段と、
トラヒック量が前記平均値を越えている期間のうち最大のトラヒック量から前記平均値を減じた値を波高として抽出する波高抽出手段と、
前記抽出された波高を記憶する波高記憶手段と、
前記記憶されている複数の波高にもとづいて波高の平均値を算出する波高平均算出手段と、
を具備し、
前記特性算出手段は、前記記憶されているトラヒック量から前記トラヒック量の平均値を減じて、減じられた値を前記波高の平均値で除して正規化した値に前記取得されたトラヒック量を換算する換算手段を備えていることを特徴とする。
【００１５】
以上によれば、従来の技術は予め定めた１時間や１日や１年という周期における統計値を抽出することに対し、本発明は通信トラヒックの時系列データから、取得された前後の時刻のトラヒック量よりも多いトラヒック量である波高を抽出することに特徴がある。波高を抽出することによって、相関が弱いデータを獲得することが可能になる。すなわち、本発明によれば時系列データよりも統計学上意味のあるデータを獲得することが可能になる。
【００１６】
また、通信トラヒックの時系列データから波高を抽出することは、時系列データの周期に依存せず、トラヒック増加の事象を特徴づける代表値を抽出することになる。そのため、トラヒック増加が予め定めた周期の区切り前後にあっても、集約する周期の枠組みに依存せずに、事象の影響の大きさを表すことができる。したがって、時系列データの周期とその区切り方に依存せずに、トラヒックの変動特性が分析可能になる。
【００１７】
さらに、本発明は波高のデータを正規化することを特徴とする。この正規化したデータを参照することによって、データのサンプル数やデータが取得された測定点が異なるようなデータ間の比較が正確にできることが期待される。すなわち、波高を正規化することで空間的に広がる時系列データを対比することができる。
【００１８】
またさらに、従来の技術では代表値を一定率とするポアソン過程にしたがうという仮定を設けているが、本発明はそのような特定の確率過程を仮定していない。したがって、特定の確率過程にしたがわないデータを分析することが可能になる。
【００１９】
さらにまた、波高を、時系列データがその平均値を越えている間の最大値であると定義してデータを抽出する場合と、時系列データが増加から減少に転じた時の減少直前の値であると定義してデータを抽出する場合があることを特徴とする。前者は影響が平均以上である事象に限定して分析することが可能になり、後者はトラヒック量が増加したデータの全事象を対象として分析することが可能になる。
【００２０】
また、時系列データよりも統計学上意味のあるデータを獲得することが可能な波高データを近似することによって、データの少ない区間やデータの無い区間における統計的数値を有意な根拠を持って推定することができる。
【００２１】
その結果、時系列データの周期とその区切り方に依存せずに、時間的空間的に広がる通信トラヒックに対して、トラヒック量が増加したデータの増加の事象を漏れなく総合して分析することが可能となる。
【００２２】
【発明の実施の形態】
以下、図面を参照して本発明の実施形態に係る通信トラヒック分析方法および通信トラヒック分析装置を説明する。
図１、図２、および図３は、本実施形態に係る通信トラヒック分析装置が回線に接続される形態をそれぞれ示している。図１は本実施形態に係る通信トラヒック分析装置がノードに接続されている場合であり、図２は本実施形態に係る通信トラヒック分析装置がリンクに接続されている場合であり、図３は本実施形態に係る通信トラヒック分析装置が端末に接続されている場合である。
【００２３】
通信網（単に「網」と呼ぶこともある）は端末、ノード、リンクの集合体である。たとえば、図１、図２、および図３に示されるように、端末（１０１、１０２）、ノード（２０１、２０２）、リンク（Ｌ１、Ｌ２、Ｌ３）、網とからなる。ここで、網はさらにノードとリンクからなる集合である。
ノード（２０１、２０２）は、伝送されてきたデータを転送する転送機能のみを持つ装置である。リンク（Ｌ１、Ｌ２、Ｌ３）は、ノード間またはノードと端末間を接続するための接続路である。端末（１０１、１０２）は、情報を送受信することが可能な装置である。
【００２４】
図１、図２、および図３では、端末１０１とノード２０１がリンクＬ１で接続している。ノード２０１とノード２０２が網を経由してリンクＬ２で接続している。ノード２０２と端末１０２がリンクＬ３で接続している。
【００２５】
Ｍ１、Ｍ２、およびＭ３は、本実施形態に係る通信トラヒック分析装置である。Ｍ１はノード２０１に接続し、Ｍ２はリンクＬ２、Ｍ３は端末１０１に接続している。
【００２６】
Ｍ１のように通信トラヒック分析装置が接続される場合は、ノードで転送されるトラヒックを測定して分析することが可能になる。また、Ｍ２のように通信トラヒック分析装置が接続される場合は、リンクＬ２上のＭ２の測定点で受信されるトラヒックを測定して分析することが可能になる。Ｍ１およびＭ２のような測定は受動的であるため、トラヒック自体に影響を与えないという特徴がある。
測定されたデータの出力方法は、予め定めた周期ごとに出力する方法と、カウンタで常に計数していてトリガを受けた場合に現在値を出力する方法が通常、使用されている。
データは周期あるいはトリガの時刻を属性とする。周期的出力方法は、電話網においてたとえば呼量が定義されている。トリガによる出力方式は、インターネットにおいてＭＩＢの項目が定義される。たとえば、転送パケット数や転送バイト数が挙げられる。
Ｍ１およびＭ２のような測定は受動的であるため、転送または受信したトラヒックについては測定可能であるが、任意のトラヒックについて測定可能ではない。
【００２７】
一方、Ｍ３のように通信トラヒック分析装置が接続される場合は、受動的だけでなく、能動的な測定をすることができる。すなわち、受信したトラヒックを測定すること以外に、測定用のトラヒックを送信することができる。そのため、トラヒック自体に影響を与えることになるが、端末が送信できる任意のトラヒックについての測定が可能である。
電話網においては、試験呼による測定がある。また、インターネットにおいては、ＩＣＭＰ（Internet Control Management Protocol）のピングパケット（ping packet）による測定がある。
【００２８】
図４は、本発明の実施形態に係る通信トラヒック分析装置の機能ブロック図である。図１、図２、および図３に示されているＭ１、Ｍ２、およびＭ３に対応する。
本発明の実施形態に係る通信トラヒック分析装置Ｍは、測定プローブＰ、測定ブロックＢ１、データベースブロック（ＤＢブロック）Ｂ２、波高算出ブロックＢ３、および特性分析ブロックＢ４を備えている。
【００２９】
測定プローブＰは、通信網上の測定しようとする回線に接続される。すなわち、上述したようにノード、リンク、または端末に接続される。接続された測定点からトラヒック量を取得する。トラヒック量は、その測定点で伝送されている情報量である。
【００３０】
測定ブロックＢ１は、取得された各トラヒック量に取得された時刻を付してゆく。たとえば、測定ブロックＢ１はカウンタである。データベースブロックＢ２は、取得された時刻とともにトラヒック量を記憶する記憶装置である。
【００３１】
波高算出ブロックＢ３は、時系列データから波高データを抽出する。波高は、データベースブロックＢ２に記憶されているトラヒック量から、取得された前後の時刻のトラヒック量よりも多いトラヒック量であるとして定義される。また、波高は、平均値にもとづいて定義される場合もある。
【００３２】
平均値にもとづいて定義される場合は、取得したトラヒック量が平均値を越えている期間のうち最大のトラヒック量からその平均値を減じた値であるとして波高は定義される。
【００３３】
さらに、波高算出ブロックＢ３は、取得した各トラヒック量のうちいくつかを選択して、平均値を算出する。またさらに、波高算出ブロックＢ３は、波高の平均値を計算する。
【００３４】
特性分析ブロックＢ４は、時系列データを正規化する。また、このブロックＢ４は、正規化された時系列データを度数分布に変換して、さらにその度数分布を確率分布に変換する。さらに、このブロックＢ４は、最小二乗法によりプロットされたデータを最適な直線で近似する。
【００３５】
つぎに、本実施形態に係る通信トラヒック分析装置の動作を説明する。
測定プローブＰが接続されている通信網上の回線の測定点からトラヒックが取得される。
取得されたトラヒック量は、測定ブロックＢ１によって、測定プローブＰがトラヒック量を取得した時刻を付される。そして、時刻が付されたトラヒック量は、データベースブロックＢ２に記憶されてゆく。
【００３６】
波高算出ブロックＢ３は、データベースブロックＢ２に記憶されている時刻が付されているトラヒック量を入力して、上述した波高を算出しそのデータをつぎの特性分析ブロックＢ４に出力する。
【００３７】
時系列データは、前後のデータ間に相関が強くあると考えられるが、波高のデータは、前後で時系列データの平均を一旦下回っているので、相関が弱いと考えられる。これはデータの独立性が強まることを意味し、統計学の適用をより厳密にすることが可能になる。
【００３８】
この場合、波高算出ブロックＢ３は、取得した各トラヒック量のうちいくつかを選択して、平均値を算出する。たとえば、ある時刻からこれとは異なるある時刻までに取得されたトラヒック量が選択されて、これらトラヒック量から平均値が計算される。
また、予め定めた個数のトラヒック量のデータを選択して、それらをもとに平均値を計算してもよい。また、移動平均法によってトラヒック量の平均値を計算してもよい。また、これらの計算方法が組み合わされて計算されてもよい。
さらに、波高算出ブロックＢ３は、波高の平均値を計算する。
【００３９】
特性分析ブロックＢ４が、抽出された波高にもとづいて測定プローブＰが接続されている回線上のトラヒックに関する統計的な特性を算出する。たとえば、トラヒック量の大きさを変数とする確率分布等を算出する。また、トラヒック量の平均値にもとづいてトラヒック量を正規化して、その正規化した値を変数として確率分布等を算出する。
さらに、特性分析ブロックＢ４は、求められた確率分布を近似した近似式を計算する。
【００４０】
以下、具体例を参照して、通信トラヒック分析装置で実行される通信トラヒック分析方法を説明する。分析の対象は提供中のサービスであるため、トラヒックのデータは時間的・空間的な広がりを持つ。したがって、空間を固定すると、分析は時系列データを扱うことになる。
図５は、図４に示されている通信トラヒック分析装置が測定した時系列データの例を示す図である。図５の横軸は時刻を表し、縦軸は非同期転送モード（ＡＴＭ：Asynchronous Transfer Mode）におけるセルに関する２０秒周期の到着セル数を表す時系列データのグラフである。１セルは５３Ｂｙｔｅの固定長のパケットである。図５に示される時系列データは、データベースブロックＢ２に記憶されるデータである。
【００４１】
この測定は、図１に示されるようなＭ１の接続形態、すなわち、通信トラヒック分析装置がノードに接続されて行われた。このネットワーク（ノードおよびリンク）は、インターネットのバックボーン回線として主に利用されていたものである。
【００４２】
このようなトラヒックの時系列データの特徴として、まず、バースト的にトラヒックが増加している箇所がいくつかある。また、この図では示されていないが、インターネットのバックボーン回線では日中にトラヒックが増加するばかりではなく、夜間に増加する場合もある。さらに、年間の変動が急激な増加であり、幅を持ったトレンド変化が見られる場合もある。
すなわち、図５に示されている時系列データ例は、従来の分析方法では分析不可能な典型的なデータ例である。
【００４３】
図６は、図５に示されている時系列データのうち波高であるトラヒック量に印を付けた図である。この印が付けられているトラヒック量は、波高算出ブロックＢ３によって抽出される。
【００４４】
図６の場合は、取得したトラヒック量が平均値を越えている期間のうち最大のトラヒック量からその平均値を減じた値であるトラヒック量が波高として抽出されている。波高算出ブロックＢ３によって、データベースブロックＢ２に記憶されているトラヒック量からトラヒック量の平均値が算出される。
また、図６の場合とは異なり、波高が時系列データが増加から現象に転じたときの減少直前の値であると定義されて、波高データが抽出されることも考えられる。この場合は、トラヒック量が増加する全事象を対象として分析することが可能になる。
【００４５】
図６に示されている時系列データの場合は、その平均値は３９９，０４３セル／２０秒と計算される。この平均値を単位変換すると、平均使用帯域は１．０５７Ｍｂｐｓとなる。
【００４６】
また、波高算出ブロックＢ３によって、波高データの平均値も計算される。図６に示されている時系列データの場合、波高データの平均値は４３，４５８セル／２０秒（０．１１５Ｍｂｐｓ）と計算される。
【００４７】
これらの平均値によれば、波高データの基準の値は時系列データの平均値であるので、トラヒックが平均的には１．０５７Ｍｂｐｓ流れているところに平均波高０．１１５Ｍｂｐｓで加わっていることを意味している。
【００４８】
なお、図６は、図５に示したデータ全体の平均値をもとにして計算されているが、予め定めた個数を前後に渡った部分の平均値を移動して移動平均を算出する方法を適用しても、同様に正規化が可能である。
【００４９】
この移動平均法では、ある値に前後の値を加えて３つの値をもとにして平均値を算出し、これを時系列データの１つ１つに適用してそれぞれの平均値から全体の平均値を計算する。
【００５０】
特に、時系列データの前半の時刻で測定された到着セル数だけに移動平均を適用すると、トラヒックが測定されて間もなく、正規化した波高を定めることができ、即時性に優れるという利点がある。
【００５１】
図７は、図６に示されている時系列データの縦軸の到着セル数を正規化した図である。この正規化は特性分析ブロックＢ４によって実行される。
この例では、特性分析ブロックＢ４は、到着セル数から時系列データの平均値を減じて基準値を０にして、その減じられた値を波高データの平均値で除して、平均波高を揃える計算をする。
【００５２】
このような正規化方法により、到着セル数が異なることの多い異なるノードまたはリンクで測定された各時系列データの比較が容易かつ論理的に可能になる。
【００５３】
また、時系列データ全体の平均値による正規化は、多くのデータを比較することができるので、綿密な分析が可能になる特徴がある。
【００５４】
図８は、図７に示されている時系列データの度数分布を示す図である。ここでは参考のために示した。
【００５５】
この例では階級幅を平均波高とし、階級は［（ｎ−１）×階級幅、ｎ×階級幅）として度数が計算される。ただし、ｎは整数であり、区間の左端を含み、右端を含まないとしている。
【００５６】
時系列データは、負の階級に最頻値が現れて負の階級は３番目の階級までしか広がっていないのに対して、正の階級では５番目の階級まで広がっている。図８に示されている時系列データの度数分布は、左裾（すなわち、度数の最大値に対応する階級値から階級値の負の値が大きくなる方向）が急速に消滅するのに対して、右裾（すなわち、度数の最大値に対応する階級値から階級値の正の値が大きくなる方向）が比較的厚く尾を引く分布の形をしている。
【００５７】
この時系列データの度数分布では、隣接するデータ間に強い相関がある（すなわち、隣接するデータ間に独立性が仮定できない）と考えられるので、この分布をそのまま統計的な分析に使うのは適切ではない。
【００５８】
そこで、本実施形態では波高のデータを使用して度数分布を計算する。ここで参照している波高データは、前後で時系列データの平均を一旦下回っているので、相関が弱いと考えられる。したがって、波高データでの隣接するデータ間では独立性が高いと考えられる。また、波高は時系列データが増加から現象に転じたときの減少直前の値であると定義されている場合でさえも、時系列データの各データでの相関に比較して、相関が弱いと考えられる。
【００５９】
図９は、図７に示されている波高データの度数分布を示す図である。この波高データの度数分布は、特性分析ブロックＢ４によって計算される。
【００６０】
図８の場合と同様にこの例でも階級幅は平均波高とする。また、階級は［（ｎ−１）×階級幅、ｎ×階級幅）としている。ただし、区間の左端を含み、右端を含まないとしている。定義により正の階級のみ現れるので、ｎは整数ではなく自然数となる。
【００６１】
同じ階級における度数が図８より減っているのは、波高の抽出により、隣接する平均値到着セル数以下のデータを省略しているからである。
【００６２】
これによりデータの独立性を強め、右裾の尾の引き方について統計手法の適用をより厳密にすることができる。
【００６３】
図１０は、図９に示されている度数分布にもとづいて算出された確率分布を示す図である。この確率分布は、特性分析ブロックＢ４によって算出される。図９のデータに加え、２１０ポイントのデータが使用されている。このため、階級値５．５で確率０．０１の点が付けられている。図１０に付けられているほかの点はおおむね図９の度数分布にしたがっている。
図９で使用したトラヒックデータは９９８サンプルから抽出した８３ポイントである。図１０では、異なる日のデータも合わせて３０３５サンプルから抽出した２９３ポイントが使用されている。
【００６４】
図１０に示されている確率分布は、図９に示されている波高の度数分布を変換したものである。図９の各度数に対応する点がプロットされていて、各点が太線で連結されている。縦軸は、波高Ｘがｘ以上である確率Ｐ（Ｘ＞ｘ）を対数軸で示している。ここで、Ｘおよびｘは横軸の階級値に対応する。横軸は、図９と同様に、平均波高で正規化した階級値である。ところで、階級値が０．５の場合に確率が１である理由は、度数分布の階級値の定義により、階級値が０．５の場合の確率は０．５の階級値が属する［０、平均波高）の範囲以上の波高が現れる確率なので、あらゆる波高データが該当するためである。
【００６５】
図１０にプロットしてある各点にもとづいて、ｌｏｇ_１０Ｐ（Ｘ＞ｘ）＝−ａｘ＋ｂの直線で近似する。最小二乗法により図９の各度数に対応する点に最も近い直線が選択されて、パラメータａとｂが定められる。この近似により、図１０に示されている各点間が補間されるばかりでなく、データが無い階級値までも外挿することが可能になる。
【００６６】
したがって、階級値の大きい部分でデータが無い階級値における確率も算出することが可能である。原理的に、この近似直線により、波高が平均波高のｎ倍以上になる場合の確率を計算することが可能になる。
【００６７】
また、採集した時系列データの１セルが何秒周期で送信されているかにもとづいて、波高データの採択率および確率を時間に変換すると、そのような波高の発生間隔を推定することができる。このように求められた時間間隔における突発的なトラヒック量の増加がリンクの帯域に対して許容できるかどうかを、ネットワークの提供事業者は判断し、設備設計または通信の品質管理に反映することが可能になる。
【００６８】
本実施形態で図示されたデータ例の場合、図７で参照されたトラヒックデータは９９８サンプルから抽出した８３ポイントである。図１０で参照された、異なる日のデータも合わせて図１０は３０３５サンプルから抽出した２９３ポイントである。
【００６９】
したがって、波高の採択率は２９３／３０３５＝９．６５％である。また、パラメータは、最小二乗法によりａ＝０．３９６、ｂ＝０．１８１と推定される。この場合、たとえば、波高が平均波高のｘ＝８倍以上になる確率は約０．１％である。この結果、このような波高が平均波高の８倍以上になる波高のトラヒックが発生する間隔は、２０秒／（０．１％×９．６５％）＝約５８時間と推定される。
【００７０】
したがって、時系列データの平均値は１．０５７Ｍｂｐｓであり波高の平均値０．１１５Ｍｂｐｓであることにより、約５８時間に１度、２０秒間にトラヒックが１．０５７Ｍｂｐｓ＋８×０．１１５Ｍｂｐｓ＝１．９７７Ｍｂｐｓを越えることがあることになる。
【００７１】
このように計算された時間間隔で突発的にトラヒック量の増加が発生することが、リンクの帯域に対して許容できることかどうかが判断されて、設備設計または通信管理に生かされる。
【００７２】
上記の例では、ｌｏｇ_１０Ｐ（Ｘ＞ｘ）＝−ａｘ＋ｂの直線で度数分布を近似したが、ほかの直線で近似してもよい。たとえば、ｌｏｇ_１０Ｐ（Ｘ＞ｘ）＝−ａ（ｌｏｇ_１０ｘ）＋ｂで近似してもよい。この式は、トラヒック特性がバースト性を持つ場合に適用するとよい近似になる式である。
【００７３】
この式を適用する場合、図１０のグラフの横軸について対数を取る。最終的に、式１と式２の選択方法は、分布の裾について直線近似がより適切な式を選択する。たとえば、図１０のようにグラフを見てどちらの式が適切か自明な場合もあるが、統計的な正確性を期すならば、近似誤差の２乗和を比較することでより適切な式を選択することが可能になる。
【００７４】
この発明は、上述した実施の形態に限定されるものではなく、その技術的範囲において種々変形して実施することができる。
【００７５】
【発明の効果】
以上説明したように、本発明の通信トラヒック分析方法および通信トラヒック分析装置通信によれば、トラヒックの時系列データに対して、トラヒックのピーク特性を独立性を確保しつつ抽出することができ、時系列データの統計値分析に関して、通信の品質を適切に管理したり、適切な設備を設計することを可能にする。
【図面の簡単な説明】
【図１】本発明の実施形態に係る通信トラヒック分析装置がノードに接続されている場合のシステム全体の構成図。
【図２】本発明の実施形態に係る通信トラヒック分析装置がリンクに接続されている場合のシステム全体の構成図。
【図３】本発明の実施形態に係る通信トラヒック分析装置が端末に接続されている場合のシステム全体の構成図。
【図４】本発明の実施形態に係る通信トラヒック分析装置の機能ブロック図。
【図５】図４に示されている通信トラヒック分析装置が測定した時系列データの例を示す図。
【図６】図５に示されている時系列データのうち波高であるトラヒック量に印を付けた図。
【図７】図６に示されている時系列データの縦軸の到着セル数を正規化した図。
【図８】図７に示されている時系列データの度数分布を示す図。
【図９】図７に示されている波高データの度数分布を示す図。
【図１０】図９に示されている度数分布にもとづいて算出された確率分布を示す図。
【符号の説明】
１０１端末
１０２端末
２０１ノード
２０２ノード
Ｌ１リンク
Ｌ２リンク
Ｌ３リンク
Ｍ通信トラヒック分析装置
Ｐ測定プローブ
Ｂ１測定ブロック
Ｂ２データベースブロック
Ｂ３波高算出ブロック
Ｂ４特性分析ブロック[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a communication traffic analysis method and a communication traffic analysis apparatus that analyze time-series data obtained by measuring a traffic volume that is an amount of information transmitted to a line on a communication network, particularly when managing communication quality or The present invention relates to a communication traffic analysis method and a communication traffic analysis apparatus that can provide data that can be used when designing a facility for communication.
[0002]
[Prior art]
As is well known, a communication network such as a telephone network or the Internet includes terminals, nodes, and links. Data having an amount of information on a line on the communication network is transmitted, and communication between terminals becomes possible. This data is called traffic, and it becomes possible to analyze the communication status in the communication network by analyzing the traffic volume.
[0003]
The terminal transmits data to the node through the link, and the received node repeatedly transfers the data to another node through the link. Finally, the destination terminal receives the data. The communication carrier measures traffic and analyzes the traffic in order to grasp the quality of the service being provided and to manage the quality of the service and to appropriately design equipment for smoothly providing the service.
[0004]
For example, when a provider that provides a typical telephone service designs a bandwidth of a telephone network link, time-series data of call volume measured at a node is used. The call volume is defined by the average value of the number of simultaneous connection lines between nodes, and is output in one hour period. In order to design a bandwidth once every several years, a conventional method for analyzing a time series data of call volume and calculating a representative value (referred to as basic call volume) is as follows.
[0005]
(1) The average value of the number of simultaneously connected lines between nodes is defined as a one-hour periodic call volume.
(2) The maximum hourly call volume for each day from the time series data of the one-hour period call volume is defined as the daily maximum busy call volume.
(3) The basic call volume is the average of the top 30 days of the daily most busy call volume on 365 days of the year.
[0006]
The reason why such a method is adopted is due to the characteristics of the following three time periods.
(A) It is estimated that the fluctuation within one hour is random at a constant rate (following the Poisson process).
(B) Daily fluctuations are estimated to be high during the day and low at night when the day changes.
(C) Annual fluctuations are estimated to be stable increases or decreases, and sudden fluctuations may be ignored to some extent. The certain level is estimated to be a day that exceeds the average of the top 30 days of the daily daily busy call volume.
[0007]
Now, the bandwidth of the link is designed with respect to the basic call volume determined as described above. In link bandwidth design, it is said that randomly varying characteristics can be estimated by the Poisson process of reason (a). However, since the call volume exceeding the basic call volume including the fluctuation amount may be added to the link, connection may not be possible. Therefore, a prescribed value (call loss rate) is determined so that such an event is below a certain probability.
[0008]
The call loss rate to be protected is defined in “Ministry of Posts and Telecommunications No. 30”, and the call loss rate from the caller node to the callee node is 0.15 or less. In order to comply with the ministerial ordinance, the business company gives a stricter value than the ministerial ordinance between the transmitting nodes, and distributes it to the links that compose it. The bandwidth of the link is designed by the Erlang B formula from the allocation value and the basic call volume.
[0009]
As another example, a case where an Internet service provider (ISP, Internet Service Provider) designs the bandwidth of an Internet link will be described. There are many ISPs, and the bandwidth design methods are various. In addition, the bandwidth design method is an important matter for the ISP because the know-how and sales strategies of each ISP are condensed. One of the simplest methods of bandwidth design is a design method that takes time-series data of the number of transfer bytes in a line from MIB (Management Information Base) and sets the line usage rate to a predetermined value or less. The measurement cycle used in this method is determined by the operating policy of the ISP. Also, how much the packet discard rate for stochastically changing traffic is tolerated is similarly determined by the ISP operation policy.
[0010]
[Problems to be solved by the invention]
However, the Internet continues to grow, and the bandwidth design of the Internet link cannot configure the reasons corresponding to (a), (b), and (c) described above. The reason is due to the following three features.
(A) Burst property is seen in fluctuation within one hour.
(B) The daily fluctuation has several peaks, and there are also peaks before and after the day changes.
(C) Annual fluctuations are unstable and characteristics are unknown, and characteristics of sudden fluctuations are also unknown.
Therefore, there is a need for a time-series data analysis method that can configure logical reasons for Internet links.
[0011]
Also in the telephone network, the Internet connection service using the telephone network as an access line has begun to spread, and the reasons (a), (b), and (c) are not appropriate. That is, the telephone network often has the above-mentioned features (A), (B), and (C) similar to the Internet.
Therefore, there is a need for a time-series data analysis method that can configure logical reasons in the telephone network.
[0012]
Therefore, in view of the problems in the prior art, the present invention relates to a statistical analysis of time series data of communication traffic, and a communication traffic analysis method capable of appropriately managing communication quality or designing appropriate equipment. An object of the present invention is to provide a communication traffic analyzer.
[0013]
[Means for Solving the Problems]
  The communication traffic analysis method of the present invention includes:
  In a communication traffic analysis method for analyzing the amount of traffic being the amount of information transmitted to a line on a communication network,
  In order to measure the traffic volume, obtain the traffic volume at the measurement point connected to the line on the communication network in time series,
  Storing the acquired traffic volume;
  From the plurality of stored traffic volumes, extract a traffic volume that is greater than the traffic volume of the acquired time before and after,
  Based on the extracted traffic volume, a statistical characteristic regarding the traffic volume on the connected line is calculated.
  Comprising
The process of extracting the traffic volume is as follows:
Based on the plurality of stored traffic volumes, an average value of the traffic volume is calculated,
A value obtained by subtracting the average value from the maximum traffic amount during a period in which the traffic amount exceeds the average value is extracted as a wave height.
Storing the extracted wave height;
Based on the stored plurality of wave heights, an average value of the wave heights is calculated.
Comprising
The step of calculating the statistical characteristic includes subtracting an average value of the traffic amount from the stored traffic amount and dividing the reduced value by the average value of the wave height to obtain a normalized value. The amount of traffic received
  It is provided with the thing.
[0014]
  Further, the communication traffic analysis device of the present invention is
  In a communication traffic analysis device that analyzes the amount of traffic that is the amount of information transmitted to a line on a communication network,
  Acquisition means for acquiring in a time series the traffic volume at a measurement point connected to a line on the communication network in order to measure the traffic volume;
  Storage means for storing the acquired traffic volume;
  Extraction means for extracting a traffic amount larger than the traffic amount at the previous and subsequent times obtained from the plurality of stored traffic amounts;
  A characteristic calculating means for calculating a statistical characteristic related to the traffic volume on the connected line based on the extracted traffic volume;
  Comprising
The extraction means includes
Average calculating means for calculating an average value of the traffic volume based on the plurality of stored traffic volumes;
A wave height extracting means for extracting a value obtained by subtracting the average value from the maximum traffic volume during a period in which the traffic volume exceeds the average value, as a wave height;
Wave height storage means for storing the extracted wave height;
Wave height average calculating means for calculating an average value of wave heights based on the plurality of stored wave heights;
Comprising
The characteristic calculation means subtracts the average value of the traffic volume from the stored traffic volume, divides the reduced value by the average value of the wave height, and normalizes the acquired traffic volume. Conversion means to convertIt is characterized by having.
[0015]
According to the above, the conventional technique extracts statistical values in a predetermined cycle of 1 hour, 1 day, or 1 year, whereas the present invention uses the time series data of communication traffic to determine the time before and after being acquired. It is characterized by extracting a wave height that is a traffic amount larger than the traffic amount. By extracting the wave height, it is possible to acquire data with weak correlation. That is, according to the present invention, it is possible to acquire data that is more statistically meaningful than time series data.
[0016]
In addition, extracting the wave height from the time series data of communication traffic does not depend on the period of the time series data and extracts a representative value that characterizes an event of traffic increase. Therefore, even if the traffic increase is before and after the predetermined period break, the magnitude of the influence of the event can be expressed without depending on the aggregated period framework. Therefore, it is possible to analyze the traffic fluctuation characteristics without depending on the period of the time series data and how to separate the time series data.
[0017]
Furthermore, the present invention is characterized by normalizing wave height data. By referring to the normalized data, it is expected that the comparison between data such that the number of data samples and the measurement point at which the data is acquired is different can be accurately performed. That is, it is possible to compare time series data that is spatially widened by normalizing the wave height.
[0018]
Furthermore, although the prior art makes an assumption that a Poisson process with a representative value as a constant rate is followed, the present invention does not assume such a specific stochastic process. Therefore, it is possible to analyze data that does not follow a specific stochastic process.
[0019]
Furthermore, when the data is extracted by defining the wave height as the maximum value while the time series data exceeds the average value, and the value immediately before the decrease when the time series data changes from increase to decrease. It is characterized in that data may be extracted with the definition. The former makes it possible to analyze only the events whose influence is above the average, and the latter makes it possible to analyze all the events of the data whose traffic volume has increased.
[0020]
In addition, by approximating wave height data that can obtain statistically more meaningful data than time series data, statistical values in sections with little or no data are estimated with a significant basis. can do.
[0021]
As a result, it is possible to comprehensively analyze the increase in data with increased traffic volume for communication traffic that spreads in time and space without depending on the period of time series data and how to break it. It becomes possible.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a communication traffic analysis method and a communication traffic analysis apparatus according to an embodiment of the present invention will be described with reference to the drawings.
1, 2, and 3 show forms in which the communication traffic analysis apparatus according to the present embodiment is connected to a line. FIG. 1 shows a case where the communication traffic analysis apparatus according to this embodiment is connected to a node, FIG. 2 shows a case where the communication traffic analysis apparatus according to this embodiment is connected to a link, and FIG. This is a case where the communication traffic analysis apparatus according to the embodiment is connected to a terminal.
[0023]
A communication network (sometimes simply referred to as a “network”) is a collection of terminals, nodes, and links. For example, as shown in FIGS. 1, 2, and 3, the terminal (101, 102), the node (201, 202), the link (L1, L2, L3), and the network. Here, the network is a set of nodes and links.
The nodes (201, 202) are devices having only a transfer function for transferring transmitted data. The links (L1, L2, L3) are connection paths for connecting between nodes or between a node and a terminal. The terminals (101, 102) are devices that can transmit and receive information.
[0024]
In FIG. 1, FIG. 2, and FIG. 3, the terminal 101 and the node 201 are connected by a link L1. Node 201 and node 202 are connected by a link L2 via a network. The node 202 and the terminal 102 are connected by a link L3.
[0025]
M1, M2, and M3 are communication traffic analyzers according to the present embodiment. M1 is connected to the node 201, M2 is connected to the link L2, and M3 is connected to the terminal 101.
[0026]
When a communication traffic analyzer is connected as in M1, it is possible to measure and analyze the traffic transferred at the node. Further, when a communication traffic analyzer is connected as in M2, it is possible to measure and analyze the traffic received at the M2 measurement point on the link L2. Since measurements like M1 and M2 are passive, they do not affect the traffic itself.
As a method of outputting the measured data, a method of outputting every predetermined period and a method of outputting a current value when receiving a trigger while always counting by a counter are generally used.
Data has a cycle or trigger time as an attribute. In the periodic output method, for example, call volume is defined in the telephone network. MIB items are defined on the Internet as the trigger output method. For example, the number of transfer packets and the number of transfer bytes can be mentioned.
Since measurements such as M1 and M2 are passive, they can be measured for forwarded or received traffic, but not for any traffic.
[0027]
On the other hand, when a communication traffic analyzer is connected as in M3, not only passive but also active measurement can be performed. That is, in addition to measuring the received traffic, the traffic for measurement can be transmitted. Therefore, the traffic itself is affected, but it is possible to measure any traffic that can be transmitted by the terminal.
In the telephone network, there is a test call measurement. In addition, in the Internet, there is a measurement by ICMP (Internet Control Management Protocol) ping packet.
[0028]
FIG. 4 is a functional block diagram of the communication traffic analysis apparatus according to the embodiment of the present invention. Corresponds to M1, M2, and M3 shown in FIG. 1, FIG. 2, and FIG.
The communication traffic analyzer M according to the embodiment of the present invention includes a measurement probe P, a measurement block B1, a database block (DB block) B2, a wave height calculation block B3, and a characteristic analysis block B4.
[0029]
The measurement probe P is connected to a line to be measured on the communication network. That is, it is connected to a node, a link, or a terminal as described above. Get traffic volume from connected measurement points. The traffic amount is the amount of information transmitted at the measurement point.
[0030]
The measurement block B1 attaches the acquired time to each acquired traffic amount. For example, the measurement block B1 is a counter. The database block B2 is a storage device that stores the traffic volume together with the acquired time.
[0031]
The wave height calculation block B3 extracts the wave height data from the time series data. The wave height is defined as a traffic volume that is larger than the traffic volume at the time before and after the acquired traffic volume from the traffic volume stored in the database block B2. The wave height may be defined based on an average value.
[0032]
When defined based on the average value, the wave height is defined as a value obtained by subtracting the average value from the maximum traffic amount during the period in which the acquired traffic volume exceeds the average value.
[0033]
Further, the wave height calculation block B3 selects some of the acquired traffic volumes and calculates an average value. Furthermore, the wave height calculation block B3 calculates an average value of the wave heights.
[0034]
The characteristic analysis block B4 normalizes the time series data. Further, this block B4 converts the normalized time series data into a frequency distribution, and further converts the frequency distribution into a probability distribution. Furthermore, this block B4 approximates the data plotted by the least square method with an optimal straight line.
[0035]
Next, the operation of the communication traffic analyzer according to this embodiment will be described.
Traffic is acquired from the measurement point of the line on the communication network to which the measurement probe P is connected.
The acquired traffic amount is given the time when the measurement probe P acquired the traffic amount by the measurement block B1. Then, the traffic volume to which the time is attached is stored in the database block B2.
[0036]
The wave height calculation block B3 inputs the traffic amount with the time stored in the database block B2, calculates the wave height described above, and outputs the data to the next characteristic analysis block B4.
[0037]
The time series data is considered to have a strong correlation between the preceding and succeeding data, but the wave height data is considered to have a weak correlation because it is once below the average of the time series data before and after. This means that the independence of the data is strengthened, and the application of statistics can be made stricter.
[0038]
In this case, the wave height calculation block B3 selects some of the acquired traffic volumes and calculates an average value. For example, a traffic volume acquired from a certain time to a different time is selected, and an average value is calculated from these traffic volumes.
Alternatively, a predetermined number of pieces of traffic data may be selected, and an average value may be calculated based on the data. Further, the average value of the traffic amount may be calculated by a moving average method. Moreover, these calculation methods may be combined and calculated.
Further, the wave height calculation block B3 calculates an average value of the wave heights.
[0039]
The characteristic analysis block B4 calculates a statistical characteristic related to traffic on the line to which the measurement probe P is connected based on the extracted wave height. For example, a probability distribution or the like using the amount of traffic as a variable is calculated. Further, the traffic amount is normalized based on the average value of the traffic amount, and a probability distribution or the like is calculated using the normalized value as a variable.
Further, the characteristic analysis block B4 calculates an approximate expression that approximates the obtained probability distribution.
[0040]
Hereinafter, a communication traffic analysis method executed by the communication traffic analysis device will be described with reference to a specific example. Since the object of analysis is the service being provided, traffic data has a temporal and spatial spread. Therefore, if the space is fixed, the analysis will deal with time series data.
FIG. 5 is a diagram showing an example of time-series data measured by the communication traffic analyzer shown in FIG. The horizontal axis of FIG. 5 represents time, and the vertical axis is a graph of time-series data representing the number of arriving cells in a cycle of 20 seconds for cells in an asynchronous transfer mode (ATM). One cell is a 53-byte fixed-length packet. The time series data shown in FIG. 5 is data stored in the database block B2.
[0041]
This measurement was performed by connecting M1 as shown in FIG. 1, that is, a communication traffic analyzer connected to the node. This network (node and link) was mainly used as a backbone line of the Internet.
[0042]
As a feature of such traffic time-series data, there are first several places where traffic increases in a burst manner. Although not shown in this figure, the Internet backbone line may not only increase traffic during the day, but also increase at night. In addition, the annual fluctuation is a rapid increase, and there may be a trend change with a wide range.
That is, the time-series data example shown in FIG. 5 is a typical data example that cannot be analyzed by the conventional analysis method.
[0043]
FIG. 6 is a diagram in which the traffic amount which is the wave height in the time series data shown in FIG. 5 is marked. The traffic volume with this mark is extracted by the wave height calculation block B3.
[0044]
In the case of FIG. 6, the traffic amount that is a value obtained by subtracting the average value from the maximum traffic amount during the period in which the acquired traffic amount exceeds the average value is extracted as the wave height. The wave height calculation block B3 calculates an average value of the traffic volume from the traffic volume stored in the database block B2.
Further, unlike the case of FIG. 6, it is conceivable that the wave height is extracted by defining the wave height as a value immediately before the decrease when the time-series data changes from the increase to the phenomenon. In this case, it becomes possible to analyze all the events in which the traffic volume increases.
[0045]
In the case of the time series data shown in FIG. 6, the average value is calculated as 399,043 cells / 20 seconds. When this average value is converted into a unit, the average use band is 1.057 Mbps.
[0046]
Further, the average value of the wave height data is also calculated by the wave height calculation block B3. In the case of the time-series data shown in FIG. 6, the average value of the wave height data is calculated as 43,458 cells / 20 seconds (0.115 Mbps).
[0047]
According to these average values, since the reference value of the wave height data is the average value of the time series data, it is added that the traffic is flowing at an average wave height of 0.115 Mbps while the traffic is flowing at an average of 1.057 Mbps. I mean.
[0048]
6 is calculated based on the average value of the entire data shown in FIG. 5, but a method of calculating the moving average by moving the average value of the portion extending in the front and back a predetermined number. Even if is applied, normalization is possible as well.
[0049]
In this moving average method, an average value is calculated based on three values by adding previous and subsequent values to a certain value, and this is applied to each of the time series data to calculate the total value from each average value. Calculate the average value.
[0050]
In particular, when the moving average is applied only to the number of arrival cells measured at the first half time of the time series data, there is an advantage that the normalized wave height can be determined soon after the traffic is measured and the immediacy is excellent.
[0051]
FIG. 7 is a diagram in which the number of arrival cells on the vertical axis of the time series data shown in FIG. 6 is normalized. This normalization is performed by the characteristic analysis block B4.
In this example, the characteristic analysis block B4 subtracts the average value of the time series data from the number of arrival cells to set the reference value to 0, and divides the reduced value by the average value of the wave height data to make the average wave height uniform. Calculate.
[0052]
Such a normalization method makes it possible to easily and logically compare time series data measured at different nodes or links that often have different arrival cell numbers.
[0053]
In addition, normalization based on the average value of the entire time series data has a feature that enables a thorough analysis because a large amount of data can be compared.
[0054]
FIG. 8 is a diagram showing the frequency distribution of the time-series data shown in FIG. Shown here for reference.
[0055]
In this example, the class width is the average wave height, and the frequency is calculated as [(n−1) × class width, n × class width). However, n is an integer, including the left end of the section and not including the right end.
[0056]
In the time series data, the mode appears in the negative class, and the negative class extends only to the third class, whereas the positive class extends to the fifth class. In the frequency distribution of the time series data shown in FIG. 8, the left foot (that is, the direction in which the negative value of the class value increases from the class value corresponding to the maximum value of the frequency) rapidly disappears. The right tail (that is, the direction in which the positive value of the class value increases from the class value corresponding to the maximum value of the frequency) has a relatively thick and tailing distribution.
[0057]
In this time series data frequency distribution, it is considered that there is a strong correlation between adjacent data (ie, independence cannot be assumed between adjacent data), so it is appropriate to use this distribution as it is for statistical analysis. is not.
[0058]
Therefore, in this embodiment, the frequency distribution is calculated using the wave height data. Since the wave height data referred to here is once below the average of the time series data before and after, it is considered that the correlation is weak. Therefore, it is considered that the independence between adjacent data in the wave height data is high. In addition, even when the wave height is defined to be the value immediately before the decrease when the time series data changes from an increase to a phenomenon, the correlation is weak compared to the correlation of each data of the time series data. Conceivable.
[0059]
FIG. 9 is a diagram showing the frequency distribution of the wave height data shown in FIG. The frequency distribution of the wave height data is calculated by the characteristic analysis block B4.
[0060]
Similar to the case of FIG. 8, in this example, the class width is the average wave height. The class is [(n−1) × class width, n × class width). However, it includes the left end of the section and does not include the right end. Since only positive classes appear by definition, n is a natural number, not an integer.
[0061]
The reason why the frequency in the same class is lower than that in FIG. 8 is that data below the number of adjacent average value arrival cells is omitted by extracting the wave height.
[0062]
As a result, the independence of the data can be strengthened, and the statistical method can be applied more strictly with respect to how the tail of the right tail is drawn.
[0063]
FIG. 10 is a diagram showing a probability distribution calculated based on the frequency distribution shown in FIG. This probability distribution is calculated by the characteristic analysis block B4. In addition to the data of FIG. 9, data of 210 points is used. For this reason, a point with a probability of 0.01 is assigned with a class value of 5.5. The other points attached to FIG. 10 generally follow the frequency distribution of FIG.
The traffic data used in FIG. 9 is 83 points extracted from 998 samples. In FIG. 10, 293 points extracted from 3035 samples including data for different days are used.
[0064]
The probability distribution shown in FIG. 10 is obtained by converting the frequency distribution of the wave height shown in FIG. Points corresponding to the respective frequencies in FIG. 9 are plotted, and the respective points are connected by bold lines. The vertical axis indicates the probability P (X> x) that the wave height X is greater than or equal to x on the logarithmic axis. Here, X and x correspond to class values on the horizontal axis. The horizontal axis is the class value normalized by the average wave height, as in FIG. By the way, the reason why the probability is 1 when the class value is 0.5 is that the probability value when the class value is 0.5 belongs to the class value of 0.5 by the definition of the class value of the frequency distribution [0, This is because any wave height data corresponds to the probability that a wave height exceeding the range of (average wave height) appears.
[0065]
Based on each point plotted in FIG.₁₀Approximate with a straight line of P (X> x) = − ax + b. The straight line closest to the point corresponding to each frequency in FIG. 9 is selected by the method of least squares, and parameters a and b are determined. By this approximation, not only the points shown in FIG. 10 are interpolated, but it is also possible to extrapolate even class values with no data.
[0066]
Therefore, it is also possible to calculate the probability in the class value where there is no data in the part where the class value is large. In principle, this approximate straight line makes it possible to calculate the probability when the wave height is n times or more than the average wave height.
[0067]
Further, when the selection rate and probability of wave height data are converted into time based on how many seconds of one cell of collected time-series data is transmitted, the generation interval of such wave heights can be estimated. The network provider can determine whether or not the sudden increase in traffic in the time interval determined in this way can be tolerated for the bandwidth of the link, and may reflect this in equipment design or communication quality control. It becomes possible.
[0068]
In the case of the data example illustrated in this embodiment, the traffic data referred to in FIG. 7 is 83 points extracted from 998 samples. Together with the data for the different days referenced in FIG. 10, FIG. 10 shows 293 points extracted from 3035 samples.
[0069]
Therefore, the adoption rate of wave height is 293/3035 = 9.65%. The parameters are estimated as a = 0.396 and b = 0.181 by the least square method. In this case, for example, the probability that the wave height is x = 8 times or more of the average wave height is about 0.1%. As a result, the interval at which the traffic of the wave height where the wave height is 8 times or more of the average wave height is generated is estimated to be 20 seconds / (0.1% × 9.65%) = about 58 hours.
[0070]
Therefore, the average value of the time series data is 1.057 Mbps, and the average value of the wave height is 0.115 Mbps, so that traffic is 1.057 Mbps + 8 × 0.115 Mbps = 1.77 Mbps once in about 58 hours and 20 seconds. There are things that will be exceeded.
[0071]
It is determined whether or not the sudden increase in traffic volume at the time interval calculated in this way is acceptable for the link bandwidth, and is used for facility design or communication management.
[0072]
In the above example, log₁₀Although the frequency distribution is approximated by a straight line of P (X> x) = − ax + b, it may be approximated by another straight line. For example, log₁₀P (X> x) = − a (log₁₀x) It may be approximated by + b. This equation is an equation that can be approximated when the traffic characteristic is bursty.
[0073]
When this equation is applied, the logarithm is taken with respect to the horizontal axis of the graph of FIG. Finally, the selection method of Expression 1 and Expression 2 selects an expression that is more suitable for linear approximation with respect to the bottom of the distribution. For example, there is a case where it is obvious which expression is appropriate by looking at the graph as shown in FIG. 10, but if statistical accuracy is desired, a more appropriate expression can be obtained by comparing the sum of squares of approximation errors. It becomes possible to select.
[0074]
The present invention is not limited to the embodiment described above, and can be implemented with various modifications within the technical scope thereof.
[0075]
【The invention's effect】
As described above, according to the communication traffic analysis method and communication traffic analysis apparatus communication of the present invention, it is possible to extract the peak characteristics of traffic from the time series data of traffic while ensuring independence. With regard to statistical value analysis of series data, it is possible to appropriately manage communication quality and design appropriate equipment.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an entire system when a communication traffic analysis apparatus according to an embodiment of the present invention is connected to a node.
FIG. 2 is a configuration diagram of the entire system when a communication traffic analysis apparatus according to an embodiment of the present invention is connected to a link.
FIG. 3 is a configuration diagram of the entire system when a communication traffic analysis apparatus according to an embodiment of the present invention is connected to a terminal.
FIG. 4 is a functional block diagram of a communication traffic analysis apparatus according to an embodiment of the present invention.
5 is a diagram showing an example of time-series data measured by the communication traffic analysis device shown in FIG.
6 is a diagram in which a traffic amount that is a wave height in the time-series data shown in FIG. 5 is marked.
7 is a diagram in which the number of arrival cells on the vertical axis of the time-series data shown in FIG. 6 is normalized.
8 is a diagram showing a frequency distribution of the time series data shown in FIG.
FIG. 9 is a diagram showing a frequency distribution of the wave height data shown in FIG. 7;
FIG. 10 is a diagram showing a probability distribution calculated based on the frequency distribution shown in FIG. 9;
[Explanation of symbols]
101 terminal
102 terminals
201 nodes
202 nodes
L1 link
L2 link
L3 link
M Communication traffic analyzer
P measurement probe
B1 measurement block
B2 database block
B3 Wave height calculation block
B4 Characteristic analysis block

Claims

In a communication traffic analysis method for analyzing the amount of traffic being the amount of information transmitted to a line on a communication network,
In order to measure the traffic volume, obtain the traffic volume at the measurement point connected to the line on the communication network in time series,
Storing the acquired traffic volume;
From the plurality of stored traffic volumes, extract a traffic volume that is greater than the traffic volume of the acquired time before and after,
Calculating a statistical characteristic related to the traffic volume on the connected line based on the extracted traffic volume ;
Process of extracting the previous Symbol traffic amount,
Based on the plurality of stored traffic volumes, an average value of the traffic volume is calculated,
A value obtained by subtracting the average value from the maximum traffic amount during a period in which the traffic amount exceeds the average value is extracted as a wave height .
Stores the crest, which is before Symbol extraction,
Based on the plurality of wave height, which is the store, immediately Bei calculating a mean value of the wave height,
The step of calculating the statistical characteristic includes subtracting an average value of the traffic amount from the stored traffic amount and dividing the reduced value by the average value of the wave height to obtain a normalized value. A communication traffic analysis method comprising: converting the amount of traffic generated.

The process of calculating the statistical properties, the communication traffic analysis method according to claim 1, characterized by further comprising calculating a probability distribution of the normalized value as a variable.

The process of calculating the statistical characteristic is to calculate a probability distribution by approximating a probability variable that is logarithmically transformed as a normalized value so that the logarithmically transformed probability is distributed on a straight line under the random variable. The communication traffic analysis method according to claim 1 or 2 , further comprising:

The process of calculating the statistical characteristic is performed by approximating a probability variable with the normalized value so that the logarithmically transformed probability is distributed on a straight line under the logarithmically transformed random variable, communication traffic analysis method according to claim 1 or claim 2, characterized by further comprising calculating a.

The probability distribution is communication traffic analysis method according to any one of the preceding claims 2, characterized in that indicating the probability when the larger than normalized values has a value.

In the process of calculating the statistical characteristics, the normalized value is a random variable X, a probability distribution in which X is larger than a certain value x is P (X> x), and a and b are parameters. ,
log ₁₀ P (X> x) ≈−ax + b
Communication traffic analysis method according to claim 1 or claim 2, characterized by further comprising the approximating probability distribution by adjusting the a and b in the formula.

In the process of calculating the statistical characteristics, the normalized value is a random variable X, a probability distribution in which X is larger than a certain value x is P (X> x), and a and b are parameters. ,
log ₁₀ P (X> x) ≈−a (log ₁₀ x) + b
Communication traffic analysis method according to claim 1 or claim 2, characterized by further comprising the approximating probability distribution by adjusting the a and b in the formula.

In a communication traffic analysis device that analyzes the amount of traffic that is the amount of information transmitted to a line on a communication network,
Acquisition means for acquiring in a time series the traffic volume at a measurement point connected to a line on the communication network in order to measure the traffic volume;
Storage means for storing the acquired traffic volume;
Extraction means for extracting a traffic amount larger than the traffic amount at the previous and subsequent times obtained from the plurality of stored traffic amounts;
A characteristic calculating means for calculating a statistical characteristic related to the traffic volume on the connected line based on the extracted traffic volume;
Was immediately Bei,
The extraction means includes
Average calculating means for calculating an average value of the traffic volume based on the plurality of stored traffic volumes;
A wave height extracting means for extracting a value obtained by subtracting the average value from the maximum traffic volume during a period in which the traffic volume exceeds the average value, as a wave height ;
A pulse height memory means for storing a height that is pre-Symbol extraction,
Wave height average calculating means for calculating an average value of wave heights based on the plurality of stored wave heights;
Was immediately Bei,
The characteristic calculation means subtracts the average value of the traffic volume from the stored traffic volume, divides the reduced value by the average value of the wave height, and normalizes the acquired traffic volume. A communication traffic analyzer comprising a conversion means for conversion.

9. The communication traffic analysis apparatus according to claim 8 , wherein the characteristic calculation unit further includes a probability calculation unit that calculates a probability distribution using the normalized value as a variable.

The characteristic calculating means further includes a probability distribution calculating means for calculating a probability distribution by approximating the probability variable logarithmically transformed so that the probability of logarithmic transformation is distributed on a straight line under the random variable. The communication traffic analysis apparatus according to claim 8 or 9 , further comprising:

The characteristic calculation means calculates a probability distribution by approximating a probability variable as the normalized value so that a logarithmically transformed probability is distributed on a straight line under the logarithmically transformed probability variable. communications traffic analyzer according to further comprising a calculating means to claim 8 or claim 9, characterized in.

The probability distribution is communication traffic analyzer according to claim 11 claim 9, characterized in that indicating the probability when the larger than normalized values has a value.

The characteristic calculating means uses the normalized value as a random variable X, a probability distribution in which X is larger than a certain value x as P (X> x), and a and b as parameters,
log ₁₀ P (X> x) ≈−ax + b
Wherein communications traffic analyzer according to claim 8 or claim 9, characterized by further comprising a probability distribution calculating means for approximating the probability distribution by adjusting the a and b in the.

The characteristic calculating means uses the normalized value as a random variable X, a probability distribution in which X is larger than a certain value x as P (X> x), and a and b as parameters,
log ₁₀ P (X> x) ≈−a (log ₁₀ x) + b
Wherein communications traffic analyzer according to claim 8 or claim 9, characterized by further comprising a probability distribution calculating means for approximating the probability distribution by adjusting the a and b in the.