JP7701700B2

JP7701700B2 - NETWORK FAILURE LOCALIZATION DEVICE, NETWORK FAILURE LOCALIZATION METHOD, AND PROGRAM

Info

Publication number: JP7701700B2
Application number: JP2021196292A
Authority: JP
Inventors: 光希池内; 康太郎松田; 洋斎藤
Original assignee: Nippon Telegraph and Telephone Corp; University of Tokyo NUC; NTT Inc USA
Current assignee: University of Tokyo NUC; NTT Inc; NTT Inc USA
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2025-07-02
Anticipated expiration: 2041-12-02
Also published as: JP2023082481A

Description

本発明は、ネットワークにおける障害箇所を特定する技術に関連するものである。 The present invention relates to technology for identifying fault locations in a network.

近年、ネットワークの複雑化が進み、ネットワーク機器が出力するログやメトリクスなどのデータだけに依存した従来の方法だけでは検知できないような障害が発生するようになった。 In recent years, networks have become more complex, resulting in the occurrence of failures that cannot be detected using traditional methods that rely solely on data such as logs and metrics output by network devices.

このような障害を検知する手段として「ネットワークトモグラフィー」と呼ばれる手段が注目されている。ネットワークトモグラフィーは、複数の離れたノード間のｅｎｄ－ｔｏ－ｅｎｄの通信状況を測定（これをパス測定と呼ぶ）し、その疎通性に関する記録を統合することで、障害箇所（障害ノードや障害リンク）を特定する手段である。 As a means of detecting such failures, a method called "network tomography" has attracted attention. Network tomography measures the end-to-end communication status between multiple distant nodes (this is called path measurement) and integrates records of this connectivity to pinpoint the location of the failure (failed node or failed link).

特に、各ネットワークコンポーネント（ノード、リンク）のバイナリ状態（障害がある／ない）を推定するものは、バイナリネットワークトモグラフィーとも呼ばれ、盛んに研究されている［非特許文献１］。 In particular, methods that estimate the binary state (fault/no fault) of each network component (node, link) are also called binary network tomography and are being actively researched [Non-Patent Document 1].

バイナリネットワークトモグラフィーの既存技術の多くは、ルーティングが確定的であることを仮定している。しかし、現実には負荷分散メカニズムや、ウェイトが等しいパスに分散してトラヒックを送るＥＣＭＰなどのプロトコルが存在し、ルーティングが確率的に振る舞う状況が生ずる。確率ルーティング下におけるネットワークトモグラフィーについても、少数ではあるが既存手法［非特許文献２，３］が存在する。 Most existing techniques for binary network tomography assume that routing is deterministic. However, in reality, there are load balancing mechanisms and protocols such as ECMP that distribute traffic among paths with equal weights, which can lead to situations in which routing behaves probabilistically. There are also a small number of existing methods for network tomography under probabilistic routing [Non-Patent Documents 2, 3].

N. Duffield, "Simple network performance tomography," in Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 2003, pp. 210-215.N. Duffield, "Simple network performance tomography," in Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 2003, pp. 210-215. H. Herodotou, B. Ding, S. Balakrishnan, G. Outhred, and P. Fitter, "Scalable near real-time failure localization of data center networks," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 1689-1698.H. Herodotou, B. Ding, S. Balakrishnan, G. Outhred, and P. Fitter, "Scalable near real-time failure localization of data center networks," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014, pp. 1689-1698. R. Tagyo, D. Ikegami, and R. Kawahara, "Network tomography using routing probability for undeterministic routing," IEICE Transactions on Communications, vol. E104.B, no. 7, pp. 837-848, 2021.R. Tagyo, D. Ikegami, and R. Kawahara, "Network tomography using routing probability for undeterministic routing," IEICE Transactions on Communications, vol. E104.B, no. 7, pp. 837-848, 2021. T. Soma and Y. Yoshida, "Maximizing monotone submodular functions over the integer lattice," Mathematical Programming, vol. 172, no. 1, pp. 539-563, 2018.T. Soma and Y. Yoshida, "Maximizing monotone submodular functions over the integer lattice," Mathematical Programming, vol. 172, no. 1, pp. 539-563, 2018. T. Soma, N. Kakimura, K. Inaba, and K.-i. Kawarabayashi, "Optimal budget allocation: Theoretical guarantee and efficient algorithm," in Proceedings of International Conference on Machine Learning (ICML). PMLR, 2014, pp. 351-359.T. Soma, N. Kakimura, K. Inaba, and K.-i. Kawarabayashi, "Optimal budget allocation: Theoretical guarantee and efficient algorithm," in Proceedings of International Conference on Machine Learning (ICML). PMLR, 2014, pp. 351-359. S. Knight, H. X. Nguyen, N. Falkner, R. Bowden, and M. Roughan, "The internet topology zoo," IEEE Journal on Selected Areas in Communications, vol. 29, no. 9, pp. 1765-1775, 2011.S. Knight, H.

確率ルーティングでは、ノードペアが決まっても、それを結ぶパスが一意に定まらないため、正しく障害箇所を特定するには大量のパス測定が必要となる。一般に大量のパス測定は、障害箇所特定までが長期化したり、ネットワークに大きな負荷がかかったりするため、できる限り避けるべきである。 In probabilistic routing, even if node pairs are determined, the path connecting them is not uniquely determined, so a large number of path measurements are required to correctly identify the location of a fault. In general, large-scale path measurements should be avoided as much as possible, as they can take a long time to identify the location of a fault and place a heavy load on the network.

しかしながら、確率ルーティング下における既存のネットワークトモグラフィー手法［非特許文献２、３］では、既に大量のパス測定データが得られていることを仮定しており、不必要に多くのパス測定が実施されてしまう可能性がある。 However, existing network tomography methods under probabilistic routing [Non-Patent Documents 2, 3] assume that a large amount of path measurement data has already been obtained, and there is a possibility that an unnecessarily large number of path measurements will be performed.

本発明は上記の点に鑑みてなされたものであり、確率ルーティング下において限られた回数のパス測定でできるだけ正確に障害箇所を特定するための障害箇所特定技術を提供することを目的とする。 The present invention has been made in consideration of the above points, and aims to provide a fault location identification technology that identifies the fault location as accurately as possible with a limited number of path measurements under probabilistic routing.

開示の技術によれば、対象ネットワークの障害箇所を特定するためのネットワーク障害箇所特定装置であって、
前記対象ネットワークに対して実行すべきパス測定のテストを算出するテスト最適化部と、
前記テスト最適化部により算出された前記テストを前記対象ネットワークに対して実行するテスト実行部と、
前記テスト実行部による前記テストの結果に応じて前記対象ネットワークの状態を絞り込むテスト結果分析部と、を備え、
前記テスト最適化部が、前記テスト結果分析部により得られた分析結果を用いて更にテストを算出し、前記テスト実行部が前記更に算出されたテストを実行し、前記テスト結果分析部が、前記更に算出されたテストの結果に応じて前記対象ネットワークの前記状態を絞り込む処理を、１回以上繰り返すネットワーク障害箇所特定装置であり、
前記テスト最適化部は、前記テストの実行結果を表す確率変数と前記対象ネットワークの前記状態を表す確率変数の間の相互情報量の最大化問題の最適解あるいは近似最適解である更なるテストを算出する
ネットワーク障害箇所特定装置が提供される。

According to the disclosed technology, there is provided a network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization unit for calculating path measurement tests to be performed on the target network;
a test execution unit that executes the test calculated by the test optimization unit on the target network;
a test result analysis unit that narrows down the state of the target network according to the result of the test by the test execution unit,
a network fault location identification device, the network fault location identification device repeating, one or more times, a process in which the test optimization unit calculates a further test using an analysis result obtained by the test result analysis unit, the test execution unit executes the further calculated test, and the test result analysis unit narrows down the state of the target network according to the result of the further calculated test;
A network fault location identification device is provided, in which the test optimization unit calculates a further test which is an optimal or near-optimal solution to a problem of maximizing mutual information between a random variable representing the result of executing the test and a random variable representing the state of the target network.

開示の技術によれば、確率ルーティング下において限られた回数のパス測定でできるだけ正確に障害箇所を特定することが可能となる。 The disclosed technology makes it possible to pinpoint the location of a fault as accurately as possible using a limited number of path measurements under probabilistic routing.

本発明の実施の形態におけるシステム構成図である。1 is a system configuration diagram according to an embodiment of the present invention. Ａｌｇｏｒｉｔｈｍ１を示す図である。FIG. 1 is a diagram showing Algorithm 1. Ｐｒｏｃｅｄｕｒｅ２を示す図である。FIG. 2 is a diagram showing Procedure 2. Ｐｒｏｃｅｄｕｒｅ３を示す図である。FIG. 3 is a diagram showing Procedure 3. ネットワーク障害箇所特定装置１００の構成例を示す図である。FIG. 1 illustrates an example of the configuration of a network failure point identification device 100. ネットワーク障害箇所特定装置１００の処理手順を示すフローチャートである。4 is a flowchart showing a processing procedure of the network failure point locating device 100. 評価に用いたトポロジーを示す図である。FIG. 1 is a diagram showing the topology used in the evaluation. Ｍｉｓｓｏｕｒｉの評価結果を示す図である。FIG. 1 is a diagram showing evaluation results of Missouri. ＩＯＮの評価結果を示す図である。FIG. 1 shows the evaluation results of ION. Ｎｔｅｌｏｓの評価結果を示す図である。FIG. 13 is a diagram showing the evaluation results of Ntelos. 装置のハードウェア構成例を示す図である。FIG. 2 illustrates an example of a hardware configuration of the apparatus.

以下、図面を参照して本発明の実施の形態（本実施の形態）を説明する。以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 The following describes an embodiment of the present invention (the present embodiment) with reference to the drawings. The embodiment described below is merely an example, and the embodiment to which the present invention is applicable is not limited to the following embodiment.

（システム全体構成、動作概要）
図１に、本実施の形態におけるシステムの全体構成例を示す。図１に示すように、本システムは、ネットワーク障害箇所特定装置１００が、ネットワーク障害箇所を特定する対象となるネットワークである対象ネットワーク２００に接続された構成を備える。ネットワーク障害箇所特定装置１００は、対象ネットワーク２００のルーティングのトポロジー情報や動的に変化しうるルーティング情報や、パス測定装置の制約情報などに基づいて、上記パス測定装置がパス測定に関するテストを実施しその結果を得る工程を１回以上繰り返すことで障害箇所を特定する。 (Overall system configuration and operation overview)
An example of the overall configuration of a system according to this embodiment is shown in Fig. 1. As shown in Fig. 1, this system has a configuration in which a network fault location identification device 100 is connected to a target network 200, which is a network for which a network fault location is to be identified. The network fault location identification device 100 identifies the fault location by repeating one or more times a process in which the path measurement device performs a test related to path measurement and obtains the results, based on routing topology information of the target network 200, routing information that may change dynamically, and constraint information of the path measurement device.

対象ネットワーク２００は、複数のノードと複数のリンクを有し、確率ルーティングによりデータ送受信がなされている。なお、リンクを枝あるいはエッジと呼んでもよい。 The target network 200 has multiple nodes and multiple links, and data is sent and received using probabilistic routing. The links may also be called branches or edges.

本実施の形態では、ネットワーク障害箇所特定装置１００が、確率ルーティング下において限られた回数のパス測定でできるだけ正確に障害箇所を特定する。そのためのネットワーク障害箇所特定装置１００の動作概要は下記のとおりである。 In this embodiment, the network fault location identification device 100 identifies the fault location as accurately as possible with a limited number of path measurements under probabilistic routing. The outline of the operation of the network fault location identification device 100 for this purpose is as follows.

本実施の形態では、ネットワーク障害箇所特定装置１００が、障害箇所を絞り込むにあたってパス測定の有効性を「相互情報量」を用いて表現する。相互情報量は確率ルーティングであっても自然に定義できる。相互情報量が大きい、すなわち障害箇所が最も絞り込めると見込めるパス測定を優先的に実施し、測定データを得る。 In this embodiment, the network fault location identification device 100 expresses the effectiveness of path measurements in narrowing down the fault location using "mutual information." Mutual information can be naturally defined even in probabilistic routing. Path measurements with large mutual information, i.e., path measurements that are expected to most effectively narrow down the fault location, are performed preferentially to obtain measurement data.

ネットワーク障害箇所特定装置１００は、この測定データに従い、障害箇所の可能性を表した確率分布をベイズ推定の枠組みで更新する。これらの手順を繰り返すことにより、段階的に障害箇所を絞り込んでいく。 The network fault location identification device 100 updates the probability distribution representing the possible fault locations in the Bayesian estimation framework according to this measurement data. By repeating these steps, the fault locations are gradually narrowed down.

本実施の形態では、ネットワーク障害箇所特定装置１００は、有効性の高いパス測定を利用すること、ベイズ推定に従い実施すべき測定を逐次的に決定することで、少数の測定で効率的に障害箇所を絞り込み、上記の課題を解決している。 In this embodiment, the network fault location identification device 100 uses highly effective path measurements and sequentially determines the measurements to be performed according to Bayesian estimation, thereby efficiently narrowing down the fault location with a small number of measurements and resolving the above-mentioned problem.

以下、ネットワーク障害箇所特定装置１００が実行する処理内容を詳細に説明する。 The processing performed by the network fault location identification device 100 is explained in detail below.

（問題設定）
まず、本実施の形態における問題の定式化について説明する。ただし、本実施の形態に係る技術は厳格に以下の定義に従わない状況でも適用可能なものである。例えば、以下ではリンク故障の特定に関する定式化を行っているが、ノード故障の特定に関しても同様に本実施の形態に係る技術を適用できる。 (Problem setting)
First, the formulation of the problem in this embodiment will be described. However, the technology according to this embodiment can be applied even in situations that do not strictly follow the following definition. For example, although the following formulation is performed regarding the identification of a link failure, the technology according to this embodiment can be similarly applied to the identification of a node failure.

対象とするネットワークを無向グラフＧ（Ｖ，Ｅ）とする。Ｖ＝｛ｖ_ｉ｝_ｉは頂点集合、Ｅ＝｛ｅ_ｊ｝_ｊは枝集合である。「ネットワーク状態」あるいは単に「状態」をバイナリベクトルｓ＝（ｓ_１，・・・，ｓ_｜Ｅ｜）∈Ｓ⊂｛０，１｝^｜Ｅ｜で表す。ここでｓ_ｊ＝１（ｓ_ｊ＝０）は枝ｅ_ｊが異常（正常）であることを表し、Ｓはあり得る全状態の集合を表す。 The target network is an undirected graph G(V,E). V={v _i } where _i is a set of vertices, and E={e _j } where _j is a set of edges. A "network state" or simply a "state" is represented by a binary vector s=(s ₁ ,...,s _|E| )∈S⊂{0,1} ^|E| , where s _j =1 (s _j =0) indicates that edge e _j is abnormal (normal), and S represents the set of all possible states.

監視パス集合Ａ＝｛ａ１，・・・，ａ｜Ａ｜｝は、バイナリベクトルａ_ｊ＝（ａ_ｊ１，・・・，ａ_ｊ｜Ｅ｜）∈｛０，１｝^｜Ｅ｜で表される「監視パス」の集合である。ａ_ｊｌ＝１であれば、監視パスａ_ｊが枝ｅ_ｌを含むことを表す。監視パスは単に枝の集合と見なせるため、ループありのパスも許容する。 The monitoring path set A = {a1, ..., a|A|} is a set of "monitoring paths" represented by a binary vector _aj = ( _aj1 , ..., _aj|E| ) ∈ {0, 1} ^|E| . If _ajl = 1, it means that the monitoring path _aj includes an edge _e1 . Since the monitoring path can be regarded as simply a set of edges, paths with loops are also allowed.

監視パスａ_ｊのテストが実行されたとき、ａ_ｊ内の枝が一本でも異常であれば結果１を、全ての枝が正常であれば結果０を得る。すなわち結果１とはその監視パスが不通であったことを表し、結果０とはその監視パスが疎通できたことを表す。本実施の形態では、確率ルーティングを想定しているため、ａ_ｊを直接指定することはできず、ソース頂点（Ｓノード）とデスティネーション頂点（Ｄノード）を同じくする複数の監視パスの「グループ」を指定できるだけである。 When a test of a monitoring path _aj is executed, if even one edge in _aj is abnormal, a result of 1 is obtained, and if all edges are normal, a result of 0 is obtained. In other words, a result of 1 indicates that the monitoring path is not connected, and a result of 0 indicates that the monitoring path is connected. In this embodiment, since probabilistic routing is assumed, _aj cannot be specified directly, and only a "group" of multiple monitoring paths that have the same source vertex (S node) and destination vertex (D node) can be specified.

グループｃ_ｉ∈Ｃが指定されると、監視パスａ_ｊのテストが独立に確率ｐ_ｉｊ（ｉ＝１，・・・，｜Ｃ｜，ｊ＝１，・・・，｜Ａ｜）で実行されるとする。ここで For a group c _i ∈ C, it is assumed that the tests of the monitoring path a _j are independently executed with probability p _ij (i=1, ..., |C|, j=1, ..., |A|).

が成り立つ。事前にルーティングの統計を解析しておくことにより、（ｐ_ｉｊ）_ｉ，ｊは既知と見なせる。状態ｓの下でｃ_ｉが実行されると、確率

holds. By analyzing the routing statistics in advance, (p _ij ) _i,j can be considered as known. When c _i is executed under state s, the probability

で結果１を得る。ここで

to get result 1. Here,

は引数が真であれば１を、偽であれば０を返す指示関数である。ｕ_ｉ（ｓ）を行列表示したものをＵ＝（ｕ_ｉ（ｓ））_ｉ，ｓと表す。実際のネットワーク運用では、パス測定装置が同時に複数の方向にプローブパケットを送信したり、統計値を得るために同じ方向に対して複数パケットを一度に送ったりできるように、監視システムが設計されていることもある。

is an indicator function that returns 1 if the argument is true and 0 if it is false. The matrix representation of _{u i} (s) is expressed as U = (u _i (s)) _i,s . In actual network operation, a monitoring system may be designed so that a path measurement device can simultaneously send probe packets in multiple directions, or send multiple packets in the same direction at once to obtain statistics.

このような状況を考慮して、「プローブテスト」あるいは単に「テスト」ξ_ｉ∈Ｘ＝｛ξ_１，・・・，ξ_｜Ｘ｜｝を定める。本実施の形態において、 Taking such a situation into consideration, a "probe test" or simply a "test" ξ _i ∈X={ξ ₁ , ..., ξ _|X| } is defined.

を実行すると、ｃ_ｊがξ_ｉｊ回実行される（ｊ＝１，・・・，｜Ｃ｜）。ここでＺ_＋は非負整数全体の集合を表す。すなわち、ξ_ｉは異なるノードペア間の同時測定や、複数回測定を一つのテストとしてパッケージ化したものである。一度のテストに同数の測定が含まれるように、｜ξ_ｉ｜＝Σ_ｊξ_ｉｊをｉによらない定数とするのが自然である。

When C is executed, c _j is executed ξ _ij times (j = 1, ..., |C|). Here, Z ₊ represents the set of all non-negative integers. That is, ξ _i is a package of simultaneous measurements between different node pairs or multiple measurements as one test. It is natural to set |ξ _i | = Σ _j ξ _ij as a constant independent of i so that one test contains the same number of measurements.

本実施の形態における問題は以下の通りである。 The problems with this embodiment are as follows:

今、状態ｓ∈Ｓが未知とする。またｓの事前分布と実施可能なテストξの回数Ｎが与えられたとする。このとき、Ｎ回のテストの実施でできるだけ高確度で真の状態を特定するためには、どのような戦略でテストを実施する（各テストをどのタイミングでどのくらいの回数実施する）のが効率的だろうか。またテストの実施結果に応じて、どのように真の状態を推定すればよいだろうか。なお、以下の説明では状態ｓは動的に変化しないとしているが、途中で変化する場合であっても本実施の形態に係る技術は適用可能である。 Now, assume that the state s∈S is unknown. Also assume that the prior distribution of s and the number N of tests ξ that can be performed are given. In this case, what is the most efficient strategy for performing the tests (when and how many times each test should be performed) in order to identify the true state as accurately as possible by performing N tests? Also, how should the true state be estimated based on the results of the tests? Note that in the following explanation, it is assumed that the state s does not change dynamically, but the technology related to this embodiment can be applied even if it changes during the process.

（ネットワーク障害箇所特定装置１００による処理内容の詳細）
本実施の形態では「アダプティブ測定」のアプローチをとる。これは、ネットワーク障害箇所特定装置１００が、現状で得られているテストの結果に応じて次に実施するテストを逐次的に決定していくものである。アダプティブ測定は、一度にＮ回分のテストを全て決めてしまう非アダプティブなアプローチに比べて、実装が複雑になるが、最適なテストの決定に際して使える情報が段階的に増えていくため、結果として少数のテスト回数で高精度な状態特定が可能となることが見込める。 (Details of Processing by Network Fault Location Identification Device 100)
In this embodiment, an "adaptive measurement" approach is adopted. In this approach, the network fault location identification device 100 sequentially determines the next test to be performed depending on the currently obtained test results. Adaptive measurement is more complicated to implement than a non-adaptive approach in which all N tests are determined at once, but since the information available for determining the optimal test increases step by step, it is expected that a small number of tests will result in highly accurate state identification.

具体的には以下のようなバッチ処理として定式化する。 Specifically, this is formulated as a batch process as follows:

Ｎ回のテストをサイズＮ_ＢのＢ個のバッチに分ける。すなわちＮ_Ｂ×Ｂ＝Ｎが成り立つ。ｂ回目のバッチ（（ｂ∈［１，Ｂ］∩Ｚ_＋））では、テスト設計 Divide the N tests into B batches of size N _B , i.e., N _B ×B=N. In the bth batch ((b∈[1,B]∩Z ₊ )), the test design

を決定する。これはテストξ_ｉ（ｉ＝１，・・・，｜Ｘ｜）を実施する回数を表したものであり、｜Ｍ_ｂ｜＝Ｎ_Ｂである。Ｍ_ｂを決定した後、それを実行して、結果

This represents the number of times to execute the test ξ _i (i=1, . . . , |X|), where |M _b |=N _B. After determining M _b , execute it and obtain the result

を得る。ここで

where

は、グループｃ_ｉの

is the number of groups c _i

回の実行の中で、結果１を得た回数を表している。ｂ＜Ｂ（つまり最終バッチ以外）のときは、ｙ^Ｍ＿ｂとそれ以前のバッチの結果に基づき、次のテスト設計Ｍ_ｂ＋１を決定していくこととなる。以上より、考えるべき問題は、各バッチにおいて状態を効率的に絞り込む上で、どのようにＭ_ｂを設計するかである。なおバッチサイズとバッチ回数（Ｎ，Ｂ_Ｎ）に関しては、運用の実態（一回のテスト実行に要する時間や許容されるネットワーク負荷など）に応じて決定されるハイパーパラメータである。

It represents the number of times that a result 1 was obtained in the number of executions. When b<B (i.e., other than the final batch), the next test design M _b+1 is determined based on y ^M_b and the results of the previous batch. From the above, the problem to be considered is how to design M _b in order to efficiently narrow down the states in each batch. Note that the batch size and the number of batches (N, B _N ) are hyperparameters that are determined according to the actual operation (the time required to execute one test, the allowable network load, etc.).

Ｍ（Ｍ_ｂの添え字ｂは適宜省略する）を設計するにあたっては、Ｍの「よさ」を定量的に表さなければならない。真の状態を特定するのが目的であるから、状態の確率分布をできるだけ先鋭化する、すなわちエントロピーを下げるのが自然な戦略である。 When designing M (the subscript b in _Mb will be omitted as appropriate), it is necessary to quantitatively express the "goodness" of M. Since the purpose is to identify the true state, a natural strategy is to sharpen the probability distribution of the state as much as possible, i.e., to reduce the entropy.

そこで本実施の形態では、Ｍの実行により得られるｙ^Ｍがもたらす有効性の指標として、Ｙ^ＭとＳの間の相互情報量Ｉ（Ｓ；Ｙ^Ｍ）を用いることにする（ｙ^Ｍやｓを確率変数と見なす際は、Ｙ^Ｍ、Ｓのように大文字を用いる）。Ｉ（Ｓ；Ｙ^Ｍ）は、状態分布のエントロピーがＹ^Ｍを観測することで平均的にどのくらい減少するかを表す。これにより本実施の形態で考える問題を以下のように記述することができる。 Therefore, in this embodiment, the mutual information I(S; ^YM ) between ^YM and S is used as an index of the effectiveness of yM obtained by executing ^M (when ^yM or s is regarded as a random variable, capital letters are used, such as ^YM and ^S ). I(S; YM) represents the average reduction in the entropy of the state distribution by observing ^YM . This allows the problem considered in this embodiment to be written as follows:

［問題］：事前分布Ｐｒ（ｓ）と（ｂ－１）回目のバッチまでに得られる測定結果 [Problem]: Prior distribution Pr(s) and measurement results obtained up to the (b-1)th batch

が与えられたとき、

Given

を求めよ。

Seek.

ここでＩ（Ｓ；Ｙ^Ｍ｜Ｄ_ｂ－１）はＤ_ｂ－１が与えられたときのＹ^ＭとＳの間の相互情報量であり、下記の数式１のように与えられる。 Here, I(S; Y ^M |D _b-1 ) is the mutual information between Y ^M and S when D _b-1 is given, and is given by the following Equation 1.

また数式１右辺の各量は、下記の数式２、３で与えられる。

The quantities on the right side of Equation 1 are given by

Equations

2 and 3 below.

上記の問題は、組合せ最適化問題であり、ＮＰ困難であることが示せる。従って、ネットワーク障害箇所特定装置１００は、一例として以下に示す貪欲法に基づくアルゴリズムによって近似解を求める。

The above problem is a combinatorial optimization problem, and can be shown to be NP-hard. Therefore, the network fault location identification device 100 obtains an approximate solution by, for example, an algorithm based on the greedy method shown below.

ただし、本実施の形態に係る技術は、貪欲法に閉じるものではなく、他の近似最適化手法や最適化手法を用いることも可能である。またここではルーティング確率が動的に変化しない場合を述べているが、それが変化する場合であってもその情報が取得できるのならば、本実施の形態に係る技術を適用可能である。 However, the technology according to this embodiment is not limited to the greedy method, and other approximate optimization methods or optimization methods can also be used. Also, although the case where the routing probability does not change dynamically is described here, the technology according to this embodiment can be applied even if it changes, as long as the information can be obtained.

ネットワーク障害箇所特定装置１００により実行される処理の手順（アルゴリズム）を図２のＡｌｇｏｒｉｔｈｍ１に示す。まず２行目であり得る状態を削減するＳｔａｔｅＳｐａｃｅＲｅｄｕｃｔｉｏｎの処理があるが、これに関しては後述する。 The procedure (algorithm) of the process executed by the network fault location identification device 100 is shown in Algorithm 1 in Figure 2. First, in the second line, there is a StateSpaceReduction process that reduces the number of possible states, which will be described later.

５行目から始まるバッチ処理では、６～１０行目でＭを貪欲法に基づき作成する。ｗｈｉｌｅループ内ではＩ（Ｓ；Ｙ^Ｍ｜Ｄ_ｂ－１）の増分が最も大きいようなξ_{ｉ＿ｍａｘ}を逐次的に選択し、Ｍの第ｉ_ｍａｘ成分をインクリメントしている。Ｍを作成した後、それを実行して、ｙ^Ｍを取得し、１１～１３行目で事後分布Ｐｒ（ｓ｜Ｄ_ｂ）の更新を行う。 In the batch processing starting from line 5, M is created based on the greedy algorithm in lines 6 to 10. In the while loop, ξ _{i_max} that gives the largest increment in I(S; Y ^M |D _b-1 ) is selected sequentially, and the i _max component of M is incremented. After M is created, it is executed to obtain y ^M , and the posterior distribution Pr(s|D _b ) is updated in lines 11 to 13.

続いて８行目の相互情報量の計算方法について説明する。相互情報量は図３に示したＰｒｏｃｅｄｕｒｅ２に基づく。基本的には数式１のＩ（Ｓ；Ｙ^Ｍ｜Ｄ_ｂ－１）の数式に基づくが、数式３内のｙ^Ｍに関する和は指数個の項の和となるため、厳密な実行が難しい。そこで以下のようにモンテカルロサンプリングを行う。 Next, the method of calculating the mutual information on line 8 will be described. The mutual information is based on Procedure 2 shown in Fig. 3. Basically, it is based on the formula I(S; Y ^M |D _b-1 ) in Equation 1, but since the sum related to y ^M in Equation 3 is the sum of an exponential number of terms, it is difficult to execute strictly. Therefore, Monte Carlo sampling is performed as follows.

まず、ｙ^ＭのＮ_ｙ個のサンプリングを二項分布に従い４行目で取得する。各サンプルに対して、６～７行目で、後述の方法で事後分布を計算し、そのエントロピーを計算する。そして８行目でそれらの平均をとる。Ｐｒｏｃｅｄｕｒｅ２では、すべての可能な状態ｓが列挙可能であることを仮定しているが、｜Ｓ｜が大きい場合には、ｓに関する和もサンプル平均に置き換えてもよい。 First, N _y samples of y ^M are obtained according to the binomial distribution in line 4. For each sample, the posterior distribution is calculated using the method described below in lines 6 and 7, and its entropy is calculated. Then, the average is taken in line 8. In Procedure 2, it is assumed that all possible states s can be enumerated, but if |S| is large, the sum over s may also be replaced with the sample average.

次にＡｌｇｏｒｉｔｈｍ１の１３行目、Ｐｒｏｃｅｄｕｒｅ２の２、６、８行目に現れる事後分布Ｐｒ（ｓ｜Ｄ_ｂ）（Ｐｒ（ｓ｜Ｄ_ｂ－１）なども同様）に関する計算について説明する。Ｐｒ（ｓ｜Ｄ_ｂ）は Next, we will explain the calculation of the posterior distribution Pr(s|D _b ) (Pr(s|D _b-1 ) and so on) that appears on line 13 of Algorithm 1 and lines 2, 6, and 8 of Procedure 2. Pr(s|D _b ) is

と記述することができる。ここでベイズの定理と、ｓが与えられた下でｙ^Ｍ＿ｂとＤ_ｂ－１が条件付き独立であることを用いた。Ｐｒ（ｓ｜Ｄ_ｂ－１）は既知であり、Ｐｒ（ｙ^Ｍ＿ｂ）｜ｓ）は下記の数式４により計算することができる。

Here, we use Bayes' theorem and the fact that y ^M_b and D _b-1 are conditionally independent under the condition s. Pr(s|D _b-1 ) is known, and Pr(y ^M_b )|s) can be calculated by the following Equation 4.

実装する際は、ｌｏｇｕ_ｉ（ｓ）、ｌｏｇ（１－ｕ_ｉ（ｓ））やコンビネーションのｌｏｇの値をメモ化して、

When implementing, memoize the values of log _{u i} (s), log(1-u _i (s)), and the log of the combination,

の代わりに

instead of

を計算するとよい。

It is a good idea to calculate

さて、Ａｌｇｏｒｉｔｈｍ１は貪欲法に基づく近似アルゴリズムであるが、以下のように相互情報量最大化の意味で定数の近似度を持っていることが示せる。すなわち、最悪ケースであっても、相互情報量が一定値以上であることが保証されている。 Now, Algorithm 1 is an approximation algorithm based on the greedy method, but it can be shown that it has a constant approximation in the sense of maximizing mutual information as follows. In other words, even in the worst case, it is guaranteed that the mutual information will be equal to or greater than a certain value.

［定理］：Ｎ_ｙが十分大きいときＡｌｇｏｒｉｔｈｍ１で得られるＭ_ｂを^～Ｍ_ｂとし、最適なＭ_ｂを [Theorem]: When _Ny is sufficiently large, let _Mb obtained by Algorithm 1 ^{be 〜} _Mb , and let the optimal _Mb be

とする。このとき、

In this case,

が成り立つ。

holds true.

（略証）
ルーティングの確率が互いに独立であることを用いると、相互情報量Ｉ（Ｓ；Ｙ^Ｍ｜Ｄ_ｂ－１）は (abbreviated proof)
By using the fact that the routing probabilities are mutually independent, the mutual information I(S; Y ^M |D _b-1 ) is

の関数として単調ＤＲ劣モジュラ［非特許文献４］という性質を持つことが示せる。一般に単調ＤＲ劣モジュラ関数の最大化問題は貪欲法により（１－１／ｅ）－近似を達成できることが知られている［非特許文献５］。（証終）
最後に、Ａｌｇｏｒｉｔｈｍ１の２行目にあったＳｔａｔｅＳｐａｃｅＲｅｄｕｃｔｉｏｎの処理について説明を行う。この処理は、あらかじめあり得ない状態をＳから取り除いておくことで、Ａｌｇｏｒｉｔｈｍ１の実行時間を短縮するためのものである。一般に行列Ｕには０や１を値に持つ成分が多く含まれている。

It can be shown that it has the property of being monotone DR submodular as a function of [Non-Patent Document 4]. It is generally known that the maximization problem of a monotone DR submodular function can achieve (1-1/e)-approximation by a greedy method [Non-Patent Document 5]. (End of Proof)
Finally, we will explain the processing of StateSpaceReduction in the second line of Algorithm 1. This processing is intended to reduce the execution time of Algorithm 1 by removing impossible states from S in advance. In general, a matrix U contains many components with values of 0 and 1.

実際、一つのグループｃは、ネットワーク全体のうちごく一部の枝しか含まないのが通常であるので０の成分が多く、また同一グループ内の監視パスは一部で共通の枝を経由していることも多いから１の成分が多くなる。加えて、次のような自明な命題も成り立つ：「ｕ_ｉ（ｓ）＝１（０）とする。このとき、もしグループｃ_ｉが実行されて結果が１（０）であったならば、真の状態はｓではない」。以上を踏まえて、次の定義を導入する。 In fact, a group c usually contains only a small portion of the edges of the entire network, so there are many 0s, and monitoring paths within the same group often pass through some common edges, so there are many 1s. In addition, the following trivial proposition also holds: "Let u _i (s) = 1 (0). In this case, if group c _i is executed and the result is 1 (0), then the true state is not s." Based on the above, the following definition is introduced.

［定義］（除去可能な状態）：グループｃ_ｉを実行し結果１（０）が得られたとき、ｕ_ｉ（ｓ）＝０（１）を満たす状態ｓは除去可能である。 [Definition] (Removable state): When group c _i is executed and the result 1 (0) is obtained, state s that satisfies u _i (s)=0 (1) is removable.

明らかに除去可能な状態は状態空間Ｓから除外してよい。また次の補題が成り立つ。 A state that is clearly removable may be removed from the state space S. The following lemma also holds.

［補題］：テストξを実行したとき除去可能な状態の個数の期待値Ｒ（Ｕ，ξ）は下記の数式５で与えられる。 [Lemma]: The expected number of removable states when performing test ξ, R(U, ξ), is given by the following formula 5.

ここでδ_ｄ（ｘ）＝１（ｘ＝ｄ）、δ_ｄ（ｘ）＝０（その他）であり（ｄ＝０，１）、０^０＝１である。特にξ＝ｅ_ｌ（第ｌ成分のみが１であるような単位ベクトル）のときは下記の数式６となる。

Here, δ _d (x) = 1 (x = d), δ _d (x) = 0 (otherwise) (d = 0, 1), and 0 ⁰ = 1. In particular, when ξ = e _l (a unit vector in which only the l-th component is 1), the following Equation 6 is obtained.

ここで、

Where:

である。

It is.

（略証）
期待値の線形性より (abbreviated proof)
From the linearity of expectation

となる。各グループの実行がベルヌーイ試行であることを踏まえると、

Considering that each group's execution is a Bernoulli trial,

と記述できるので、これの余事象をとってｉについて和を取ればよい。数式６はξ＝ｅ_ｌを代入して計算し、不等式は

Since we can write it as follows, we can take the complementary event of this and sum it with respect to i. Equation 6 is calculated by substituting ξ＝e _l , and the inequality becomes

から得られる。（証終）
上の補題を基に、ＳｔａｔｅＳｐａｃｅＲｅｄｕｃｔｉｏｎの処理をまとめたものが、図４のＰｒｏｃｅｄｕｒｅ３である。各グループｃ_ｌに対して数式６を計算し、それを最大化するｃ_{ｌ＿ｍａｘ}を得る。次にｃ_{ｌ＿ｍａｘ}を含むようなテストξを任意に選んで実行する。実行結果に応じて除去可能な状態をＳから除外し、行列Ｕも小さくする。以上の工程をＮ_ｉｔｅｒ回繰り返す。

It is obtained from (end of proof).
Based on the above lemma, the processing of StateSpaceReduction is summarized in Procedure 3 in Figure 4. Calculate Equation 6 for each group c _l , and obtain c _{l _max} that maximizes it. Next, arbitrarily select and execute a test ξ that includes c _{l _max} . Depending on the execution result, remove removable states from S, and also reduce the matrix U. The above process is repeated N _iter times.

なお、数式６の代わりに数式５を最大化するξを任意に選んで実行するようにしてもよい。数式５や数式６は相互情報量の計算よりも軽量なので、結果的にＡｌｇｏｒｉｔｈｍ１の総計算時間が小さくなる。 In addition, instead of formula 6, ξ that maximizes formula 5 may be arbitrarily selected and executed. Formulas 5 and 6 are lighter than the calculation of mutual information, so the total calculation time of Algorithm 1 is reduced as a result.

（実施例）
上記の処理の実施例として、ネットワーク障害箇所特定装置１００の構成例と、その構成を用いた処理手順例を説明する。 (Example)
As an embodiment of the above process, a configuration example of the network fault location identification device 100 and a processing procedure example using this configuration will be described.

図５に、ネットワーク障害箇所特定装置１００の構成例を示す。図５に示すように、ネットワーク障害箇所特定装置１００は、入力用ＵＩ１１０、状態数削減部１２０、テスト実行部１３０、相互情報量最大化部１４０、事後分布計算部１５０、出力用ＵＩ１６０を有する。状態数削減部１２０、テスト実行部１３０、事後分布計算部１５０は、図示のとおりに対象ネットワーク２００と接続している。なお、「状態数削減部１２０＋相互情報量最大化部１４０」をテスト最適化部と呼んでもよい。また、事後分布計算部１５０をテスト結果分析部と呼んでもよい。 Figure 5 shows an example of the configuration of the network fault location identification device 100. As shown in Figure 5, the network fault location identification device 100 has an input UI 110, a state number reduction unit 120, a test execution unit 130, a mutual information maximization unit 140, a posterior distribution calculation unit 150, and an output UI 160. The state number reduction unit 120, the test execution unit 130, and the posterior distribution calculation unit 150 are connected to the target network 200 as shown in the figure. Note that the "state number reduction unit 120 + mutual information maximization unit 140" may be called a test optimization unit. Also, the posterior distribution calculation unit 150 may be called a test result analysis unit.

上記の構成を備えるネットワーク障害箇所特定装置１００の処理手順を図６のフローチャートを参照して説明する。 The processing procedure of the network fault location identification device 100 having the above configuration will be explained with reference to the flowchart in Figure 6.

Ｓ１０１において、まず入力用ＵＩ１１０にアルゴリズム実行に必要なデータやパラメータ（事前分布Ｐｒ（ｓ）、Ｕ、Ｎ_Ｂ、Ｂなど）を入力する。Ｓ１０２において、これらを基に状態数削減部１２０がＰｒｏｃｅｄｕｒｅ３に従いξを決定し、それをテスト実行部１３０に渡す。 In S101, data and parameters (prior distribution Pr(s), U, N _B , B, etc.) required for executing the algorithm are first input to the input UI 110. In S102, the state number reduction unit 120 determines ξ based on this in accordance with Procedure 3 and passes it to the test execution unit 130.

Ｓ１０３において、テスト実行部１３０は、ｐｉｎｇを始めとする疎通性確認プログラムなどを用いて、テストを対象ネットワーク２００で実行する。再びＳ１０２において、状態数削減部１２０が、得られた結果を基に、Ｐｒｏｃｅｄｕｒｅ３に従って、次のξを決定する。 In S103, the test execution unit 130 executes a test on the target network 200 using a connectivity check program such as ping. In S102 again, the state number reduction unit 120 determines the next ξ based on the obtained results in accordance with Procedure 3.

Ｓ１０２～Ｓ１０３を定められた回数実施したら、次に、Ｓ１０４において、相互情報量最大化部１４０が、Ａｌｇｏｒｉｔｈｍ１の貪欲法に従ってＭを作成する。これをテスト実行部１３０に渡して、Ｓ１０５において対象ネットワーク２００でテストを実行する。 After steps S102 and S103 have been performed a set number of times, in step S104, the mutual information maximization unit 140 creates M according to the greedy method of Algorithm 1. This is passed to the test execution unit 130, which executes a test on the target network 200 in step S105.

その結果は事後分布計算部１５０に渡され、Ｓ１０６において、ベイズ推定の枠組みに従って事後分布を計算する。得られた事後分布は、相互情報量最大化部１４０に渡され、Ａｌｇｏｒｉｔｈｍ１のループに従い、以上の工程（Ｓ１０４～Ｓ１０６）を繰り返す。決められた回数実施したら、最終的な事後分布から、最尤値として推定状態を出力用ＵＩ１６０に出力する。Ｓ１０７において、出力用ＵＩが推定状態を出力する。 The result is passed to the posterior distribution calculation unit 150, and in S106, the posterior distribution is calculated according to the Bayesian estimation framework. The obtained posterior distribution is passed to the mutual information maximization unit 140, and the above steps (S104 to S106) are repeated according to the loop of Algorithm 1. After a predetermined number of executions, the estimated state is output to the output UI 160 as the maximum likelihood value from the final posterior distribution. In S107, the output UI outputs the estimated state.

（効果について）
以上説明した本実施の形態に係る技術により、確率ルーティング下において限られた回数のパス測定でできるだけ正確に障害箇所を特定することが可能となる。当該技術では、障害特定に有効なパス測定を優先的に行うため、障害特定までに要するパス測定が少数に抑えられ、障害特定の短期化、ネットワーク負荷の削減が期待できる。 (About the effects)
The technology according to the present embodiment described above makes it possible to pinpoint a fault location as accurately as possible with a limited number of path measurements under probabilistic routing. This technology prioritizes path measurements that are effective for fault identification, so the number of path measurements required to identify a fault is reduced, and it is expected to shorten the time required to identify a fault and reduce the network load.

図７の３つのネットワークデータ［非特許文献６］を用いて評価を行った。図７における＃ｆａｉｌｕｒｅｓは同時に故障する枝数を表し、｜Ｓ_ｋ｜はあり得る状態の総数である。各ネットワークに対し以下のような設定を考える。 The evaluation was performed using the three network data [Non-Patent Document 6] in Fig. 7. In Fig. 7, #failures represents the number of simultaneously failed edges, and |S _k | is the total number of possible states. The following settings are considered for each network.

グループ数は｜Ｃ｜＝３｜Ｖ｜で、各ノードに対して、それをＳノードとしたとき、ランダムに選んだ３つのノードをＤノードとして、ノードペア（グループ）を決めた。各ノードペアに対して、最短パス、二番目に短いパス、三番目に短いパスを監視パスとみなした（｜Ａ｜＝９｜Ｖ｜）。各グループに対してｉ番目に短いパスは確率 The number of groups is |C| = 3|V|, and for each node, when it is the S node, three randomly selected nodes are the D nodes, and node pairs (groups) are determined. For each node pair, the shortest path, the second shortest path, and the third shortest path are considered as monitoring paths (|A| = 9|V|). The i-th shortest path for each group has the probability

で選択されるようにした（ｌ_ｉはパスの長さ）。Ｓノードを同じくする３つのグループを一度ずつ実行するものを一つのプローブテストξとみなし、全部で｜Ｘ｜＝｜Ｖ｜パターンのテストを考えた。初期状態分布Ｐｒ（ｓ）は一様とし、Ｎ_ｙ＝３０とした。また、Ｐｒｏｃｅｄｕｒｅ３による状態削減はＮ_ｉｔｅｒ＝１０として実施した。

(l _i is the path length). Three groups with the same S node are executed once each, which is regarded as one probe test ξ, and a total of |X| = |V| patterns were considered for testing. The initial state distribution Pr(s) was uniform, and N _y = 30. In addition, state reduction by Procedure 3 was performed with N _iter = 10.

本実施の形態に係る技術（ＰＭと記す）の他に、比較として、Ｒａｎｄｏｍ、ＬＳ［非特許文献２］、ＬＡＳＳＯ［非特許文献３］を実施した。Ｒａｎｄｏｍは本実施の形態の手法において、相互情報量を用いずに、ランダムにξを選択したものである。 In addition to the technique according to the present embodiment (denoted as PM), we also carried out Random, LS [Non-Patent Document 2], and LASSO [Non-Patent Document 3] for comparison. Random is the technique according to the present embodiment in which ξ is selected randomly without using mutual information.

ＬＳとＬＡＳＳＯは確率ルーティングにおけるネットワークトモグラフィーとして提案された非アダプティブなアプローチによる既存手法である。ＬＡＳＳＯに関してはハイパーパラメータλを０．０００１、０．００１、０．０１と変えた場合に実施した。 LS and LASSO are existing methods that use non-adaptive approaches proposed as network tomography in probabilistic routing. For LASSO, we performed the experiment by changing the hyperparameter λ to 0.0001, 0.001, and 0.01.

評価指標としては正答率を用いた。故障枝が２本ある状態に対しては、２つとも特定して初めて正解とした。Ｍｉｓｓｏｕｒｉについては真の状態全パターンに対して、ＩＯＮとＮｔｅｌｏｓについては、真の状態５０パターン（ランダムに選択）に対して、実験を行い、正答率を算出した。 The accuracy rate was used as the evaluation index. For a state with two faulty branches, the answer was considered correct only when both were identified. For Missouri, experiments were performed for all true state patterns, and for ION and Ntelos, experiments were performed for 50 true state patterns (selected at random), and the accuracy rate was calculated.

実験の結果を図８～図１０に示す。各図において横軸が総テスト数Ｎ、縦軸が正答率である。本実施の形態に係る方法ＰＭが既存手法ＬＳ、ＬＡＳＳＯよりも高い性能を示していることがわかる。例えば正答率０．９６を超すのに、最善の既存手法では１６０（Ｍｉｓｓｏｕｒｉ）、３２０（ＩＯＮ）、６４０（Ｎｔｅｌｏｓ）のテスト数を要しているのに対し、ＰＭではそれぞれ２２、３４、３４のテスト数で十分である。また、ＰＭはＲａｎｄｏｍの結果も上回っており、相互情報量を用いることの有用性がわかる。 The experimental results are shown in Figures 8 to 10. In each figure, the horizontal axis is the total number of tests N, and the vertical axis is the accuracy rate. It can be seen that the method PM according to this embodiment shows higher performance than the existing methods LS and LASSO. For example, to exceed a accuracy rate of 0.96, the best existing methods require 160 (Missouri), 320 (ION), and 640 (Ntelos) tests, whereas PM requires only 22, 34, and 34 tests, respectively. PM also exceeds the results of Random, demonstrating the usefulness of using mutual information.

（ハードウェア構成例）
ネットワーク障害箇所特定装置１００は、例えば、コンピュータにプログラムを実行させることにより実現できる。このコンピュータは、物理的なコンピュータであってもよいし、クラウド上の仮想マシンであってもよい。 (Hardware configuration example)
The network failure point identification device 100 can be realized, for example, by causing a computer to execute a program. This computer may be a physical computer or a virtual machine on the cloud.

すなわち、ネットワーク障害箇所特定装置１００は、コンピュータに内蔵されるＣＰＵやメモリ等のハードウェア資源を用いて、ネットワーク障害箇所特定装置１００で実施される処理に対応するプログラムを実行することによって実現することが可能である。上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メール等、ネットワークを通して提供することも可能である。 That is, the network fault location identification device 100 can be realized by executing a program corresponding to the processing performed by the network fault location identification device 100 using hardware resources such as a CPU and memory built into a computer. The above program can be recorded on a computer-readable recording medium (such as a portable memory) and stored or distributed. The above program can also be provided via a network such as the Internet or email.

図１１は、上記コンピュータのハードウェア構成例を示す図である。図１１のコンピュータは、それぞれバスＢＳで相互に接続されているドライブ装置１０００、補助記憶装置１００２、メモリ装置１００３、ＣＰＵ１００４、インタフェース装置１００５、表示装置１００６、入力装置１００７、出力装置１００８等を有する。 Figure 11 is a diagram showing an example of the hardware configuration of the computer. The computer in Figure 11 has a drive device 1000, an auxiliary storage device 1002, a memory device 1003, a CPU 1004, an interface device 1005, a display device 1006, an input device 1007, an output device 1008, etc., all of which are interconnected by a bus BS.

当該コンピュータでの処理を実現するプログラムは、例えば、ＣＤ－ＲＯＭ又はメモリカード等の記録媒体１００１によって提供される。プログラムを記憶した記録媒体１００１がドライブ装置１０００にセットされると、プログラムが記録媒体１００１からドライブ装置１０００を介して補助記憶装置１００２にインストールされる。但し、プログラムのインストールは必ずしも記録媒体１００１より行う必要はなく、ネットワークを介して他のコンピュータよりダウンロードするようにしてもよい。補助記憶装置１００２は、インストールされたプログラムを格納すると共に、必要なファイルやデータ等を格納する。 The program that realizes the processing on the computer is provided by a recording medium 1001, such as a CD-ROM or a memory card. When the recording medium 1001 storing the program is set in the drive device 1000, the program is installed from the recording medium 1001 via the drive device 1000 into the auxiliary storage device 1002. However, the program does not necessarily have to be installed from the recording medium 1001, but may be downloaded from another computer via a network. The auxiliary storage device 1002 stores the installed program as well as necessary files, data, etc.

メモリ装置１００３は、プログラムの起動指示があった場合に、補助記憶装置１００２からプログラムを読み出して格納する。ＣＰＵ１００４は、メモリ装置１００３に格納されたプログラムに従って、ライトタッチ維持装置１００に係る機能を実現する。インタフェース装置１００５は、ネットワーク等に接続するためのインタフェースとして用いられる。表示装置１００６はプログラムによるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）等を表示する。入力装置１００７はキーボード及びマウス、ボタン、又はタッチパネル等で構成され、様々な操作指示を入力させるために用いられる。出力装置１００８は演算結果を出力する。 When an instruction to start a program is received, the memory device 1003 reads out and stores the program from the auxiliary storage device 1002. The CPU 1004 realizes functions related to the light touch maintenance device 100 in accordance with the program stored in the memory device 1003. The interface device 1005 is used as an interface for connecting to a network, etc. The display device 1006 displays a GUI (Graphical User Interface) or the like according to a program. The input device 1007 is composed of a keyboard and mouse, buttons, a touch panel, etc., and is used to input various operational instructions. The output device 1008 outputs the results of calculations.

（付記）
本明細書には、少なくとも下記各項のネットワーク障害箇所特定装置、ネットワーク障害箇所特定方法、及びプログラムが開示されている。
（第１項）
対象ネットワークの障害箇所を特定するためのネットワーク障害箇所特定装置であって、
前記対象ネットワークに対して実行すべき最適なパス測定のテストを算出するテスト最適化部と、
前記テスト最適化部により算出されたテストを前記対象ネットワークに対して実行するテスト実行部と、
前記テスト実行部によるテストの結果に応じてネットワークの状態を絞り込むテスト結果分析部と、を備え、
前記テスト最適化部が、前記テスト結果分析部により得られた分析結果を用いてテストを算出し、前記テスト実行部がテストを実行し、前記テスト結果分析部が、テストの結果に応じてネットワークの状態を絞り込む処理を、１回以上繰り返す
ネットワーク障害箇所特定装置。
（第２項）
前記対象ネットワークにおけるルーティング情報は、発着ノードに対してそれらを結ぶパスが一意に定まらず、確率的に決定され、テストにおいてどのパスが選択されたかは観測できず、その確率分布のみ利用可能である
第１項に記載のネットワーク障害箇所特定装置。
（第３項）
前記テスト最適化部は、テストの実行結果を表す確率変数と前記対象ネットワークの状態を表す確率変数の間の相互情報量の最大化問題の最適解あるいは近似最適解であるテストを選出する
第１項又は第２項に記載のネットワーク障害箇所特定装置。
（第４項）
前記テスト最適化部は、テストの実行結果に基づいて、候補として除外できるネットワーク状態の個数の期待値が大きくなるようなテストを選出する
第１項ないし第３項のうちいずれか１項に記載のネットワーク障害箇所特定装置。
（第５項）
前記テスト結果分析部は、ネットワーク状態を表す確率分布を、テストの実行結果に基づいて、ベイズ推定の枠組みに従って更新する
第１項ないし第３項のうちいずれか１項に記載のネットワーク障害箇所特定装置。
（第６項）
対象ネットワークの障害箇所を特定するためのネットワーク障害箇所特定装置が実行するネットワーク障害箇所特定方法であって、
前記対象ネットワークに対して実行すべき最適なパス測定のテストを算出するテスト最適化ステップと、
前記テスト最適化ステップにより算出されたテストを前記対象ネットワークに対して実行するテスト実行ステップと、
前記テスト実行ステップによるテストの結果に応じてネットワークの状態を絞り込むテスト結果分析ステップと、を備え、
前記テスト結果分析ステップにより得られた分析結果を用いて前記テスト最適化ステップによりテストを算出し、前記テスト実行ステップによりテストを実行し、前記テスト結果分析ステップによりテストの結果に応じてネットワークの状態を絞り込む処理を、１回以上繰り返す
ネットワーク障害箇所特定方法。
（第７項）
コンピュータを、第１項ないし第５項のうちいずれか１項に記載のネットワーク障害箇所特定装置における各部として機能させるためのプログラム。 (Additional Note)
This specification discloses at least the network fault location locating device, the network fault location locating method, and the program described in the following sections.
(Section 1)
A network fault location identification device for identifying a fault location in a target network, comprising:
a test optimizer for computing optimal path measurement tests to perform on the target network;
a test execution unit that executes the test calculated by the test optimization unit on the target network;
a test result analysis unit that narrows down the network state according to the result of the test performed by the test execution unit;
The test optimization unit calculates a test using the analysis result obtained by the test result analysis unit, the test execution unit executes the test, and the test result analysis unit repeats the process of narrowing down the network state according to the test result, one or more times.
(Section 2)
A network fault location identification device as described in claim 1, in which the routing information in the target network does not uniquely determine the path connecting the source and destination nodes, but is determined probabilistically, and it is not possible to observe which path was selected in the test, and only the probability distribution is available.
(Section 3)
The network fault location identification device described in claim 1 or 2, wherein the test optimization unit selects a test that is an optimal solution or a near-optimal solution to a problem of maximizing mutual information between a random variable representing a test execution result and a random variable representing a state of the target network.
(Section 4)
4. The network fault location identification device according to claim 1, wherein the test optimization unit selects a test that increases an expected value of a number of network states that can be excluded as candidates based on a result of the test execution.
(Section 5)
4. The network fault location identification device according to claim 1, wherein the test result analysis unit updates a probability distribution representing a network state based on a result of execution of the test in accordance with a framework of Bayesian estimation.
(Section 6)
A network fault location identification method executed by a network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization step for computing optimal path measurement tests to be performed on the target network;
a test execution step of executing the test calculated by the test optimization step on the target network;
a test result analysis step of narrowing down the network state according to the result of the test performed by the test execution step;
A network fault location identification method, comprising: calculating a test in the test optimization step using the analysis results obtained in the test result analysis step; executing the test in the test execution step; and narrowing down the network state according to the test results in the test result analysis step, the above steps being repeated one or more times.
(Section 7)
A program for causing a computer to function as each unit in the network fault location locating device according to any one of claims 1 to 5.

以上、本実施の形態について説明したが、本発明はかかる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the present embodiment has been described above, the present invention is not limited to this specific embodiment, and various modifications and variations are possible within the scope of the gist of the present invention described in the claims.

１００ネットワーク障害箇所特定装置
１１０入力用ＵＩ
１２０状態数削減部
１３０テスト実行部
１４０相互情報量最大化部
１５０事後分布計算部
１６０出力用ＵＩ
１０００ドライブ装置
１００１記録媒体
１００２補助記憶装置
１００３メモリ装置
１００４ＣＰＵ
１００５インタフェース装置
１００６表示装置
１００７入力装置
１００８出力装置 100 Network fault location identification device 110 Input UI
120 State number reduction unit 130 Test execution unit 140 Mutual information maximization unit 150 Posterior distribution calculation unit 160 Output UI
1000 Drive device 1001 Recording medium 1002 Auxiliary storage device 1003 Memory device 1004 CPU
1005 Interface device 1006 Display device 1007 Input device 1008 Output device

Claims

A network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization unit for calculating path measurement tests to be performed on the target network;
a test execution unit that executes the test calculated by the test optimization unit on the target network;
a test result analysis unit that narrows down the state of the target network according to the result of the test by the test execution unit,
a network fault location identification device, the network fault location identification device repeating, one or more times, a process in which the test optimization unit calculates a further test using an analysis result obtained by the test result analysis unit, the test execution unit executes the further calculated test, and the test result analysis unit narrows down the state of the target network according to the result of the further calculated test;
The test optimization unit calculates a further test that is an optimal solution or a near-optimal solution of a problem of maximizing mutual information between a random variable representing the result of executing the test and a random variable representing the state of the target network.

A network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization unit for calculating path measurement tests to be performed on the target network;
a test execution unit that executes the test calculated by the test optimization unit on the target network;
a test result analysis unit that narrows down the state of the target network according to the result of the test by the test execution unit,
a network fault location identification device, the network fault location identification device repeating, one or more times, a process in which the test optimization unit calculates a further test using an analysis result obtained by the test result analysis unit, the test execution unit executes the further calculated test, and the test result analysis unit narrows down the state of the target network according to the result of the further calculated test;
The test optimization unit calculates further tests that will increase an expected value of the number of states of the target network that can be excluded as candidates based on results of the tests.

A network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization unit for calculating path measurement tests to be performed on the target network;
a test execution unit that executes the test calculated by the test optimization unit on the target network;
a test result analysis unit that narrows down the state of the target network according to the result of the test by the test execution unit,
a network fault location identification device, the network fault location identification device repeating, one or more times, a process in which the test optimization unit calculates a further test using an analysis result obtained by the test result analysis unit, the test execution unit executes the further calculated test, and the test result analysis unit narrows down the state of the target network according to the result of the further calculated test;
The test result analysis unit updates a probability distribution representing the state of the target network based on the execution results of the test in accordance with a Bayesian estimation framework.

A network fault location identification method executed by a network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization step for computing path measurement tests to be performed on the target network;
a test execution step of executing the test calculated by the test optimization step on the target network;
a test result analysis step of narrowing down the state of the target network according to the result of the test performed by the test execution step;
a test optimization step using an analysis result obtained by the test result analysis step, a test execution step executing the further calculated test, and a process of narrowing down the state of the target network according to the result of the further calculated test by the test result analysis step, the process being repeated one or more times;
A network fault location method, in which in the test optimization step, a further test is calculated which is an optimal solution or a near-optimal solution of a problem of maximizing mutual information between a random variable representing the result of executing the test and a random variable representing the state of the target network.

A network fault location identification method executed by a network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization step for computing path measurement tests to be performed on the target network;
a test execution step of executing the test calculated by the test optimization step on the target network;
a test result analysis step of narrowing down the state of the target network according to the result of the test performed by the test execution step;
a test optimization step using an analysis result obtained by the test result analysis step, a test execution step executing the further calculated test, and a process of narrowing down the state of the target network according to the result of the further calculated test by the test result analysis step, the process being repeated one or more times;
A network fault location method, comprising: calculating, in the test optimization step, a further test that increases an expected value of the number of states of the target network that can be excluded as candidates, based on a result of executing the test.

A network fault location identification method executed by a network fault location identification device for identifying a fault location in a target network, comprising:
a test optimization step for computing path measurement tests to be performed on the target network;
a test execution step of executing the test calculated by the test optimization step on the target network;
a test result analysis step of narrowing down the state of the target network according to the result of the test performed by the test execution step;
a test optimization step using an analysis result obtained by the test result analysis step, a test execution step executing the further calculated test, and a process of narrowing down the state of the target network according to the result of the further calculated test by the test result analysis step, the process being repeated one or more times;
In the test result analysis step, a probability distribution representing the state of the target network is updated based on the execution results of the tests in accordance with a Bayesian estimation framework.

A program for causing a computer to function as each part of a network fault location identification device according to any one of claims 1 to 3.