JP7765348B2

JP7765348B2 - Fault-tolerant system and data processing method

Info

Publication number: JP7765348B2
Application number: JP2022086961A
Authority: JP
Inventors: 英宏河合
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2025-11-06
Anticipated expiration: 2042-05-27
Also published as: JP2023174221A

Description

本発明は、フォールトトレラントシステムの技術を適用する多重系処理システムに関する。 The present invention relates to a multi-processing system that applies fault-tolerant system technology.

多重化された計算機（ノード）間で同じ処理を実施し、その何れかの計算機上で障害が発生しても無停止で業務を継続可能とする計算機システムとして、フォールトトレラントシステムが知られている。 A fault-tolerant system is a computer system that performs the same processing across multiple computers (nodes) and allows operations to continue uninterrupted even if a failure occurs on one of the computers.

フォールトトレラントシステムを実現するアプローチの１つとして、ステートマシンレプリケーションが知られている。ステートマシンレプリケーションでは、同一のステートを持つ各複製ノードに対して同一の入力を与え、その入力とステートを用いて決定論的に処理を行う。結果として各ノードは同一の出力をして、同一のステートに更新される。これにより、一部のノードに障害が発生したとしても、残りのノードで不整合が生じることなく、業務を継続することができる。 State machine replication is known as one approach to achieving fault-tolerant systems. In state machine replication, the same input is provided to each replicated node, which has the same state, and processing is performed deterministically using that input and state. As a result, each node produces the same output and is updated to the same state. This means that even if a failure occurs in some nodes, the remaining nodes can continue operating without inconsistencies.

ここで、複数のタスクを並列実行するよう構成されたアプリケーションの場合、全体としての振る舞いをノード間で一致化させるには、共有ステートへのアクセス順を一致化させればよい。 Here, in the case of an application configured to execute multiple tasks in parallel, the overall behavior can be made consistent between nodes by making the order in which shared state is accessed consistent.

例えば特許文献１に開示されたデータ処理方法では、リーダノードが、共有データへアクセスを行う都度、そのアクセス順をフォロワノードに伝達し、一方でフォロワノードは、共有データにアクセスを行う都度、リーダノードからのアクセス順の指示を待ち、その順に従って処理する。これにより、特許文献１のデータ処理方法は、複製ノード間の振る舞いを一致化している。 For example, in the data processing method disclosed in Patent Document 1, the leader node communicates the access order to the follower nodes each time it accesses shared data. Meanwhile, the follower nodes wait for instructions from the leader node regarding the access order each time they access the shared data, and process the data in that order. In this way, the data processing method of Patent Document 1 ensures that the behavior of replicated nodes is consistent.

国際公開第２０１２／１２７６５２号International Publication No. 2012/127652

しかし、上述した特許文献１では、タスク間の共有データへのアクセスの順序を、アクセスの都度、ノード間で一致化することによって決定性を保証している。すなわち、特許文献１のデータ処理方法においては、タスク間で共有されるデータへのアクセス要求が発生する都度、リーダノードとフォロワノード間で通信が発生するため、ワークロードによっては同期のための負荷が増大するおそれがあった。 However, in the above-mentioned Patent Document 1, determinism is guaranteed by matching the order of access to shared data between tasks between nodes each time an access is made. In other words, with the data processing method of Patent Document 1, communication occurs between the leader node and follower node each time an access request is made to data shared between tasks, which could increase the load for synchronization depending on the workload.

本発明は以上の点を考慮してなされたもので、同期による負荷を抑制しながらノード間の振る舞いを一致化できる、低遅延でスケーラブルなフォールトトレラントシステム及びデータ処理方法を提供しようとするものである。 The present invention was made in consideration of the above points, and aims to provide a low-latency, scalable fault-tolerant system and data processing method that can harmonize the behavior between nodes while reducing the load caused by synchronization.

かかる課題を解決するため本発明においては、多重化された複数のノードが通信ネットワークを介して接続されるフォールトトレラントシステムであって、前記複数のノードは、それぞれプロセッサ、メモリ、及び記憶領域を有し、各前記ノードにおいて前記プロセッサが前記記憶領域からプログラムを前記メモリに読み出して実行することによって動作するアプリケーションは、入力に対して決定論的に出力を生成する１以上のタスクから構成され、前記複数のノードのうちの１つがリーダノードとなり、リーダノード以外のノードがフォロワノードとなり、前記リーダノードと前記フォロワノードは、それぞれ同一のタスクを実行するように構成され、各前記タスクは、自タスクの状態を示すローカルステートに１対１で紐付けられ、前記タスクの入力は、システム外部または他のタスクから受信したメッセージ、及び当該タスクに紐付けられた前記ローカルステートへのアクセスを含み、前記タスクの出力は、当該タスクがシステム外部または他のタスクに送信するメッセージ、及び当該タスクに紐付けられた前記ローカルステートの更新を含み、前記タスクが複数のメッセージ送信元を持つ送信先タスクにメッセージを送信するとき、前記複数のノードのプロセッサは、前記送信先タスクに対するメッセージの配送順について前記複数のノード間で合意形成し、合意形成された配送順に従って前記メッセージを送信することを特徴とするフォールトトレラントシステムが提供される。 In order to solve this problem, the present invention provides a fault-tolerant system in which multiple nodes are connected via a communication network, each of which has a processor, memory, and storage area. The processor in each node reads a program from the storage area into the memory and executes it. The application is comprised of one or more tasks that deterministically generate an output for an input. One of the multiple nodes is a leader node, and the other nodes are follower nodes. The leader node and the follower nodes are each configured to execute the same task. Each task has a status indicating the state of the task. A fault-tolerant system is provided in which a task is linked one-to-one to a local state, the task inputs include messages received from outside the system or from other tasks, and access to the local state linked to the task, and the task outputs include messages sent by the task to outside the system or to other tasks, and updates to the local state linked to the task, and when the task sends a message to a destination task having multiple message senders, the processors of the multiple nodes reach a consensus among the multiple nodes on the order in which messages should be delivered to the destination task, and send the messages according to the agreed-upon delivery order.

また、かかる課題を解決するため本発明においては、多重化された複数のノードが通信ネットワークを介して接続されるフォールトトレラントシステムによるデータ処理方法であって、前記複数のノードは、それぞれプロセッサ、メモリ、及び記憶領域を有し、各前記ノードにおいて前記プロセッサが前記記憶領域からプログラムを前記メモリに読み出して実行することによって動作するアプリケーションは、入力に対して決定論的に出力を生成する１以上のタスクから構成され、前記複数のノードのうちの１つがリーダノードとなり、リーダノード以外のノードがフォロワノードとなり、前記リーダノードと前記フォロワノードは、それぞれ同一のタスクを実行するように構成され、各前記タスクは、自タスクの状態を示すローカルステートに１対１で紐付けられ、前記タスクの入力は、システム外部または他のタスクから受信したメッセージ、及び当該タスクに紐付けられた前記ローカルステートへのアクセスを含み、前記タスクの出力は、当該タスクがシステム外部または他のタスクに送信するメッセージ、及び当該タスクに紐付けられた前記ローカルステートの更新を含み、前記タスクが複数のメッセージ送信元を持つ送信先タスクにメッセージを送信するとき、前記複数のノードのプロセッサは、前記送信先タスクに対するメッセージの配送順について前記複数のノード間で合意形成し、合意形成された配送順に従って前記メッセージを送信することを特徴とするデータ処理方法が提供される。 In order to solve this problem, the present invention provides a data processing method using a fault-tolerant system in which multiplexed nodes are connected via a communication network, each of the multiple nodes having a processor, memory, and storage area, and an application that runs in each of the nodes by the processor reading a program from the storage area into the memory and executing it, and the application is composed of one or more tasks that deterministically generate output for an input, one of the multiple nodes is a leader node, and the nodes other than the leader node are follower nodes, and the leader node and the follower node are each configured to execute the same task, and each of the tasks is A data processing method is provided in which a task is linked one-to-one to a local state that indicates the state of the task itself, the task's input includes messages received from outside the system or from other tasks, and access to the local state linked to the task, the task's output includes messages sent by the task to outside the system or to other tasks, and updates to the local state linked to the task, and when the task sends a message to a destination task having multiple message senders, the processors of the multiple nodes reach a consensus among the multiple nodes on the order of message delivery to the destination task, and send the messages according to the agreed-upon delivery order.

本発明によれば、同期による負荷を抑制しながらノード間の振る舞いを一致化できる、低遅延でスケーラブルなフォールトトレラントシステム及びデータ処理方法を提供することができる。 The present invention provides a low-latency, scalable fault-tolerant system and data processing method that can harmonize the behavior of nodes while reducing the load caused by synchronization.

本発明の一実施形態に係るフォールトトレラントシステム（ＦＴシステム）１の構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of a fault-tolerant system (FT system) 1 according to an embodiment of the present invention. ＦＴシステム１におけるイベント発生時の全体的な処理の特徴を示すイメージ図である。FIG. 1 is an image diagram showing the overall characteristics of processing when an event occurs in the FT system 1. ノード間の相関関係を説明するイメージ図である。FIG. 10 is an image diagram illustrating the correlation between nodes. メッセージの特徴を説明するための図である。FIG. 10 is a diagram illustrating the characteristics of a message. タスク２０の動作例を示すイメージ図である。FIG. 10 is an image diagram showing an example of the operation of task 20. メッセージパス管理テーブル３５の一例を示す図である。FIG. 10 is a diagram showing an example of a message path management table 35. ノード間におけるイベントログの合意形成の特徴を説明するための図である。FIG. 10 is a diagram for explaining characteristics of consensus building on an event log between nodes. イベントログの合意形成におけるリーダの処理手順の一例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure of a leader in consensus building of an event log. イベントログの合意形成におけるフォロワの処理手順の一例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure of a follower in consensus building of an event log. リーダの入力処理部３１による受信処理の処理手順例を示すフローチャートである。10 is a flowchart showing an example of a procedure for a receiving process by an input processing unit 31 of a reader. イベントログ３４に保持される外部入力タイプのログエントリのデータ構成例を示す図である。10 is a diagram showing an example of the data structure of an external input type log entry stored in the event log 34. FIG. メッセージ要求受付処理の処理手順例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure for a message request acceptance process. メッセージ配送タイプのログエントリの具体例を示す図である。FIG. 10 illustrates an example of a message delivery type log entry. メッセージパスを追加するときのノード間の関係を説明するためのモデル図である。FIG. 10 is a model diagram for explaining the relationship between nodes when a message path is added. メッセージパス追加処理の処理手順例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure for a message path addition process. メッセージパス追加タイプのログエントリの一例を示す図である。FIG. 10 illustrates an example of a message path addition type log entry. メッセージパス追加タイプのログエントリの別例を示す図である。FIG. 10 is a diagram illustrating another example of a log entry of the message path addition type. メッセージパス情報の一例を示す図である。FIG. 10 is a diagram illustrating an example of message path information. メッセージパス情報受信処理の処理手順例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure for message path information reception processing. メッセージパス管理テーブル３５の具体例を示す図である。FIG. 10 is a diagram showing a specific example of a message path management table 35. メッセージパス情報タイプのログエントリの一例を示す図である。FIG. 10 illustrates an example of a log entry of a message path information type. 第２の改善方法によるメッセージパス追加の処理手順例を示すシーケンス図である。FIG. 10 is a sequence diagram illustrating an example of a processing procedure for adding a message path according to a second improvement method. 第２の改善方法によるメッセージパス追加処理の処理手順例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure for message path addition processing according to a second improvement method. 第２の改善方法によるメッセージパス情報受信処理の処理手順例を示すフローチャートである。10 is a flowchart illustrating an example of a processing procedure for message pass information reception processing according to a second improvement method. 外部出力の順序保証を説明するイメージ図である。FIG. 10 is a conceptual diagram illustrating order assurance of external outputs. 複数のノードからの外部出力を調整して出力する構成を説明するイメージ図である。FIG. 10 is a conceptual diagram illustrating a configuration in which external outputs from a plurality of nodes are adjusted and output.

以下、図面を参照して、本発明の実施形態を詳述する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

なお、以下の記載及び図面は、本発明を説明するための例示であって、説明の明確化のため、適宜、省略及び簡略化がなされている。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。本発明が実施形態に制限されることは無く、本発明の思想に合致するあらゆる応用例が本発明の技術的範囲に含まれる。本発明は、当業者であれば本発明の範囲内で様々な追加や変更等を行うことができる。本発明は、他の種々の形態でも実施する事が可能である。特に限定しない限り、各構成要素は複数でも単数でも構わない。 The following description and drawings are illustrative of the present invention, and have been omitted or simplified as appropriate for clarity of explanation. Furthermore, not all of the combinations of features described in the embodiments are necessarily essential to the solution of the invention. The present invention is not limited to the embodiments, and all application examples that conform to the concept of the present invention are included within the technical scope of the present invention. Those skilled in the art will be able to make various additions and modifications to the present invention within the scope of the present invention. The present invention can also be implemented in a variety of other forms. Unless otherwise specified, each component may be plural or singular.

以下の説明では、「テーブル」、「表」、「リスト」、「キュー」等の表現にて各種情報を説明することがあるが、各種情報は、これら以外のデータ構造で表現されていてもよい。データ構造に依存しないことを示すために「ＸＸテーブル」、「ＸＸリスト」等を「ＸＸ情報」と呼ぶことがある。各情報の内容を説明する際に、「識別情報」、「識別子」、「名」、「ＩＤ」、「番号」等の表現を用いるが、これらについてはお互いに置換が可能である。 In the following explanation, various types of information may be described using terms such as "table," "list," and "queue," but the various types of information may also be expressed using data structures other than these. To indicate that the information is not dependent on the data structure, "XX table," "XX list," etc. may be referred to as "XX information." When describing the content of each piece of information, terms such as "identification information," "identifier," "name," "ID," and "number" are used, but these terms are interchangeable.

また、以下の説明では、同種の要素を区別しないで説明する場合には、参照符号又は参照符号における共通番号を使用し、同種の要素を区別して説明する場合は、その要素の参照符号を使用又は参照符号に代えてその要素に割り振られたＩＤを使用することがある。 In addition, in the following explanation, when describing elements of the same type without distinguishing between them, reference signs or common numbers in reference signs will be used, and when describing elements of the same type with distinction between them, the reference signs of those elements will be used, or the ID assigned to those elements will be used instead of the reference signs.

また、以下の説明では、プログラムを実行して行う処理を説明する場合があるが、プログラムは、少なくとも１以上のプロセッサ（例えばＣＰＵ）によって実行されることで、定められた処理を、適宜に記憶資源（例えばメモリ）及び／又はインターフェースデバイス（例えば通信ポート）等を用いながら行うため、処理の主体がプロセッサとされてもよい。同様に、プログラムを実行して行う処理の主体が、プロセッサを有するコントローラ、装置、システム、計算機、ノード、ストレージシステム、ストレージ装置、サーバ、管理計算機、クライアント、又は、ホストであってもよい。プログラムを実行して行う処理の主体（例えばプロセッサ）は、処理の一部又は全部を行うハードウェア回路を含んでもよい。例えば、プログラムを実行して行う処理の主体は、暗号化及び復号化、又は圧縮及び伸張を実行するハードウェア回路を含んでもよい。プロセッサは、プログラムに従って動作することによって、所定の機能を実現する機能部として動作する。プロセッサを含む装置及びシステムは、これらの機能部を含む装置及びシステムである。 In addition, while the following description may describe processing performed by executing a program, a program is executed by at least one processor (e.g., a CPU) to perform a predetermined process using storage resources (e.g., memory) and/or interface devices (e.g., communication ports) as appropriate, and therefore the processor may be the entity performing the processing. Similarly, the entity performing the processing by executing a program may be a controller, device, system, computer, node, storage system, storage device, server, management computer, client, or host having a processor. The entity performing the processing by executing a program (e.g., a processor) may include a hardware circuit that performs some or all of the processing. For example, the entity performing the processing by executing a program may include a hardware circuit that performs encryption and decryption, or compression and decompression. A processor operates as a functional unit that realizes a specified function by operating in accordance with a program. Devices and systems that include a processor are devices and systems that include these functional units.

プログラムは、プログラムソースから計算機のような装置にインストールされてもよい。プログラムソースは、例えば、プログラム配布サーバ又は計算機が読み取り可能な記憶メディアであってもよい。プログラムソースがプログラム配布サーバの場合、プログラム配布サーバはプロセッサ（例えばＣＰＵ）と記憶資源を含み、記憶資源はさらに配布プログラムと配布対象であるプログラムとを記憶してよい。そして、プログラム配布サーバのプロセッサが配布プログラムを実行することで、プログラム配布サーバのプロセッサは配布対象のプログラムを他の計算機に配布してよい。また、以下の説明において、２以上のプログラムが１つのプログラムとして実現されてもよいし、１つのプログラムが２以上のプログラムとして実現されてもよい。 A program may be installed on a device such as a computer from a program source. The program source may be, for example, a program distribution server or a computer-readable storage medium. When the program source is a program distribution server, the program distribution server includes a processor (e.g., a CPU) and storage resources, and the storage resources may further store a distribution program and a program to be distributed. The processor of the program distribution server may then execute the distribution program, causing the processor of the program distribution server to distribute the program to be distributed to other computers. Also, in the following description, two or more programs may be realized as one program, and one program may be realized as two or more programs.

（１）構成
図１は、本発明の一実施形態に係るフォールトトレラントシステム（ＦＴシステム）１の構成例を示すブロック図である。図１に示すように、ＦＴシステム１は、通信ネットワーク２を介して接続される複数の計算機（ノード１０）から構成される計算機システムである。 (1) Configuration Fig. 1 is a block diagram showing an example of the configuration of a fault-tolerant system (FT system) 1 according to one embodiment of the present invention. As shown in Fig. 1, the FT system 1 is a computer system made up of multiple computers (nodes 10) connected via a communication network 2.

本実施形態に係るＦＴシステム１は、いわゆるリーダ－フォロワ型分散システムであって、複数のノード１０のうちの１台がリーダノード（リーダとも呼ぶ）となり、リーダノード以外のノードがフォロワノード（フォロワとも呼ぶ）となる。具体的には例えば、分散合意アルゴリズムのＲＡＦＴにおけるリーダ及びフォロワを想定する。ＦＴシステム１では、複数のノード１０上に複製されたイベント駆動型のアプリケーションに対して同じ入力が与えられ、同じ出力を得る。イベント駆動型のアプリケーションは、独立動作可能な複数の処理モジュールであるタスク２０から構成される。すなわち、ＦＴシステム１は、イベントの発生時に、リーダノードとフォロワノードとで、それぞれ同一のタスク２０が実行されるように構成される。 The FT system 1 according to this embodiment is a so-called leader-follower distributed system, in which one of the multiple nodes 10 serves as a leader node (also called the leader), and the nodes other than the leader node serve as follower nodes (also called followers). Specifically, consider the leader and follower in the RAFT distributed consensus algorithm. In the FT system 1, the same input is given to an event-driven application replicated on multiple nodes 10, and the same output is obtained. An event-driven application is composed of tasks 20, which are multiple processing modules capable of operating independently. In other words, the FT system 1 is configured so that when an event occurs, the same task 20 is executed on both the leader node and the follower node.

なお、ＦＴシステム１が正常に稼働している間は、リーダのノード１０とフォロワのノード１０が入れ替わることはないが、例えばリーダのノード１０に何らかの障害が発生した可能性があるときは、当該ノード１０はＦＴシステム１から切り離され、フォロワのノード１０のうちの１台が新たにリーダとなってＦＴシステム１は処理を継続する。このような構成により、スケーラブルなＦＴシステム１が実現される。 While the FT system 1 is operating normally, the leader node 10 and follower node 10 will not switch positions. However, if, for example, there is a possibility that a failure has occurred in the leader node 10, that node 10 will be disconnected from the FT system 1, and one of the follower nodes 10 will become the new leader, and the FT system 1 will continue processing. This configuration allows for the realization of a scalable FT system 1.

また、ＦＴシステム１は、通信ネットワーク２を介して１台以上の端末３と接続され、端末３とデータの送受信を行う。通信ネットワーク２は、例えばＬＡＮ（Local Area Network）であるが、これに限定されない。端末３は、ユーザが使用する計算機であって、例えば、クライアントとしてＦＴシステム１にイベントを送信したり、イベントの処理結果をＦＴシステム１から受信したりする。なお、後述する図２に示すように、ＦＴシステム１ではリーダノードがイベントを最初に受け取る。 The FT system 1 is also connected to one or more terminals 3 via a communications network 2, and transmits and receives data to and from the terminals 3. The communications network 2 is, for example, a local area network (LAN), but is not limited to this. The terminals 3 are computers used by users, and, for example, act as clients to send events to the FT system 1 and receive the results of event processing from the FT system 1. As shown in Figure 2, which will be described later, in the FT system 1, the leader node is the first to receive an event.

ノード１０は、ＣＰＵ（Central Processing Unit）等のプロセッサ、メモリ、記憶装置（記憶領域）、及び通信インタフェース等を備えた計算機である。記憶装置には、ノード１０で実行されるイベント処理基盤３０及びタスク２０等の各種プログラム及びデータが記憶される。そして、プロセッサが、記憶装置からこれらの各種プログラムを読み出してメモリに展開し、実行する。 Node 10 is a computer equipped with a processor such as a CPU (Central Processing Unit), memory, a storage device (storage area), a communication interface, etc. The storage device stores various programs and data, such as the event processing platform 30 and tasks 20, that are executed by node 10. The processor then reads these various programs from the storage device, expands them into memory, and executes them.

タスク２０は、１以上のノード１０の複数のタスク群でアプリケーションを構成する、独立動作可能なイベント駆動型の処理モジュール（プログラム）である。タスク２０は、具体的には例えば、マイクロサービスや、ＦａａＳ（Function as a Service）上のFunctionや、ＯＳ（Operating System）上のプロセス等である。 A task 20 is an independently operable event-driven processing module (program) that constitutes an application with a group of multiple tasks on one or more nodes 10. Specific examples of a task 20 include a microservice, a function on a FaaS (Function as a Service), or a process on an OS (Operating System).

本実施形態において、アプリケーションのステートは、タスク２０ごとに独立してステート（ローカルステート）２１によって管理される。ステート２１は、排他的にアクセスされる。各タスク２０は、自タスクの状態を示すステート２１と１対１で紐付けられ、他のタスク２０と疎結合に連係動作する。すなわち、各タスク２０は、直接的な共有のアプリデータを持たず、他のタスク２０が管理するアプリデータ（ステート２１）にアクセスする際には、処理を要求する一方向性のメッセージ（以後は、単にメッセージと記載する）によって排他的に実施される。そして、各タスク２０は決定論的に振る舞うように構成される。すなわち、入力と現在のステートが決定すれば、出力と処理後のステートの組み合わせが一意に定まる。 In this embodiment, the application state is managed independently for each task 20 by a state (local state) 21. The state 21 is accessed exclusively. Each task 20 is associated one-to-one with the state 21 that indicates the state of the task itself, and operates in loosely coupled cooperation with other tasks 20. In other words, each task 20 does not have directly shared application data, and when accessing application data (state 21) managed by another task 20, this is done exclusively by a unidirectional message (hereinafter simply referred to as a message) requesting processing. Furthermore, each task 20 is configured to behave deterministically. In other words, once the input and current state are determined, the combination of the output and post-processing state is uniquely determined.

イベント処理基盤３０は、イベントを処理するオペレーションシステムである。イベント処理基盤３０は、マルチタスク処理が可能で、複数のタスク２０を並列に実行することができる。イベント処理基盤３０は、入力処理部３１、メッセージ処理部３２、出力処理部３３、イベントログ３４、メッセージパス管理テーブル３５、及びメッセージ待機キュー３６を有する。 The event processing infrastructure 30 is an operation system that processes events. The event processing infrastructure 30 is capable of multitasking and can execute multiple tasks 20 in parallel. The event processing infrastructure 30 has an input processing unit 31, a message processing unit 32, an output processing unit 33, an event log 34, a message path management table 35, and a message waiting queue 36.

入力処理部３１は、発生したイベントを外部（端末３）あるいはリーダノードから受け付け、イベント発生順に当該イベントを処理する適切なタスクを呼び出す。 The input processing unit 31 receives events from the outside (terminal 3) or the leader node, and calls the appropriate task to process the events in the order in which they occur.

メッセージ処理部３２は、タスク２０からのメッセージ要求を受け付け、対象のタスク２０へのメッセージを実施する。このとき、メッセージ処理部３２は、システム全体の決定性保証に必要なメッセージパス管理テーブル３５の管理やノード間同期処理を行う。 The message processing unit 32 accepts message requests from tasks 20 and sends messages to the target tasks 20. At this time, the message processing unit 32 manages the message path management table 35, which is necessary to ensure determinism throughout the system, and performs inter-node synchronization processing.

出力処理部３３は、必要に応じて各ノード１０の処理結果の調停を行い、最終的にそれを外部に送信する。処理結果の調停は、例えば、多数決により選ばれた結果を送信する方法、または、先着優先で送信する方法等を採用することができる。 The output processing unit 33 arbitrates the processing results of each node 10 as necessary and ultimately transmits them to the outside. The arbitration of processing results can be achieved, for example, by transmitting the result selected by majority vote, or by transmitting on a first-come, first-served basis.

イベントログ３４は、ノード１０間で合意形成する（または合意形成した）情報を管理するデータ構造である。具体的には例えば、分散合意アルゴリズムのＲＡＦＴにおけるログを想定する。イベントログ３４は、ログエントリの集合によって構成される。ログエントリは、合意形成する情報の単位であり、最初にリーダがログに追記し、それをフォロワに複製する。リーダが過半数のフォロワからログエントリの複製完了の報告を受けたとき、古いログエントリから順に、ログエントリがコミットされる。そしてリーダは、ログエントリをどこまでコミットしたかをフォロワに通知する。各ノード１０は、コミット済みのログエントリを、古いものから順に適用する。どのように適用するかは、ログエントリのタイプによって異なる。 The event log 34 is a data structure that manages information for which consensus is to be reached (or for which consensus has been reached) between nodes 10. Specifically, consider the log in the distributed consensus algorithm RAFT. The event log 34 is composed of a collection of log entries. A log entry is a unit of information for which consensus is to be reached, and is first added to the log by the leader, who then replicates it to the followers. When the leader receives reports from the majority of followers that replication of the log entries is complete, the log entries are committed, starting with the oldest. The leader then notifies the followers of how many log entries have been committed. Each node 10 applies the committed log entries, starting with the oldest. How they are applied depends on the type of log entry.

メッセージパス管理テーブル３５は、メッセージパスを管理するデータである。メッセージパスは、メッセージの送信元とメッセージの送信先とを結ぶ経路である。詳細は後述するが、メッセージパス管理テーブル３５は、自ノードを含む各ノードから受信するメッセージパス情報に基づいて登録される。 The message path management table 35 is data that manages message paths. A message path is a route that connects the message sender and the message destination. Details will be described later, but the message path management table 35 is registered based on message path information received from each node, including the local node.

メッセージ待機キュー３６は、メッセージパスごとに用意される。メッセージ要求が発生した場合に、メッセージ要求はメッセージ待機キュー３６に順にキューイングされる。メッセージ待機キュー３６がブロック状態ではない場合、所定のタイミングでメッセージ待機キュー３６の先頭からメッセージ要求がデキューされ、メッセージ送信先に送信される。 A message waiting queue 36 is provided for each message path. When a message request occurs, the message request is queued in the message waiting queue 36 in order. If the message waiting queue 36 is not in a blocked state, the message request is dequeued from the top of the message waiting queue 36 at a specified timing and sent to the message destination.

（２）イベント処理の概要
図２は、ＦＴシステム１におけるイベント発生時の全体的な処理の特徴を示すイメージ図である。図２に示した外部入力３８は、イベントの外部入力またはフォロワノード１０Ｂ，１０Ｃへの転送、及び外部入力されたイベントに関するノード間の合意形成を行う。外部入力３８は、具体的には図１に示した入力処理部３１に相当する。外部出力３９は、ノード１０における処理結果をリーダノード１０Ａまたは外部に出力する。外部出力３９は、具体的には図１に示した出力処理部３３に相当する。 (2) Overview of Event Processing Figure 2 is an image diagram showing the overall processing characteristics when an event occurs in the FT system 1. The external input 38 shown in Figure 2 performs external input of an event or transfers the event to follower nodes 10B and 10C, and forms a consensus between nodes regarding the externally input event. The external input 38 specifically corresponds to the input processing unit 31 shown in Figure 1. The external output 39 outputs the processing result in the node 10 to the leader node 10A or to the outside. The external output 39 specifically corresponds to the output processing unit 33 shown in Figure 1.

図２に示すように、本実施形態に係るＦＴシステム１では、リーダノード１０Ａの外部入力３８が最初にイベントを受け取り、受け取ったイベントをフォロワノード１０Ｂ，１０Ｃの外部入力３８に転送し、ノード間で合意形成を行う。そして各ノード１０Ａ，１０Ｂ，１０Ｃは、受け取ったイベントに対して、リーダノード１０Ａが決定した順に、適切なタスク２０を呼び出して処理を行う。 As shown in Figure 2, in the FT system 1 according to this embodiment, the external input 38 of the leader node 10A first receives an event, then forwards the received event to the external inputs 38 of the follower nodes 10B and 10C, and consensus is reached between the nodes. Then, each of the nodes 10A, 10B, and 10C calls the appropriate task 20 in the order determined by the leader node 10A to process the received event.

図２において各タスク２０は受信メッセージに対し逐次的、かつ決定論的に処理を行う。例えばタスクＡがメッセージＭａ１、Ｍａ２を順に受信した場合、まずメッセージＭａ１を処理し、その結果としてメッセージＭｂ１をタスクＢに送信し、次にメッセージＭａ２を処理し、その結果としてメッセージＭｂ２をタスクＢに送信する。このとき、タスクＢは単一のメッセージ送信元しか持たないため、どのノードにおいてもメッセージＭｂ１、Ｍｂ２の順に逐次的、かつ決定論的に処理する。このように、ＦＴシステム１において、単一パスのメッセージ配送については、同期を取ることなく決定性が維持される。 In Figure 2, each task 20 processes received messages sequentially and deterministically. For example, if task A receives messages Ma1 and Ma2 in that order, it first processes message Ma1 and sends the resulting message Mb1 to task B, then processes message Ma2 and sends the resulting message Mb2 to task B. In this case, since task B has only a single message sender, messages Mb1 and Mb2 are processed sequentially and deterministically at every node. In this way, determinism is maintained for single-path message delivery in the FT system 1 without synchronization.

また、図２においてタスクＤからタスクＢに破線で示したように、１のタスクから別の１のタスクに対して初回にメッセージパスが追加されるときは、当該メッセージがあったことについて、ノード間で合意形成する。これにより、パスに複数のメッセージ送信元が生じた場合、以降はメッセージ処理順の合意形成を行うようにする。 Also, as shown by the dashed line from task D to task B in Figure 2, when a message path is added from one task to another task for the first time, a consensus is reached between the nodes about the existence of that message. As a result, if multiple message senders appear on the path, a consensus is reached on the order in which messages will be processed thereafter.

また、図２においてタスクＢ，またはＤからのメッセージ受信によりタスクＣが動作するといったように、複数パスからのメッセージ配送がある場合については、メッセージ処理順をノード間で合意形成する。これにより、複数パスを持つタスク（例えばタスクＣ）の振る舞いがノード間で一致する。 Furthermore, in cases where messages are delivered via multiple paths, such as when task C operates upon receiving a message from task B or D in Figure 2, the order in which messages are processed is agreed upon between nodes. This ensures that the behavior of a task with multiple paths (for example, task C) is consistent between nodes.

そして、各ノード１０Ａ，１０Ｂ，１０Ｃでタスク２０の処理が行われた後は、外部出力３９が、必要に応じて多数決等の調停を行い、処理結果を外部（端末３）に出力する。 After task 20 has been processed at each node 10A, 10B, and 10C, external output 39 performs arbitration such as majority voting as necessary and outputs the processing results to the outside (terminal 3).

（３）タスク動作の決定性保証
図３は、ノード間の相関関係を説明するイメージ図である。図３では、図面の左半分にリーダノード（リーダ）１０Ａを示し、図面の右半分に１つのフォロワノード（フォロワ）１０Ｂを示し、それらの相関関係を表している。なお、詳細は後述するが、外部出力は様々な方法を採用できることから、出力処理部３３は、リーダ１０Ａ，フォロワ１０Ｂの区別を付けずに図示している。図３に示すように、ノード間では、外部入力、メッセージパス追加、及びメッセージ配送順（タスクの呼び出し順）の３種類の合意形成が行われる。 (3) Ensuring determinism of task operation Figure 3 is an image diagram illustrating the correlation between nodes. In Figure 3, a leader node (leader) 10A is shown in the left half of the drawing, and one follower node (follower) 10B is shown in the right half of the drawing, showing the correlation between them. Note that, as will be described in detail later, various methods can be used for external output, and therefore the output processing unit 33 is illustrated without distinguishing between the leader 10A and the follower 10B. As shown in Figure 3, three types of consensus are formed between nodes: external input, message path addition, and message delivery order (task invocation order).

図４は、メッセージ配送の特徴を説明するための図である。図４に示すように、タスク間及び外部入出力とタスク間は、メッセージ１０１によって情報交換及び連係動作をする。なお、本実施形態で扱うメッセージは一方向性であり、例えばタスクＡからタスクＢに向けたメッセージに対する応答は、タスクＢからタスクＡへのメッセージとみなして扱う。 Figure 4 is a diagram explaining the characteristics of message delivery. As shown in Figure 4, information exchange and coordination between tasks and between external input/output and tasks is performed using messages 101. Note that messages handled in this embodiment are unidirectional; for example, a response to a message from task A to task B is treated as a message from task B to task A.

１つのメッセージパス上に流れるメッセージは、そのパス上で順序が入れ替わることはない。それぞれのメッセージ１０１には、メッセージパスごとに、送信順を示す通番が付与される。本説明では、同一のメッセージパス内でのメッセージの通番（メッセージ通番）を、送信順に＃１，＃２，・・・と表記する。メッセージ通番は、メッセージパス管理テーブル３５で管理される。 Messages flowing on a single message path never change order on that path. Each message 101 is assigned a sequence number indicating the order of transmission for each message path. In this explanation, the sequence numbers (message sequence numbers) of messages within the same message path will be represented as #1, #2, ... in the order of transmission. Message sequence numbers are managed in the message path management table 35.

例えば図４に示すように、タスクＢからタスクＣに向かうメッセージパスにおいて、タスクＢから＃１、＃２の順にメッセージが送信された場合、タスクＣは＃１、＃２の順にメッセージを受信する。一方、タスクＢから外部出力に向かうメッセージ「Ｂ→外＃２」とタスクＢからタスクＣに向かうメッセージ「Ｂ→Ｃ＃１」のように、送信元が同じで送信先が異なるメッセージパス上のメッセージでは、どちらのメッセージが先に送信先に届くかはその送信順に関係しない。 For example, as shown in Figure 4, on a message path going from task B to task C, if task B sends messages #1 and #2 in that order, task C will receive the messages in the order #1 and #2. On the other hand, for messages on a message path that have the same sender but different destinations, such as message "B → External #2" going from task B to external output and message "B → C #1" going from task B to task C, the order in which they are sent does not affect which message arrives at the destination first.

また、前述したように、タスク２０は、メッセージ１０１の受信を契機に動作するイベント駆動型タスクである。各タスク２０は決定論的に振る舞うものとし、入力に対して一意の処理結果を出力する。ここで、タスク２０への入力は、外部入力３８や他のタスク２０から受信したメッセージ１０１の内容または自身のローカルステート２１の内容である。また、タスク２０からの出力は、他のタスク２０や外部出力３９に送信するメッセージ１０１または自身のローカルステート２１への更新内容である。各タスク２０は、同時に１つのメッセージ１０１を処理する。複数のメッセージ１０１を受信した場合には、逐次的に１つずつメッセージ１０１を処理する。ローカルステート２１は、タスク２０単位で独立しており、他のタスク２０から直接アクセスされない。あるタスク２０が他のタスク２０のローカルステート２１にアクセスしたい場合には、当該他のタスク２０にアクセス要求を依頼するメッセージ１０１を発行することで実現される。 As mentioned above, task 20 is an event-driven task that operates upon receiving a message 101. Each task 20 behaves deterministically and outputs a unique processing result for each input. Here, input to task 20 is the contents of message 101 received from external input 38 or other tasks 20, or the contents of its own local state 21. Furthermore, output from task 20 is the message 101 sent to other tasks 20 or external output 39, or updates to its own local state 21. Each task 20 processes one message 101 at a time. When multiple messages 101 are received, the messages 101 are processed sequentially one by one. The local state 21 is independent for each task 20 and cannot be directly accessed by other tasks 20. When a task 20 wants to access the local state 21 of another task 20, it does so by issuing a message 101 to the other task 20 requesting access.

図５は、タスク２０の動作例を示すイメージ図である。図５（Ａ）はメッセージ送信元が複数である場合の構成例を示しており、図５（Ｂ）はメッセージ送信元が単一である場合の構成例を示している。図５を参照しながら、本実施形態において各ノード１０によるアプリケーション（具体的にはタスク動作）の決定性が保証される原理について説明する。 Figure 5 is an image diagram showing an example of the operation of task 20. Figure 5(A) shows an example configuration when there are multiple message senders, and Figure 5(B) shows an example configuration when there is a single message sender. With reference to Figure 5, the principle by which the determinism of applications (specifically task operations) by each node 10 is guaranteed in this embodiment will be explained.

図５（Ａ）において、タスクＣは、タスクＡ，Ｂからのメッセージ１０１Ａ，１０１Ｂの内容と、ローカルステート２１Ｃの内容をもとに処理を行い、ローカルステート２１Ｃを更新し、処理結果を外部出力３９に出力する。このとき、ノード１０間でメッセージ配送順（処理順）について合意形成を行わないと、タスクＡからのメッセージ１０１Ａを先に処理した場合と、タスクＢからのメッセージ１０１Ｂを先に処理した場合とで、タスクＣの出力内容や最終的なローカルステート２１Ｃに差異が生じることがある。具体的には、例えばタスクＡからのメッセージ１０１Ａを先に処理する場合、タスクＣはメッセージ１０１ＡによってタスクＣのローカルステート２１Ｃを更新し、その更新後のローカルステート２１Ｃの内容を、タスクＢからのメッセージ１０１Ｂの処理において入力として利用するためである。 In Figure 5 (A), task C performs processing based on the contents of messages 101A and 101B from tasks A and B and the contents of local state 21C, updates local state 21C, and outputs the processing results to external output 39. At this time, if consensus is not reached between nodes 10 regarding the message delivery order (processing order), differences may occur in the output contents of task C and the final local state 21C between when message 101A from task A is processed first and when message 101B from task B is processed first. Specifically, for example, if message 101A from task A is processed first, task C updates local state 21C of task C with message 101A, and the contents of this updated local state 21C are used as input when processing message 101B from task B.

上記のような問題を解消してシステム全体として処理結果の決定性を保証する（言い換えれば、ノード１０間で同じ処理結果を得る）ために、本実施形態では、タスク単体が決定論的に振る舞うことに加えて、複数のメッセージ送信元を持つタスク２０について、どの順番でメッセージをディスパッチするかをノード１０間で合意を形成する。合意形成の詳細は後述する。 To resolve the above problems and guarantee determinism of processing results across the entire system (in other words, to obtain the same processing results across nodes 10), in this embodiment, in addition to tasks behaving deterministically, for tasks 20 with multiple message senders, consensus is reached between nodes 10 regarding the order in which messages should be dispatched. Details of consensus will be described later.

一方、図５（Ｂ）のタスクＣのように、単一のメッセージ送信元（タスクＡ）しか持たない場合には、前述した通り、タスクＣにおいてメッセージ１０１Ａがどの処理順でディスパッチされるかは一意に決定される。具体的には、タスクＣにおいてメッセージ１０１Ａは＃３，＃４の処理順でディスパッチされる。したがって、単一のメッセージ送信元しか持たない単一パスの場合には、図５（Ａ）の場合のような合意形成は不要である。 On the other hand, when there is only a single message source (task A), as in task C in Figure 5(B), the processing order in which message 101A is dispatched in task C is uniquely determined, as mentioned above. Specifically, in task C, message 101A is dispatched in the processing order of #3 and #4. Therefore, in the case of a single path with only a single message source, consensus building as in the case of Figure 5(A) is not necessary.

図６は、メッセージパス管理テーブル３５の一例を示す図である。図６に示したメッセージパス管理テーブル１１０は、図１に示したメッセージパス管理テーブル３５の一例であって、メッセージパスごとに、当該メッセージパスにおいてディスパッチ済みのメッセージ通番を管理する。メッセージパス管理テーブル１１０は、メッセージパス１１１、状態１１２，及びディスパッチ済みメッセージ通番１１３の項目を有して構成される。 Figure 6 shows an example of the message path management table 35. The message path management table 110 shown in Figure 6 is an example of the message path management table 35 shown in Figure 1, and manages, for each message path, the sequence numbers of messages that have been dispatched on that message path. The message path management table 110 is composed of the following items: message path 111, status 112, and dispatched message sequence number 113.

メッセージパス１１１は、当該レコードで管理するメッセージパスの送信元及び送信先を示す。 Message path 111 indicates the source and destination of the message path managed by the record.

状態１１２は、各メッセージパスについて、「ブロック状態」か否かの状態を示す。具体的には例えば、当該レコードのメッセージパスが「ブロック状態」とされている場合に、状態１１２に「ブロック」が設定される。詳細はメッセージ処理部による処理の説明で後述するが、あるメッセージ配送先タスクについて１番目のメッセージパスが存在するときに新たに２番目のメッセージパスが追加されるとき、２番目のメッセージパスに関する合意形成が完了するまで、１番目のメッセージパスは「ブロック状態」とされ、１番目のメッセージパス上のメッセージのディスパッチが保留される。この１番目のメッセージパスのブロック状態は、２番目のメッセージパスの追加に関する「メッセージパス追加」のログエントリについて合意形成が完了し、このログエントリが適用されるタイミングで、解除される。 Status 112 indicates whether each message path is in a "blocked state." Specifically, for example, if the message path of the record in question is in a "blocked state," status 112 is set to "blocked." Details will be provided later in the explanation of processing by the message processing unit, but when a second message path is added when a first message path exists for a message destination task, the first message path is placed in a "blocked state," and dispatching of messages on the first message path is put on hold until consensus is reached on the second message path. The blocked state of the first message path is released when consensus is reached on the "Message Path Added" log entry regarding the addition of the second message path and this log entry is applied.

ディスパッチ済みメッセージ通番１１３は、当該レコードのメッセージパスにおいて、何番目のメッセージまでがディスパッチ済みであるかをメッセージ通番で示す。メッセージ通番は、メッセージパスごとにカウントされ、そのメッセージパス上でメッセージが発生するごとに１ずつカウントアップされて、その値が当該メッセージに付与される。この結果、メッセージパスとメッセージ通番は、各メッセージを一意に識別するために用いることができる。 The dispatched message sequence number 113 indicates, by message sequence number, up to which message has been dispatched on the message path of the record. The message sequence number is counted for each message path, and is incremented by one each time a message occurs on that message path, and that value is assigned to that message. As a result, the message path and message sequence number can be used to uniquely identify each message.

なお、メッセージパス管理テーブル１１０にメッセージ送信先とメッセージ送信元のペアが存在しない場合、新規メッセージパスとして扱い、メッセージパス管理テーブル１１０に新たなレコードを追加登録する。メッセージパス管理テーブル１１０への新規メッセージパスの追加は、新規メッセージパス追加に伴う合意形成後に行われる。すなわち、合意形成中に当該新規メッセージパスへのメッセージが再び発生した場合も、新規メッセージパスとして扱われる。 If a pair of message destination and message source does not exist in the message path management table 110, it is treated as a new message path, and a new record is added to the message path management table 110. A new message path is added to the message path management table 110 after consensus has been reached regarding the addition of the new message path. In other words, if a message to the new message path is generated again during consensus, it is also treated as a new message path.

また、メッセージパス管理テーブル１１０への登録は、ＦＴシステム１の実運用時に、メッセージが発生するごとに各メッセージパスを新規に登録することが想定されるが、変形例として、事前にある程度のメッセージパスを登録しておくようにしてもよい。すなわち、システムのテスト時に構築したメッセージパス管理テーブルの内容をファイル出力し、実運用時にはファイル出力した内容をメッセージパス管理テーブル１１０にロードしておくことで、ある程度のメッセージパスが構築された状態で実運用を開始するようにしてもよい。このようにすることで、実運用時のメッセージパスの登録処理に掛かる負荷を低減することができる。 In addition, when registering in the message path management table 110 during actual operation of the FT system 1, it is expected that each message path will be newly registered each time a message is generated. However, as a variant, a certain number of message paths may be registered in advance. In other words, the contents of the message path management table constructed during system testing may be output to a file, and during actual operation, the contents of the output file may be loaded into the message path management table 110, so that actual operation can begin with a certain number of message paths constructed. In this way, the load on the message path registration process during actual operation can be reduced.

（４）合意形成処理
以下では、本実施形態に係るＦＴシステム１におけるノード１０間の合意形成について説明する。 (4) Consensus Building Process The following describes the consensus building process between the nodes 10 in the FT system 1 according to this embodiment.

ＦＴシステム１において、ノード１０間で各種の合意形成をするにあたり、イベントログ３４（あるいは単にログ）が、その合意形成の過程や結果を管理する。合意形成プロトコルとしてはＲＡＦＴが知られており、本実施形態に係るＦＴシステム１は、基本的な点ではＲＡＦＴを用いて、ノード１０間の合意形成を行う。但し、説明の簡略のため、本実施形態の説明では、ＲＡＦＴを単純化して表現しており、例えばＲＡＦＴのリーダ再選出プロトコル等は記載を省略する。ＲＡＦＴにおけるログエントリの適用は、ログエントリの内容をステートマシンに反映させることを意味する。そして本実施形態におけるログエントリの適用は、ログエントリのタイプによって処理内容が異なる。 In the FT system 1, when various consensuses are reached between the nodes 10, the event log 34 (or simply the log) manages the process and results of the consensus. RAFT is a known consensus protocol, and the FT system 1 according to this embodiment essentially uses RAFT to reach consensus between the nodes 10. However, for ease of explanation, RAFT is expressed in a simplified manner in the description of this embodiment, and details such as the RAFT leader re-election protocol are omitted. Applying a log entry in RAFT means reflecting the contents of the log entry in the state machine. In this embodiment, the processing content of the log entry application differs depending on the type of log entry.

本実施形態で取扱うログエントリのタイプは、「外部入力」、「メッセージ配送」、「メッセージパス追加」の３種類に大別することができる。 The types of log entries handled in this embodiment can be broadly divided into three categories: "external input," "message delivery," and "message path addition."

「外部入力」のログエントリは、リーダが受信した外部からのイベントをフォロワに対して複製し、各ノードで同じ順にディスパッチすることを保証する、ことを目的とする。「外部入力」のログエントリを適用するときの具体的な動作としては、メッセージ処理部３２が、外部入力イベントの内容に応じて、これを処理する適切なタスクに、イベント内容をディスパッチする。外部入力のログエントリのデータ構成例は、図１１に示される。 The purpose of the "External Input" log entry is to ensure that external events received by the leader are replicated to followers and dispatched in the same order at each node. The specific operation when applying the "External Input" log entry is for the message processing unit 32 to dispatch the event contents to the appropriate task for processing, depending on the contents of the external input event. An example of the data structure of an external input log entry is shown in Figure 11.

「メッセージ配送」のログエントリは、複数のメッセージ送信元を持つケースにおいて、どの順番でメッセージをディスパッチするかをノード間で一致化する、ことを目的とする。このようなログエントリの適用により、各メッセージは、メッセージパスとメッセージ通番の組合せによって一意に識別される。「メッセージ配送」のログエントリを適用するときの具体的な動作としては、メッセージ処理部３２が当該メッセージをディスパッチする。メッセージ配送のログエントリの具体例は、図１３に示される。 The purpose of the "Message Delivery" log entry is to ensure consistency between nodes regarding the order in which messages are dispatched when there are multiple message senders. By applying this log entry, each message is uniquely identified by the combination of the message path and message sequence number. The specific operation when applying the "Message Delivery" log entry is for the message processing unit 32 to dispatch the message. A specific example of a message delivery log entry is shown in Figure 13.

「メッセージパス追加」のログエントリは、どのようなメッセージパスが存在するかをノード間で認識合わせをする、ことを目的とする。特に、１番目のメッセージパスが存在するときに新たに２番目のメッセージパスが追加されるときには、１番目のメッセージパスにおいて何番のメッセージ通番までディスパッチした後に２番目のメッセージパスを追加するか、についても併せてノード間で認識合わせする。「メッセージパス追加」のログエントリを適用するときの具体的な動作としては、メッセージ処理部３２が、メッセージパス管理テーブル３５（図５のメッセージパス管理テーブル１１０）に当該メッセージパスを追加する。さらに、１番目のメッセージパスの追加である場合、メッセージ処理部３２は、当該メッセージパスのメッセージ待機キュー３６上のメッセージを順にディスパッチする。また、２番目のメッセージパスの追加である場合、メッセージ処理部３２は、１番目のメッセージパスのメッセージ待機キュー３６上のメッセージを、合意したメッセージ通番まで順にディスパッチした後で、１番目のメッセージパスの「ブロック状態」を解除した上で、２番目のメッセージパスをメッセージパス管理テーブル３５（図５のメッセージパス管理テーブル１１０）に追加する。メッセージ追加パスのログエントリの具体例は、図１７，図１８に示される。 The purpose of the "Message Path Added" log entry is to ensure that nodes agree on the existence of message paths. In particular, when a second message path is added when a first message path already exists, the nodes also agree on the message sequence number up to which messages should be dispatched on the first message path before the second message path is added. Specifically, when the "Message Path Added" log entry is applied, the message processing unit 32 adds the message path to the message path management table 35 (message path management table 110 in Figure 5). Furthermore, when a first message path is added, the message processing unit 32 dispatches messages in the message waiting queue 36 of the message path in order. Furthermore, when a second message path is added, the message processing unit 32 dispatches messages in the message waiting queue 36 of the first message path in order up to the agreed-upon message sequence number, then releases the "blocked state" of the first message path and adds the second message path to the message path management table 35 (message path management table 110 in Figure 5). Specific examples of log entries for message addition paths are shown in Figures 17 and 18.

図７は、ノード間におけるイベントログの合意形成の特徴を説明するための図である。図７には、異なるノード１０（リーダ、フォロワ１、フォロワ２）におけるログエントリの進行例が示されている。 Figure 7 is a diagram illustrating the characteristics of event log consensus building between nodes. Figure 7 shows an example of the progression of log entries at different nodes 10 (leader, follower 1, follower 2).

上記した３種類のログエントリに共通して、本実施形態におけるノード間のイベントログ３４の合意形成は、図７の場合、以下のように行われる。 For all three types of log entries described above, consensus on the event log 34 between nodes in this embodiment is achieved as follows in the case of Figure 7.

図７において、フォロワ１は＃５までログエントリの追記が完了しており、フォロワ２は＃６までログエントリの追記が完了している。このとき、リーダがフォロワ１，２からそれぞれの追記完了の通知を受信している場合、リーダは、自ノードを含む過半数のノードにおいて＃６まで追記が完了していることから、リーダは＃６までログエントリをコミットし、どこまでコミット済みかをフォロワ１，２に通知する。そして各ノードは、コミット済み、かつ未適用のログエントリがあれば、順に適用する。 In Figure 7, follower 1 has completed adding log entries up to #5, and follower 2 has completed adding log entries up to #6. At this time, if the leader receives notifications from followers 1 and 2 that their entries have been added, the leader will commit the log entries up to #6, since the majority of nodes, including the leader itself, have completed adding entries up to #6, and will notify followers 1 and 2 of how many entries have been committed. Then, each node will apply any committed but unapplied log entries in order.

このようなイベントログ３４の合意形成の詳細な処理手順について、図８及び図９を参照しながら説明する。 The detailed processing procedure for reaching consensus on such an event log 34 will be explained with reference to Figures 8 and 9.

図８は、イベントログの合意形成におけるリーダの処理手順の一例を示すフローチャートである。図９は、イベントログの合意形成におけるフォロワの処理手順の一例を示すフローチャートである。なお、図８及び図９に示す処理は、ログエントリが「外部入力」タイプの場合は入力処理部３１によって実行され、「メッセージ配送」タイプや「メッセージパス追加」タイプの場合はメッセージ処理部３２によって実行されるが、以下の説明では簡略化して、メッセージ処理部３２が実行するものとして記載する。 Figure 8 is a flowchart showing an example of a leader's processing procedure for consensus building on an event log. Figure 9 is a flowchart showing an example of a follower's processing procedure for consensus building on an event log. Note that the processing shown in Figures 8 and 9 is performed by the input processing unit 31 if the log entry is of the "external input" type, and by the message processing unit 32 if the log entry is of the "message delivery" or "message path addition" type; however, for simplicity's sake, the following explanation will be described as being performed by the message processing unit 32.

図８によればまず、リーダのメッセージ処理部３２が、ログエントリをログ（イベントログ３４）に追記する（ステップＳ１１）。次に、リーダのメッセージ処理部３２は、フォロワに、ログエントリの追記を要求する（ステップＳ１２）。 As shown in FIG. 8, first, the leader's message processing unit 32 adds a log entry to the log (event log 34) (step S11). Next, the leader's message processing unit 32 requests the follower to add a log entry (step S12).

ここで、図９に示すように、フォロワのメッセージ処理部３２は、図８のステップＳ１２の要求を受け取ると（ステップＳ２１）、当該ログエントリをメッセージパス上の送信順にログ（イベントログ３４）に追記する（ステップＳ２２）。そして、ログエントリの追記成功をリーダに通知する（ステップＳ２３）。 As shown in FIG. 9, when the follower's message processing unit 32 receives the request of step S12 in FIG. 8 (step S21), it adds the log entry to the log (event log 34) in the order of transmission on the message path (step S22). It then notifies the leader that the log entry was successfully added (step S23).

次に図８では、リーダのメッセージ処理部３２が、ＦＴシステム１を構成する複数のフォロワのうちの過半数のフォロワから、ステップＳ２３の追記成功の通知を受け取ったか否かを判定する（ステップＳ１３）。ステップＳ１３では、自身を含めて過半数のノードが追記成功したかどうかを判定する。そして、過半数のフォロワから追記成功の通知を受け取った場合は（ステップＳ１３のＹＥＳ）、ステップＳ１４に進む。一方、過半数のフォロワから追記成功の通知を受け取っていない場合（ステップＳ１３のＮＯ）、リーダは図８の処理を終了し、過半数のフォロワから追記成功の通知を受け取るまで待機する。 Next, in FIG. 8, the message processing unit 32 of the leader determines whether it has received notifications of successful addition in step S23 from a majority of the followers constituting the FT system 1 (step S13). In step S13, it determines whether the addition was successful for a majority of nodes, including the leader itself. If notifications of successful addition have been received from a majority of followers (YES in step S13), the process proceeds to step S14. On the other hand, if notifications of successful addition have not been received from a majority of followers (NO in step S13), the leader ends the processing in FIG. 8 and waits until notifications of successful addition are received from a majority of followers.

ステップＳ１４では、リーダのメッセージ処理部３２は、ステップＳ１３で過半数の通知があったログエントリを追記順が古いほうから順にコミットする。次に、リーダのメッセージ処理部３２は、フォロワに対して、どのログエントリまで（どのログエントリ通番まで）コミットしたかを通知する（ステップＳ１５）。そして最後に、リーダのメッセージ処理部３２は、コミット済みかつ未適用のログエントリを古いものから順に適用し（ステップＳ１６）、合意形成の処理を終了する。 In step S14, the leader's message processing unit 32 commits the log entries for which a majority of notifications were received in step S13, in order of oldest addition. Next, the leader's message processing unit 32 notifies the follower up to which log entries (up to which log entry serial numbers) have been committed (step S15). Finally, the leader's message processing unit 32 applies the committed but unapplied log entries in order of oldest addition (step S16), and the consensus building process ends.

一方、フォロワでは、ステップＳ２３でログエントリの追記成功をリーダに通知した後、メッセージ処理部３２が、リーダからステップＳ１５のコミット通知を受信したか否かを判定する（ステップＳ２４）。コミット通知を受信した場合は（ステップＳ２４のＹＥＳ）、コミット済みかつ未適用のログエントリを古いものから順に適用し（ステップＳ２５）、合意形成の処理を終了する。また、コミット通知を受信していない場合は（ステップＳ２４のＮＯ）、特段の処理を行わずに処理を終了する。 Meanwhile, in the follower, after notifying the leader of the successful addition of the log entry in step S23, the message processing unit 32 determines whether or not a commit notification was received from the leader in step S15 (step S24). If a commit notification was received (YES in step S24), the follower applies the committed but unapplied log entries in order from oldest to newest (step S25), and terminates the consensus building process. On the other hand, if a commit notification was not received (NO in step S24), the follower terminates the process without performing any special processing.

以上の図８及び図９の処理は、説明を簡便にするために、逐次的に処理が行われるフローチャートであったが、本実施形態はこれに限定されるものではなく、例えば、他のノードからの各種メッセージを待機する処理を挟む等、合意形成の処理を非同期に実施するようにしてもよい。また、以降の説明において「ログエントリのイベントログへの追記」が示される場合には、そのバックグラウンドにおいて図８及び図９に示した合意形成処理が動作するものとする（個々の説明を省略する）。 The above processing in Figures 8 and 9 is a flowchart in which processing is performed sequentially for ease of explanation, but this embodiment is not limited to this. For example, the consensus building process may be performed asynchronously, such as by inserting processing to wait for various messages from other nodes. Furthermore, in the following explanation, when "adding a log entry to the event log" is indicated, it is assumed that the consensus building process shown in Figures 8 and 9 is operating in the background (individual explanations will be omitted).

（５）入力処理部３１
以下では、入力処理部３１による処理について詳しく説明する。より具体的には、外部入力されたイベントに対する入力処理部３１の受信処理について説明する。 (5) Input processing unit 31
The following describes in detail the processing performed by the input processing unit 31. More specifically, the reception processing performed by the input processing unit 31 in response to an externally input event will be described.

図１０は、リーダの入力処理部３１による受信処理の処理手順例を示すフローチャートである。図１０に示す処理は、リーダの入力処理部３１によって例えば定期的に実行される。 Figure 10 is a flowchart showing an example of the processing procedure for reception processing by the reader's input processing unit 31. The processing shown in Figure 10 is executed, for example, periodically by the reader's input processing unit 31.

図１０に示すように、リーダの入力処理部３１は、外部入力イベントを受信したか否かを判定する（ステップＳ３１）。外部入力イベントを受信していない場合（ステップＳ３１のＮＯ）、入力処理部３１は次の周期でステップＳ３１の処理を繰り返す。外部入力イベントを受信した場合（ステップＳ３１のＹＥＳ）、入力処理部３１は、受信した外部入力イベントのログエントリをログ（イベントログ３４）に追記する（ステップＳ３２）。 As shown in FIG. 10, the input processing unit 31 of the reader determines whether an external input event has been received (step S31). If an external input event has not been received (NO in step S31), the input processing unit 31 repeats the process of step S31 in the next cycle. If an external input event has been received (YES in step S31), the input processing unit 31 adds a log entry for the received external input event to the log (event log 34) (step S32).

図１１は、イベントログ３４に保持される外部入力タイプのログエントリのデータ構成例を示す図である。図１１に示したように、外部入力タイプのログエントリ１２０は、タイプ１２１及び外部入力１２２を有して構成される。タイプ１２１には、ログエントリの種類が示され、外部入力１２２には、外部入力の内容が示される。すなわち、外部入力タイプのログエントリは、外部から受信したイベントがそのまま、イベントログ３４に保持される。 Figure 11 is a diagram showing an example of the data structure of an external input type log entry stored in the event log 34. As shown in Figure 11, an external input type log entry 120 is composed of a type 121 and an external input 122. Type 121 indicates the type of log entry, and external input 122 indicates the contents of the external input. In other words, an external input type log entry stores an event received from the outside as is in the event log 34.

ステップＳ３２において入力処理部３１が外部入力イベントのログエントリをイベントログ３４に追記することで、図８及び図９を参照しながら説明した通り、合意形成処理が動作し、結果として、当該外部入力イベントの内容がリーダからフォロワに転送される。したがって、フォロワは、図１０のような外部入力イベントを直接受信する処理を持たない。 In step S32, the input processing unit 31 adds a log entry for the external input event to the event log 34, which initiates the consensus building process as described with reference to Figures 8 and 9, resulting in the content of the external input event being transferred from the leader to the follower. Therefore, the follower does not have a process for directly receiving the external input event as shown in Figure 10.

以上のような受信処理を行うことにより、入力処理部３１は、どのノード１０でも同じイベントを同じ順番でディスパッチすることを保証する。 By performing the above-described receiving process, the input processing unit 31 ensures that the same events are dispatched in the same order on every node 10.

その後、入力処理部３１は、外部入力イベントの内容に応じて、それを処理する適切なタスク２０に対してメッセージを行うことにより、当該イベントを処理させる。具体的には、入力処理部３１は、メッセージ処理部２２に対してタスク２０の動作を要求するメッセージ要求を送信し、このメッセージ要求を受信したメッセージ処理部２２が、後述する図１２のメッセージ要求受付処理を行うことにより、タスク２０にメッセージを送信する。 Then, depending on the content of the external input event, the input processing unit 31 sends a message to the appropriate task 20 that will process the event, thereby causing the event to be processed. Specifically, the input processing unit 31 sends a message request to the message processing unit 22, requesting that task 20 operate. Upon receiving this message request, the message processing unit 22 performs the message request reception process shown in FIG. 12, which will be described later, and sends a message to task 20.

なお、後述するように、タスク間のメッセージにおいては、当該メッセージのディスパッチ順をノード間で一致させるための合意形成（メッセージ配送順の合意形成）が行われるが、外部入力イベントについてのメッセージは、当該メッセージを行う時点でノード間におけるディスパッチ順が一致していることから、メッセージ配送順の合意形成の対象外である。 As will be described later, for messages between tasks, consensus is reached between nodes to ensure that the dispatch order of those messages is consistent (consensus on message delivery order). However, messages regarding external input events are not subject to consensus on message delivery order because the dispatch order between nodes is consistent at the time the messages are sent.

（６）メッセージ処理部３２
以下では、メッセージ処理部３２による各種の処理について詳しく説明する。 (6) Message Processing Unit 32
The various processes performed by the message processing unit 32 will be described in detail below.

（６－１）メッセージ要求受付処理
図１２は、メッセージ要求受付処理の処理手順例を示すフローチャートである。図１２に示すメッセージ要求受付処理は、メッセージ処理部２２がタスク２０からのメッセージ要求を受理したときに実行される。 12 is a flowchart showing an example of the processing procedure of the message request acceptance processing. The message request acceptance processing shown in FIG. 12 is executed when the message processing unit 22 accepts a message request from the task 20.

図１２によればまず、メッセージ処理部２２は、受理したメッセージ要求が新規のメッセージパス上のメッセージ配送を要求するものであるか否かを判定する（ステップＳ４１）。具体的には、メッセージ処理部２２は、メッセージ要求が示すメッセージ送信先とメッセージ送信元との組み合わせがメッセージパス管理テーブル１１０に登録されているかを確認し、メッセージパス管理テーブル１１０にメッセージ送信先とメッセージ送信元のペアが存在しない場合に新規メッセージパスと判定することができる。 As shown in FIG. 12, first, the message processing unit 22 determines whether the received message request requests message delivery on a new message path (step S41). Specifically, the message processing unit 22 checks whether the combination of message destination and message source indicated in the message request is registered in the message path management table 110, and can determine that it is a new message path if the message path management table 110 does not contain a pair of message destination and message source.

ステップＳ４１において新規のメッセージパスである場合（ステップＳ４１のＹＥＳ）、メッセージ処理部２２は、新規のメッセージパスを追加するメッセージパス追加処理を行う（ステップＳ４２）。詳細は図１５等を参照しながら後述するが、ステップＳ４２のメッセージパス追加処理では、「メッセージパス追加」タイプのログエントリがログ（イベントログ３４）に追記される。このとき、前述した通り、ログエントリのログへの追記に伴って、図８及び図９に示した合意形成処理が動作する。そしてステップＳ４２の処理後、メッセージ処理部２２は、受理したメッセージ要求をメッセージ待機キュー３６に追加し（ステップＳ４３）、処理を終了する。 If the message path is a new one in step S41 (YES in step S41), the message processing unit 22 performs a message path addition process to add a new message path (step S42). Details will be described later with reference to FIG. 15, etc., but in the message path addition process in step S42, a log entry of type "Message path added" is added to the log (event log 34). At this time, as described above, the consensus building process shown in FIGS. 8 and 9 is executed in conjunction with the addition of the log entry to the log. Then, after processing step S42, the message processing unit 22 adds the accepted message request to the message waiting queue 36 (step S43) and terminates the process.

ステップＳ４１において新規のメッセージパスではない場合（ステップＳ４１のＮＯ）、メッセージ処理部２２は、受理したメッセージ要求によるメッセージが、複数のメッセージ送信元を持つタスク２０に対するメッセージであるか否かを判定する（ステップＳ４４）。 If the message path is not new in step S41 (NO in step S41), the message processing unit 22 determines whether the message from the accepted message request is a message for a task 20 with multiple message senders (step S44).

ステップＳ４４において複数のメッセージ送信元を持つタスク２０に対するメッセージであった場合（ステップＳ４４のＹＥＳ）、メッセージ処理部２２は、「メッセージ配送」タイプのログエントリをログ（イベントログ３４）に追記する（ステップＳ４５）。その後は、ノード間で合意形成するまでメッセージを実行できないため、前述したステップＳ４３に進み、受理したメッセージ要求をメッセージ待機キュー３６に追加して処理を終了する。 If, in step S44, the message is addressed to a task 20 with multiple message senders (YES in step S44), the message processing unit 22 adds a log entry of type "message delivery" to the log (event log 34) (step S45). After that, since the message cannot be executed until consensus is reached between the nodes, the process proceeds to the aforementioned step S43, where the accepted message request is added to the message waiting queue 36 and processing ends.

ステップＳ４４において複数のメッセージ送信元を持たない（すなわち単一パスで繋がれた）タスク２０に対するメッセージであった場合（ステップＳ４４のＮＯ）、メッセージ処理部２２は、当該メッセージパスがブロック状態とされているか否かを確認する（ステップＳ４６）。当該メッセージパスがブロック状態である場合（ステップＳ４６のＹＥＳ）、ノード間で合意形成するまでメッセージを実行できないため、前述したステップＳ４３に進み、受理したメッセージ要求をメッセージ待機キュー３６に追加して処理を終了する。一方、当該メッセージパスがブロック状態ではない場合には（ステップＳ４６のＮＯ）、合意形成は不要であることから、メッセージ処理部２２は、受理したメッセージ要求による当該メッセージを実行し（ステップＳ４７）、処理を終了する。 If, in step S44, the message is for a task 20 that does not have multiple message senders (i.e., is connected by a single path) (NO in step S44), the message processing unit 22 checks whether the message path is blocked (step S46). If the message path is blocked (YES in step S46), the message cannot be executed until consensus is reached between the nodes, so the process proceeds to the aforementioned step S43, adds the accepted message request to the message waiting queue 36, and terminates processing. On the other hand, if the message path is not blocked (NO in step S46), consensus is not required, so the message processing unit 22 executes the message in accordance with the accepted message request (step S47) and terminates processing.

以上のように図１２に示すメッセージ要求受付処理によれば、新規のメッセージパスによるメッセージの場合、及び複数のメッセージ送信元を持つタスクへのメッセージの場合には、ノード間で合意形成するまでは当該メッセージを実行しないため、メッセージ待機キュー３６にメッセージ要求を一時保留させる。また、既存の単一メッセージパスに対する要求については、当該メッセージパスがブロック状態でない限りは、合意形成することなしにメッセージを実行することができる。また、送信先のタスクに対する２番目のメッセージパスの追加中（ノード間で合意形成中）は、１番目のメッセージパスによるメッセージはブロックされる。 As described above, according to the message request acceptance process shown in Figure 12, in the case of a message via a new message path or a message to a task with multiple message senders, the message will not be executed until consensus is reached between the nodes, so the message request is temporarily put on hold in the message waiting queue 36. Furthermore, for requests to an existing single message path, the message can be executed without consensus being reached as long as the message path is not blocked. Furthermore, while a second message path is being added to the destination task (while consensus is being reached between nodes), messages via the first message path are blocked.

図１３は、メッセージ配送タイプのログエントリの具体例を示す図である。図１３に示したように、メッセージ配送タイプのログエントリ１３０は、タイプ１３１及びメッセージ１３２を有して構成される。タイプ１３１には、当該ログエントリが有する情報のタイプとして、メッセージパスとメッセージ通番が示される。そしてメッセージ１３２には、タイプ１３１に示された各タイプの情報の内容が示される。 Figure 13 shows a specific example of a message delivery type log entry. As shown in Figure 13, a message delivery type log entry 130 is composed of a type 131 and a message 132. Type 131 indicates the type of information contained in the log entry, including the message path and message sequence number. Message 132 indicates the content of each type of information indicated in type 131.

複数のメッセージ送信元を持つメッセージ送信先へのメッセージについては、図１３に示したログエントリ１３０のような「メッセージ配送」タイプのログエントリをログ（イベントログ３４）に追記し、どの順番でメッセージをディスパッチするかをノード間で合意形成する。具体的には、図１３に示したログエントリ１３０が適用される場合、タスクＡからタスクＣへのメッセージパス「Ａ→Ｃ」上のメッセージ通番「３」のメッセージが、タスクＣに対してディスパッチされる。 For messages addressed to a message destination with multiple message senders, a "message delivery" type log entry such as log entry 130 shown in Figure 13 is added to the log (event log 34), and the nodes reach a consensus on the order in which to dispatch the messages. Specifically, when log entry 130 shown in Figure 13 is applied, the message with message sequence number "3" on the message path "A → C" from task A to task C is dispatched to task C.

（６－２）メッセージパス追加処理
以下では、新規のメッセージパスが追加されるときの処理について詳しく説明する。 (6-2) Message Path Addition Processing The processing when a new message path is added will be described in detail below.

前述したように、ＦＴシステム１では、複数のメッセージ送信元を持つメッセージ送信先へのメッセージについては、ノード１０間においてメッセージのディスパッチ順の一致化が必要となる。しかし、複数のメッセージ送信元が存在するか否かは、事前に容易に把握できないことがある。そこで、本実施形態では、イベント発生に基づくタスク２０の実行時に、新規のメッセージパスを追加できるようにする。そしてこのときに不整合が起きないよう、各ノード１０が同じ状態（ディスパッチするメッセージ通番を揃えた状態）で新規のメッセージパスを追加するように合意形成する。 As mentioned above, in the FT system 1, for messages to a message destination with multiple message senders, the message dispatch order must be consistent between nodes 10. However, it may not be easy to determine in advance whether multiple message senders exist. Therefore, in this embodiment, a new message path can be added when a task 20 based on an event occurrence is executed. To prevent inconsistencies from occurring at this time, a consensus is reached so that each node 10 adds a new message path in the same state (with the same message sequence numbers to be dispatched).

図１４は、メッセージパスを追加するときのノード間の関係を説明するためのモデル図である。図１４を参照しながら、新規のメッセージパスを追加するときの合意形成の必要性について、より具体的に説明する。 Figure 14 is a model diagram that explains the relationships between nodes when adding a message path. With reference to Figure 14, we will explain in more detail the need for consensus building when adding a new message path.

図１４には、既存のメッセージパス「Ａ→Ｂ」が存在する状況で、新規のメッセージパス「Ｄ→Ｂ」を追加するというケースが例示されている。なお、図１４において着色済みのメッセージ１０１は、ディスパッチ済みのメッセージを表している。具体的には、図１４の場合、ノード１（リーダ）は＃５までディスパッチ済みであり、ノード２（フォロワ１）は＃６までディスパッチ済みである。 Figure 14 illustrates a case where a new message path "D → B" is added when an existing message path "A → B" exists. Note that the colored messages 101 in Figure 14 represent messages that have already been dispatched. Specifically, in the case of Figure 14, node 1 (leader) has already dispatched up to #5, and node 2 (follower 1) has already dispatched up to #6.

図１４において、新規のメッセージパス「Ｄ→Ｂ」を追加するまでは、タスクＢをメッセージ送信先とするメッセージパスは「Ａ→Ｂ」の単一のメッセージパスしか存在しないため、「Ａ→Ｂ」上のメッセージについてノード間の同期が行われない。したがって、「Ｄ→Ｂ」のメッセージパスを追加しようとした時点で、何番までメッセージをディスパッチ済みかはノード間で一致しないことがある。 In Figure 14, until the new message path "D → B" is added, there is only a single message path "A → B" with task B as the message destination, so messages on "A → B" are not synchronized between nodes. Therefore, when an attempt is made to add the message path "D → B," there may be a discrepancy between nodes as to how many messages have already been dispatched.

図１４のケースを用いて具体的に説明すると、仮に、ノード間で合意形成を行わずに、各ノードが各自のタイミングで新規のメッセージパス「Ｄ→Ｂ」を追加してメッセージのディスパッチを行うとした場合、ノード１では、「Ａ→Ｂ」上の＃５のメッセージがディスパッチされた後に、新規のメッセージパス「Ｄ→Ｂ」が追加され、「Ｄ→Ｂ」上の＃１のメッセージがディスパッチされ、その後に「Ａ→Ｂ」上の＃６，＃７のメッセージが順にディスパッチされることになる。このとき、ノード１（リーダ）のイベントログ３４には、「Ａ→Ｂ＃５」、「Ｄ→Ｂ追加」、「Ｄ→Ｂ＃１」、「Ａ→Ｂ＃６」、「Ａ→Ｂ＃７」の順でログエントリが記録される。新規のメッセージパスの追加とその直前のメッセージ（上記の１番目と２番目）は、不可分な１つのログエントリとして記録されてもよい。 To explain this in more detail using the case in Figure 14, suppose each node adds a new message path "D→B" and dispatches messages at its own timing without reaching a consensus between the nodes. After node 1 dispatches message #5 on "A→B," a new message path "D→B" is added, message #1 on "D→B," and then messages #6 and #7 on "A→B" are dispatched in that order. In this case, the event log 34 of node 1 (the leader) records log entries in the following order: "A→B #5," "D→B Added," "D→B #1," "A→B #6," and "A→B #7." The addition of the new message path and the messages immediately preceding it (the first and second messages above) may be recorded as a single, inseparable log entry.

一方、ノード２では、新規のメッセージパス「Ｄ→Ｂ」が追加される前に、「Ａ→Ｂ」上の＃６のメッセージがディスパッチ済みであることから、「Ａ→Ｂ」上の＃６のメッセージがディスパッチされた後に、「Ｄ→Ｂ」上の＃１のメッセージがディスパッチされ、その後に「Ａ→Ｂ」上の＃７のメッセージがディスパッチされることになる。このとき、ノード２（フォロワ）のイベントログ３４には、「Ａ→Ｂ＃６」、「Ｄ→Ｂ追加」、「Ｄ→Ｂ＃１」、「Ａ→Ｂ＃７」の順でログエントリが記録される。ノード１の場合と同様、新規のメッセージパスの追加とその直前のメッセージ（上記の１番目と２番目）は、不可分な１つのログエントリとして記録されてもよい。 On the other hand, at node 2, message #6 on "A→B" has already been dispatched before the new message path "D→B" is added. Therefore, after message #6 on "A→B" is dispatched, message #1 on "D→B" is dispatched, and then message #7 on "A→B" is dispatched. At this time, the event log 34 of node 2 (follower) records log entries in the following order: "A→B #6," "D→B Added," "D→B #1," and "A→B #7." As with node 1, the addition of the new message path and the messages immediately preceding it (the first and second messages above) may be recorded as a single, inseparable log entry.

上記したノード１及びノード２の処理結果を比較すると、イベントログ３４には異なるログエントリが記録され、ノード１とノード２との間では同期を維持できなくなることが分かる。 Comparing the processing results of node 1 and node 2 above, we can see that different log entries are recorded in the event log 34, and that synchronization cannot be maintained between node 1 and node 2.

そこで、本実施形態では、「Ｄ→Ｂ」のメッセージパスを追加した後、「Ａ→Ｂ」上のメッセージと「Ｄ→Ｂ」上のメッセージについて、どのような順にディスパッチするか、ノード間で一致化させる必要がある。すなわち、「Ａ→Ｂのメッセージパス上のメッセージをＸ番までディスパッチしたところで、Ｄ→Ｂのメッセージパスを追加する」ということを、ノード間で合意形成する必要がある。 In this embodiment, after adding the "D → B" message path, it is necessary to reach agreement among the nodes on the order in which the messages on "A → B" and the messages on "D → B" should be dispatched. In other words, it is necessary to reach agreement among the nodes that "after messages on the A → B message path have been dispatched up to message number X, the D → B message path will be added."

具体的には、まず、ＲＡＦＴには規定されていない本実施形態固有のプロトコルとして、各ノードは、新規のメッセージパスを追加した場合に、図１４の右側に示したように、そのメッセージパス追加に関するメッセージパス情報１０２を他のノードにブロードキャストする。例えば、図１４に示された（ノード１，Ａ→Ｂ，５）というメッセージパス情報１０２は、「ノード１」において、「Ａ→Ｂ」のメッセージパス上で「＃５」のメッセージがディスパッチ済みとなったタイミングで、新たなメッセージパスが追加されたことを表している。次に、リーダが、各ノードからブロードキャストされたメッセージパス情報１０２に基づいて、各ノードにおいて既存のメッセージパス「Ａ→Ｂ」上でディスパッチ済みのメッセージ通番の最大値（例えば＃６）を判別する。そして、以上の処理を踏まえて、各ノードが、「「Ａ→Ｂ」上のディスパッチ済みのメッセージ通番の最大値（＃６）の後に、「Ｄ→Ｂ」のメッセージパスを追加すること」をノード間で合意形成することにより、全てのノードで同期をとることができる。 Specifically, as a protocol unique to this embodiment and not specified in RAFT, when a new message path is added, each node broadcasts message path information 102 related to the message path addition to other nodes, as shown on the right side of Figure 14. For example, the message path information 102 (node 1, A → B, 5) shown in Figure 14 indicates that a new message path was added at node 1 when message #5 was dispatched on the A → B message path. Next, based on the message path information 102 broadcast from each node, the leader determines the highest message sequence number (e.g., #6) of messages dispatched on the existing A → B message path at each node. Based on the above process, each node then agrees to add the D → B message path after the highest message sequence number (#6) dispatched on A → B, thereby achieving synchronization among all nodes.

以下に、上記の合意形成の契機となる、メッセージパスを追加する処理（メッセージパス追加処理）について詳しく説明する。 Below, we will explain in detail the process of adding a message path (message path addition process), which triggers the above-mentioned consensus formation.

図１５は、メッセージパス追加処理の処理手順例を示すフローチャートである。図１５に示すメッセージパス追加処理は、図１２のステップＳ４２に相当する処理であって、メッセージ送信要求の受理によって新規のメッセージパスの追加が必要となる場合に、各ノード１０のメッセージ処理部２２によって実行される。 Figure 15 is a flowchart showing an example of the processing procedure for message path addition processing. The message path addition processing shown in Figure 15 corresponds to step S42 in Figure 12, and is executed by the message processing unit 22 of each node 10 when a message transmission request is received and it becomes necessary to add a new message path.

図１５によればまず、メッセージ処理部２２は、新規に追加するメッセージパス（以後、新規メッセージパス）が、メッセージ送信先にとって２番目のパスであるか否かを判定する（ステップＳ５１）。 As shown in FIG. 15, first, the message processing unit 22 determines whether the newly added message path (hereinafter referred to as the new message path) is the second path for the message destination (step S51).

ステップＳ５１において新規メッセージパスが２番目のメッセージパスである場合（ステップＳ５１のＹＥＳ）、メッセージ処理部２２は、１番目のメッセージパスをブロック状態に設定する（ステップＳ５２）。次いで、メッセージ処理部２２は、既存のメッセージパスのメッセージパス情報（図１４及び図１８を参照）を、他ノードにブロードキャストし（ステップＳ５３）、メッセージパス追加処理を終了する。 If the new message path is the second message path in step S51 (YES in step S51), the message processing unit 22 sets the first message path to a blocked state (step S52). Next, the message processing unit 22 broadcasts the message path information of the existing message path (see Figures 14 and 18) to other nodes (step S53), and ends the message path addition process.

なお、ステップＳ５３のブロードキャストを受信した各ノードでは、後述する図１９のメッセージパス情報受信処理が行われることにより、受信したメッセージパス情報がメッセージパス管理テーブル３５に登録され、リーダノードが、ノード間で一致化させるディスパッチ順を決定する。 In addition, each node that receives the broadcast in step S53 performs the message path information reception process shown in Figure 19, which will be described later, and the received message path information is registered in the message path management table 35, and the leader node determines the dispatch order to ensure consistency between nodes.

ステップＳ５１において新規メッセージパスが２番目のメッセージパスではない、すなわち、１番目または３番目以降のメッセージパスである場合（ステップＳ５１のＮＯ）、メッセージ処理部２２は、自ノードがリーダであるか否かを確認する（ステップＳ５４）。自ノードがリーダである場合（ステップＳ５４のＹＥＳ）、メッセージ処理部２２は、「メッセージパス追加」タイプのログエントリをログ（イベントログ３４）に追記し（ステップＳ５５）、メッセージパス追加処理を終了する。自ノードがフォロワである場合（ステップＳ５４のＮＯ）、メッセージ処理部２２は、特段の処理を行うことなくメッセージパス追加処理を終了する。 If the new message path is not the second message path in step S51, i.e., if it is the first or third or subsequent message path (NO in step S51), the message processing unit 22 checks whether the own node is the leader (step S54). If the own node is the leader (YES in step S54), the message processing unit 22 adds a log entry of type "Message path added" to the log (event log 34) (step S55) and ends the message path addition process. If the own node is a follower (NO in step S54), the message processing unit 22 ends the message path addition process without performing any special processing.

図１６は、メッセージパス追加タイプのログエントリの一例を示す図である。図１６に示したログエントリ１４０は、新規メッセージパスが１番目または３番目以降のメッセージパスである場合に、図１５のステップＳ５５でログ（イベントログ３４）に追記されるログエントリの一例である。 Figure 16 shows an example of a message path addition type log entry. Log entry 140 shown in Figure 16 is an example of a log entry that is added to the log (event log 34) in step S55 of Figure 15 when the new message path is the first or third or subsequent message path.

ログエントリ１４０は、新規メッセージパスの追加を単位として管理されるデータであって、既存のメッセージパスを示す既存メッセージパス１４１と、新規メッセージパスを示す追加メッセージパス１４２と、新規メッセージパスの追加時に既存のメッセージパス上でディスパッチ済みのメッセージの通番を示すディスパッチ済み通番１４３とを有する。 Log entry 140 is data managed in units of adding a new message path, and includes an existing message path 141 indicating an existing message path, an added message path 142 indicating a new message path, and a dispatched sequence number 143 indicating the sequence number of a message already dispatched on the existing message path when the new message path was added.

図１６のログエントリ１４０の場合は、新規メッセージパスがメッセージ送信先にとって２番目のメッセージパスではないため、既存メッセージパスのメッセージパス情報をブロードキャストする処理（図１５のステップＳ５３）は行われない。さらに、図１５のステップＳ５３を受けて各ノードで実行される図１９のメッセージパス情報追加処理も実行されないので、ディスパッチ通番を参照する必要もない（図１９のステップＳ６４が行われない）。したがって、図１６のログエントリ１４０では、既存メッセージパス１４１及びディスパッチ済み通番１４３に値を指定しなくてよく、追加メッセージパス１４２に、新規メッセージパスがタスクＡからタスクＢに向けたメッセージパスであることを示す「Ａ→Ｂ」が指定される。 In the case of log entry 140 in Figure 16, the new message path is not the second message path for the message destination, so the process of broadcasting message path information for the existing message path (step S53 in Figure 15) is not performed. Furthermore, the message path information addition process in Figure 19, which is executed at each node in response to step S53 in Figure 15, is not executed, so there is no need to reference the dispatch sequence number (step S64 in Figure 19 is not performed). Therefore, in log entry 140 in Figure 16, it is not necessary to specify values for existing message path 141 and dispatched sequence number 143, and "A → B" is specified for added message path 142, indicating that the new message path is a message path from task A to task B.

図１７は、メッセージパス追加タイプのログエントリの別例を示す図である。図１７に示したログエントリ１５０は、新規メッセージパスが１番目または３番目以降のメッセージパスである場合に、後述する図１９のステップＳ６４でログ（イベントログ３４）に追記されるログエントリの一例である。 Figure 17 shows another example of a message path addition type log entry. Log entry 150 shown in Figure 17 is an example of a log entry that is added to the log (event log 34) in step S64 of Figure 19 (described below) when the new message path is the first or third or subsequent message path.

ログエントリ１５０は、図１６のログエントリ１４０と同様に、新規メッセージパスの追加を単位として管理されるデータであって、既存のメッセージパスを示す既存メッセージパス１５１と、新規メッセージパスを示す追加メッセージパス１５２と、新規メッセージパスの追加時に既存のメッセージパス上でディスパッチ済みのメッセージの通番を示すディスパッチ済み通番１５３とを有する。 Like log entry 140 in Figure 16, log entry 150 is data managed in units of the addition of a new message path, and includes an existing message path 151 indicating an existing message path, an added message path 152 indicating a new message path, and a dispatched sequence number 153 indicating the sequence number of a message already dispatched on the existing message path when the new message path was added.

具体的には、図１７のログエントリ１５０を適用した場合には、既存の「Ａ→Ｂ」のメッセージパス上でメッセージ通番「５」番までをディスパッチした後、「Ｄ→Ｂ」のメッセージパスがメッセージパス管理テーブル３５に追加される。 Specifically, when log entry 150 in Figure 17 is applied, after messages up to message sequence number "5" are dispatched on the existing "A → B" message path, the "D → B" message path is added to the message path management table 35.

図１８は、メッセージパス情報の一例を示す図である。メッセージパス情報は、あるメッセージ送信先に対する２番目のメッセージパスが追加される際に、１番目のメッセージパス上でどこまでメッセージがディスパッチされたかをノード間で共有するためのデータである。また、メッセージパス情報は、どのタイミングでメッセージパスの追加を行うか、リーダが判断するためのヒントとして用いられる。 Figure 18 shows an example of message path information. Message path information is data shared between nodes to indicate how far a message has been dispatched on the first message path when a second message path to a certain message destination is added. The message path information is also used as a hint for the leader to determine when to add a message path.

図１８に示すメッセージパス情報１６０は、ノードＩＤ１６１、既存メッセージパス１６２、追加メッセージパス１６３、及びディスパッチ済み通番１６４を有する。ノードＩＤ１６１は、当該メッセージパス情報１６０の送信元のノード１０の識別子を示す。既存メッセージパス１６２は、既存の１番目のメッセージパスの送信元及び送信先を示す。追加メッセージパス１６３は、追加しようとしているメッセージパス（すなわち、既存メッセージパスと同じメッセージ送信先を持つ２番目のメッセージパス）の送信元及び送信先を示す。ディスパッチ済み通番１６４は、既存メッセージパス１６２に示される１番目のメッセージパスにおいてディスパッチ済みのメッセージ通番の最大値を示す。 The message path information 160 shown in FIG. 18 includes a node ID 161, an existing message path 162, an additional message path 163, and a dispatched sequence number 164. The node ID 161 indicates the identifier of the node 10 that is the sender of the message path information 160. The existing message path 162 indicates the sender and destination of the first existing message path. The additional message path 163 indicates the sender and destination of the message path to be added (i.e., the second message path that has the same message destination as the existing message path). The dispatched sequence number 164 indicates the maximum value of the dispatched message sequence number for the first message path indicated in the existing message path 162.

図１９は、メッセージパス情報受信処理の処理手順例を示すフローチャートである。図１９に示すメッセージパス情報受信処理は、図１５のステップＳ５３で既存のメッセージパスのメッセージパス情報がブロードキャストされたときに、このメッセージパス情報を受信した各ノード１０においてメッセージ処理部２２が実行する処理である。すなわち、メッセージパス情報受信処理は、新規メッセージパスがメッセージ送信先にとって２番目のパスである場合に実行される。 Figure 19 is a flowchart showing an example of the processing procedure for message path information reception processing. The message path information reception processing shown in Figure 19 is processing executed by the message processing unit 22 in each node 10 that receives message path information when message path information for an existing message path is broadcast in step S53 of Figure 15. In other words, the message path information reception processing is executed when the new message path is the second path for the message destination.

図１９によればまず、メッセージ処理部２２は、受信したメッセージパス情報をメッセージパス管理テーブル３５に登録する（ステップＳ６１）。メッセージパス管理テーブル３５の具体例は、後述する図２０に示される。 According to FIG. 19, first, the message processing unit 22 registers the received message path information in the message path management table 35 (step S61). A specific example of the message path management table 35 is shown in FIG. 20, which will be described later.

次に、メッセージ処理部２２は、自ノードがリーダであるか否かを確認する（ステップＳ６２）。自ノードがリーダである場合は（ステップＳ６２のＹＥＳ）、ステップＳ６３に進み、自ノードがリーダではない、すなわちフォロワである場合は（ステップＳ６２のＮＯ）、メッセージパス情報受信処理を終了する。 Next, the message processing unit 22 checks whether the node itself is the leader (step S62). If the node itself is the leader (YES in step S62), the process proceeds to step S63. If the node itself is not the leader, i.e., is a follower (NO in step S62), the message path information reception process ends.

ステップＳ６３では、リーダのメッセージ処理部２２は、他の全ノードから、当該新規メッセージパスの追加に関するメッセージパス情報を受信済みであるか否かを判定する。他の全ノードから上記メッセージパス情報を受信済みである場合（ステップＳ６３のＹＥＳ）、ステップＳ６４に進み、他の全ノードからの上記メッセージパス情報が揃っていない場合は（ステップＳ６３のＮＯ）、メッセージパス情報受信処理を終了する。 In step S63, the message processing unit 22 of the leader determines whether message path information regarding the addition of the new message path has been received from all other nodes. If the message path information has been received from all other nodes (YES in step S63), the process proceeds to step S64. If the message path information has not been received from all other nodes (NO in step S63), the message path information reception process ends.

ステップＳ６４では、リーダのメッセージ処理部２２は、自ノードを含むの全てのノードからのメッセージパス情報に基づいて、既存のメッセージパス上のディスパッチ済みのメッセージ通番の最大値を判別する。そして、判別した最大値のメッセージ通番をディスパッチ済み通番とした「メッセージパス追加」タイプのログエントリをログ（イベントログ３４）に追記し、メッセージパス情報受信処理を終了する。 In step S64, the message processing unit 22 of the leader determines the maximum value of the dispatched message sequence number on the existing message paths based on the message path information from all nodes, including the leader node. It then adds a "Message path added" type log entry to the log (event log 34) with the determined maximum message sequence number as the dispatched sequence number, and terminates the message path information reception process.

図２０は、メッセージパス管理テーブル３５の具体例を示す図である。図２０に示したメッセージパス管理テーブル１７０は、メッセージパス１７１及びディスパッチ済みメッセージ通番１７２を有する。メッセージパス１７１は、既存のメッセージパスの送信元及び送信先を示し、ディスパッチ済みメッセージ通番１７２は、既存メッセージパス上のディスパッチ済みのメッセージ通番を示す。 Figure 20 shows a specific example of the message path management table 35. The message path management table 170 shown in Figure 20 has a message path 171 and a dispatched message sequence number 172. The message path 171 indicates the source and destination of an existing message path, and the dispatched message sequence number 172 indicates the sequence number of a dispatched message on an existing message path.

ここで、図２０のメッセージパス管理テーブル１７０は、「Ａ→Ｂ」のメッセージパス１７１におけるディスパッチ済みメッセージ通番１７２が「５，８」となっている。これは例えば、他ノードからのメッセージパス情報の受信（図１９のステップＳ６１）において、メッセージパス「Ａ→Ｂ」上のディスパッチ済み通番が「５」であることを示すメッセージパス情報（例えば図１８に示したノードＩＤ「２」からのメッセージパス情報１６０）とは別に、メッセージパス「Ａ→Ｂ」上のディスパッチ済み通番が「８」であることを示すメッセージパス情報を受信したことを意味する。このようなメッセージパス管理テーブル１７０がリーダノード１０Ａのメッセージパス管理テーブル３５である場合、図１９のステップＳ６４において、リーダのメッセージ処理部２２は、メッセージパス「Ａ→Ｂ」上のディスパッチ済みのメッセージ通番の最大値が「８」であると判別する。したがって、ステップＳ６４においてリーダのメッセージ処理部２２は、既存メッセージパス１５１を「Ａ→Ｂ」、追加メッセージパス１５２を「Ｄ→Ｂ」、ディスパッチ済み通番１５３を「８」とした「メッセージパス追加」タイプのログエントリを、ログ（イベントログ３４）に追記することになる。 In the message path management table 170 in FIG. 20, the dispatched message sequence number 172 for the message path 171 "A → B" is "5, 8." This means, for example, that when receiving message path information from another node (step S61 in FIG. 19), message path information indicating that the dispatched sequence number for the message path "A → B" is "8" was received in addition to message path information indicating that the dispatched sequence number for the message path "A → B" is "5" (e.g., message path information 160 from node ID "2" shown in FIG. 18). If this message path management table 170 is the message path management table 35 of the leader node 10A, in step S64 in FIG. 19, the message processing unit 22 of the leader determines that the maximum value of the dispatched message sequence numbers for the message path "A → B" is "8." Therefore, in step S64, the message processing unit 22 of the reader adds a log entry of the "Message path added" type to the log (event log 34), with the existing message path 151 set to "A → B", the additional message path 152 set to "D → B", and the dispatched sequence number 153 set to "8".

以上に説明した図１５のメッセージパス追加処理及び図１９のメッセージパス情報受信処理は、以下のような特徴を備える。 The message path addition process in Figure 15 and the message path information reception process in Figure 19 described above have the following features:

メッセージ送信先のタスク等にとって新規メッセージパスが２番目のメッセージパスである場合に、メッセージパス情報受信処理において「メッセージパス追加」タイプのログエントリがログに追記される（図１９のステップＳ６４）ことに伴って、ノード間でメッセージパス追加の合意形成が行われる。このとき、新規メッセージパスの追加に関する合意形成が完了するまで、当該メッセージ送信先へのメッセージのディスパッチが保留される（図１５のステップＳ５２）。 If the new message path is the second message path for the message destination task, etc., a log entry of type "Message path added" is added to the log during the message path information reception process (step S64 in Figure 19), and consensus is reached between nodes on the addition of the message path. At this time, the dispatch of the message to the message destination is put on hold until consensus is reached on the addition of the new message path (step S52 in Figure 15).

また、メッセージパス追加の合意形成が完了し、当該ログエントリがコミット及び適用されることで、当該メッセージパスがメッセージパス管理テーブル３５に追加される（図１５のステップＳ５５，図１９のステップＳ６１）。 Furthermore, once consensus on the message path addition is reached and the log entry is committed and applied, the message path is added to the message path management table 35 (step S55 in FIG. 15, step S61 in FIG. 19).

新規メッセージパスが１番目のメッセージパスである場合、メッセージパス管理テーブル３５に当該メッセージパスが追加されるまで、１番目のメッセージパス上のメッセージのディスパッチを保留する。 If the new message path is the first message path, dispatching of messages on the first message path is suspended until the message path is added to the message path management table 35.

新規メッセージパスが３番目以降のメッセージパスである場合、当該新規メッセージパスの追加以降に発生するメッセージ送信先へのメッセージについては、当該新規メッセージパスの追加に相当する「メッセージパス追加」タイプのログエントリの後に、ログに追記されることから、結果として、新規メッセージパスの追加が完了するまでディスパッチが保留されることになる。 If the new message path is the third or subsequent message path, messages to the message destination that occur after the new message path is added will be added to the log after the "Message path added" type log entry that corresponds to the addition of the new message path. As a result, dispatching will be put on hold until the addition of the new message path is complete.

（６－３）メッセージパス追加時のメッセージ待機時間の改善
前述した通り、２番目のメッセージパスを追記する際には、「メッセージパス追加」タイプのログエントリについてノード間の合意形成が完了し、メッセージパス管理テーブルに当該メッセージパスが追加されるまで、１番目のメッセージパス上のメッセージは、一時的にディスパッチが保留される。このとき、ＦＴシステム１では、ディスパッチが保留されている時間（メッセージ待機時間）だけ、処理が停滞する。 (6-3) Improving message waiting time when adding a message path As mentioned above, when adding a second message path, dispatching of messages on the first message path is temporarily suspended until consensus is reached between nodes on the "message path added" type log entry and the message path is added to the message path management table. At this time, processing in the FT system 1 stagnates for the time that dispatching is suspended (message waiting time).

このような処理の停滞を抑制するために、例えば、システムテストの段階でメッセージパス管理テーブル３５の内容をある程度構築してその内容を退避させ、実運用時に当該内容を復元することが考えられる。この方法によれば、動的なメッセージパスの追加の発生頻度をある程度抑制することができるが、さらなる抑制ができるとより好ましい。 To prevent such processing stalls, one possible solution is to build the contents of the message path management table 35 to a certain extent during system testing, save the contents, and then restore them during actual operation. This method can reduce the frequency of dynamic message path additions to a certain extent, but it would be even better if it could be reduced even further.

そこで、本実施形態では、メッセージ待機時間をさらに改善する方法として、例えば以下の２つの改善方法を採用することができる。 Therefore, in this embodiment, the following two improvement methods can be adopted to further improve message waiting time.

（６－３－１）第１の改善方法
第１の改善方法は、ノードが、２番目のメッセージパスの追加に関するメッセージパス情報を他ノードから受信した時点で、メッセージパス管理テーブル３５へのメッセージパス情報の追記（図１９のステップＳ６１）に先行して、当該メッセージパス情報に記載されたディスパッチ済みメッセージ通番の最大値まで、１番目のメッセージパス上のメッセージをディスパッチするものである。 (6-3-1) First Improvement Method In the first improvement method, when a node receives message path information regarding the addition of a second message path from another node, the node dispatches messages on the first message path up to the maximum value of the dispatched message sequence number written in the message path information, prior to adding the message path information to the message path management table 35 (step S61 in FIG. 19).

第１の改善方法は、ノードの増減が殆どない状況に適しており、最大限、ディスパッチの保留による遅延を抑制することができる。 The first improvement method is suitable for situations where there is little increase or decrease in the number of nodes, and can minimize delays caused by deferred dispatching.

但し、第１の改善方法は、ノードの増減頻度が高い状況では、次のようなリスクを有する。メッセージパス情報でディスパッチ済みメッセージ通番の最大値Ｍを報告したノードが、当該メッセージパス情報をリーダに伝える前に障害停止した場合、リーダはＮ＜ＭとなるＮをディスパッチ済みメッセージ通番として、「メッセージパス追加」タイプのログエントリをログに追記する。このとき既にＭまでディスパッチ済みのノードが存在した場合、そのノードはリーダから指示された通りにログエントリを適用することができず、クラスタから除外する必要がある。したがって、ノードの増減頻度が高いほど、ＦＴシステム１におけるノード１０の多重系を維持できなくなるリスクが上昇する。 However, the first improvement method poses the following risk in situations where the number of nodes increases or decreases frequently. If a node that reports the maximum value M of the dispatched message sequence number in its message path information fails and stops before transmitting that message path information to the leader, the leader will add an "Add message path" type log entry to the log, with N < M as the dispatched message sequence number. If there is a node that has already dispatched up to M at this time, that node will not be able to apply the log entry as instructed by the leader and will need to be excluded from the cluster. Therefore, the more frequently the number of nodes increases or decreases, the greater the risk that the multiplexed system of nodes 10 in the FT system 1 will not be able to be maintained.

（６－３－２）第２の改善方法
第２の改善方法は、メッセージパス追加の合意形成において、上述した方法のようにリーダが全てのノードから２番目のメッセージパスの追加に関するメッセージパス情報を受信してからノード間で合意形成するのではなく、リーダが全てのノードから２番目のメッセージパスの追加に関するメッセージパス情報を受信していく途中で、メッセージ済み通番の最大値が更新された場合に都度、ＲＡＦＴなどの合意形成プロトコルに基づきノード間でメッセージパス追加の合意形成を行い、合意形成できた段階までメッセージをディスパッチするものである。以下に、第２の改善方法を実現するための構成及び処理について詳しく説明する。 (6-3-2) Second Improvement Method In the second improvement method, in reaching a consensus on adding a message path, rather than reaching a consensus between nodes after the leader receives message path information regarding the addition of a second message path from all nodes as in the method described above, whenever the maximum value of the message completion sequence number is updated while the leader is receiving message path information regarding the addition of a second message path from all nodes, a consensus on adding a message path is reached between nodes based on a consensus protocol such as RAFT, and messages are dispatched until a consensus is reached. Below, the configuration and processing for realizing the second improvement method are described in detail.

第２の改善方法を採用する場合、ＦＴシステム１は、新たな種類のログエントリとして、「メッセージパス情報」タイプのログエントリを導入する。 When adopting the second improvement method, the FT system 1 introduces a new type of log entry, a "message path information" type log entry.

「メッセージパス情報」のログエントリは、あるメッセージ送信先に２本目のメッセージパスを追加する際に、既存のメッセージパス上のディスパッチ済みメッセージ通番について、「少なくともＮ番までディスパッチした後に新規メッセージパスを追加する」ことについてノード間で合意をとる、ことを目的とする。合意形成は過半数のノードからの応答があれば可能であるため、このような「メッセージパス情報」のログエントリを利用することにより、リーダが全てのフォロワからメッセージパス情報を受信する前に、先行的に、Ｎ番までのメッセージ要求をディスパッチすることができる。「メッセージパス情報」のログエントリを適用するときの具体的な動作としては、「ブロック状態」が設定された既存のメッセージパス上で滞留しているメッセージ要求について、当該「メッセージパス情報」のログエントリに示されるディスパッチ済みメッセージ通番に対応するメッセージ要求までを、順にディスパッチする。 The purpose of the "Message Path Information" log entry is to reach an agreement among nodes when adding a second message path to a certain message destination that "a new message path will be added after dispatching at least N message sequence numbers on the existing message path." Since consensus can be reached with a response from a majority of nodes, by using this "Message Path Information" log entry, it is possible to preemptively dispatch message requests up to N before the leader receives message path information from all followers. The specific behavior when applying the "Message Path Information" log entry is to dispatch message requests that are stuck on an existing message path with a "blocked state" set, in order, up to the message request that corresponds to the dispatched message sequence number indicated in the "Message Path Information" log entry.

図２１は、メッセージパス情報タイプのログエントリの一例を示す図である。図２１に示したように、メッセージパス情報タイプのログエントリ１８０は、タイプ１８１及びメッセージパス情報１８２を有して構成される。タイプ１８１には、当該ログエントリが有する情報のタイプとして、ノードＩＤ、既存メッセージパス、追加メッセージパス、及びディスパッチ済み通番が示される。そしてメッセージパス情報１８２には、タイプ１８１に示された各タイプの情報の内容が示される。 Figure 21 shows an example of a log entry of the message path information type. As shown in Figure 21, a log entry 180 of the message path information type is composed of a type 181 and message path information 182. Type 181 indicates the type of information contained in the log entry, including the node ID, existing message path, additional message path, and dispatched sequence number. Message path information 182 indicates the content of each type of information indicated in type 181.

図２２は、第２の改善方法によるメッセージパス追加の処理手順例を示すシーケンス図である。図２２では、説明のために簡略化したノード構成として、１つのリーダ（ノードＩＤ＝１）と２つのフォロワ（ノードＩＤ＝２，３）とが示されている。また、後述する各ステップの処理は、対応するノードのメッセージ処理部３２が実行する。 Figure 22 is a sequence diagram showing an example of the processing procedure for adding a message path using the second improvement method. For the sake of explanation, Figure 22 shows a simplified node configuration with one leader (node ID = 1) and two followers (node IDs = 2 and 3). The processing of each step described below is performed by the message processing unit 32 of the corresponding node.

図２２において、ログエントリ１９１，１９３は、ステップＳ７１，Ｓ７３でログ（イベントログ３４）に追記される「メッセージパス情報」タイプのログエントリの一例であり、ログエントリ１９５は、ステップＳ７５でログ（イベントログ３４）に追記される「メッセージパス追加」タイプのログエントリの一例である。また、メッセージパス情報１９２，１９４は、ステップＳ７２，Ｓ７４でブロードキャストされるメッセージパス情報の一例である。メッセージパス情報１９２，１９４は、ＲＡＦＴには規定されていない本実施形態固有のプロトコルによって、あるノードから他ノードにブロードキャストされる通知である。 In FIG. 22, log entries 191 and 193 are examples of "message path information" type log entries added to the log (event log 34) in steps S71 and S73, and log entry 195 is an example of a "message path addition" type log entry added to the log (event log 34) in step S75. Furthermore, message path information 192 and 194 are examples of message path information broadcast in steps S72 and S74. Message path information 192 and 194 are notifications broadcast from one node to other nodes using a protocol specific to this embodiment that is not specified in RAFT.

図２２によればまず、リーダノードであるノード１が、自ノードの「メッセージパス情報」のログエントリ１９１をログに追記する（ステップＳ７１）。このログへの追記の結果、ＲＡＦＴプロトコルに従って、当該ログの複製がフォロワノード（ノード２，３）にブロードキャストされる。そして各ノードは、ログエントリ１９１のコミットを完了し次第、これを適用する。この結果、各ノードは、具体的には、メッセージ通番「５」まで、メッセージパス「Ａ→Ｂ」上でブロックされていたメッセージ要求をディスパッチする。 As shown in Figure 22, first, node 1, the leader node, adds log entry 191 of its own node's "message path information" to the log (step S71). As a result of this addition to the log, a copy of the log is broadcast to follower nodes (nodes 2 and 3) according to the RAFT protocol. Each node then applies log entry 191 as soon as it has committed it. As a result, each node dispatches message requests that were blocked on message path "A → B" up to message sequence number "5."

その後、フォロワノードであるノード３から、メッセージ通番「６」をディスパッチ済みとするメッセージパス情報１９２がブロードキャストされたとする（ステップＳ７２）。 Then, let us assume that follower node Node 3 broadcasts message path information 192 indicating that message sequence number "6" has been dispatched (Step S72).

この場合、メッセージパス情報１９２を受信したノード１は、メッセージパス情報１９２におけるメッセージパス「Ａ→Ｂ」上のディスパッチ済みメッセージ通番が「６」となっており、ステップＳ７１でログに追記したログエントリ１９１に示されるディスパッチ済みメッセージ通番の「５」よりも大きいことから、メッセージパス情報１９２の内容を示す「メッセージパス情報」タイプのログエントリ１９３をログに追記する（ステップＳ７３）。この追記により、「メッセージパス情報」タイプのログエントリにおけるディスパッチ済みメッセージ通番が「６」に更新される。そして、ステップＳ７１におけるログへの追記と同様、ノード１は、ＲＡＦＴプロトコルに従って、当該ログの複製をフォロワノード（ノード２，３）にブロードキャストし、各ノードはコミットが完了し次第、これを適用する。 In this case, node 1, which received message path information 192, notices that the dispatched message sequence number on message path "A → B" in message path information 192 is "6," which is greater than the dispatched message sequence number "5" indicated in log entry 191 added to the log in step S71. Therefore, node 1 adds log entry 193 of type "message path information" indicating the contents of message path information 192 to the log (step S73). This addition updates the dispatched message sequence number in the log entry of type "message path information" to "6." Then, just as with the addition to the log in step S71, node 1 broadcasts a copy of the log to follower nodes (nodes 2 and 3) in accordance with the RAFT protocol, and each node applies it as soon as the commit is complete.

またその後、フォロワノードであるノード２から、メッセージ通番「４」をディスパッチ済みとするメッセージパス情報１９４がブロードキャストされたとする（ステップＳ７４）。この場合、メッセージパス情報１９４を受信したノード１は、メッセージパス情報１９４におけるメッセージパス「Ａ→Ｂ」上のディスパッチ済みメッセージ通番が「４」であり、これまでのディスパッチ済みメッセージ通番の最大値である「６」よりも小さいので、特段の処理を行わない。 Furthermore, suppose that follower node Node 2 subsequently broadcasts message path information 194 indicating that message sequence number "4" has been dispatched (Step S74). In this case, Node 1, which receives message path information 194, does not perform any special processing because the dispatched message sequence number on message path "A → B" in message path information 194 is "4," which is smaller than the maximum value of "6" that has been the largest number of dispatched messages up to that point.

そして、リーダノードであるノード１は、全てのノードからのメッセージパス情報を受信すると、最終的にログに追記された「メッセージパス情報」タイプのログエントリに基づいて、「メッセージパス追加」タイプのログエントリを生成し、これをログ（イベントログ３４）に追記する（ステップＳ７５）。具体的には、メッセージパス「Ａ→Ｂ」上のディスパッチ済みメッセージ通番を「６」とし、追加メッセージパスを「Ｄ→Ｂ」とする「メッセージパス追加」タイプのログエントリ１９５をログに追記する。なお、このログ追記以降の適用処理等は、図１２で説明した処理と同様である。 When node 1, the leader node, receives message path information from all nodes, it generates a log entry of type "Message path addition" based on the log entry of type "Message path information" that was finally added to the log, and adds this to the log (event log 34) (step S75). Specifically, it adds to the log a log entry of type "Message path addition" 195 that sets the dispatched message sequence number on message path "A → B" to "6" and the added message path to "D → B". Note that the application processing after this log addition is the same as the processing described in Figure 12.

上記のように、第２の改善方法では、メッセージパスを追加する際に「メッセージパス情報」のログエントリを利用して合意形成を行うため、図１５に示したメッセージパス追加処理と、図１９に示したメッセージパス情報受信処理の処理手順が、以下のように一部変更される。 As described above, in the second improvement method, consensus is reached using the log entry of "message path information" when adding a message path, so the processing procedures for the message path addition process shown in Figure 15 and the message path information reception process shown in Figure 19 are partially modified as follows:

図２３は、第２の改善方法によるメッセージパス追加処理の処理手順例を示すフローチャートである。図２３のフローチャートについて、図１５に示したメッセージパス追加処理との相違点を中心に説明し、共通する処理の説明は省略する。 Figure 23 is a flowchart showing an example of the processing procedure for message path addition processing using the second improvement method. The flowchart in Figure 23 will be explained, focusing on the differences from the message path addition processing shown in Figure 15, and a description of the common processing will be omitted.

図２３によれば、ステップＳ５２において１番目のメッセージパスをブロック状態に設定した後、メッセージ処理部２２が、自ノードがリーダであるか否かを確認する（ステップＳ８１）。そして、自ノードがリーダである場合に（ステップＳ８１のＹＥＳ）、メッセージ処理部２２は、自ノードの「メッセージパス情報」のログエントリをログに追記する（ステップＳ８２）。このステップＳ８２の処理は、ＲＡＦＴプロトコルにおけるログ追記によって実現され、図２２のステップＳ７１で説明したように、ログへの追記の結果、ＲＡＦＴプロトコルに従って、当該ログの複製がフォロワにブロードキャストされる。一方、自ノードがリーダではない、すなわちフォロワであった場合は（ステップＳ８１のＮＯ）、図２２のステップＳ７２，Ｓ７４で説明したように、メッセージ処理部２２が、ＲＡＦＴプロトコルから独立した本実施形態固有のメッセージ送信によって、既存のメッセージパスのメッセージパス情報を他ノードにブロードキャストする（ステップＳ５３）。 As shown in FIG. 23, after setting the first message path to a blocked state in step S52, the message processing unit 22 checks whether the node itself is the leader (step S81). If the node itself is the leader (YES in step S81), the message processing unit 22 adds a log entry of the node's "message path information" to the log (step S82). This step S82 process is realized by adding a log entry in the RAFT protocol. As explained in step S71 of FIG. 22, as a result of adding the entry to the log, a copy of the log is broadcast to followers in accordance with the RAFT protocol. On the other hand, if the node itself is not the leader, i.e., is a follower (NO in step S81), the message processing unit 22 broadcasts message path information of the existing message path to other nodes by message transmission specific to this embodiment, independent of the RAFT protocol, as explained in steps S72 and S74 of FIG. 22 (step S53).

図２４は、第２の改善方法によるメッセージパス情報受信処理の処理手順例を示すフローチャートである。図２４のフローチャートについて、図１９に示したメッセージパス情報受信処理との相違点を中心に説明し、共通する処理の説明は省略する。 Figure 24 is a flowchart showing an example of the processing procedure for message pass information reception processing using the second improvement method. The flowchart in Figure 24 will be explained, focusing on the differences from the message pass information reception processing shown in Figure 19, and a description of the common processing will be omitted.

図２４によれば、メッセージパス情報を受信したノードのメッセージ処理部２２が当該メッセージパス情報をメッセージパス管理テーブル３５に登録し（ステップＳ６１）、当該ノードがリーダノードであった場合に（ステップＳ６２のＹＥＳ）、新規メッセージパスの追加に関するメッセージパス情報を他の全てのノードから受信済みであるか否かを判定する（ステップＳ６３）。 As shown in FIG. 24, the message processing unit 22 of the node that received the message path information registers the message path information in the message path management table 35 (step S61), and if the node is the leader node (YES in step S62), it determines whether message path information regarding the addition of a new message path has been received from all other nodes (step S63).

そして、ステップＳ６３においてリーダが新規メッセージパスの追加に関するメッセージパス情報を他の全てのノードから受信済みでない場合に（ステップＳ６３のＮＯ）、リーダのメッセージ処理部２２は、ステップＳ６１におけるメッセージパス情報のメッセージパス管理テーブル３５への登録によってディスパッチ済みメッセージ通番の最大値が更新されたか否かを確認する（ステップＳ９１）。 Then, if the leader has not received message path information regarding the addition of a new message path from all other nodes in step S63 (NO in step S63), the message processing unit 22 of the leader checks whether the maximum value of the dispatched message sequence number has been updated by registering the message path information in the message path management table 35 in step S61 (step S91).

ステップＳ９１においてディスパッチ済みメッセージ通番の最大値が更新されていた場合（ステップＳ９１のＹＥＳ）、リーダのメッセージ処理部２２は、ステップＳ６１で受信したメッセージパス情報に基づいて、「メッセージパス情報」タイプのログエントリをログに追記し（ステップＳ９２）、メッセージパス情報受信処理を終了する。ステップＳ９２の処理は、図２２のステップＳ７１の処理に相当する。すなわち、リーダのメッセージ処理部２２は、既存のメッセージパス上のディスパッチ済みメッセージ通番について、現時点で他ノードから受信したメッセージパス情報のなかで自ノード分も含めた「最大値」が更新された場合に、すべてのノードからメッセージパス情報を受信する前の途中でも、「メッセージパス情報」タイプのログエントリをログに追記する。 If the maximum value of the dispatched message sequence number has been updated in step S91 (YES in step S91), the message processing unit 22 of the leader adds a log entry of type "message path information" to the log based on the message path information received in step S61 (step S92), and terminates the message path information reception process. The processing of step S92 corresponds to the processing of step S71 in FIG. 22. In other words, if the "maximum value" of the dispatched message sequence number on an existing message path, including that of its own node, is updated among the message path information currently received from other nodes, the message processing unit 22 of the leader adds a log entry of type "message path information" to the log even if it is in the middle of receiving message path information from all nodes before receiving message path information.

一方、ステップＳ９１においてディスパッチ済みメッセージ通番の最大値が更新されていなかった場合（ステップＳ９１のＮＯ）、リーダのメッセージ処理部２２は特段の処理を行わずにメッセージパス情報受信処理を終了する。この処理手順は、図２２のステップＳ７４においてフォロワ（ノード２）からメッセージパス情報１９４がブロードキャストされた場合に、このメッセージパス情報１９４を受け取ったリーダ（ノード１）が特段の処理を行わないことに対応している。 On the other hand, if the maximum value of the dispatched message sequence number has not been updated in step S91 (NO in step S91), the message processing unit 22 of the leader terminates the message pass information reception process without performing any special processing. This processing procedure corresponds to the case where message pass information 194 is broadcast from the follower (node 2) in step S74 of Figure 22, and the leader (node 1) receives this message pass information 194 and does not perform any special processing.

以上の説明をまとめると、メッセージパス追加時の第２の改善方法は、（６－２）で説明したメッセージパス追加処理の方法と比べたときに、次のような特徴を有する。 To summarize the above explanation, the second improvement method for adding a message path has the following features when compared to the message path addition processing method described in (6-2):

第２の改善方法における（６－２）のメッセージパス追加処理との相違点として、まず、「メッセージパス情報」タイプのログエントリが導入される。そして、リーダノードは、メッセージパス情報をフォロワノードにブロードキャストする代わりに、「メッセージパス情報」タイプのログエントリをログに追記する。その結果、ＲＡＦＴプロトコルに従って、当該ログの複製がフォロワノードにブロードキャストされる。 The difference between the message path addition process in the second improvement method (6-2) is that a log entry of type "message path information" is first introduced. Then, instead of broadcasting the message path information to follower nodes, the leader node adds a log entry of type "message path information" to the log. As a result, a copy of the log is broadcast to follower nodes according to the RAFT protocol.

また、リーダノードがフォロワノードからメッセージパス情報を受信したとき、既存メッセージパスにおけるディスパッチ済みメッセージ通番の最大値が更新される場合には、リーダノードは、受信した上記メッセージパスに対応する「メッセージパス情報」タイプのログエントリをログに追記する。 Also, when a leader node receives message path information from a follower node, if the maximum value of the dispatched message sequence number for an existing message path is updated, the leader node adds a log entry of type "Message Path Information" corresponding to the received message path to the log.

また、「メッセージパス情報」タイプのログエントリがコミットされ、このコミットを受けて各ノードが当該ログエントリの内容を適用する際には、当該ログエントリ（メッセージパス情報）に示されるディスパッチ済みメッセージ通番まで、既存メッセージパス上でブロックされているメッセージ要求がディスパッチされる。 In addition, when a log entry of type "message path information" is committed and each node applies the contents of that log entry in response to this commit, message requests that are blocked on the existing message path are dispatched up to the dispatched message sequence number indicated in that log entry (message path information).

一方、第２の改善方法における（６－２）のメッセージパス追加処理との共通点としては、例えば、フォロワはメッセージパス情報を他のノードにブロードキャストすることが挙げられる。このような構成とする理由は、メッセージパス追加処理の最中にリーダが障害等によって停止し、新しくリーダとなったフォロワが当該メッセージパス追加処理を継続して実施する際には、各フォロワからブロードキャストされた情報が必要になるためである。また、リーダは、全てのノードからメッセージパス情報を受信したとき、「メッセージパス追加」タイプのログエントリをログに追記する。さらに、この「メッセージパス追加」タイプのログエントリを適用する際の処理も、（６－２）のメッセージパス追加処理と共通である。 On the other hand, a common point between the second improvement method and the message path addition process (6-2) is that the follower broadcasts message path information to other nodes. The reason for this configuration is that if the leader stops due to a failure or other reason during the message path addition process and the follower who becomes the new leader continues the message path addition process, the information broadcast from each follower is required. Also, when the leader receives message path information from all nodes, it adds a log entry of type "Message path addition" to the log. Furthermore, the process for applying this "Message path addition" type log entry is also common to the message path addition process (6-2).

第２の改善方法は、上記のような特徴を備えることにより、リーダが全てのノードからメッセージパス情報を受信し終わる前に、ブロックされているメッセージ要求の一部を、安全にディスパッチすることができる。ここでの「安全」とは、処理途中で障害等によって一部のノードが離脱しても、図１４を参照しながら説明したような、ノード間で同期を維持できなくなる状況には陥らない、ことを意味する。第２の改善方法は、ＦＴシステム１において、他のノードに比べてメッセージパス情報の通知が遅いノードが存在し、かつ、ノードの離脱が起こりやすい構成である場合に、特に有効である。 By incorporating the above-mentioned features, the second improvement method allows some of the blocked message requests to be safely dispatched before the leader has finished receiving message passing information from all nodes. "Safely" here means that even if some nodes leave during processing due to a failure or other reason, it will not result in a situation where synchronization cannot be maintained between nodes, as explained with reference to Figure 14. The second improvement method is particularly effective when the FT system 1 is configured such that there is a node that is slower to notify message passing information than other nodes and node departures are more likely to occur.

（７）出力処理部３３
以下では、出力処理部３３による処理について詳しく説明する。 (7) Output processing unit 33
The processing performed by the output processing unit 33 will be described in detail below.

出力処理部３３は、各タスク２０の処理結果として外部出力要求を受け取り、所定のポリシーに従って外部出力する内容を決定し、決定した内容を外部（クライアント等の端末３）に送信する。 The output processing unit 33 receives external output requests as the processing results of each task 20, determines the content to be output externally in accordance with a predetermined policy, and transmits the determined content to the outside (terminal 3 such as a client).

これまで述べてきたように、本実施形態において各ノード１０のタスク群は、決定論的に振る舞うことが保証される。したがって、あるタスク２０に着目したとき、基本的にどのノード１０からも同じ内容の外部出力要求が、同じ順で送信される。 As described above, in this embodiment, the task groups of each node 10 are guaranteed to behave deterministically. Therefore, when focusing on a particular task 20, essentially all nodes 10 will send the same external output requests in the same order.

一方、異なるタスク２０からの外部出力要求については、その要求順序はノード間で一致する補償はなく、実際に外部出力する順序も決定論的に決まらない（図２５（Ａ）参照）。しかし、一般的にはこれらの外部出力内容は互いに独立しており、外部出力の順序はシステムとしての結果に影響を与えないと想定される。もし、異なるタスク２０からの外部出力要求についてその順序性が重要である場合は、それぞれの外部出力内容を中継する共通のタスク（図２５（Ｂ）のタスクＣ）を置くことにより、出力順をノード間で一致化させることが可能である（図２５（Ｂ）参照）。 On the other hand, there is no guarantee that the order of external output requests from different tasks 20 will match between nodes, and the order in which the external outputs are actually made is not deterministically determined (see Figure 25(A)). However, these external output contents are generally independent of each other, and it is assumed that the order of the external outputs does not affect the results of the system. If the order of external output requests from different tasks 20 is important, it is possible to match the output order between nodes by placing a common task (task C in Figure 25(B)) that relays the contents of each external output (see Figure 25(B)).

図２５は、外部出力の順序保証を説明するイメージ図である。図２５（Ａ），（Ｂ）では何れも、異なるタスク２０Ａ，２０Ｂ（タスクＡ，Ｂ）からの外部出力要求が外部出力３９に送信される。外部出力３９は出力処理部３３と読み替えることができる。 Figure 25 is an image diagram explaining the order guarantee of external outputs. In both Figures 25(A) and (B), external output requests from different tasks 20A and 20B (tasks A and B) are sent to the external output 39. The external output 39 can be interpreted as the output processing unit 33.

図２５（Ａ）の構成では、タスクＡから外部出力３９に対して「Ａ→外＃２」，「Ａ→外＃３」の外部出力要求のメッセージ１０１Ａが送信され、タスクＢから外部出力３９に対して「Ｂ→外＃４」の外部出力要求のメッセージ１０１Ｂが送信される。ここで、同一のタスクＡからの外部出力要求については、「Ａ→外＃２」の後に「Ａ→外＃３」が外部出力されることは保証される。一方、異なるタスクＡ，Ｂからの外部出力要求については、「Ａ→外＃２」と「Ｂ→外＃４」の何れが先に外部出力されるかの保証はなく、外部出力３９（出力処理部３３）に到着する順序もノード間で異なる場合がある。 In the configuration of Figure 25 (A), task A sends message 101A requesting external output for "A → External #2" and "A → External #3" to external output 39, and task B sends message 101B requesting external output for "B → External #4" to external output 39. Here, for external output requests from the same task A, it is guaranteed that "A → External #3" will be output after "A → External #2." On the other hand, for external output requests from different tasks A and B, there is no guarantee that "A → External #2" or "B → External #4" will be output first, and the order in which they arrive at external output 39 (output processing unit 33) may differ between nodes.

図２５（Ｂ）の構成は、図２５（Ａ）の構成に、それぞれの外部出力内容を集約して中継する共通のタスクＣを置いたものである。前述した通り、異なるタスクＡ，Ｂからの外部出力要求をタスクＣが受け取ることにより、タスクＣから外部出力３９（出力処理部３３）への外部出力の出力順をノード間で一致化させることができる。 The configuration in Figure 25(B) is the same as the configuration in Figure 25(A), but with the addition of a common task C that aggregates and relays the contents of each external output. As mentioned above, by having task C receive external output requests from different tasks A and B, the output order of external outputs from task C to external output 39 (output processing unit 33) can be made consistent between nodes.

なお、出力処理部３３が外部出力する内容を決定する際に従う「所定のポリシー」は、例えば、先着優先のポリシーや多数決のポリシーが考えられる。 The "predetermined policy" that the output processing unit 33 follows when determining the content to be output externally may be, for example, a first-come, first-served policy or a majority vote policy.

先着優先のポリシーでは、各複製ノード（各ノード１０）から届いた外部出力要求のうち、最初に届いたものを外部出力し、残りは破棄する。多数決のポリシーでは、各複製ノード（各ノード１０）から届いた同一と考えられる外部出力要求に対して多数決を実施し、確からしい内容を選択して外部出力し、残りは破棄する。 Under the first-come, first-served policy, the first external output request received from each replicated node (each node 10) is output externally, and the rest are discarded. Under the majority voting policy, a majority vote is held on external output requests that are thought to be identical and arrive from each replicated node (each node 10), the most likely content is selected and output externally, and the rest are discarded.

また、本実施形態のＦＴシステム１では、ノード１０における処理結果を外部装置（例えば端末３）に出力するために、専用の外部出力装置を備えるようにしてもよいが、以下の図２６に示すように構成することで、専用の外部出力装置がなくても各ノード１０の外部出力を調整して外部装置に出力することができる。 Furthermore, the FT system 1 of this embodiment may be provided with a dedicated external output device to output the processing results of the node 10 to an external device (e.g., terminal 3). However, by configuring it as shown in Figure 26 below, the external output of each node 10 can be adjusted and output to an external device even without a dedicated external output device.

図２６は、複数のノードからの外部出力を調整して出力する構成を説明するイメージ図である。図２６において、各ノードのタスクＡは、外部出力要求を送信する際に、各ノードの外部出力３９（出力処理部３３）にブロードキャストする。そして、外部出力３９（出力処理部３３）は、各ノードのタスクＡから受信した外部出力要求の内容を比較し、例えば多数決によって、正しいと判定されるものを外部装置（端末３）に送信する。図２６の場合は拡大図に示したように、ノード１及びノード２からの外部出力要求の内容が「α」であり、多数派となることから、「α」が外部出力される。 Figure 26 is an illustration of a configuration for coordinating and outputting external outputs from multiple nodes. In Figure 26, when task A of each node sends an external output request, it broadcasts it to the external output 39 (output processing unit 33) of each node. The external output 39 (output processing unit 33) then compares the contents of the external output requests received from task A of each node, and transmits the one that is determined to be correct, for example by majority vote, to an external device (terminal 3). In the case of Figure 26, as shown in the enlarged view, the contents of the external output requests from node 1 and node 2 are "α", which is the majority, and therefore "α" is output externally.

また、図２６において、外部出力と３９と外部装置（端末３）との間の通信に、冪等なメッセージングプロトコルを用いるようにすれば、重複送信されたメッセージは外部装置側で破棄することができる。例えばセッション単位で送信メッセージに通番を付与し、受信側で通番順に１つずつ取り込むようにする。こうすることで、外部送信の前にノードが障害停止した場合であっても、外部装置は滞りなく処理結果を受信することができる。 Furthermore, in Figure 26, if an idempotent messaging protocol is used for communication between the external output 39 and the external device (terminal 3), duplicate messages can be discarded by the external device. For example, a sequence number can be assigned to the transmitted message on a session-by-session basis, and the messages can be retrieved one by one in sequence number order on the receiving side. In this way, even if a node fails and stops before an external transmission, the external device can receive the processing results without delay.

１フォールトトレラント（ＦＴ）システム
２通信ネットワーク
３端末
１０ノード
１０Ａリーダノード（リーダ）
１０Ｂ，１０Ｃフォロワノード（フォロワ）
２０タスク
２１ステート（ローカルステート）
３０イベント処理基盤
３１入力処理部
３２メッセージ処理部
３３出力処理部
３４イベントログ
３５メッセージパス管理テーブル
３６メッセージ待機キュー
３８外部入力
３９外部出力
1 Fault-tolerant (FT) system 2 Communication network 3 Terminal 10 Node 10A Leader node (leader)
10B, 10C Follower node (follower)
20 Task 21 State (local state)
30 Event processing platform 31 Input processing unit 32 Message processing unit 33 Output processing unit 34 Event log 35 Message path management table 36 Message waiting queue 38 External input 39 External output

Claims

A fault-tolerant system in which multiplexed nodes are connected via a communication network,
each of the plurality of nodes has a processor, a memory, and a storage area;
An application that runs by the processor in each of the nodes reading a program from the storage area into the memory and executing it is composed of one or more tasks that deterministically generate an output for an input,
one of the plurality of nodes is a leader node, and the nodes other than the leader node are follower nodes, and the leader node and the follower nodes are configured to perform the same task,
Each task is associated with a local state that indicates the state of the task itself,
The task's inputs include messages received from outside the system or from other tasks, and access to the local state associated with the task;
The outputs of the task include messages that the task sends to the outside of the system or to other tasks, and updates to the local state associated with the task;
a fault-tolerant system characterized in that, when the task sends a message to a destination task having a plurality of message senders, processors of the plurality of nodes reach an agreement among the plurality of nodes on a delivery order of the messages to the destination task, and deliver and process the messages in accordance with the agreed-upon delivery order.

2. The fault-tolerant system according to claim 1, wherein when the task sends a message to a destination task having a plurality of message senders, sending of the message to the destination task is temporarily suspended until an agreement is reached among the plurality of nodes regarding an order of message delivery to the destination task.

3. The fault-tolerant system according to claim 2, wherein when a new second message path is added from a third task different from one or more first tasks to a second task for which an existing first message path has been generated from the first tasks, the processor coordinates recognition among the plurality of nodes regarding the addition of the second message path.

If the second message path to be added is a second message path for the second task,
The fault-tolerant system of claim 3, wherein the processor reaches a consensus among the plurality of nodes regarding up to which messages on the first message path should be dispatched before adding the second message path, and temporarily suspends transmission of messages on the first message path until such consensus is reached.

When the second message path to be added is the second message path for the second task,
5. The fault-tolerant system of claim 4, wherein the processor of the leader node determines the maximum value of the sequence number of messages already dispatched on the first message path at each node, and reaches a consensus among the plurality of nodes to add the second message path after the message with the maximum sequence number has been dispatched on the first message path.

When the processor of each of the nodes intends to add a second message path for the second task as the second message path, the processor of each of the nodes broadcasts message path information regarding the addition of the second message path to other nodes;
the message path information indicates a sequence number of a message that has been dispatched on the first message path at the node;
The fault-tolerant system described in claim 4, characterized in that the processor of the other node that receives the message path information dispatches messages on the first message path in its own node up to the maximum sequence number indicated in the message path information, prior to reaching an agreement among the multiple nodes as to how many messages should be dispatched on the first message path before adding the second message path.

When the processor of the follower node intends to add a second message path for the second task as the second message path, the processor of the follower node broadcasts message path information regarding the addition of the second message path to other nodes;
5. The fault-tolerant system according to claim 4, wherein, while the leader node is receiving message path information regarding the addition of the second message path from all nodes, when the maximum value of the sequence numbers of messages already dispatched on the first message path indicated in the message path information received by the leader node from any node exceeds the maximum value of the sequence numbers of messages already dispatched on the first message path in the leader node, the nodes reach an agreement to dispatch messages on the first message path in their own nodes up to the maximum value of the sequence numbers indicated in the message path information.

A data processing method using a fault-tolerant system in which multiplexed nodes are connected via a communication network, comprising:
each of the plurality of nodes has a processor, a memory, and a storage area;
An application that runs by the processor in each of the nodes reading a program from the storage area into the memory and executing it is composed of one or more tasks that deterministically generate an output for an input,
one of the plurality of nodes is a leader node, and the nodes other than the leader node are follower nodes, and the leader node and the follower nodes are configured to perform the same task,
Each task is associated with a local state that indicates the state of the task itself,
The task's inputs include messages received from outside the system or from other tasks, and access to the local state associated with the task;
The outputs of the task include messages that the task sends to the outside of the system or to other tasks, and updates to the local state associated with the task;
a data processing method characterized in that, when the task sends a message to a destination task having a plurality of message senders, processors of the plurality of nodes reach an agreement among the plurality of nodes on a delivery order of the messages to the destination task, and transmit the messages in accordance with the agreed-upon delivery order.