JP2826466B2

JP2826466B2 - Performance measurement method of parallel computer system

Info

Publication number: JP2826466B2
Application number: JP6060385A
Authority: JP
Inventors: 真史篠原
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 1994-03-30
Filing date: 1994-03-30
Publication date: 1998-11-18
Anticipated expiration: 2013-11-18
Also published as: JPH07271741A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は並列コンピュータシステ
ムの性能測定方式に関し、特に並列コンピュータシステ
ムにおけるメモリアクセスの競合の並列コンピュータシ
ステムの性能測定方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for measuring the performance of a parallel computer system, and more particularly, to a method for measuring the performance of a parallel computer system in which a memory access conflict occurs in the parallel computer system.

【０００２】[0002]

【従来の技術】まず、並列コンピュータシステムの構成
について図４を参照して説明する。2. Description of the Related Art First, a configuration of a parallel computer system will be described with reference to FIG.

【０００３】図４に示す並列コンピュータシステムは、
同時に演算処理が可能な演算処理部１−０〜１−３１か
らなる演算処理部１と、複数の演算処理部１−０〜１−
３１のそれぞれに属しその演算処理部より発行されたメ
モリアクセス系のリクエストを要求されたメモリポート
へ接続する演算処理部側ネットワーク２−０〜２−３１
よりなる演算処理部側ネットワーク２と、各々同時にア
クセス可能な複数のメモリバンクを有する複数のメモリ
部４−０〜４−３１からなるメモリ部４と、配下のそれ
ぞれのメモリ部４−０〜４−３１の該当するメモリバン
クにメモリアクセス系のリクエストを接続するメモリ部
側ネットワーク３−０〜３−３１よりなるメモリ部側ネ
ットワーク３とから構成されている。[0003] The parallel computer system shown in FIG.
An arithmetic processing unit 1 comprising arithmetic processing units 1-0 to 1-31 capable of performing arithmetic processing at the same time;
And a network 2-0 to 2-31, which connects a memory access request issued by the processing unit to the requested memory port.
And a memory unit 4 comprising a plurality of memory units 4-0 to 4-31 each having a plurality of memory banks which can be accessed simultaneously, and respective subordinate memory units 4-0 to 4-4. And a memory unit side network 3 comprising memory unit side networks 3-0 to 3-31 for connecting a memory access system request to a corresponding memory bank of -31.

【０００４】図４では演算処理部１、演算処理部側ネッ
トワーク２、メモリ部側ネットワーク３、メモリ部４と
もにそれぞれ３２台構成で示してあるが実際は２ⁿ台で
の構成が可能である。以下は全て３２台構成で説明す
る。In FIG. 4, each of the arithmetic processing unit 1, the network 2 for the arithmetic processing unit, the network 3 for the memory unit, and the memory unit 4 is shown as having 32 units. However, in practice, 2 ⁿ units can be used. The following description is based on a 32-unit configuration.

【０００５】また、図４はメモリアクセス系のメモリ向
きのインターフエースのみを示してあるが実際はメモリ
部４から演算処理部１に向かうインターフエースも存在
する。FIG. 4 shows only an interface for a memory of a memory access system, but actually there is an interface from the memory unit 4 to the arithmetic processing unit 1.

【０００６】次に図４の並列コンピュータシステムの動
作について説明する。Next, the operation of the parallel computer system shown in FIG. 4 will be described.

【０００７】演算処理部１よりメモリアクセス系の命令
が発行されると、まず配下の演算処理部側ネットワーク
２に対してリクエストが発行される。When an instruction for a memory access system is issued from the arithmetic processing unit 1, a request is first issued to the subordinate operation processing unit side network 2.

【０００８】演算処理部側ネットワーク２ではリクエス
トを受け取ると、メモリ部４に属するどのメモリ部（３
２個の中の）に対するアクセスかを認識する。ここで、
演算処理部１の中の一つの演算処理部には複数のパイプ
ラインが存在し、それぞれ同時に命令を発行するため、
演算処理部側ネットワーク２の各演算処理部側ネットワ
ークでは同時に同一メモリ部へアクセスするリクエスト
の競合調停を行なっている。ここで、競合調停の結果、
アクセスが許可されたリクエストはメモリ部側ネットワ
ーク３に発行される。When the request is received by the arithmetic processing unit side network 2, any one of the memory units (3
Recognize whether it is access to (of the two). here,
A plurality of pipelines exist in one arithmetic processing unit in the arithmetic processing unit 1, and instructions are issued at the same time.
Competition arbitration of requests for accessing the same memory unit is performed simultaneously in each operation processing unit side network of the operation processing unit side network 2. Here, as a result of competitive arbitration,
The request for which access is permitted is issued to the memory unit side network 3.

【０００９】メモリ部側ネットワーク３の各メモリ部側
ネットワークでは異なる演算処理部からのリクエストが
同時に受け取られることがある。また、異なる演算処理
部から同じメモリ部の同じメモリバンクに対するリクエ
ストが同時に発行された場合は演算処理部側ネットワー
クと同じように各メモリ部側ネットワーク内で競合調停
を行なう。ここで、競合調停の結果アクセスが許可され
たリクエストはメモリ部の指定されたバンクに対してア
クセスする。In each of the memory side networks of the memory side network 3, requests from different arithmetic processing units may be received simultaneously. When requests for the same memory bank of the same memory unit are issued simultaneously from different arithmetic processing units, contention arbitration is performed in each memory unit network in the same manner as the arithmetic processing unit network. Here, the request permitted to access as a result of the contention arbitration accesses the designated bank in the memory unit.

【００１０】図５は図４における演算処理部１に属する
演算処理部１−０〜１−３１の中の１つの演算処理部の
ブロック図である。FIG. 5 is a block diagram of one of the processing units 1-0 to 1-31 belonging to the processing unit 1 in FIG.

【００１１】図５の演算処理部はそれぞれ同時にベクト
ル命令を実行するベクトルプロセッサ１０〜１７と、命
令の先取り、発行およびスカラ命令を実行するスカラプ
ロセッサ１８とから構成されている。The arithmetic processing unit shown in FIG. 5 includes vector processors 10 to 17 for simultaneously executing vector instructions, and a scalar processor 18 for prefetching, issuing, and executing scalar instructions.

【００１２】図５では一つの演算処理部にベクトルプロ
セッサが８台の構成で示してあるが、実際は２ⁿ台での
構成が可能である。以下は全てベクトルプロセッサが８
台の構成で説明する。Although FIG. 5 shows a configuration in which one arithmetic processing unit has eight vector processors, a configuration with 2 ⁿ units is actually possible. Below are all 8 vector processors
A description will be given of the configuration of the stand.

【００１３】まず、スカラプロセッサ１８で命令の先取
りが実行される。先取りされた命令がスカラ命令の場合
はそのままメモリに対してリクエストが発行される。実
際にはこの演算処理部の配下の演算処理部側ネットワー
クに対してリクエストが発行される。また、先取りされ
た命令がベクトル命令の場合は各ベクトルプロセッサに
対して命令の実行を指示する。First, the scalar processor 18 executes instruction prefetching. If the prefetched instruction is a scalar instruction, the request is issued to the memory as it is. Actually, a request is issued to the network on the operation processing unit side under the operation processing unit. When the prefetched instruction is a vector instruction, it instructs each vector processor to execute the instruction.

【００１４】図６は図４における演算処理部側ネットワ
ーク２に属する演算処理部側ネットワーク２−０〜２−
３１の中の１つの演算処理部側ネットワークの従来の構
成を示すブロック図である。FIG. 6 is a diagram showing the operation processing unit side networks 2-0 to 2--2 belonging to the operation processing unit side network 2 in FIG.
31 is a block diagram illustrating a conventional configuration of one arithmetic processing unit side network among 31. FIG.

【００１５】図６の演算処理部側ネットワークは、各プ
ロセッサからのリクエストを受け取るリクエスト受け付
けバッフア２１と、リクエスト受け付けバッフア２１で
受け取ったリクエストがどのメモリポートに対するリク
エストかを判断し他のプロセッサからのリクエストと競
合している場合にはそのアクセスの優先順位を決める競
合調停部２２と、競合調停部２２にてメモリへのアクセ
スが許可されたリクエストの出力ポートの選択を行なう
クロスバ部２４とから構成されている。The arithmetic processing unit side network shown in FIG. 6 determines a request reception buffer 21 for receiving a request from each processor and a memory port to which the request received by the request reception buffer 21 is directed. If there is a conflict, the contention arbitration unit 22 determines the priority of the access, and the crossbar unit 24 selects an output port of a request permitted to access the memory by the contention arbitration unit 22. ing.

【００１６】各プロセッサから発行されたメモリアクセ
ス系のリクエストはまずリクエスト受け付けバッフア２
１で受け取られる。ここで、リクエスト受け付けバッフ
ア２１に、先行するリクエストが溜まっていない場合
は、そのリクエストは競合調停部２２へと送られる。A memory access request issued from each processor is first received by a request reception buffer 2.
Received at 1. Here, when a preceding request is not stored in the request reception buffer 21, the request is sent to the contention arbitration unit 22.

【００１７】リクエスト受け付けバッフア２１でリクエ
ストが保留される条件としては、競合調停部２２におい
てリクエストの競合が発生し後続リクエストの処理がで
きない場合である。The condition for suspending a request in the request receiving buffer 21 is when a conflict between requests occurs in the conflict arbitration unit 22 and the subsequent request cannot be processed.

【００１８】リクエスト受け付けバッフア２１は前記の
条件によりリクエストの保持を行なうが、その保持量に
は制限があり、その制限に達するとプロセッサに対して
リクエスト受け取り不可能であることを示すホールド信
号を転送する。次に、リクエスト受け付けバッフア２１
から競合調停部２２に対してリクエストは転送される。The request receiving buffer 21 holds the request under the above-mentioned conditions, but the amount of the request is limited. When the limit is reached, a hold signal indicating that the request cannot be received is transferred to the processor. I do. Next, the request receiving buffer 21
The request is transferred to the contention arbitration unit 22 from.

【００１９】競合調停部２２の動作としては各プロセッ
サからのリクエストがどのメモリポートに対してアクセ
スするのかを判断する。実際は複数あるメモリ部側ネッ
トワーク３−０〜３−３１のどのメモリ部側ネットワー
クに対するアクセスであるかを判断する。これは、リク
エストのアドレスの一部のビットによって認識すること
ができる。The operation of the contention arbitration unit 22 determines to which memory port a request from each processor accesses. In practice, it is determined which one of the plurality of memory unit networks 3-0 to 3-31 is to be accessed. This can be recognized by some bits of the address of the request.

【００２０】ここで、あるメモリポートに対するリクエ
ストが唯一であったときはそのままクロスバ部２４に対
してリクエストが発行される。しかし、あるプロセッサ
からのリクエストが他のプロセッサからのリクエストと
同じメモリポートに対して発行されているいるときはこ
こで競合調停を行なう。Here, when there is only one request for a certain memory port, the request is issued to the crossbar unit 24 as it is. However, when a request from a certain processor is issued to the same memory port as a request from another processor, contention arbitration is performed here.

【００２１】競合調停は、決められた優先順位に従って
リクエストをシリアルにメモリポートに対して発行す
る。リクエストをシリアルに発行している間はリクエス
ト受け付けバッフア２１に対しては後続リクエストの発
行を許さない。In the contention arbitration, a request is issued serially to a memory port according to a determined priority. While the request is issued serially, the request reception buffer 21 is not allowed to issue the subsequent request.

【００２２】図７は、図４におけるメモリ部側ネットワ
ーク４に属するメモリ部側ネットワーク４−０〜４−３
１の中の１つのメモリ部側ネットワークのブロック図で
ある。FIG. 7 shows the memory unit side networks 4-0 to 4-3 belonging to the memory unit side network 4 in FIG.
FIG. 2 is a block diagram of one memory unit side network in FIG.

【００２３】図７のメモリ部側ネットワークは、演算処
理部側ネットワーク２−０〜２−３１からのリクエスト
をそれぞれ受け取るリクエストバッフア部３００〜３３
１と、リクエストバッフア部３００〜３３１のリクエス
トを受け取りメモリ部４−０〜４−３１のそれぞれのメ
モリバンクに対する競合調停を行なう競合調停部３３２
と、競合調停部３３２にてメモリバンクへのアクセスが
許可されたリクエストの出力ポート選択を行なうクロス
バ部３３３とから構成されている。The memory unit side network in FIG. 7 includes request buffer units 300 to 33 which receive requests from the arithmetic processing unit side networks 2-0 to 2-31, respectively.
1 and a contention arbitration unit 332 that receives requests from the request buffer units 300 to 331 and performs contention arbitration for each memory bank of the memory units 4-0 to 4-31.
And a crossbar unit 333 for selecting an output port of a request whose access to the memory bank is permitted by the contention arbitration unit 332.

【００２４】各演算処理部側ネットワーク２−０〜２−
３１より転送されたリクエストは対応するリクエストバ
ッフア部３００〜３３１にて受け取られる。本バッフア
は演算処理部側ネットワーク２−０〜２−３１対応であ
るので競合によるホールドはポート単位で行なわれる。
ここで受け取られたリクエストは競合調停部３３２へ送
られる。Each network 2-0 to 2-
The request transferred from 31 is received by the corresponding request buffer units 300 to 331. Since this buffer is compatible with the arithmetic processing unit side networks 2-0 to 2-31, the hold due to contention is performed on a port basis.
The request received here is sent to the contention arbitration unit 332.

【００２５】競合調停部３３２ではまず受け取ったリク
エストがどのメモリポートに対するリクエストかを判断
する。ここで、競合する他の演算処理部側ネットワーク
からのリクエストが存在しなければそのリクエストはそ
のままクロスバ部３３３に対して出力される。他の演算
処理部側ネットワークからのリクエストが同じメモリポ
ートに対してアクセスしようとした場合、競合調停部３
３２は決められた優先順位に従ってリクエストの発行許
可をシリアルに出力する。リクエストの発行を許可され
たリクエストはクロスバ部３３３に出力される。クロス
バ部３３３は競合調停部３３２によって発行許可された
リクエストを受け取りリクエストで指定されたメモリポ
ートに対してリクエストを発行する。The contention arbitration unit 332 first determines to which memory port the received request is. Here, if there is no conflicting request from the other processing unit side network, the request is output to the crossbar unit 333 as it is. When a request from another operation processing unit side network attempts to access the same memory port, the contention arbitration unit 3
Reference numeral 32 serially outputs a request issuance permission according to the determined priority. The request permitted to issue the request is output to the crossbar unit 333. The crossbar unit 333 receives the request issued by the contention arbitration unit 332 and issues the request to the memory port specified by the request.

【００２６】図８は図４におけるメモリ部４に属するメ
モリ部４−０〜４−３１の中の１つのメモリ部のブロッ
ク図である。FIG. 8 is a block diagram of one of the memory units 4-0 to 4-31 belonging to the memory unit 4 in FIG.

【００２７】図８のメモリ部は同時にアクセス可能な複
数のメモリ４００〜４３１から構成されている。The memory section shown in FIG. 8 comprises a plurality of memories 400 to 431 which can be accessed simultaneously.

【００２８】メモリ部側ネットワーク３から出力された
リクエストはメモリ部に属するメモリ４００〜４３１の
中の指定された１つのメモリに到着するとそのままメモ
リアクセスを開始する。ここでは他のリクエストとの競
合は起らないためである。When a request output from the memory unit side network 3 arrives at one specified memory among the memories 400 to 431 belonging to the memory unit, the memory access is started as it is. This is because no conflict with other requests occurs.

【００２９】上述のように、並列コンピュータシステム
においては、メモリアクセスの競合が発生し、これが、
システムやソフトウエアの性能評価の尺度となってい
る。As described above, in the parallel computer system, contention for memory access occurs.
It is a measure of system and software performance evaluation.

【００３０】上述の構成による並列コンピュータシステ
ムにおいてのメモリアクセスの競合の並列コンピュータ
システムの性能測定方式では、同一演算処理部の異なる
プロセッサ同志の競合による待ち時間を測定するために
演算処理部側ネットワークのそれぞれに、また、異なる
演算処理部からのリクエスト同志の競合による待ち時間
を測定するためにメモリ部側ネットワークのそれぞれに
待ち時間測定機能を有していた。In the performance measurement method of the parallel computer system for competing for memory access in the parallel computer system having the above-described configuration, in order to measure the waiting time due to the competition between different processors of the same processing unit, the network of the processing unit side network is used. Each of the networks on the memory unit side has a waiting time measurement function for measuring the waiting time due to competition between requests from different arithmetic processing units.

【００３１】同一演算処理部の異なるプロセッサ同志の
競合による待ち時間を測定するための従来の並列コンピ
ュータシステムの性能測定方式について図９を参照して
説明する。Referring to FIG. 9, a performance measuring method of a conventional parallel computer system for measuring a waiting time due to competition between different processors of the same arithmetic processing unit will be described.

【００３２】図９は、図６のリクエスト受け付けバッフ
ア２１のブロック図である。FIG. 9 is a block diagram of the request receiving buffer 21 of FIG.

【００３３】図９のリクエスト受け付けバッフア２１
は、プロセッサ毎のリクエストをそれぞれ受け付けるレ
ジスタ２１００〜２１０８と、収容ワード数がＷで１ワ
ードにプロセッサの数だけの待ち合せリクエストを保持
できるインプットバッフア２１１９と、インプットバッ
フア２１１９の書き込みアドレスを保持する＋１回路を
含むライトアドレスレジスタ（ＷＡＲ）２１１７と、イ
ンプットバッフア２１１９の読み出しアドレスを保持す
る＋１回路を含むリードアドレスレジスタ（ＲＡＲ）２
１１８と、ＷＡＲ２１１７とＲＡＲ２１１８の値を比較
するコンパレータ２１２０と、コンパレータ２１２０の
出力を保持するレジスタ２１２１と、プロセッサ対応リ
クエスト受け付けレジスタ２１００〜２１０８の出力と
インプットバッフア２１１９の出力とのいずれかを選択
するセレクタ２１２２と、セレクタ２１２２の出力を受
け取るレジスタ２１２３とから構成されている。The request receiving buffer 21 shown in FIG.
Holds registers 2100 to 2108 that receive requests for each processor, an input buffer 2119 that can hold waiting requests of the number of processors in one word with an accommodation word number of W, and a write address of the input buffer 2119. A write address register (WAR) 2117 including a +1 circuit, and a read address register (RAR) 2 including a +1 circuit holding a read address of the input buffer 2119
118, a comparator 2120 for comparing the values of the WAR 2117 and the RAR 2118, a register 2121 for holding the output of the comparator 2120, and one of the outputs of the processor corresponding request reception registers 2100 to 2108 and the output of the input buffer 2119. It comprises a selector 2122 and a register 2123 for receiving the output of the selector 2122.

【００３４】プロセッサから出力されたリクエストはレ
ジスタ２１００〜２１０８にて受け取られる。このとき
同一タイミングに出力された他のプロセッサからのリク
エストも全て受け取られる。ここで、先行リクエストが
ない場合や、先行リクエストが競合調停部２２を抜けて
しまっている場合はインプットバッフア２１１９はバイ
パスされてリクエストはレジスタ２１２３にセットされ
る。その後、競合調停部２２へとリクエストは転送され
る。しかし、先行リクエストが詰まっている場合はリク
エストはインプットバッフア２１１９にて保持される。
このとき、ＷＡＲ２１１７はプラス１される。Requests output from the processor are received by registers 2100 to 2108. At this time, all requests from other processors output at the same timing are also received. Here, when there is no preceding request or when the preceding request has passed through the contention arbitration unit 22, the input buffer 2119 is bypassed and the request is set in the register 2123. Thereafter, the request is transferred to the contention arbitration unit 22. However, if the preceding request is jammed, the request is held in the input buffer 2119.
At this time, WAR2117 is incremented by one.

【００３５】インプットバッフア２１１９にリクエスト
が溜まっているときに先行リクエストの競合調停が終わ
ったときはインプットバッフア２１１９から次リクエス
トを読み出し、競合調停部２２に対してリクエストを転
送する。その際、ＲＡＲ２１１８はプラス１される。イ
ンプットバッフア２１１９から読み出されたリクエスト
はセレクタ２１２２によってセレクトされレジスタ２１
２３にセットされ競合調停部２２に転送される。ここ
で、コンパレータ２１２０は（ＷＡＲ２１１７−ＲＡＲ
２１１８）がインプットバッフア２１１９の最大収容ワ
ード数Ｗと等しい場合に″１″を出力する。この状態で
は、インプットバッフア２１１９はこれ以上リクエスト
を収容することはできないインプットバッフアビジーの
状態であり、このコンパレータ２１２０の出力″１″の
信号はインプットバッフアビジー信号として対応する演
算処理部に送出され、そこからのリクエストの発行を停
止させる。When the contention arbitration of the preceding request is completed while the requests are accumulated in the input buffer 2119, the next request is read from the input buffer 2119 and the request is transferred to the contention arbitration unit 22. At that time, RAR2118 is incremented by one. The request read from the input buffer 2119 is selected by the selector 2122 and
23 and transferred to the contention arbitration unit 22. Here, the comparator 2120 is (WAR2117-RAR
2118) is equal to the maximum number of words W accommodated in the input buffer 2119, "1" is output. In this state, the input buffer 2119 is in an input buffer busy state in which no more requests can be accommodated, and the signal of the output “1” of the comparator 2120 is input to the corresponding processing unit as an input buffer busy signal. It is sent and stops issuing requests from it.

【００３６】性能測定回路（図示せず）はレジスタ２１
２１がコンパレータ２１２０の出力″１″を保持する時
間を測定し、この時間を演算処理部側ネットワーク２で
他のプロセッサとの競合等により待たされた時間として
いる。The performance measuring circuit (not shown) is connected to the register 21
21 measures the time during which the output “1” of the comparator 2120 is held, and uses this time as the time that the arithmetic processing unit side network 2 waits due to competition with another processor or the like.

【００３７】また、異なる演算処理部からのリクエスト
同志の競合による待ち時間を測定するためのメモリ部側
ネットワークの待ち時間測定は、上述の同一演算処理部
の異なるプロセッサ同志の競合による待ち時間を測定す
るための演算処理部側ネットワークの待ち時間測定と同
様に行なわれるが、この場合には図７のリクエストバッ
フア３００〜３３１のそれぞれに図９のインプットバッ
フア２１１９に設けたと同様の書き込み、読み出しの回
路であるＷＡＲ２１１７、ＲＡＲ２１１８と、コンパレ
ータ２１２０と、レジスタ２１２１とを設けて性能測定
回路（図示せず）により測定を行なっている。The latency measurement of the memory side network for measuring the latency due to competition between requests from different arithmetic processing units measures the latency due to competition between different processors of the same arithmetic processing unit. In this case, the same write and read operations as those provided in the input buffer 2119 of FIG. 9 are performed for each of the request buffers 300 to 331 of FIG. 7 in this case. WAR2117 and RAR2118, a comparator 2120, and a register 2121 are provided, and the performance is measured by a performance measuring circuit (not shown).

【００３８】上述のシステム構成では、どのプロセッサ
からのリクエストであるか、また、ベクトル命令の場合
は戻り先のベクトルレジスタの値を付随情報としてリク
エストと同時に、メモリ部側ネットワークに対して送っ
ている。ロード系の命令の場合にデータが戻ってきたと
きは同時に上記の付随情報もメモリ部側ネットワークか
ら帰ってくる。演算処理部側ネットワークではこの付随
情報を元にロードデータを戻すことになる。In the above system configuration, the processor from which the request is issued, and in the case of a vector instruction, the value of the vector register of the return destination is sent to the memory unit side network as the accompanying information at the same time as the request. . When data is returned in the case of a load-type instruction, the accompanying information is also returned from the memory-side network at the same time. The arithmetic processing unit side network returns the load data based on the accompanying information.

【００３９】しかし、このようにすると、転送データ量
が多くなりネットワーク間のインタフエース線の本数が
増大し、配線が輻輳することなるので、最近では、図１
０に示すように、演算処理部側ネットワーク２−０〜２
−３１のそれぞれにＩＤ発行管理部２３を設け、メモリ
部側ネットワークに対して付随情報を送らずにＩＤ発行
管理部２３に設けてある各メモリ部対応のＩＤバッフア
にプロセッサから発行されたリクエストの付随情報を書
き込み登録し、そのライトアドレスをＩＤ番号としてメ
モリ部側ネットワークに送る。そして、ロードおよびス
トアの両命令でＩＤ番号がメモリ部側ネットワークから
戻ってくるとＩＤ番号をリードアドレスとしてＩＤバッ
フアの内容を読み出し、命令の付随情報を得るようにし
て、配線の輻輳を緩和している。However, in this case, the amount of transfer data increases, the number of interface lines between networks increases, and wiring becomes congested.
0, the arithmetic processing unit side networks 2-0 to 2
-31, an ID issuance management unit 23 is provided for each of the memory units, without sending accompanying information to the memory unit side network. The accompanying information is written and registered, and the write address is sent to the memory side network as an ID number. Then, when the ID number returns from the network on the memory unit side in both the load and store instructions, the contents of the ID buffer are read using the ID number as a read address, and the accompanying information of the instruction is obtained to reduce the congestion of the wiring. ing.

【００４０】[0040]

【発明が解決しようとする課題】上述の従来の並列コン
ピュータシステムの性能測定方式は、配線の輻輳を回避
するためにＩＤ発行管理部２３を有する構成とした場合
には、リクエスト受け付けバッフア２１でリクエストが
保留される条件として、上記で考慮した競合調停部２２
においてリクエストの競合が発生し後続リクエストの処
理ができない場合の他に、ＩＤ発行管理部２３において
もリクエスト保留の条件が発生する。In the above-described conventional performance measurement method for a parallel computer system, if the ID issue management unit 23 is provided to avoid wiring congestion, the request receiving buffer 21 As a condition for suspending, the contention arbitration unit 22 considered above
In addition to the case where a request conflict occurs and the subsequent request cannot be processed, a request suspension condition also occurs in the ID issuance management unit 23.

【００４１】それは、メモリ部側ネットワークからのリ
プライが帰ってこないリクエストが各メモリ部対応のＩ
Ｄバッフアに残っていて、各メモリ部対応のＩＤバッフ
アに新たなリクエストを登録することができない場合が
生ずるからである。このような場合には従来の並列コン
ピュータシステムの性能測定方式では、メモリ部側ネッ
トワークでの遅れがＩＤ発行管理部２３の遅れとして測
定されることとなり、少なくとも一部のメモリ部側ネッ
トワークの影響を除外して演算処理部側ネットワークの
遅れを正確に測定することができないという問題点があ
る。The reason is that a request from which a reply from the network on the memory unit side does not return is sent to the I / O corresponding to each memory unit.
This is because a new request may remain in the D buffer and a new request cannot be registered in the ID buffer corresponding to each memory unit. In such a case, in the performance measurement method of the conventional parallel computer system, the delay in the memory unit side network is measured as the delay of the ID issue management unit 23, and the influence of at least a part of the memory unit side network is reduced. There is a problem that it is not possible to accurately measure the delay of the arithmetic processing unit side network by excluding it.

【００４２】本発明の目的は、ＩＤ発行管理部を設けて
配線の輻輳を回避した構成の並列コンピュータシステム
においても、少なくとも一部のメモリ部側ネットワーク
の影響を除外して性能を測定することができ、ユーザプ
ログラムの改善解析に寄与できる並列コンピュータシス
テムの性能測定方式を提供することにある。An object of the present invention is to measure the performance of a parallel computer system provided with an ID issue management unit by eliminating the influence of at least a part of the memory unit side network, even in a parallel computer system in which wiring congestion is avoided. It is an object of the present invention to provide a method for measuring the performance of a parallel computer system that can contribute to the improvement analysis of a user program.

【００４３】[0043]

【課題を解決するための手段】第１の発明の並列コンピ
ュータシステムの性能測定方式は、それぞれｍ（自然
数）個のプロセッサからなるｎ（自然数）個の演算処理
部とそれぞれｎ個の同時にアクセスできるメモリからな
るｎ個のメモリ部と前記各演算処理部に対応して設けら
れ対応する演算処理部の各プロセッサからの出力を対応
する前記メモリ部に接続するｎ個の演算処理部側ネット
ワークと前記メモリ部のそれぞれに対して設けられ各演
算処理部側ネットワークからの出力を前記メモリ部の対
応するメモリに接続するｎ個のメモリ部側ネットワーク
とから構成される並列コンピュータシステムの性能測定
方式において、前記各演算処理部側ネットワークには対
応する演算処理部の２以上のプロセッサから同時に同一
の前記メモリ部に対するメモリアクセスリクエストが発
行されたときにその競合調停のためにこれに続くメモリ
アクセスリクエストを保持するインプットバッフアと前
記メモリ部対応に設けられ発行されたメモリアクセスリ
クエストの付随情報を登録するとともにこれに代えてＩ
Ｄ番号を発行しこれに対する応答があったときにはこの
登録を無効にするｎ個のＩＤバッフア部とを有し、前記
インプットバッフアが満杯になったときにはインプット
バッフアビジー信号を発生するインプットバッフアビジ
ー信号発生手段と、前記ＩＤバッフア部の登録が満杯に
なったときにはＩＤバッフアビジー信号を発生するＩＤ
バッフアビジー信号発生手段と、前記インプットバッフ
アビジー信号と前記ＩＤバッフアビジー信号との供給を
受けインプットバッフアが満杯で前記ｎ個のすべてのＩ
Ｄバッフア部の登録が満杯でない時間を測定する時間測
定手段とを有して構成されている。According to the performance measuring method of the parallel computer system of the first invention, n (natural number) arithmetic processing units each including m (natural number) processors and n simultaneous processing units can be simultaneously accessed. N memory sections each including a memory section, and n arithmetic processing section side networks which are provided corresponding to the respective arithmetic processing sections and connect outputs from respective processors of the corresponding arithmetic processing sections to the corresponding memory sections; In a performance measurement method for a parallel computer system comprising n memory unit-side networks provided for each of the memory units and connecting an output from each arithmetic processing unit-side network to a corresponding memory of the memory unit, Each of the arithmetic processing unit side networks simultaneously receives the same memory unit from two or more processors of the corresponding arithmetic processing unit. When the memory access request is issued, the input buffer for holding the subsequent memory access request for contention arbitration and the accompanying information of the issued memory access request provided for the memory unit are registered and registered. I instead of
An input buffer for generating an input buffer busy signal when the input buffer becomes full when the input buffer is full. A busy signal generating means for generating an ID buffer busy signal when registration of the ID buffer section is full;
A buffer busy signal generating means for receiving the input buffer busy signal and the ID buffer busy signal, and when the input buffer is full and all of the n I
And a time measuring means for measuring a time when the registration of the D buffer is not full.

【００４４】第２の発明の並列コンピュータシステムの
性能測定方式は、それぞれｍ（自然数）個のプロセッサ
からなるｎ（自然数）個の演算処理部とそれぞれｎ個の
同時にアクセスできるメモリからなるｎ個のメモリ部と
前記各演算処理部に対応して設けられ対応する演算処理
部の各プロセッサからの出力を対応する前記メモリ部に
接続するｎ個の演算処理部側ネットワークと前記メモリ
部のそれぞれに対して設けられ各演算処理部側ネットワ
ークからの出力を前記メモリ部の対応するメモリに接続
するｎ個のメモリ部側ネットワークとから構成される並
列コンピュータシステムの性能測定方式において、前記
各演算処理部側ネットワークには対応する演算処理部の
２以上のプロセッサから同時に同一の前記メモリ部に対
するメモリアクセスリクエストが発行されたときにその
競合調停のためにこれに続くメモリアクセスリクエスト
を保持するインプットバッフアと前記メモリ部対応に設
けられ発行されたメモリアクセスリクエストの付随情報
を登録するとともにこれに代えてＩＤ番号を発行しこれ
に対する応答があったときにはこの登録を無効にするｎ
個のＩＤバッフア部とを有し、前記インプットバッフア
が満杯になったときにはインプットバッフアビジー信号
を発生するインプットバッフアビジー信号発生手段と、
前記ＩＤバッフア部の登録が満杯になったときにはＩＤ
バッフアビジー信号を発生するＩＤバッフアビジー信号
発生手段と、前記インプットバッフアビジー信号と前記
ＩＤバッフアビジー信号との供給を受けインプットバッ
フアが満杯で前記ｎ個のうちから選択された１つのＩＤ
バッフア部の登録が満杯でない時間を測定する時間測定
手段とを有して構成されている。The performance measurement method of the parallel computer system according to the second aspect of the present invention is characterized in that n (natural number) arithmetic processing units each having m (natural number) processors and n (n) natural memory units each having n simultaneously accessible memories. For each of the n processing unit side networks and the memory unit, which are provided corresponding to the memory unit and the respective processing units and connect the output from each processor of the corresponding processing unit to the corresponding memory unit. The performance of a parallel computer system comprising an n number of memory unit side networks connected to outputs corresponding to the respective memory units in the memory unit. The network is simultaneously accessed by two or more processors of the corresponding arithmetic processing unit for the same memory unit. When a request is issued, an input buffer for holding a subsequent memory access request for contention arbitration and accompanying information of the issued memory access request provided for the memory unit are registered and replaced. Issue an ID number and invalidate this registration when there is a response to it.
Input buffer busy signal generating means for generating an input buffer busy signal when the input buffer is full;
When the ID buffer section registration is full, the ID
An ID buffer busy signal generating means for generating a buffer busy signal, one of the IDs selected from among the n when the input buffer is full and the input buffer busy signal and the ID buffer busy signal are supplied;
And time measuring means for measuring the time when the registration of the buffer section is not full.

【００４５】[0045]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００４６】それに先立ち、配線の輻輳を緩和するため
に演算処理部側ネットワークに図１０に示すように設け
られたＩＤ発行管理部２３の構成およびその動作につい
て説明する。Prior to this, the configuration and operation of the ID issuance management unit 23 provided as shown in FIG. 10 in the arithmetic processing unit side network to reduce the congestion of wiring will be described.

【００４７】図３はＩＤ発行管理部２３のブロック図で
ある。FIG. 3 is a block diagram of the ID issue management section 23.

【００４８】図３のＩＤ発行管理部２３は、競合調停部
２２から送られてくる各プロセッサからのリクエスト付
随情報をメモリ部対応に振り分けるクロスバ部２３０
と、メモリ部対応にそれぞれ設けられそのメモリ部に発
行されたリクエストの付随情報を登録するとともにこれ
に代えてＩＤ番号を発行しリクエストの発行を制御管理
するＩＤバッフア部２３００〜２３３１と、各ＩＤバッ
フア部２３００〜２３３１からのビジー信号の論理和信
号を作成する論理和回路２３１とから構成されている。The ID issuance management unit 23 shown in FIG. 3 is a crossbar unit 230 for distributing the request accompanying information from each processor sent from the contention arbitration unit 22 according to the memory unit.
ID buffer units 2300 to 2331 which are respectively provided corresponding to the memory units, register accompanying information of the issued requests in the memory units, issue ID numbers instead of the registered information, and control and manage the issuance of requests. And a logical sum circuit 231 for generating a logical sum signal of the busy signals from the buffer units 2300 to 2331.

【００４９】ＩＤバッフア部２３００〜２３３１の構成
は同一であり、ＩＤバッフア２３０００のライトアドレ
スをデコードするデコーダ（ＷＡＤＥＣ）２３２００
と、ＩＤバッフア２３０００のリードアドレスをデコー
ドするデコーダ（ＲＡＤＥＣ）２３３００と、ＩＤバッ
フア２３０００のワード数分のビット数をもち各ワード
毎に有効なリクエスト付随情報が書き込まれているかど
うかを示すｖ−ビットを収容しているｖ−ビットレジス
タ２３４００と、ｖ−ビットレジスタ２３４００の各ビ
ットがすべて１であることを検出するＡＬＬ１検出回路
２３５００と、ｖ−ビットレジスタ２３４００の各ビッ
トのうち０であるビットを検出しそれに対応するＩＤバ
ッフア２３０００のライトアドレスを作成するエンコー
ダ（ＥＮＣ）２３６００と、クロスバ部２３０からの出
力である付随情報を受けとるレジスタ２３７００と、Ｉ
Ｄバッフア２３０００のライトアドレスを保持するレジ
スタ（ＩＤＷＡＲ）２３８００と、ＩＤバッフア２３０
００のリードアドレスを保持するレジスタ（ＩＤＲＡ
Ｒ）２３９００と、レジスタ２３７００からの値を入力
データ、レジスタ２３８００の値をライトアドレス、レ
ジスタ２３９００の値をリードアドレスとするＩＤバッ
フア２３０００と、ＩＤバッフア２３０００からの読み
出しデータを保持するレジスタ２３１００とから構成さ
れている。The configuration of the ID buffer units 2300 to 2331 is the same, and a decoder (WADEC) 23200 for decoding the write address of the ID buffer 23000 is used.
A decoder (RADEC) 23300 for decoding the read address of the ID buffer 23000; and a v-bit indicating whether valid request accompanying information having the number of bits corresponding to the number of words of the ID buffer 23000 is written for each word. , A ALL1 detection circuit 23500 that detects that all bits of the v-bit register 23400 are all 1, and a bit that is 0 among the bits of the v-bit register 23400. An encoder (ENC) 23600 for detecting and creating a write address of the ID buffer 23000 corresponding thereto, a register 23700 for receiving accompanying information output from the crossbar unit 230,
A register (IDWAR) 23800 for holding the write address of the D buffer 23000, and an ID buffer 230
00 to hold a read address (IDRA
R) 23900, an ID buffer 23000 having the value of the register 23700 as input data, the value of the register 23800 as a write address, and the value of the register 23900 as a read address, and a register 23100 for holding data read from the ID buffer 23000. It is configured.

【００５０】次にこのＩＤ発行管理部２３の動作につい
て説明する。Next, the operation of the ID issue management section 23 will be described.

【００５１】競合調停部２２より発行が許可されたリク
エストのリクエスト付随情報はクロスバ部２３０に入力
され行先のメモリ部に対応するＩＤバッフア部のレジス
タ２３７００にセットされ、次いで、ＩＤバッフア２３
０００に空きワード（有効でないデータを格納している
ワード）がある場合にはそこに格納される。The request accompanying information of the request permitted to be issued by the contention arbitration unit 22 is input to the crossbar unit 230 and set in the register 23700 of the ID buffer unit corresponding to the destination memory unit.
If there is a free word in 000 (a word storing invalid data), it is stored there.

【００５２】ＩＤバッフア２３０００の有効データの格
納状態はｖ−ビットレジスタ２３４００に格納されてい
る各ｖ−ビットに示されているので、ＩＤバッフア２３
０００に空きがあり、新たなリクエスト付随情報を受け
取ることのできる状態であると、ｖ−ビットレジスタ２
３４００の出力をもとにエンコーダ２３６００によりエ
ンコードされて空きワードに対応するライトアドレスが
作成されてレジスタ２３８００にセットされ、レジスタ
２３７００のデータはＩＤバッフア２３０００のレジス
タ２３８００の示すアドレスに対応するワードに格納さ
れる。Since the storage state of valid data in ID buffer 23000 is indicated by each v-bit stored in v-bit register 23400, ID buffer 23
000 is empty and is ready to receive new request accompanying information, the v-bit register 2
The write address corresponding to the empty word is created by encoding by the encoder 23600 based on the output of 3400, and is set in the register 23800. The data of the register 23700 is stored in the word corresponding to the address indicated by the register 23800 of the ID buffer 23000. Is done.

【００５３】それと同時に、レジスタ２３８００の出力
であるライトアドレスはＩＤ番号としてメモリ部に対し
てリクエストデータと共に出力される。At the same time, the write address output from the register 23800 is output to the memory unit together with the request data as an ID number.

【００５４】更に、ライトアドレスはエンコーダ２３６
００からデコーダ２３２００に送られデコードされ、今
データが格納されたワードに対応するｖ−ビットレジス
タ２３４００のｖビットを１にセットし、有効なデータ
が格納されたことを示す。Further, the write address is stored in the encoder 236.
From 00, the data is sent to the decoder 23200, decoded, and the v bit of the v-bit register 23400 corresponding to the word in which the data is now stored is set to 1, indicating that valid data has been stored.

【００５５】このｖビットはリクエストに対するリプラ
イがメモリ部側ネットワークから戻ってくるまで１にセ
ットされている。This v bit is set to 1 until the reply to the request returns from the network on the memory unit side.

【００５６】リクエストに対するリプライがメモリ部側
ネットワークから戻ってくるときには、同時にＩＤ番号
も図３のＲＹＩＤ端子を介して戻ってくる。メモリロー
ド系の命令の場合はロードデータと共にＩＤ番号が帰っ
てくる。メモリストア系の命令のときは、リプライはＩ
Ｄ番号のみがメモリ部側ネットワークから戻ってくる。
有効なリプライがメモリ部側ネットワークから戻ってく
ると、レジスタ２３９００にて受けとられる。この値を
リードアドレスとしてＩＤバッフア２３０００からデー
タが読み出されてレジスタ２３１００にセットされると
同時に、デコーダ２３３００でこのＩＤ番号がデコード
されて今データが読み出されたワードに対応するｖ−ビ
ットレジスタ２３１００のｖビットを０にリセットし、
そのワードが空きワードになったことを示す。When the reply to the request returns from the network on the memory unit side, the ID number also returns via the RYID terminal in FIG. In the case of a memory load instruction, the ID number is returned together with the load data. For memory store instructions, the reply is I
Only the D number returns from the network on the memory unit side.
When a valid reply returns from the memory side network, it is received by the register 23900. Using this value as a read address, data is read from the ID buffer 23000 and set in the register 23100, and at the same time, the ID number is decoded by the decoder 23300, and the v-bit register corresponding to the word from which the data has just been read is read. Reset the v bit of 23100 to 0,
Indicates that the word has become an empty word.

【００５７】以上説明したように、ＩＤバッフア２３０
００に空きワードがある場合には、リクエスト付随情報
はＩＤバッフア２３０００に登録され、代りにＩＤ番号
が発行されるが、空きワードがない場合にはＩＤ番号を
発行することができず、後続のリクエスト付随情報は受
け取ることができないので、この場合には競合調停部２
２およびインプットバッフア２１にＩＤ発行管理部ビジ
ー信号を送出し、それらからのリクエスト付随情報の供
給を停止させる。ＩＤ発行管理部ビジー信号はＩＤ発行
管理部２３にあるＩＤバッフア部２３００〜２３３１の
何れがビジーになっても送出できるように、論理和回路
２３１により各ＩＤバッフア部のビジー信号、すなわ
ち、各のＡＬＬ１検出回路２３５００の出力、の論理和
信号を作成している。As described above, the ID buffer 230
If there is an empty word at 00, the request accompanying information is registered in the ID buffer 23000 and an ID number is issued instead. If there is no empty word, the ID number cannot be issued and the subsequent Since the request accompanying information cannot be received, in this case, the contention arbitration unit 2
2 and sends a busy signal to the input buffer 21 to stop the supply of request accompanying information from them. The logical sum circuit 231 outputs a busy signal of each ID buffer unit, that is, each of the ID buffer management units busy signals, so that any of the ID buffer units 2300 to 2331 in the ID issuance management unit 23 can transmit the busy signal. The logical sum signal of the output of the ALL1 detection circuit 23500 is created.

【００５８】このようにして、メモリ部側ネットワーク
に起因して生ずるＩＤバッフアビジーによってもインプ
ットバッフアビジーが生ずるので、演算処理部側の性能
を測定するためには、このメモリ部側ネットワークに起
因するフアクタを除去する必要がある。As described above, the input buffer busy also occurs due to the ID buffer busy caused by the network on the memory unit side. Therefore, in order to measure the performance of the arithmetic processing unit side, the input buffer busy is caused by the network on the memory unit side. Factors need to be removed.

【００５９】そこで、本発明では、インプットバッフア
ビジーで、かつ、ＩＤバッフアがビジーでない時間を測
定することとした。Therefore, in the present invention, the time when the input buffer is busy and the time when the ID buffer is not busy is measured.

【００６０】図１は本発明の並列コンピュータシステム
の性能測定方式の第１の実施例のブロック図である（以
下の説明では既述のインプットバッフアビジー信号の作
成手段およびＩＤバッフア部ビジー信号の作成手段につ
いての繰り返し記述は省略する。第２の実施例について
も同じ。）。FIG. 1 is a block diagram of a first embodiment of a performance measuring method for a parallel computer system according to the present invention (in the following description, the input buffer busy signal generating means and the ID buffer busy signal generating means described above). The repeated description of the creation means is omitted, and the same applies to the second embodiment.)

【００６１】図１の並列コンピュータシステムの性能測
定方式は、ＩＤ発行管理部ビジー信号線により供給され
る第１の信号（２値信号でビジーの場合は１、そうでな
い場合は０）を反転するインバータ２３２と、反転され
た第１の信号を一時格納するレジスタ２３３と、インプ
ットバッフアビジー信号線により供給される第２の信号
（２値信号でビジーの場合は１、そうでない場合は０）
を一時格納するレジスタ２３４と、論理積回路２３５
と、論理積回路２３５の出力が１なる時間を測定する時
間測定回路２３６とから構成されている。The performance measurement method of the parallel computer system of FIG. 1 inverts the first signal (1 if the binary signal is busy and 0 otherwise) supplied by the ID issue management section busy signal line. An inverter 232, a register 233 for temporarily storing the inverted first signal, and a second signal supplied by the input buffer busy signal line (1 if the binary signal is busy, otherwise 0)
234 for temporarily storing the data and a logical product circuit 235
And a time measuring circuit 236 for measuring the time when the output of the AND circuit 235 becomes 1.

【００６２】時間測定回路２３６は、論理積回路２３５
の出力１をカウントアップ信号としてカウントアップし
その計数値を出力するカウンタ回路である。The time measuring circuit 236 includes a logical product circuit 235
Is a counter circuit that counts up the output 1 of the above as a count-up signal and outputs the count value.

【００６３】このように構成することにより、論理積回
路２３５の出力が１である時間を測定することは、レジ
スタ２３４および２３３に格納されているデータがそれ
ぞれ１である時間、すなわち、インプットバッフアビジ
ーで、かつ、ＩＤ発行管理部２３がビジーでない時間を
測定していることになる。With this configuration, measuring the time when the output of the AND circuit 235 is 1 means that the data stored in the registers 234 and 233 is 1 each, that is, the input buffer. This means that the time is busy and the ID issuance management unit 23 is not busy.

【００６４】このようにして第１の実施例では、同一演
算処理部内で発生した競合による遅れ時間を測定でき、
この測定結果を元にしてユーザプログラムのチューンア
ップを行ない、同一演算処理部内で発生する競合時間を
最小にすることにより、並列コンピュータシステムの演
算処理部側ネットワークの持つデータ転送能力を最大限
に発揮せしめることができるという効果を有する。As described above, in the first embodiment, the delay time due to contention occurring in the same arithmetic processing unit can be measured.
Tune up the user program based on this measurement result and minimize the contention time that occurs in the same processing unit to maximize the data transfer capability of the processing unit side network of the parallel computer system. It has the effect that it can be done.

【００６５】図２は本発明の並列コンピュータシステム
の性能測定方式の第２の実施例のブロック図である。FIG. 2 is a block diagram of a second embodiment of the performance measuring method of the parallel computer system according to the present invention.

【００６６】図２の並列コンピュータシステムの性能測
定方式は、ＩＤ発行管理部２３にある各ＩＤバッフア部
のビジー信号線の中から任意の１つのビジー信号線を選
択する選択回路２３７と、選択されたビジー信号線によ
り供給される第３の信号（２値信号でビジーの場合は
１、そうでない場合は０）を反転するインバータ２３２
と、反転された第３の信号を一時格納するレジスタ２３
３と、インプットバッフアビジー信号線により供給され
る第４の信号（２値信号でビジーの場合は１、そうでな
い場合は０）を一時格納するレジスタ２３４と、論理積
回路２３５と、論理積回路２３５の出力が１なる時間を
測定する時間測定回路２３６とから構成されている。The performance measurement method of the parallel computer system shown in FIG. 2 includes a selection circuit 237 for selecting any one busy signal line from the busy signal lines of each ID buffer unit in the ID issue management unit 23. 232 that inverts the third signal (1 if the binary signal is busy and 0 otherwise) supplied by the busy signal line
And a register 23 for temporarily storing the inverted third signal.
3, a register 234 for temporarily storing a fourth signal (1 if the binary signal is busy and 0 otherwise) supplied by the input buffer busy signal line, a logical product circuit 235, and a logical product And a time measuring circuit 236 for measuring the time when the output of the circuit 235 becomes 1.

【００６７】時間測定回路２３６は、論理積回路２３５
の出力１をカウントアップ信号としてカウントアップし
その計数値を出力するカウンタ回路である。The time measuring circuit 236 has a logical product circuit 235
Is a counter circuit that counts up the output 1 of the above as a count-up signal and outputs the count value.

【００６８】このように構成することにより、論理積回
路２３５の出力が１である時間を測定することは、レジ
スタ２３４および２３３に格納されているデータがそれ
ぞれ１である時間、すなわち、インプットバッフアビジ
ーで、かつ、選択されたビジー信号線に対応するＩＤバ
ッフア部がビジーでない時間を測定していることにな
る。With this configuration, measuring the time when the output of the AND circuit 235 is 1 means that the data stored in the registers 234 and 233 is 1 each, that is, the input buffer. This means that the time when the ID buffer unit is busy and the ID buffer unit corresponding to the selected busy signal line is not busy is measured.

【００６９】このようにして、第２の実施例では、選択
回路２３７により選択されたＩＤバッフア部に対応する
メモリ部側ネットワークにおける演算処理部同志による
競合時間を除いたそれ以外の競合時間、すなわち、選択
回路２３７により選択されたＩＤバッフア部に対応する
メモリ部側ネットワーク以外の他のメモリ部側ネットワ
ークにおける演算処理部同志による競合と同一演算処理
部内で発生した競合とによる遅れ時間を測定することが
できる。As described above, in the second embodiment, the contention time other than the contention time by the arithmetic processing units in the memory side network corresponding to the ID buffer unit selected by the selection circuit 237, that is, the contention time, Measuring the delay time caused by the competition between the arithmetic processing units in the other memory network other than the memory network corresponding to the ID buffer selected by the selection circuit 237 and the competition generated in the same arithmetic processing unit. Can be.

【００７０】第１の実施例による測定時間と第２の実施
例による測定時間との差は、選択されたＩＤバッフア部
に対応するメモリ部側ネットワークにおける演算処理部
同志による競合に起因する遅れ時間であり、これを各Ｉ
Ｄバッフア部について測定することによりシステム全体
のスループットのためのユーザプログラムの改善解析に
寄与することができるという効果を有する。The difference between the measurement time according to the first embodiment and the measurement time according to the second embodiment is the delay time caused by the competition between the arithmetic processing units in the memory side network corresponding to the selected ID buffer unit. And this is
By measuring the D-buffer section, it is possible to contribute to the improvement analysis of the user program for the throughput of the entire system.

【００７１】[0071]

【発明の効果】以上説明したように、本発明の並列コン
ピュータシステムの性能測定方式は、インプットバッフ
アビジーで、かつ、ＩＤバッフアがビジーでない時間を
測定することにより、少なくとも一部のメモリ部側ネッ
トワークの影響を除外して性能を測定することができ、
ユーザプログラムの改善解析に寄与できるという効果を
有している。As described above, the performance measurement method of the parallel computer system according to the present invention measures at least a part of the memory unit side by measuring the time when the input buffer is busy and the time when the ID buffer is not busy. Performance can be measured without network impact,
This has the effect of contributing to improvement analysis of the user program.

[Brief description of the drawings]

【図１】本発明の並列コンピュータシステムの性能測定
方式の第１の実施例を示すブロック図である。FIG. 1 is a block diagram showing a first embodiment of a performance measurement method for a parallel computer system according to the present invention.

【図２】本発明の並列コンピュータシステムの性能測定
方式の第２の実施例を示すブロック図である。FIG. 2 is a block diagram showing a second embodiment of the performance measuring method of the parallel computer system of the present invention.

【図３】本発明の並列コンピュータシステムの性能測定
方式のＩＤバッフアビジーを発生するＩＤ発行管理部の
ブロック図である。FIG. 3 is a block diagram of an ID issuance management unit that generates an ID buffer busy of the performance measurement method of the parallel computer system of the present invention.

【図４】本発明の並列コンピュータシステムの性能測定
方式を適用する並列コンピュータシステムの構成を示す
ブロック図である。FIG. 4 is a block diagram showing a configuration of a parallel computer system to which the performance measurement method of the parallel computer system of the present invention is applied.

【図５】図４の並列コンピュータシステムの演算処理部
の構成を示すブロック図である。FIG. 5 is a block diagram illustrating a configuration of an arithmetic processing unit of the parallel computer system of FIG. 4;

【図６】図４の並列コンピュータシステムの従来の演算
処理部側ネットワークの構成を示すブロック図である。6 is a block diagram showing a configuration of a conventional arithmetic processing unit side network of the parallel computer system of FIG. 4;

【図７】図４の並列コンピュータシステムのメモリ部側
ネットワークの構成を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of a memory-side network of the parallel computer system of FIG. 4;

【図８】図４の並列コンピュータシステムのメモリ部の
構成を示すブロック図である。FIG. 8 is a block diagram illustrating a configuration of a memory unit of the parallel computer system of FIG. 4;

【図９】図６および図１０の演算処理部側ネットワーク
にあり、本発明の並列コンピュータシステムの性能測定
方式のインプットバッフアビジーを発生するリクエスト
受け付けバッフアの構成を示すブロック図である。9 is a block diagram showing a configuration of a request receiving buffer for generating an input buffer busy of the performance measurement method of the parallel computer system of the present invention, which is located in the network of the processing unit in FIGS. 6 and 10. FIG.

【図１０】図４の並列コンピュータシステムのＩＤ発行
管理部を有する演算処理部側ネットワークの構成を示す
ブロック図である。FIG. 10 is a block diagram illustrating a configuration of an arithmetic processing unit side network including an ID issue management unit of the parallel computer system of FIG. 4;

[Explanation of symbols]

１、１−０〜１−３１演算処理部２、２−０〜２−３１演算処理部側ネットワーク３、３−０〜３−３１メモリ部側ネットワーク４、４−０〜４−３１メモリ部１０〜１７ベクトルプロセッサ１８スカラプロセッサ２１リクエスト受け付けバッフア２２、３３２競合調停部２３ＩＤ発行管理部２４、２３０、３３３クロスバ部２３１論理和回路２３２インバータ２３３、２３４、２１００〜２１０８レジスタ２３５論理積回路２３６時間測定回路２３７選択回路３００〜３３１リクエストバッフア４００〜４３１メモリ２１１７レジスタ（ＷＡＲ）２１１８レジスタ（ＲＡＲ）２１１９インプットバッフア２１２０，２３４００〜２３１３１コンパレータ２１２２セレクタ２３００〜２３３１ＩＤバッフア部２３０００ＩＤバッフア２３２００デコーダ（ＷＡＤＥＣ）２３３００デコーダ（ＲＡＤＥＣ）２３４００ｖ−ビットレジスタ（ＩＤＢＶ）２３５００ＡＬＬ１検出回路２３６００エンコーダ（ＥＮＣ）２３８００レジスタ（ＩＤＷＡＲ）２３９００レジスタ（ＩＤＲＡＲ） 1, 1-0 to 1-31 Arithmetic processing unit 2, 2-0 to 2-31 Network for arithmetic processing unit 3, 3-0 to 3-31 Network for memory unit 4, 4-0 to 4-31 Memory unit 10-17 Vector processor 18 Scalar processor 21 Request receiving buffer 22, 332 Competition arbitration unit 23 ID issue management unit 24, 230, 333 Crossbar unit 231 Logical OR circuit 232 Inverter 233, 234, 2100-2108 Register 235 Logical product circuit 236 hours Measurement circuit 237 Selection circuit 300 to 331 Request buffer 400 to 431 Memory 2117 Register (WAR) 2118 Register (RAR) 2119 Input buffer 2120, 23400 to 23131 Comparator 2122 Selector 2300 to 2331 ID buffer 23000 ID Baffua 23200 decoder (WADEC) 23300 decoder (RADEC) 23400 v- bit register (IDBV) 23500 ALL1 detection circuit 23600 encoder (ENC) 23800 register (IDWAR) 23900 register (IDRAR)

Claims

(57) [Claims]

1. An n (natural number) arithmetic processing unit including m (natural number) processors and an n memory unit including n simultaneously accessible memories, respectively, and a corresponding one of the arithmetic processing units. N processing unit side networks for connecting the outputs from the respective processors of the provided corresponding processing units to the corresponding memory units, and outputs from the respective processing unit side networks provided for each of the memory units Are connected to the corresponding memories of the memory unit, and the network of the n memory units is connected to the corresponding memory of the memory unit.
When a memory access request to the same memory unit is issued simultaneously from the above processor, an input buffer for holding a subsequent memory access request for contention arbitration and a memory provided corresponding to the memory unit It has n ID buffer units for registering the accompanying information of the access request, issuing an ID number in place of this, and invalidating the registration when there is a response to the ID number, so that the input buffer becomes full. Input buffer busy signal generating means for generating an input buffer busy signal when the input buffer is busy, ID buffer busy signal generating means for generating an ID buffer busy signal when the registration of the ID buffer section is full, and the input buffer busy signal. And said I
A time measuring means for measuring the time when the input buffer is full and the registration of all the n ID buffer units is not full when supplied with the D buffer busy signal. .

2. Corresponding to n (natural number) arithmetic processing units each including m (natural number) processors, n memory units each including n simultaneously accessible memories, and the arithmetic processing units. N processing unit side networks for connecting the outputs from the respective processors of the provided corresponding processing units to the corresponding memory units, and outputs from the respective processing unit side networks provided for each of the memory units Are connected to the corresponding memories of the memory unit, and the network of the n memory units is connected to the corresponding memory of the memory unit.
When a memory access request to the same memory unit is issued simultaneously from the above processor, an input buffer for holding a subsequent memory access request for contention arbitration and a memory provided corresponding to the memory unit It has n ID buffer units for registering the accompanying information of the access request, issuing an ID number in place of this, and invalidating the registration when there is a response to the ID number, so that the input buffer becomes full. Input buffer busy signal generating means for generating an input buffer busy signal when the input buffer is busy, ID buffer busy signal generating means for generating an ID buffer busy signal when the registration of the ID buffer section is full, and the input buffer busy signal. And said I
Parallel computer having a D buffer busy signal and a time measuring means for measuring a time when the input buffer is full and the registration of one of the n ID buffer units is not full. System performance measurement method.