JP3083582B2

JP3083582B2 - Parallel processing unit

Info

Publication number: JP3083582B2
Application number: JP03098615A
Authority: JP
Inventors: 篤浩鈴木; 良雄吉岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1991-04-30
Filing date: 1991-04-30
Publication date: 2000-09-04
Anticipated expiration: 2015-09-04
Also published as: JPH05181817A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、高速に情報処理を行な
うための並列処理装置に係り、特に大量のデータを繰返
し処理する科学技術計算に好適なデータフロー型並列処
理装置および複数の処理手段間のネットワークに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing apparatus for performing high-speed information processing, and more particularly to a data flow type parallel processing apparatus and a plurality of processing means suitable for scientific and technical calculations for repeatedly processing a large amount of data. Regarding the network between.

【０００２】[0002]

【従来の技術】超並列計算機の複数の処理手段（以降、
単位処理要素；ＰＥ：Processing Elementと呼ぶ）を接
続する方式としてバス型、リングバス型、ハイパーキュ
ーブ型、ツリー型、格子型、スター型、網結合型等が知
られている。2. Description of the Related Art A plurality of processing means of a massively parallel computer (hereinafter, referred to as a plurality of processing means)
As a system for connecting a unit processing element (hereinafter referred to as PE: Processing Element), a bus type, a ring bus type, a hypercube type, a tree type, a lattice type, a star type, a network connection type, and the like are known.

【０００３】リングバス型に関しての改良は、「Loop S
tructured Computerについて」，情報処理・計算機アー
キテクチャ研究会資料，５６−１、および、「Loop Str
uctured Computerのトラヒック特性」，電子情報通信学
会論文誌’８９／３ Vol．J72-D-I Ｎｏ．３，第１４９
頁〜第１５６頁、および、「Loop Structured Computer
の特性解析」，並列処理シンポジウムＪＳＰＰ’８９
第３２１頁〜第３２８頁で知られている。[0003] Improvements regarding the ring bus type are described in "Loop S".
About tructured Computer ", Information Processing and Computer Architecture Workshop, 56-1 and" Loop Str
traffic characteristics of uctured Computer, ”IEICE Transactions '89 / 3 Vol. J72-DI No. 3, number 149
Page 156 to "Loop Structured Computer"
Characteristic Analysis ”, Parallel Processing Symposium JSPP'89
Known from pages 321 to 328.

【０００４】図８は上記論文中に開示されたLoop Struc
tured Computer（以降ＬＳＣと呼ぶ）を示している。図
８中、７１０，７２０，７３０，７４０は単位処理要素
（ＰＥ）、７１１，７２１，７３１，７４１はシフトレ
ジスタ、７５０は複数のＰＥ内のシフトレジスタの入出
力を順次接続して構成されたパイプライン型リングバス
である。特に、ホスト計算機との間でパケットの交換を
行なう制御部を構成するＰＥ７１０をＣＵ（Control Un
it）と呼ぶ。ＰＥ７２０，７３０，７４０は、記憶装置
と直接接続されていないため、ＰＥ７１０とパケットを
転送しあって記憶装置へアクセスする。パイプライン型
リングバス７５０上には、空パケット、データパケッ
ト、結果パケットが詰まっており、各ＰＥはパイプライ
ン型リングバス７１０上を流れる自ＰＥ宛のデータパケ
ットおよび結果パケットを、空パケットまたは他ＰＥ宛
の結果パケットと交換する。各ＰＥの処理は、自ＰＥ宛
のデータパケットを処理し、他ＰＥ宛の結果パケットを
作り出すことによって進行して行く。従来の技術による
ＬＳＣの処理方式には以下に述べる３種類がある。FIG. 8 shows Loop Struc disclosed in the above-mentioned paper.
tured Computer (hereinafter referred to as LSC). 8, 710, 720, 730, and 740 are unit processing elements (PEs), 711, 721, 731, and 741 are shift registers, and 750 is configured by sequentially connecting inputs and outputs of shift registers in a plurality of PEs. This is a pipeline type ring bath. In particular, the PE 710 that constitutes a control unit for exchanging packets with the host computer is controlled by a CU (Control Un
call it). Since the PEs 720, 730, and 740 are not directly connected to the storage device, the PEs transfer packets with the PE 710 to access the storage device. An empty packet, a data packet, and a result packet are packed on the pipeline type ring bus 750, and each PE transmits a data packet and a result packet addressed to the own PE flowing on the pipeline type ring bus 710 to an empty packet or another packet. Exchange with result packet addressed to PE. The processing of each PE proceeds by processing a data packet addressed to the own PE and generating a result packet addressed to the other PE. There are the following three types of LSC processing methods according to the prior art.

【０００５】（１）最初に各ＰＥの処理機能を固定して
から処理データをパイプライン的に流す方式。(1) A method in which processing data of each PE is first fixed and then processing data is flowed in a pipeline.

【０００６】（２）処理機能と処理データをともに持つ
処理パケットをパイプライン的に流す方式。(2) A method in which processing packets having both processing functions and processing data are flowed in a pipeline.

【０００７】（３）上記（１），（２）が混在する処理
方式。(3) A processing method in which the above (1) and (2) coexist.

【０００８】パイプライン型リングバスを、ＣＵを介し
て複数接続する処理形態も前記論文により知られてい
る。A processing mode for connecting a plurality of pipeline ring buses via a CU is also known from the above-mentioned paper.

【０００９】[0009]

【発明が解決しようとする課題】上記従来技術において
は、処理性能を向上させようとして一つのパイプライン
型リングバスにＰＥを増やしていくとパイプライン型リ
ングバスの輪が大きくなり、目的のＰＥにパケットが届
くまでの転送時間が長くなるという問題点があった。同
様に、処理性能の向上を目的としてＰＥを増やしていく
とパイプライン型リングバスを流れるデータパケットや
結果パケットのトラヒック量が増してＰＥ内の結果パケ
ットがパイプライン型リングバスに出力できない事態が
生じ、その結果、ＰＥ内に自ＰＥ宛パケットを取り込む
こともできなくなるために、処理がデットロックすると
いう問題点があった。また、他のパイプライン型リング
バスとは独立なパイプライン型リングバスをＣＵを介し
て接続していく方式もあるが、ＣＵには、パイプライン
型リングバス間をまたぐＰＥ間のパケットの振り分け処
理に大きな負担がかかるため、パイプライン型リングバ
ス間のパケット転送時間が長くなりシステム性能が著し
く低下するという問題点があった。In the above-mentioned prior art, if the number of PEs is increased to one pipeline type ring bus in order to improve the processing performance, the ring of the pipeline type ring bus becomes large, and the target PE is increased. However, there is a problem that the transfer time until the packet arrives becomes longer. Similarly, when the number of PEs is increased for the purpose of improving the processing performance, the traffic volume of data packets and result packets flowing through the pipeline-type ring bus increases, and the result packet in the PE cannot be output to the pipeline-type ring bus. As a result, a packet destined for the own PE cannot be captured in the PE, so that the processing is deadlocked. There is also a method in which a pipeline ring bus independent of other pipeline type ring buses is connected via a CU. However, the CU has a method of distributing packets between PEs that cross the pipeline type ring buses. Since a large load is applied to the processing, there is a problem that the packet transfer time between the pipeline type ring buses becomes long and the system performance is remarkably reduced.

【００１０】本発明の目的は、多重プログラミング環境
に対応できるＰＥ数を確保するために上記パイプライン
型リングバスのデータパケットおよび結果パケットのト
ラヒック量を容易に最適化することが可能であるネット
ワーク形態と処理方式とを提供することにある。[0010] It is an object of the present invention to provide a network configuration which can easily optimize the traffic volume of the data packet and the result packet of the pipeline type ring bus in order to secure the number of PEs which can cope with the multiple programming environment. And a processing method.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
に、本発明による並列処理装置は、複数のシフトレジス
タをリング状に接続した１層のパイプライン型リングバ
スを複数層設けるとともに、前記パイプライン型リング
バス上を流れるパケットを取り込んで処理する処理手段
を各層の各シフトレジスタに対して設け、さらに、各層
の少なくとも一つの処理手段に対してパケットの授受を
行なうパケット制御手段を設け、前記処理手段に、その
属する層の上の層の処理手段の出力を自層のパイプライ
ン型リングバスに取り込む機能を付与したものである。In order to achieve the above object, a parallel processing apparatus according to the present invention comprises a plurality of one-layer pipeline-type ring buses in which a plurality of shift registers are connected in a ring shape. Providing processing means for taking in and processing packets flowing on the pipeline type ring bus for each shift register of each layer, further providing packet control means for sending and receiving packets to at least one processing means of each layer, The processing means is provided with a function of taking in the output of the processing means of the layer above the layer to which it belongs to the pipeline ring bus of its own layer.

【００１２】前記パケットは、例えば、各処理手段に割
当てる機能を指定する機能情報を含むプログラム・パケ
ットと、該プログラム・パケットにより指定された機能
に基づき処理されるデータを含むデータ・パケットから
なる。The packet includes, for example, a program packet containing function information for designating a function to be assigned to each processing means, and a data packet containing data processed based on the function designated by the program packet.

【００１３】各処理手段は、好ましくは、前記パイプラ
イン型リングバス上を流れるパケットが自処理手段宛の
パケットであるか否かを判定する手段と、該手段の出力
に応じてそのパケットを取り込むか否かを切り換える手
段とを有する。この場合、各処理手段は、前記パイプラ
イン型リングバスからパケットを取り込む際、そのパケ
ットに代えて、上の層の処理手段からのパケットまたは
空パケットを当該パイプライン型リングバス上に乗せる
手段を有することが望ましい。Each processing means preferably determines whether or not a packet flowing on the pipeline type ring bus is a packet addressed to its own processing means, and takes in the packet in accordance with an output of the means. Means for switching whether or not to perform the operation. In this case, each processing means, when capturing a packet from the pipeline type ring bus, means for placing a packet from the processing means of an upper layer or an empty packet on the pipeline type ring bus instead of the packet. It is desirable to have.

【００１４】前記判定する手段には他の層の処理手段に
割当てられたパケットを検知する機能も付与し、該検知
時にそのパケットを下層のパイプライン型リングバスへ
スルーパスする経路を設けるようにしてもよい。The above-mentioned judging means is also provided with a function of detecting a packet allocated to the processing means of another layer, and a path for passing the packet through to the pipeline ring bus of the lower layer at the time of the detection is provided. Is also good.

【００１５】各処理手段は、前記パイプライン型リング
バス上に流れるパケットが空パケットであるか否かを判
定する手段と、該手段の出力に応じて、上の層の処理手
段からのパケットを当該パイプライン型リングバス上に
乗せるか否かを切り換える手段とを有してもよい。Each processing means includes means for determining whether or not a packet flowing on the pipeline type ring bus is an empty packet, and, in accordance with an output from the means, a packet from a processing means in an upper layer is transmitted. Means for switching whether or not to put on the pipeline ring bus.

【００１６】前記パケット制御手段は、好ましくは、第
１の演算を指定する第１のパケットと、該第１の演算の
結果を利用して行なう第２の演算を指定する第２のパケ
ットとを別個の処理手段に割当てる際、前記第１のパケ
ットを割当てる処理手段の層より下の層の処理手段に前
記第２のパケットを割当てるマッピング管理機能を有す
る。[0016] The packet control means preferably includes a first packet designating a first operation and a second packet designating a second operation performed using a result of the first operation. When allocating to the separate processing means, it has a mapping management function of allocating the second packet to processing means of a layer lower than the layer of the processing means allocating the first packet.

【００１７】本発明のその他の構成、および作用効果は
以下の記載により明らかとなろう。The other constitutions, functions and effects of the present invention will be apparent from the following description.

【００１８】[0018]

【作用】本発明は、それぞれ複数の処理手段からなる複
数の層を有し、パケット制御手段が各層の少なくとも一
つの処理手段にパケットを流すことが可能であり、ま
た、処理手段は上下のパイプライン型リングバスに接続
していることからパイプライン型リングバス間のパケッ
ト転送に処理手段を利用可能となるため、各層のパイプ
ライン型リングバスのデータパケットや結果パケットの
トラヒック量を最適な状態とすることができる。The present invention has a plurality of layers each including a plurality of processing means, the packet control means can flow a packet to at least one processing means of each layer, and the processing means comprises upper and lower pipes. Since it is connected to the line-type ring bus, processing means can be used for packet transfer between the pipeline-type ring buses, so that the traffic volume of data packets and result packets of the pipeline-type ring bus in each layer is optimized. It can be.

【００１９】[0019]

【実施例】以下、本発明の実施例について詳細に説明す
る。Embodiments of the present invention will be described below in detail.

【００２０】まず、図１に、本発明による並列処理装置
の全体構成の一例を示す。図１中、１０１はホスト計算
機、１０２はパケット制御装置、１０３は高速パケット
処理装置、１０４は記憶装置である。ホスト計算機１０
１とパケット制御装置１０２とは、記憶装置１０４を共
有している。ホスト計算機１０１とパケット制御装置１
０２との間には、ホスト計算機１０１がパケット制御装
置１０２に起動を指示する信号と、パケット制御装置１
０２がホスト計算機１０１に処理を終了したことを知ら
せる信号とが授受されいる。高速パケット処理装置１０
３は、複数のデータパスとパケット出力要求信号とでパ
ケット制御装置１０２に接続されている。ホスト計算機
１０１とパケット制御装置１０２との間のデータ転送、
およびパケット制御装置１０２と高速パケット処理装置
１０３との間のデータ転送は、全てパケットにより行わ
れる。図１のパケット制御装置１０２と高速パケット処
理装置１０３の第１列のＰＥ（後述）とが従来のＣＵに
相当する。パケットの形式にはＣＵ用、ＰＥ用にそれぞ
れ二つあり、一つは高級言語等で記述されたプログラム
をホスト計算機１０１内でコンパイルし記憶装置１０４
に格納したＣＵプログラム・パケットおよびＣＵデータ
・パケット形式、そして、もう一つは記憶装置１０４に
格納してあるＣＵプログラム・パケットおよびＣＵデー
タ・パケットをパケット制御装置１０２が高速パケット
処理装置１０３に流すために変換したＰＥプログラム・
パケットおよびＰＥデータ・パケット形式である。後者
のパケット形式の構成については後に詳述する。以下、
処理の流れの概略を説明する。FIG. 1 shows an example of the overall configuration of a parallel processing apparatus according to the present invention. In FIG. 1, 101 is a host computer, 102 is a packet control device, 103 is a high-speed packet processing device, and 104 is a storage device. Host computer 10
1 and the packet control device 102 share the storage device 104. Host computer 101 and packet control device 1
02, the host computer 101 instructs the packet control device 102 to start, and the packet control device 1
02 informs the host computer 101 that the processing has been completed. High-speed packet processing device 10
Reference numeral 3 denotes a plurality of data paths and a packet output request signal, which are connected to the packet control device 102. Data transfer between the host computer 101 and the packet control device 102,
All data transfer between the packet control device 102 and the high-speed packet processing device 103 is performed by packets. The packet control device 102 in FIG. 1 and the PEs (described later) in the first column of the high-speed packet processing device 103 correspond to a conventional CU. There are two packet formats, one for CU and the other for PE. One is a program written in a high-level language or the like, compiled in the host computer 101, and stored in the storage device 104.
The CU program packet and the CU data packet format stored in the storage device 104 and the CU program packet and the CU data packet stored in the storage device 104 are sent to the high-speed packet processing device 103 by the packet control device 102. PE program converted for
Packet and PE data packet format. The configuration of the latter packet format will be described later in detail. Less than,
An outline of the processing flow will be described.

【００２１】ホスト計算機１０１は、パケット制御装置
１０２に対してパケット処理実行を指示する。この指示
に応じて、パケット制御装置１０２は記憶装置１０４に
格納してあるＣＵプログラム・パケットを読みだし高速
パケット処理装置１０３に流すためのＰＥプログラム・
パケットに変換した後、該ＰＥプログラム・パケットを
高速パケット処理装置１０３にパイプライン的に流し始
める。高速パケット処理装置１０３内部では、ＰＥプロ
グラム・パケットの内容に従いマッピングが行なわれ
る。ここで、マッピングとは、演算機能を指定する各プ
ログラム・パケットを特定のＰＥに割当てる処理をい
う。プログラムパケットを受け取ったＰＥは割当て完了
報告を当該プログラムパケットに付加してパケット制御
装置１０２へ返送する。パケット制御装置１０２は、割
当て完了報告をカウントして全てのマッピングが終了し
たことを認識する。その時点で、パケット制御装置１０
２は、記憶装置１０４に格納されているＣＵデータ・パ
ケットを読みだして高速パケット処理装置１０３に流す
ＰＥデータ・パケットに変換し、該ＰＥデータ・パケッ
トを高速パケット処理装置１０３にパイプライン的に流
す。全てのデータ・パケットの処理が終了した時点で、
パケット制御装置１０２はホスト計算機１０１に対して
終了報告を行い、同時に高速パケット処理装置１０３の
当該プログラムのマッピングの解除を行うためにプログ
ラム消去パケットを流し、当該プログラムの処理が終了
する。以上が本装置の処理の大まかな流れである。The host computer 101 instructs the packet control device 102 to execute packet processing. In response to this instruction, the packet control device 102 reads out the CU program packet stored in the storage device 104 and sends the CU program packet to the high-speed packet processing device 103.
After the conversion into the packet, the PE program packet starts to flow to the high-speed packet processing device 103 in a pipeline manner. In the high-speed packet processing device 103, mapping is performed according to the contents of the PE program packet. Here, the mapping refers to a process of allocating each program packet specifying an arithmetic function to a specific PE. The PE that has received the program packet adds an assignment completion report to the program packet and returns it to the packet control device 102. The packet control device 102 counts the assignment completion report and recognizes that all mapping has been completed. At that time, the packet control device 10
2 reads the CU data packet stored in the storage device 104, converts it into a PE data packet to be sent to the high-speed packet processing device 103, and converts the PE data packet to the high-speed packet processing device 103 in a pipeline manner. Shed. When all data packets have been processed,
The packet control device 102 reports an end to the host computer 101, and at the same time, sends a program erasure packet to release the mapping of the program of the high-speed packet processing device 103, and the process of the program ends. The above is the general flow of the processing of the present apparatus.

【００２２】次に、高速パケット処理装置１０３内部の
動作を説明する。図２は、３行４列の高速パケット処理
装置（トーラス型高速パケット処理装置という）１０３
の構成の一例を、パケット制御装置１０２および記憶装
置１０４とともに示している。２１１，２１２，２１
３，２１４，２２１，２２２，２２３，２２４，２３
１，２３２，２３３，２３４は、それぞれＰＥである。
図２中、それぞれパイプライン型リングバス１０，２
０，３０で接続している一つの行を層と呼び、ＰＥ
（１，１），ＰＥ（１，２），ＰＥ（１，３），ＰＥ
（１，４）を第１層、ＰＥ（２，１），ＰＥ（２，
２），ＰＥ（２，３），ＰＥ（２，４）を第２層、ＰＥ
（３，１），ＰＥ（３，２），ＰＥ（３，３），ＰＥ
（３，４）を第３層と呼ぶ。各層の第１列のＰＥ（１，
１），ＰＥ（２，１），ＰＥ（３，１）は、それぞれパ
ケット制御装置１０２と接続されている。さらにこれら
のＰＥは、パケット制御装置１０２に対するパケットと
り込み要求信号を送出する機能を有する。ＰＥ（１，
１），ＰＥ（２，１），ＰＥ（３，１）を除く全てのＰ
Ｅは、上層のパイプライン型リングバスからパケットを
取り込み、パケットを処理し、下層のパイプライン型リ
ングバスへ処理済みのパケットを送り出すこと、およ
び、上層のパイプライン型リングバスからパケットを取
り込み、下層のパイプライン型リングバスへパケットを
スルーすることが可能である。ＰＥ（１，１），ＰＥ
（２，１），ＰＥ（３，１）の上下の接続バスはパケッ
ト制御装置１０２に接続されており、当該接続バスを通
してパケット制御装置１０２とのパケットの交換が行わ
れる。Next, the operation inside the high-speed packet processing device 103 will be described. FIG. 2 shows a high-speed packet processing apparatus having three rows and four columns (referred to as a torus-type high-speed packet processing apparatus) 103.
Is shown together with the packet control device 102 and the storage device 104. 211, 212, 21
3,214,221,222,223,224,23
1,232,233,234 are PEs, respectively.
In FIG. 2, the pipeline type ring buses 10, 2 are respectively shown.
One row connected by 0, 30 is called a layer, and PE
(1, 1), PE (1, 2), PE (1, 3), PE
(1,4) is the first layer, PE (2,1), PE (2,
2), PE (2,3), PE (2,4) in the second layer, PE
(3,1), PE (3,2), PE (3,3), PE
(3, 4) is called a third layer. The first row of PEs in each layer (1,
1), PE (2, 1), and PE (3, 1) are connected to the packet control device 102, respectively. Further, these PEs have a function of transmitting a packet capture request signal to the packet control device 102. PE (1,
1), all P except PE (2,1), PE (3,1)
E captures packets from the upper pipelined ring bus, processes the packets, sends processed packets to the lower pipelined ring bus, and captures packets from the upper pipelined ring bus; It is possible to pass a packet to the lower pipeline ring bus. PE (1, 1), PE
The connection buses above and below (2, 1) and PE (3, 1) are connected to the packet controller 102, and packets are exchanged with the packet controller 102 via the connection bus.

【００２３】尚、ｍ行ｎ列のトーラス型高速パケット処
理装置にも拡張することができる。また、第３層から第
１層への接続バスを除去した非トーラス型の実施例も考
えられる。この場合、第３層から第１層へのパケットの
転送はパケット制御装置１０２を介して行なうことにな
る。The present invention can be extended to a torus-type high-speed packet processing apparatus having m rows and n columns. Further, a non-torus type embodiment in which a connection bus from the third layer to the first layer is removed is also conceivable. In this case, the transfer of the packet from the third layer to the first layer is performed via the packet control device 102.

【００２４】図３に、ＰＥの一構成例を示す。図３中、
３０１，３０２，３０３はシフトレジスタを構成するラ
ッチ、３１０は空パケット判定回路、３２０は処理パケ
ット判定回路、３１１，３２１，３２２は選択回路、３
２３はスルーパス、３３０は空パケット生成器、３４０
は処理待ちパケットキュー、３４１は演算入力ラッチ
Ａ、３４２は演算入力ラッチＢ、３５０は機能決定情報
レジスタ、３６０は演算・処理部、３７０はデータ・パ
ケット生成回路、３８０は出力待ちパケットキュー、３
７２はパケット送出要求信号である。パケット送出要求
信号３７２は、図２の第１列のＰＥ２１１，２２１，２
３１にのみ必要とされる信号であるが、各ＰＥを同一構
成とするためにはすべてのＰＥに設けておいてよい。ま
た、３００はラッチ３０１へつながるパイプライン型リ
ングバス入力端子、３０４はラッチ３０３の出力を受け
るパイプライン型リングバス出力端子である。隣接する
ＰＥ間で、相互にパイプライン型リングバス入力端子３
００とパイプライン型リングバス出力端子３０４とを接
続することにより、パイプライン型リングバスが構成さ
れる。３７１はＰＥデータ・パケット出力端子、３８１
はＰＥプログラム／データ・パケット入力端子である。
図２で上下に隣接するＰＥ間で、相互にＰＥデータ・パ
ケット出力端子３７１とＰＥプログラム／データ・パケ
ット入力端子３８１とをカスケード接続することによ
り、同列のＰＥがリング状に接続される。つまり、ＰＥ
内部のパイプライン型リングバスから取り出した（パケ
ット交換した）データ・パケットが、そのＰＥ内の処理
待ちパケットキュー３４０から演算・処理部３６０に渡
り、演算結果はデータ・パケット生成回路３７０でデー
タ・パケットとなり、ＰＥデータ・パケット出力端子３
７１から隣接する下層のＰＥの出力待ちパケットキュー
３８０にキューイングされ、下層のパイプライン型リン
グバスへと転送（パケット交換）されるようになってい
る。なお、一つのＰＥ内の出力待ちパケットキュー３８
０を同一ＰＥ内のデータ・パケット生成回路３７０の後
段に設けてもよいが、図３のようにパイプライン型リン
グバスをＰＥ内部に取り込んだ構造とすることにより、
出力待ちパケットキュー３８０とパイプライン型リング
バスとの間の物理的距離、および処理待ちパケットキュ
ー３４０とパイプライン型リングバスとの間の物理的距
離を短くし、パイプライン型リングバスの高速化を図る
ことができる。FIG. 3 shows an example of the configuration of the PE. In FIG.
Latches 301, 302, and 303 constitute a shift register, 310 is an empty packet determination circuit, 320 is a processing packet determination circuit, 311, 321 and 322 are selection circuits,
23 is a through path, 330 is an empty packet generator, 340
Is a processing queue, 341 is a calculation input latch A, 342 is a calculation input latch B, 350 is a function determination information register, 360 is a calculation / processing unit, 370 is a data packet generation circuit, 380 is an output waiting packet queue,
72 is a packet transmission request signal. The packet transmission request signal 372 is the PE 211, 221, 2 in the first column of FIG.
Although the signal is required only for 31, it may be provided for all PEs so that each PE has the same configuration. Reference numeral 300 denotes a pipeline type ring bus input terminal connected to the latch 301, and reference numeral 304 denotes a pipeline type ring bus output terminal for receiving the output of the latch 303. Pipeline type ring bus input terminals 3 between adjacent PEs
By connecting 00 and the pipeline type ring bus output terminal 304, a pipeline type ring bus is configured. 371 is a PE data packet output terminal, 381
Is a PE program / data packet input terminal.
By cascading the PE data packet output terminal 371 and the PE program / data packet input terminal 381 mutually between vertically adjacent PEs in FIG. 2, PEs in the same row are connected in a ring. That is, PE
The data packet taken out (packet exchanged) from the internal pipeline type ring bus passes from the processing waiting packet queue 340 in the PE to the operation / processing unit 360, and the operation result is sent to the data packet generation circuit 370 by the data packet generation circuit 370. Becomes a packet, PE data packet output terminal 3
From 71, the packet is queued in the output waiting packet queue 380 of the adjacent lower layer PE, and is transferred (packet exchange) to the lower layer pipeline ring bus. Note that the output waiting packet queue 38 in one PE
0 may be provided at the subsequent stage of the data packet generation circuit 370 in the same PE, but by adopting a structure in which a pipeline type ring bus is incorporated in the PE as shown in FIG.
The physical distance between the output waiting packet queue 380 and the pipeline type ring bus and the physical distance between the processing waiting packet queue 340 and the pipeline type ring bus are shortened, and the speed of the pipeline type ring bus is increased. Can be achieved.

【００２５】図４および図５はＰＥプログラム／データ
・パケット形式の一実施例を示している。４００，４２
０はタスク番号（ＴＮ）、４０１，４２１はパケット番
号（ＰＮ）である。４０２，４２２は、そのパケットが
割当てられるＰＥを指定する処理先ＰＥアドレス（ＬＮ
ＰＥ）である。４０３は、プログラム・パケットの機能
を示す演算コード（ＦＣ）、４０４は発火条件（Ｅ
Ｃ）、４０５は出力データ型（ＤＴ）、４２４は後述す
るＲＴデータの型（ＤＴ）、４０６は演算結果を出力す
る個数（ＯＣ）、４０７，４０８，４０９，４１０は出
力先ＰＥアドレス（ＬＮＰＥ）である。また、４２３は
コンディションコード（ＣＣ）、４２５はＣＵがデータ
管理するためのデータのシリアル番号（ＤＮ）、４２６
は演算データおよび結果データ（ＲＴ）を示している。
出力先ＰＥアドレス４０７，４０８，４０９，４１０に
は、演算器の複数の入力ポートのいずれ（Ａ側またはＢ
側）へ出力されるかを示す情報も含まれている。発火条
件４０４の左のビットが‘１’のとき演算入力Ａが揃
い、右のビットが‘１’のとき演算入力Ｂが揃っている
ことを意味する。演算コード４０３が‘ＤＡＴＡ’かつ
発火条件４０４が‘１０’の時、４０７フィールドおよ
び４０８フィールドが演算入力Ａの定数として、また、
演算コード４０３が‘ＤＡＴＡ’かつ発火条件４０４が
‘０１’の時、４０９フィールドおよび４１０フィール
ドが演算入力Ｂの定数となる。空パケットはタスク番号
４００，４２０が‘０’であるときであり、また、プロ
グラム消去パケットはＰＥデータ・パケットのタスク番
号４２０が‘０’以外であり、かつパケット番号４２１
が‘０’であるときである。コンディションコード４２
３が‘０’以外の場合、そのＰＥデータ・パケットの演
算入力データ４２６に演算エラーがあることを示してい
る。出力データ型４０６は、出力データが実数であるか
整数であるかを、それぞれ‘Ｆ’，‘Ｉ’で示す。FIGS. 4 and 5 show one embodiment of the PE program / data packet format. 400, 42
0 is a task number (TN), and 401 and 421 are packet numbers (PN). Reference numerals 402 and 422 denote processing destination PE addresses (LN) that specify the PE to which the packet is assigned.
PE). 403 is an operation code (FC) indicating the function of the program packet, and 404 is a firing condition (E
C), 405 is an output data type (DT), 424 is an RT data type (DT) described later, 406 is the number of output results (OC), 407, 408, 409, 410 are output destination PE addresses (LNPE) ). 423 is a condition code (CC); 425 is a data serial number (DN) for data management by the CU;
Indicates operation data and result data (RT).
The output destination PE addresses 407, 408, 409, and 410 have any of a plurality of input ports (A side or B
Side) is also included. When the left bit of the firing condition 404 is “1”, the arithmetic inputs A are aligned, and when the right bit is “1”, the arithmetic inputs B are aligned. When the operation code 403 is “DATA” and the firing condition 404 is “10”, the 407 field and the 408 field are constants of the operation input A, and
When the operation code 403 is “DATA” and the firing condition 404 is “01”, the fields 409 and 410 are constants of the operation input B. An empty packet is when the task numbers 400 and 420 are “0”, and a program erasure packet is when the task number 420 of the PE data packet is other than “0” and the packet number 421
Is '0'. Condition code 42
When 3 is other than '0', it indicates that there is an operation error in the operation input data 426 of the PE data packet. The output data type 406 indicates whether the output data is a real number or an integer by 'F' and 'I', respectively.

【００２６】図６は、ＰＥの動作を説明するためのプロ
グラムの一例である。同図（ａ）は高級言語ＦＯＲＴＲ
ＡＮで記述した１次元配列の乗算と加算を行うプログラ
ム、同図（ｂ）は同図（ａ）のプログラムをホスト計算
機がコンパイルして得られたＣＵパケットに対してさら
にパケット制御装置１０２が変換処理を行なって得られ
たＰＥプログラム・パケットを示している。同図（ｃ）
は同図（ａ）のＰＥプログラム・パケットに対応するＰ
Ｅデータ・パケットを示している。FIG. 6 is an example of a program for explaining the operation of the PE. Figure (a) shows the high-level language FORTR.
A program for performing multiplication and addition of a one-dimensional array described in AN. FIG. 12B shows a packet controller 102 further converting a CU packet obtained by compiling the program shown in FIG. It shows a PE program packet obtained by performing the processing. Figure (c)
Is a P corresponding to the PE program packet shown in FIG.
5 shows an E data packet.

【００２７】図６（ｂ）において、ＰＥプログラム・パ
ケット５０１は、タスク番号が‘１’でパケット番号が
‘１’、演算コードが乗算‘＊’（この場合、Ａ（ｉ）
＊Ｂ（ｉ））であり、演算結果を実数としてＰＥアドレ
ス‘２３’の演算器入力Ａ側に出力するパケットを表
し、ＰＥアドレス‘１２’にマッピングされることを示
している。パケット５０２は、タスク番号が‘１’でパ
ケット番号が‘２’、演算コードが乗算‘＊’（この場
合、パケット５０１の計算結果＊Ｓ）であり、演算結果
を実数としてＰＥアドレス３４の演算器入力Ａ側に出力
するパケットを表し、ＰＥアドレス‘２３’にマッピン
グされることを示している。パケット５０３は、タスク
番号が‘１’でパケット番号が‘３’であり、ＰＥアド
レス‘２３’の演算器のＡ側に格納する定数ＳをＰＥア
ドレス２３にマッピングすること表している。パケット
５０４は、タスク番号が‘１’でパケット番号が
‘４’、演算コードが加算‘＋’（この場合、パケット
５０２の計算結果＋Ｃ（ｉ））であり、演算結果を実数
としてＰＥアドレス１１の演算器入力Ａ側に出力するパ
ケットを表し、ＰＥアドレス‘３４’にマッピングされ
ることを示している。パケット５０５は、タスク番号が
‘１’、パケット番号が‘５’で、演算器のＢ側入力に
パケット５０４の演算結果が送られて来たとき、それを
パケット制御装置１０２に転送するパケットを表し、Ｐ
Ｅアドレス１１にマッピングされることを示している。In FIG. 6B, the PE program packet 501 has a task number of “1”, a packet number of “1”, and an operation code of “*” (in this case, A (i)
* B (i)), which represents a packet to be output to the operation unit input A side of the PE address '23' with the operation result as a real number, and indicates that the packet is mapped to the PE address '12'. In the packet 502, the task number is “1”, the packet number is “2”, and the operation code is multiplication “*” (in this case, the calculation result * S of the packet 501). Represents a packet to be output to the device input A side, and indicates that the packet is mapped to the PE address '23'. The packet 503 has a task number of “1” and a packet number of “3”, and indicates that a constant S stored on the A side of the arithmetic unit having the PE address “23” is mapped to the PE address 23. In the packet 504, the task number is “1”, the packet number is “4”, the operation code is “+” (in this case, the calculation result of the packet 502 + C (i)), and the calculation result is a real number and the PE address 11 Indicates that the packet is output to the computing unit input A side, and is mapped to the PE address '34'. The packet 505 has a task number of “1”, a packet number of “5”, and, when the operation result of the packet 504 is sent to the B-side input of the arithmetic unit, the packet to be transferred to the packet control device 102. Represents, P
This indicates that the data is mapped to the E address 11.

【００２８】図６（ｃ）において、パケット５１０は、
タスク番号が‘１’、パケット番号が‘６’、転送先Ｐ
Ｅアドレスが‘１２’のＡ側であり、ＰＥ内演算器入力
Ａ側のＲＴデータの型が実数であることを表している。
パケット５１１は、タスク番号が‘１’でパケット番号
が‘７’、転送先ＰＥアドレスが‘１２’のＢ側であ
り、ＰＥ内演算器入力Ｂ側のＲＴデータの型が実数であ
ることを表している。パケット５１２は、タスク番号が
‘１’でパケット番号が‘８’、転送先ＰＥアドレスが
‘３４’のＢ側であり、ＰＥ内演算器入力Ｂ側のＲＴデ
ータの型が実数であることを表している。５１０，５１
１，５１２の形式のＰＥデータ・パケットが、Ａ
（１），Ｂ（１），Ｃ（１）からＡ（１００），Ｂ（１
００），Ｃ（１００）まで、パケット制御装置１０２に
より生成される。最後に、プログラム消去パケット５１
３が生成される。In FIG. 6C, the packet 510 is
Task number is '1', packet number is '6', transfer destination P
The E address is “12” on the A side, which indicates that the type of the RT data on the input A side of the processing unit in the PE is a real number.
The packet 511 indicates that the task number is “1”, the packet number is “7”, the transfer destination PE address is “B”, the B side, and the type of RT data on the B input side of the PE in the PE is a real number. Represents. The packet 512 indicates that the task number is “1”, the packet number is “8”, and the destination PE address is “34” on the B side, and that the type of RT data on the B input side of the PE in the PE is a real number. Represents. 510,51
The PE data packet in the format of 1,512 is A
(1), B (1), C (1) to A (100), B (1
00) and C (100) are generated by the packet control device 102. Finally, the program erase packet 51
3 is generated.

【００２９】図７は、図６（ｂ）に示したＰＥプログラ
ム・パケット群を高速パケット処理装置１０３にマッピ
ングした状態を示している。すなわち、ＰＥプログラム
・パケット５０１，５０２，５０３，５０４，５０５
は、それぞれＰＥアドレス‘１２’，‘２３’，‘２
３’，‘３４’，‘１１’に割当てられている。パケッ
ト制御装置１０２は、各パケットを、通常その割当て先
のＰＥの属する層の第１列のＰＥを介してパイプライン
型リングバスに流すが、当該層の第１列のＰＥの出力待
ちパケットキュー３８０が込み合っている等の場合、他
の層から流すことも可能である。パケット処理装置１０
２から流されるＰＥデータパケットは主に実線上を流れ
る。また、ＰＥ内処理待ちパケットキュー３４０が満杯
で取り込まれなかったパケットは点線で示されるパイプ
ライン型リングバスをＰＥ内処理待ちキューが空くまで
回り続ける。FIG. 7 shows a state in which the group of PE program packets shown in FIG. That is, the PE program packets 501, 502, 503, 504, 505
Are the PE addresses' 12 ',' 23 'and' 2, respectively.
3 ',' 34 ', and' 11 '. The packet control device 102 normally causes each packet to flow to the pipeline-type ring bus via the PE in the first row of the layer to which the PE to which the packet is assigned, but waits for the output queue of the PE in the first row of the layer. If 380 is crowded, it is also possible to flow from another layer. Packet processing device 10
The PE data packet flowing from 2 flows mainly on a solid line. Packets that are not taken in because the processing queue in the PE queue 340 are full continue to rotate around the pipeline ring bus indicated by the dotted line until the processing queue in the PE becomes empty.

【００３０】図３のＰＥの構成、図４のＰＥプログラム
・パケット形式、および図９のフローチャートを参照
し、具体的なマッピングの動作について説明する。The specific mapping operation will be described with reference to the configuration of the PE in FIG. 3, the PE program packet format in FIG. 4, and the flowchart in FIG.

【００３１】（１）まず、パケット制御装置１０２は記
憶装置１０４からＣＵプログラム・パケットを読みだ
し、これをＰＥプログラム・パケットに変換した後（Ｓ
１）、処理先ＰＥアドレス４０２に従い、パケット制御
装置１０２と接続されているＰＥの一つへそのＰＥプロ
グラム・パケットを転送する（Ｓ２）。(1) First, the packet control device 102 reads a CU program packet from the storage device 104 and converts it into a PE program packet (S
1) According to the processing destination PE address 402, the PE program packet is transferred to one of the PEs connected to the packet control apparatus 102 (S2).

【００３２】（２）このＰＥプログラム・パケットは、
当該ＰＥの出力待ちパケットキュー３８０に格納され
る。(2) This PE program packet is
It is stored in the output waiting packet queue 380 of the PE.

【００３３】（３）空パケット判定回路３１０は、パイ
プライン型リングバスを常に監視しており（Ｓ３）、タ
スク番号が‘０’、つまり空パケットであることを検知
した場合、セレクタ３１１をパイプライン型リングバス
の流れから出力待ちパケットキュー３８０の出力へ切り
替え（Ｓ１５）、出力待ちパケットキュー３８０に格納
してあるＰＥプログラム・パケットをパイプライン型リ
ングバスに乗せる。出力待ちパケットキュー３８０が空
の場合（Ｓ１４）、セレクタ３１１の切り替えは起こら
ない。(3) The empty packet determination circuit 310 constantly monitors the pipeline type ring bus (S3), and when detecting that the task number is "0", that is, an empty packet, sets the selector 311 to the pipe. The flow from the line-type ring bus is switched to the output of the output-waiting packet queue 380 (S15), and the PE program packet stored in the output-waiting packet queue 380 is put on the pipeline-type ring bus. When the output waiting packet queue 380 is empty (S14), switching of the selector 311 does not occur.

【００３４】（４）処理パケット判定回路３２０は、パ
イプライン型リングバスを常に監視しており（Ｓ４）、
ＰＥアドレスが機能決定情報レジスタ３５０に格納され
ているＰＥアドレスと等しい場合、セレクタ３２１をパ
イプライン型リングバスの流れから処理待ちパケットキ
ュー３４０の方へ、また、セレクタ３２２をパイプライ
ン型リングバスの流れから空パケット生成器３３０の方
へ同時に切り替え（Ｓ８）、処理待ちパケットキュー３
４０に当該ＰＥプログラム・パケットを格納すると同時
に、パイプライン型リングバスに空パケットを乗せる。
なお、機能決定情報レジスタ３５０のＰＥアドレスフィ
ールドには、そのＰＥのＰＥアドレスをシステム立ち上
げ時等に格納し、あるいは固定的に設定しておく。空パ
ケットをパイプライン型リングバスに乗せる際、出力待
ちパケットキュー３８０にパケットがあれば（Ｓ９）、
セレクタ３１１をパイプライン型リングバスの流れから
出力待ちパケットキュー３８０の方に切り替え（Ｓ１
２）、空パケットの代わりに当該パケットキュー３８０
内のパケットをパイプライン型リングバスに乗せる。処
理パケット判定回路３２０は、パイプライン型リングバ
ス上を流れるパケット（ラッチ３０１にラッチされたも
の）の割当て先ＰＥの層が自層と異なるか否かも判定す
る（Ｓ５）。異なると判定された場合には、セレクタ３
２１を自層のパイプライン型リングバスの流れからスル
ーパス３２３の方へ切り替え（Ｓ７）、下層の出力待ち
パケットキュー３８０を介して下層のパイプライン型リ
ングバスへ当該パケットを流し込む。(4) The processing packet judgment circuit 320 constantly monitors the pipeline type ring bus (S4).
When the PE address is equal to the PE address stored in the function determination information register 350, the selector 321 is moved from the flow of the pipeline type ring bus to the packet queue 340 to be processed, and the selector 322 is moved to the pipeline ring bus. The flow is simultaneously switched from the flow to the empty packet generator 330 (S8), and the queued packet queue 3 for processing is processed.
At the same time as storing the PE program packet in 40, an empty packet is put on the pipeline type ring bus.
In the PE address field of the function determination information register 350, the PE address of the PE is stored when the system is started, or is fixedly set. When placing an empty packet on the pipeline ring bus, if there is a packet in the output waiting packet queue 380 (S9),
The selector 311 is switched from the flow of the pipeline type ring bus to the output waiting packet queue 380 (S1).
2) instead of the empty packet, the packet queue 380
The packets in the pipeline ring bus. The processing packet determination circuit 320 also determines whether or not the layer of the PE to which the packet (latched by the latch 301) flowing on the pipeline type ring bus is assigned is different from its own layer (S5). If it is determined that they are different, the selector 3
21 is switched from the flow of the pipeline ring bus of the own layer to the through path 323 (S7), and the packet is flowed into the pipeline ring bus of the lower layer via the output queue queue 380 of the lower layer.

【００３５】（５）上記処理（４）で処理待ちパケット
キュー３４０に格納したＰＥプログラム・パケットの演
算コード４０３が‘ＤＡＴＡ’以外ならば（Ｓ１０）、
そのＰＥプログラム・パケットを機能決定情報レジスタ
３５０に格納する（Ｓ１１）。演算コード４０３が‘Ｄ
ＡＴＡ’であれば、このＰＥプログラム・パケットの発
火条件４０４に従い、定数を演算入力ラッチＡ３４１ま
たは演算入力ラッチＢ３４２に格納する（Ｓ１３）。(5) If the operation code 403 of the PE program packet stored in the processing waiting packet queue 340 in the above processing (4) is other than 'DATA' (S10)
The PE program packet is stored in the function determination information register 350 (S11). Operation code 403 is' D
If it is ATA ', the constant is stored in the operation input latch A341 or the operation input latch B342 according to the firing condition 404 of the PE program packet (S13).

【００３６】（６）パケット制御装置１０２の制御下
で、ＣＵプログラム・パケットがなくなるまで（Ｓ
６）、上記（１）〜（５）の処理が並列に繰返され、全
てのプログラム・パケットが各層の各ＰＥに割り当てら
れる。(6) Under the control of the packet controller 102, until there is no CU program packet (S
6) The above processes (1) to (5) are repeated in parallel, and all program packets are allocated to each PE in each layer.

【００３７】以上がマッピング動作である。次に、図３
のＰＥの構成、図４のＰＥプログラム・パケット形式、
および図１０のフローチャートを参照し、具体的なマッ
ピングの動作について説明する。The above is the mapping operation. Next, FIG.
Configuration of PE, PE program packet format of FIG. 4,
A specific mapping operation will be described with reference to FIG.

【００３８】（１）パケット制御装置１０２は、記憶装
置１０４からＣＵデータ・パケットを読みだし、これを
ＰＥデータ・パケットに変換した後（Ｓ２１）、処理先
ＰＥアドレス４２２に従い、パケット制御装置１０２と
接続しているＰＥの一つへそのＰＥデータ・パケットを
転送する（Ｓ２２）。(1) The packet control device 102 reads out a CU data packet from the storage device 104, converts it into a PE data packet (S21), and, based on the processing destination PE address 422, communicates with the packet control device 102. The PE data packet is transferred to one of the connected PEs (S22).

【００３９】（２）このＰＥデータ・パケットは当該Ｐ
Ｅの出力待ちパケットキュー３８０に格納される。(2) The PE data packet is
E is stored in the output queue queue 380.

【００４０】（３）空パケット判定回路３１０は、パイ
プライン型リングバスを常に監視しており（Ｓ２３）、
タスク番号が‘０’、つまり空パケットであることを検
知した場合、セレクタ３１１をパイプライン型リングバ
スの流れから出力待ちパケットキュー３８０の出力へ切
り替え（Ｓ３９）、出力待ちパケットキュー３８０に格
納してあるＰＥデータ・パケットをパイプライン型リン
グバスに乗せる。出力待ちパケットキュー３８０が空の
場合（Ｓ３８）、セレクタ３１１の切り替えは起こらな
い。(3) The empty packet determination circuit 310 constantly monitors the pipeline type ring bus (S23).
When detecting that the task number is '0', that is, an empty packet, the selector 311 is switched from the flow of the pipeline type ring bus to the output of the output waiting packet queue 380 (S39) and stored in the output waiting packet queue 380. The loaded PE data packet is put on a pipeline type ring bus. When the output waiting packet queue 380 is empty (S38), the switching of the selector 311 does not occur.

【００４１】（４）処理パケット判定回路３２０はパイ
プライン型リングバスを常に監視しており（Ｓ２４）、
ＰＥアドレスが機能決定情報レジスタ３５０に格納され
ているＰＥアドレスと等しい場合、セレクタ３２１をパ
イプライン型リングバスの流れから処理待ちパケットキ
ュー３４０の方へ、また、セレクタ３２２をパイプライ
ン型リングバスの流れから空パケット生成器３３０の方
へ同時に切り替え（Ｓ２５）、処理待ちパケットキュー
３４０にそのＰＥプログラム・パケットを格納すると同
時に、パイプライン型リングバスに空パケットを乗せ
る。この時、出力待ちパケットキュー３８０にパケット
があれば（Ｓ２６）、セレクタ３１１をパイプライン型
リングバスの流れから出力待ちパケットキュー３８０の
方に切り替え（Ｓ３５）、当該パケットキュー３８０内
のパケットをパイプライン型リングバスに乗せる。処理
パケット判定回路３２０は、割当て先ＰＥの属する層が
自層と異なる場合には（Ｓ３６）、セレクタ３２１を自
層のパイプライン型リングバスの流れからスルーパス３
２３の方へ切り替え（Ｓ３７）、下層の出力待ちパケッ
トキュー３８０を介して下層のパイプライン型リングバ
スへそのパケットを流し込む。(4) The processing packet judgment circuit 320 constantly monitors the pipeline type ring bus (S24).
When the PE address is equal to the PE address stored in the function determination information register 350, the selector 321 is moved from the flow of the pipeline type ring bus to the packet queue 340 to be processed, and the selector 322 is moved to the pipeline ring bus. The flow is simultaneously switched to the empty packet generator 330 (S25), and the PE program packet is stored in the processing waiting packet queue 340, and at the same time, the empty packet is put on the pipeline type ring bus. At this time, if there is a packet in the output waiting packet queue 380 (S26), the selector 311 is switched from the flow of the pipeline type ring bus to the output waiting packet queue 380 (S35), and the packet in the packet queue 380 is piped. Put it on a line-type ring bath. When the layer to which the assignment destination PE belongs is different from the own layer (S36), the processing packet determination circuit 320 determines the selector 321 from the flow of the pipeline ring bus of the own layer to the through path 3
23 (S37), and the packet is sent to the lower-layer pipeline-type ring bus via the lower-layer output waiting packet queue 380.

【００４２】（５）上記処理（４）で処理待ちパケット
キュー３４０に格納したＰＥデータ・パケットのパケッ
ト番号４２１が‘０’のとき（Ｓ２７）、機能決定情報
レジスタ３５０を初期状態にする（Ｓ３４）。ＰＥデー
タ・パケットのコンディション・コード４２３が‘０’
であれば（Ｓ２９）、演算・処理部３６０の、ＰＥアド
レス４２２で指定された演算入力側にＲＴデータ４２６
を流しこむ。コンディション・コード４２３が‘０’以
外、つまり前の処理で演算エラーがあったときには、そ
のコンディション・コード４２３に定められた例外処理
を行う（Ｓ３３）。なお、機能決定情報レジスタ３５０
内の発火条件の‘０’ビットは、演算データが所定の入
力側に取り込まれたとき‘１’へ変えられる。(5) When the packet number 421 of the PE data packet stored in the processing waiting packet queue 340 in the above process (4) is '0' (S27), the function determination information register 350 is initialized (S34). ). The condition code 423 of the PE data packet is '0'
If (S29), the RT data 426 is input to the operation input side of the operation / processing unit 360 designated by the PE address 422.
Pour in. If the condition code 423 is other than “0”, that is, if there is an operation error in the previous processing, exception processing defined in the condition code 423 is performed (S33). Note that the function determination information register 350
The '0' bit of the firing condition is changed to '1' when the operation data is taken into a predetermined input side.

【００４３】（６）機能決定情報レジスタ３５０内の発
火条件が‘１１’となった場合（Ｓ２８）、つまり、演
算データが揃った場合、そのＰＥデータ・パケットは演
算・処理部３６０で処理され（Ｓ３０）、演算結果をデ
ータ・パケット生成回路３７０に送出して、ＰＥデータ
・パケットを生成する（Ｓ３１）。(6) When the firing condition in the function determination information register 350 has become “11” (S 28), that is, when the operation data is complete, the PE data packet is processed by the operation / processing section 360. (S30) The calculation result is sent to the data packet generation circuit 370 to generate a PE data packet (S31).

【００４４】（７）データ・パケット生成回路３７０
は、機能決定情報レジスタ３５０の演算結果を出力する
個数４０６と、出力先ＰＥアドレス４０７，４０８，４
０９，４１０とに従い、出力する個数分のＰＥデータ・
パケットを生成する。これらのＰＥデータ・パケットは
下層のＰＥの出力待ちパケットキュー３８０に格納され
る。(7) Data packet generation circuit 370
Is the number 406 of output of the operation result of the function determination information register 350, and the output destination PE addresses 407, 408, 4
09, 410, the number of PE data to be output
Generate a packet. These PE data packets are stored in the output waiting packet queue 380 of the lower layer PE.

【００４５】（８）上記（３）〜（７）の動作が全ＰＥ
で並列におこなわれる。但し、機能決定情報レジスタ３
５０の演算コード４０３がＯＵＴ機能であるＰＥでは、
機能決定情報レジスタ３５０内の発火条件が‘１１’の
場合、データ・パケット生成回路３７０はパケット制御
装置１０２に対してパケット送出要求３７２を送出する
と共に該データ・パケット生成回路３７０が生成したＰ
Ｅデータ・パケットをパケット制御装置１０２に流し込
む。そして、パケット制御装置１０２は記憶装置１０４
に当該結果を格納する。(8) The above operations (3) to (7) are performed for all PEs.
Are performed in parallel. However, the function determination information register 3
In the PE where the operation code 403 of 50 is the OUT function,
When the firing condition in the function determination information register 350 is “11”, the data packet generation circuit 370 sends a packet transmission request 372 to the packet control device 102 and the P generated by the data packet generation circuit 370.
The E data packet flows into the packet controller 102. Then, the packet control device 102
To store the result.

【００４６】（９）パケット制御装置１０２の制御下
で、ＣＵデータ・パケットがなくなるまで（Ｓ３２）、
上記（１）〜（８）までの処理が並列に繰返され、パケ
ット制御装置１０２は、全ての処理済みデータ・パケッ
トを受け取った時点でプログラム消去パケット５１３を
流し、当該プログラムで使用していたＰＥを他のタスク
のために開放する。(9) Under the control of the packet controller 102, until there are no more CU data packets (S32)
The above processes (1) to (8) are repeated in parallel, and when the packet control device 102 receives all the processed data packets, it sends the program erasure packet 513 and the PE used in the program concerned. Frees up for other tasks.

【００４７】[0047]

【発明の効果】本発明によれば、パイプライン型リング
バスに複数のＰＥを接続した構造を一つの層とし、ＰＥ
を介して層間接続を行なうことにより、パケット制御装
置を介することなく一つのパイプライン型リングバスか
ら他のパイプライン型リングバスにパケットが流れるた
め、パケット制御装置の負担が軽減され、その結果、パ
ケット制御装置はより多くのデータ・パケットを高速パ
ケット処理装置に流すことができ、多くのタスクが効率
よく並列に多重処理できる。According to the present invention, a structure in which a plurality of PEs are connected to a pipeline type ring bus is formed as one layer,
By performing inter-layer connection via, packets flow from one pipeline-type ring bus to another without using a packet control device, so that the load on the packet control device is reduced, and as a result, The packet controller can pass more data packets to the high-speed packet processor, and many tasks can be efficiently multiplexed in parallel.

【００４８】さらに、データ・パケットはＰＥの上層の
パイプライン型リングバスから入り、ＰＥ内演算器で処
理され、ＰＥの下層のパイプライン型リングバスへと流
れていくため、ＰＥ内の結果パケットがパイプライン型
リングバスに出力できずにデットロックするおそれも軽
減される。Further, the data packet enters from the pipeline ring bus in the upper layer of the PE, is processed by the arithmetic unit in the PE, and flows to the pipeline ring bus in the lower layer of the PE. Is not output to the pipeline type ring bus and the risk of deadlock is reduced.

【００４９】層間のスルーパスを設けることにより、層
違いのパケットを迅速に目的の層へ転送することができ
る。勿論、各層のＰＥとＣＵが接続することで、連続し
た層のＰＥを割り当てられない場合でも、ＣＵを介して
目的の層にデータを流し込むことができ、パケット制御
装置（およびＰＥ）の利用効率を高めることが容易であ
る。By providing a through path between layers, a packet of a different layer can be quickly transferred to a target layer. Of course, by connecting the PE and the CU of each layer, even when the PE of the continuous layer cannot be allocated, the data can flow into the target layer via the CU, and the utilization efficiency of the packet control device (and the PE) can be improved. Is easy to increase.

【００５０】また、パイプライン型リングバスの一部を
構成するレジスタをＰＥに内蔵させることにより、パイ
プライン型リングバスと出力待ちパケットキューとの
間、および、パイプライン型リングバスと処理待ちパケ
ットキューとの間、の物理的距離が短くなるために、パ
イプライン型リングバスの高速化を図ることができる。Further, by incorporating a register constituting a part of the pipeline type ring bus into the PE, the register between the pipeline type ring bus and the output waiting packet queue, and between the pipeline type ring bus and the processing waiting packet can be obtained. Since the physical distance from the queue is short, the speed of the pipeline type ring bus can be increased.

[Brief description of the drawings]

【図１】本発明の一構成例を示すブロック図。FIG. 1 is a block diagram showing a configuration example of the present invention.

【図２】本発明による３行４列のトーラス型高速パケッ
ト処理装置の一実施例の構成を示すブロック図。FIG. 2 is a block diagram showing a configuration of an embodiment of a torus-type high-speed packet processing apparatus having three rows and four columns according to the present invention.

【図３】図２の単位処理要素（ＰＥ）の一構成例を示す
ブロック図。FIG. 3 is a block diagram showing a configuration example of a unit processing element (PE) in FIG. 2;

【図４】実施例の装置に使用されるＰＥプログラム・パ
ケット形式の一例の説明図。FIG. 4 is an explanatory diagram illustrating an example of a PE program packet format used in the apparatus according to the embodiment;

【図５】実施例の装置に使用されるＰＥデータ・パケッ
ト形式の一例の説明図。FIG. 5 is an explanatory diagram illustrating an example of a PE data packet format used in the apparatus according to the embodiment;

【図６】実施例のＰＥ動作を説明するためのプログラム
例の説明図。FIG. 6 is an explanatory diagram of a program example for explaining a PE operation according to the embodiment;

【図７】実施例におけるプログラム・マッピング例の説
明図。FIG. 7 is an explanatory diagram of an example of program mapping in the embodiment.

【図８】従来の高速パケット処理装置の構成のブロック
図。FIG. 8 is a block diagram of a configuration of a conventional high-speed packet processing device.

【図９】実施例におけるプログラム・マッピングのフロ
ーチャート。FIG. 9 is a flowchart of program mapping in the embodiment.

【図１０】実施例における演算処理のフローチャート。FIG. 10 is a flowchart of a calculation process in the embodiment.

[Explanation of symbols]

１０１…ホスト計算機、１０２…パケット制御装置、１
０３…高速パケット処理装置、１０４…記憶装置、２１
１，２１２，２１３，２１４，２２１，２２２，２２
３，２２４，２３１，２３２，２３３，２３４…単位処
理要素、３００…パイプライン型リングバス入力端子、
３０１，３０２，３０３…ラッチ、３０４…パイプライ
ン型リングバス出力端子、３１０…空パケット判定回
路、３１１，３２１，３２２…セレクタ、３２０…処理
パケット判定回路、３２３…スルーパス、３３０…空パ
ケット生成器、３４０…処理待ちパケットキュー、３４
１…演算入力ラッチＡ、３４２…演算入力ラッチＢ、３
５０…機能決定情報レジスタ、３６０…演算・処理部、
３７０…データ・パケット生成回路、３７１…ＰＥデー
タ・パケット出力端子、３７２…パケット送出要求、３
８０…出力待ちパケットキュー、３８１…ＰＥプログラ
ム／データ・パケット入力端子、４００，４２０…タス
ク番号、４０１，４２１…パケット番号、４０２，４２
２…処理先ＰＥアドレス、４０３…演算コード、４０４
…発火条件、４０５…出力データ型、４０６…出力個
数、４０７，４０８，４０９，４１０…出力先ＰＥアド
レスおよび演算器入力ポート指定、４２３…コンディシ
ョン・コード、４２４…ＲＴデータ型、４２５…データ
番号、４２６…演算データおよび結果データ、５０１，
５０２，５０３，５０４，、５０５…ＰＥプログラム・
パケット、５１０，５１１，５１２…ＰＥデータ・パケ
ット、５１３…プログラム消去パケット、７１０…制御
部、７２０，７３０，７４０…処理要素、７１１，７１
２，７１３，７１４…シフトレジスタ。101: Host computer, 102: Packet controller, 1
03: High-speed packet processing device, 104: Storage device, 21
1,212,213,214,221,222,22
3, 224, 231, 232, 233, 234: unit processing element, 300: pipeline type ring bus input terminal
Reference numerals 301, 302, 303: latch, 304: pipeline type ring bus output terminal, 310: empty packet determination circuit, 311, 321, 322 ... selector, 320: processing packet determination circuit, 323: through path, 330: empty packet generator 340... Queue for processing, 34
1 ... operation input latch A, 342 ... operation input latch B, 3
50: Function determination information register, 360: Operation / processing unit
370: data packet generation circuit; 371: PE data packet output terminal; 372: packet transmission request;
80: output waiting packet queue, 381: PE program / data packet input terminal, 400, 420: task number, 401, 421: packet number, 402, 42
2: Processing destination PE address, 403: Operation code, 404
... ignition condition, 405 ... output data type, 406 ... output number, 407,408,409,410 ... output destination PE address and operation unit input port designation 423 ... condition code 424 ... RT data type 425 ... data number , 426 ... operation data and result data, 501
502, 503, 504, 505 ... PE program
Packets, 510, 511, 512: PE data packet, 513: program erase packet, 710: control unit, 720, 730, 740: processing element, 711, 71
2,713,714... Shift register.

フロントページの続き (56)参考文献並列処理シンポジウムＪＳＰＰ’89論文集Ｆｅｂ．２−４ 1989 ｐ321− 328 吉岡良雄「ＬｏｏｐＳｔｒｕｃｔｕｒｅｄＣｏｍｐｕｔｅｒの特性解析」電子情報通信学会論文誌Ｄ−▲Ｉ▼ ＶＯＬ．Ｊ72 ＮＯ．３ＭＡＲＣＨ 1989 ｐ149−156 吉岡良雄「ＬｏｏｐＳｔｒｕｃｔｕｒｅｄＣｏｍｐｕｔｅｒのトラヒック特性」 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 15/16 610 G06F 13/36 530 G06F 15/82 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References Parallel Processing Symposium JSPP'89 Papers Feb. 2-4 1989 pp. 321-328 Yoshio Yoshioka "Characteristic analysis of Loop Structured Computer" Transactions of the Institute of Electronics, Information and Communication Engineers D- ▲ I VOL. J72 NO. 3 MARCH 1989, p.149-156 Yoshio Yoshioka, "Traffic Characteristics of Loop Structured Computer" (58) Fields investigated (Int. Cl. ⁷ , DB name) G06F 15/16 610 G06F 13/36 530 G06F 15/82 JICST file ( JOIS)

Claims

(57) [Claims]

1. A single-layer pipeline-type ring bus in which a plurality of shift registers are connected in a ring shape is provided in a plurality of layers, and processing means for receiving and processing packets flowing on the pipeline-type ring bus is provided in each layer. A shift register; and a packet control unit for transmitting and receiving a packet to and from at least one processing unit of each layer, wherein the processing unit outputs an output of a processing unit of a layer above the layer to which the unit belongs. A parallel processing device characterized by adding a function of taking in a pipeline type ring bus.

2. The method according to claim 1, wherein the packet is a program packet including function information designating a function to be assigned to each processing means.
2. The parallel processing device according to claim 1, comprising a data packet including data processed based on a function specified by the program packet.

3. Each of the processing means determines whether or not a packet flowing on the pipeline type ring bus is a packet addressed to its own processing means, and determines whether or not to take in the packet according to the output of the means. 3. The parallel processing device according to claim 1, further comprising means for switching whether or not the parallel processing is performed.

4. When each processing means takes in a packet from the pipeline type ring bus, it places a packet from the processing means of an upper layer or an empty packet on the pipeline type ring bus instead of the packet. 4. The parallel processing apparatus according to claim 3, further comprising: means.

5. The determining means has a function of detecting a packet assigned to a processing means of another layer, and further provides a path through which the packet is passed through to a pipeline ring bus of a lower layer at the time of the detection. 4. The parallel processing device according to claim 3, wherein:

6. A processing means for determining whether or not a packet flowing on the pipeline type ring bus is an empty packet, and in response to an output of the means, a signal from a processing means of an upper layer. parallel packet processing apparatus according to claim 1 or 4 further characterized in that a means for switching whether placed on the pipelined ring bus a.

7. The packet control means separates a first packet designating a first operation and a second packet designating a second operation performed using a result of the first operation. 2. The parallel processing according to claim 1, further comprising a mapping management function for allocating the second packet to a processing unit in a layer lower than a layer of the processing unit allocating the first packet when allocating the second packet. apparatus.

8. An m × n processing means arranged in m rows and n columns, and an n number of shift registers connected to said processing means on a one-to-one basis in each row in a ring shape. Transfer means for transferring the result of the processing means of the i-th row to the pipeline-type ring bus connected to the processing means of the (i + 1) -th row; A parallel processing device comprising: packet control means having a function of placing a packet to be processed and a function of receiving a processed packet from a pipeline type ring bus of each row.

9. The parallel processing according to claim 8, wherein said transfer means transfers the result of the processing means in the m-th row to a pipeline type ring bus connected to the processing means in the first row. apparatus.

10. The parallel processing apparatus according to claim 8, wherein the transfer by said transfer means is performed between processing means in the same row.

11. A processing system comprising: a shift register forming a part of the pipeline type ring bus; a first packet queue storing packets fetched from the shift register; 9. The parallel processing device according to claim 1, further comprising a second packet queue for storing a packet to be captured in the shift register.