JP4398965B2

JP4398965B2 - Data setting device in SIMD processor

Info

Publication number: JP4398965B2
Application number: JP2006258862A
Authority: JP
Inventors: 貴雄片山; 慎一山浦; 正展福島; 和彦原; 圭治中村; 和彦岩永; 浩資高藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2006-09-25
Filing date: 2006-09-25
Publication date: 2010-01-13
Anticipated expiration: 2020-09-28
Also published as: JP2006338696A

Description

本発明は、画像データ等を高速処理するために同一の命令で複数データに対して同じ処理を行うＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａ−ｓｔｒｅａｍ）型マイクロプロセッサに関する。 The present invention relates to a single instruction-stream multiple data-stream (SIMD) type microprocessor that performs the same processing on a plurality of data with the same instruction in order to process image data and the like at high speed.

近年、デジタル複写機やファクシミリ装置等の画像処理においては、画素数の増加、画像処理の多様化などにより画質の向上が図られている。このような画像処理では、複数（多数）のデータに対して同時に同じ処理を施すことが多い。その際、高速性を高めるため、１命令で１つのデータを処理するＳＩＳＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＳｉｎｇｌｅＤａｔａ−ｓｔｒｅａｍ）型マイクロプロセッサよりも、１命令で複数のデータを同時処理する、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎ−ｓｔｒｅａｍＭｕｌｔｉｐｌｅＤａｔａ−ｓｔｒｅａｍ）型マイクロプロセッサが用いられることが多い。 In recent years, in image processing such as digital copying machines and facsimile machines, image quality has been improved by increasing the number of pixels and diversifying image processing. In such image processing, the same processing is often performed simultaneously on a plurality (many) of data. At this time, in order to improve the high speed, a single instruction-single instruction-simulation (single instruction-stream) single-stream data (SIMD) type microprocessor that processes one piece of data at a time, rather than a single instruction-stream single data-stream (SISD) type microprocessor. -Stream (Multiple Data-stream) type microprocessor is often used.

図１は、一般的なＳＩＭＤ型マイクロプロセッサ２の概略の構成を示すブロック図である。該ＳＩＭＤ型マイクロプロセッサ２は、概略、グローバルプロセッサ（以下では、ＧＰと言う。）４、及びプロセッサエレメント３により構成されるのであるが、複数のデータを一度に処理するためにプロセッサエレメント３を複数個装備している。各プロセッサエレメント３は、レジスタファイル６と演算アレイ８を備える。ＧＰ４は、プロセッサ２全体の制御を行ない、プロセッサエレメント３は、外部入出力装置からデータを入力しデータ処理を行ない、外部入出力装置に出力する。 FIG. 1 is a block diagram showing a schematic configuration of a general SIMD type microprocessor 2. The SIMD type microprocessor 2 is roughly composed of a global processor (hereinafter referred to as GP) 4 and a processor element 3, and a plurality of processor elements 3 are used to process a plurality of data at a time. Equipped with pieces. Each processor element 3 includes a register file 6 and an operation array 8. The GP 4 controls the entire processor 2, and the processor element 3 inputs data from the external input / output device, performs data processing, and outputs the data to the external input / output device.

上記のＳＩＭＤ型マイクロプロセッサ２は、通常、１クロックサイクルで１命令を処理するが、１命令でプロセッサエレメント３の個数分のデータを一度に処理することができる。ＳＩＭＤ型マイクロプロセッサ２の性能を表す際には、ＳＩＭＤ型マイクロプロセッサ２の動作周波数や、プロセッサエレメント３の個数、即ち１命令で処理できるデータの数などが重要視されるが、更に、命令サイクル数も重要な要素とされる。つまり、同じ画像処理を行う限り１命令サイクルでも少ないほうが性能がよいとされるのである。しかし、１命令で複雑な処理を行うために、複雑な回路を設計・利用するならば、どうしてもコストが増大する。 The SIMD type microprocessor 2 normally processes one instruction in one clock cycle, but can process data for the number of processor elements 3 at one time with one instruction. When expressing the performance of the SIMD type microprocessor 2, the operating frequency of the SIMD type microprocessor 2 and the number of processor elements 3, that is, the number of data that can be processed by one instruction, are emphasized. Number is also an important factor. In other words, as long as the same image processing is performed, the performance is better when the number of instruction cycles is small. However, in order to perform complicated processing with one instruction, if a complicated circuit is designed and used, the cost inevitably increases.

本発明は、有効な命令と命令を実現する簡素な手段を設けることにより、上記のような画像データ処理に伴う命令の命令実行サイクル数を減らすことを目的とする。 It is an object of the present invention to reduce the number of instruction execution cycles of instructions associated with image data processing as described above by providing effective instructions and simple means for realizing the instructions.

本発明は、上記の目的を達成するためになされたものである。本発明に係る請求項１に記載のＳＩＭＤ型マイクロプロセッサは、
全体制御を行うグローバルプロセッサと、複数のプロセッサエレメントを含むＳＩＭＤ型マイクロプロセッサであって、
各々のプロセッサエレメントには、識別のための整数番号が順に付されており、
各プロセッサエレメントは、
そのプロセッサエレメントに付される識別のための整数番号が外部から入力される接続線と、
上記接続線に対して、上記グローバルプロセッサからのマスク制御信号を取り入れる論理回路が挿入されるマスク回路と
を有するＳＩＭＤ型マイクロプロセッサである。
The present invention has been made to achieve the above object. According to the first aspect of the present invention, there is provided a SIMD type microprocessor.
A global processor for overall control and a SIMD type microprocessor including a plurality of processor elements ,
Each processor element is given an integer number for identification, in order,
Each processor element
A connection line to which an integer number for identification attached to the processor element is input from the outside;
A mask circuit in which a logic circuit for receiving a mask control signal from the global processor is inserted into the connection line;
A SIMD type microprocessor having

本発明を利用することにより、特に、画像データ処理に伴う命令の命令実行サイクル数を減らすことができる。そのために増設の必要な回路は、簡素なものであるに過ぎない。 By utilizing the present invention, it is possible to reduce the number of instruction execution cycles for instructions associated with image data processing. Therefore, the circuit that needs to be added is only a simple one.

以下、図面を参照して本発明に係る好適な実施の形態を説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments according to the present invention will be described below with reference to the drawings.

図１は、本発明を含む一般的なＳＩＭＤ型マイクロプロセッサ２の概略の構成を示すブロック図である。主としてプロセッサ２全体を制御するグローバルプロセッサ（以下、ＧＰと言う。）４と、主として外部入出力装置からデータを入力しデータ処理を行い、外部入出力装置にデータを出力するプロセッサエレメント３とから、構成される。プロセッサエレメント３は、複数データを同時に処理するために複数用意されている。図１では、１個のＧＰ４と、２５６個のプロセッサエレメント３とにより、ＳＩＭＤ型マイクロプロセッサ２が構成されている。 FIG. 1 is a block diagram showing a schematic configuration of a general SIMD type microprocessor 2 including the present invention. From a global processor (hereinafter referred to as GP) 4 that mainly controls the entire processor 2, and a processor element 3 that mainly inputs data from an external input / output device, performs data processing, and outputs data to the external input / output device. Composed. A plurality of processor elements 3 are prepared for simultaneously processing a plurality of data. In FIG. 1, a SIMD type microprocessor 2 is configured by one GP 4 and 256 processor elements 3.

図２は、本発明に係るＳＩＭＤ型マイクロプロセッサ２のより詳細な構成を示すブロック図である。図に示されるようにＧＰ４は、
・命令コードで構成されるプログラムを格納するためのプログラムＲＡＭ１０と、
・ＧＰ４での演算データを格納するデータＲＡＭ１２と、
・プログラムを解読し各種ブロックに各種制御信号を送るシーケンシャルユニット（ＳＣＵ）９と、
・データを格納する複数の汎用レジスタ（Ｇ０〜Ｇ３）と、
・ＳＣＵ９にプログラムの命令コードを送るためにプログラムのアドレスを保持するプログラムカウンタ（ＰＣ）１４と、
・データメモリにスタックを形成するためデータメモリのアドレスを格納するスタックポインタ（ＳＰ）２４と、
・プログラムの途中でサブルーチン処理を行う際には分岐が発生するが分岐前のアドレスを格納する複数のリンクレジスタ（ＬＳ、ＬＩ、ＬＮ）と、
・データメモリのデータ、命令コード中に記述された数値（即値）データ、若しくは汎用レジスタに格納されているデータのいずれかの組み合わせで算術論理演算を行う算術論理演算装置（ＡＬＵ）１１と、
・プロセッサの状態を保持するプロセッサステータスレジスタ（図示せず。）と、
・ハードウェア割り込みとソフトウェア割り込みを制御する割り込み制御回路（図示せず。）と、
・外部入出力に直接接続され外部からのデータの入出力を制御する外部Ｉ／Ｏ制御回路（図示せず。）と
を含む。 FIG. 2 is a block diagram showing a more detailed configuration of the SIMD type microprocessor 2 according to the present invention. As shown in the figure, GP4 is
A program RAM 10 for storing a program composed of instruction codes;
A data RAM 12 for storing calculation data in GP4;
A sequential unit (SCU) 9 that decodes the program and sends various control signals to various blocks;
A plurality of general purpose registers (G0 to G3) for storing data;
A program counter (PC) 14 that holds the address of the program to send the instruction code of the program to the SCU 9;
A stack pointer (SP) 24 for storing the address of the data memory in order to form a stack in the data memory;
-When performing subroutine processing in the middle of a program, a branch occurs, but a plurality of link registers (LS, LI, LN) for storing addresses before branching;
An arithmetic and logic unit (ALU) 11 that performs an arithmetic and logical operation on any combination of data in data memory, numeric (immediate) data described in an instruction code, or data stored in a general-purpose register;
A processor status register (not shown) that holds the state of the processor;
An interrupt control circuit (not shown) for controlling hardware interrupts and software interrupts;
An external I / O control circuit (not shown) that is directly connected to the external input / output and controls the input / output of data from the outside.

図２では示していないが、上記ＳＣＵ９は、ＧＰ命令を解読し主にＧＰ内の各ブロックに制御信号を発生するＧＰインストラクションデコーダと、プロセッサエレメント命令を解読し主にプロセッサエレメント内の各ブロックに制御信号を発生するプロセッサエレメントインストラクションデコーダとで、構成される。即ち、本プロセッサに係る命令コードは、主にＧＰ４内の各ブロックを制御し、プログラムのシーケンスを決定したり、プロセッサエレメント３に転送する共通データをＧＰ４内のＡＬＵ１１で加工したりするＧＰ命令と、外部入出力から一度に入力されたデータをプロセッサエレメント３毎に処理をさせるプロセッサエレメント命令とに、分類される。 Although not shown in FIG. 2, the SCU 9 decodes the GP instruction and generates a control signal mainly in each block in the GP, and decodes the processor element instruction and mainly in each block in the processor element. And a processor element instruction decoder for generating a control signal. That is, the instruction code related to this processor is a GP instruction that mainly controls each block in GP4, determines a program sequence, and processes common data to be transferred to processor element 3 by ALU11 in GP4. These are classified into processor element instructions for processing data input at a time from the external input / output for each processor element 3.

図１に示すように、各プロセッサエレメント３は、外部からの入出力データを一時的に保持するレジスタファイル６と、プロセッサエレメント３内で算術論理演算やビット演算のデータ処理を行うための演算アレイ８を含む。さらに図２に示すようにレジスタファイル６には、例えば、Ｒ０〜Ｒ３１までの８ビットのレジスタ３４が３２本用意されている。これらのレジスタ３４からデータが演算アレイ８に転送され、又逆に演算アレイ８からデータが転送されてレジスタ３４に格納される。レジスタ３４と演算アレイ６とのバスは、８ビットの双方向バスである。 As shown in FIG. 1, each processor element 3 includes a register file 6 that temporarily holds input / output data from the outside, and an operation array for performing data processing of arithmetic logic operations and bit operations in the processor element 3. 8 is included. Further, as shown in FIG. 2, the register file 6 includes 32 8-bit registers 34 from R0 to R31, for example. Data is transferred from these registers 34 to the operation array 8, and conversely, data is transferred from the operation array 8 and stored in the register 34. The bus between the register 34 and the arithmetic array 6 is an 8-bit bidirectional bus.

更に図２に示すように、単体の演算アレイ８は演算ユニットであり、
・レジスタファイル６からのデータをシフトして符号付き拡張もしくは符号無し拡張をし１６ビットデータに加工するシフト・拡張器４４と、
・例えば、Ａレジスタ３６とＦレジスタ４０のような複数の汎用レジスタと、
・レジスタファイル６からのデータをシフト・拡張器４４を経由して加工し１入力とし、他方の入力をＡレジスタ３６からの入力とする算術論理演算装置（ＡＬＵ）３６と、
・（後で説明する）本発明に係るＰＥ番号マスク回路、固定値選択回路、及びｎおきビットパターンデータ出力回路の夫々からの出力を入力とし、自らの出力をＡレジスタ３８やＴレジスタ５４に繋げる選択回路３５と
を含む。算術論理演算装置（ＡＬＵ）３６の出力は、Ａレジスタ３６もしくはＦレジスタ４０に一時格納されように設定されているが、Ａレジスタ３６からレジスタファイル６の所定の１レジスタ３４にデータ転送されることも可能である。 Further, as shown in FIG. 2, the single arithmetic array 8 is an arithmetic unit,
A shift / extension unit 44 that shifts data from the register file 6 to perform signed extension or unsigned extension to process the data into 16-bit data;
For example, a plurality of general purpose registers such as A register 36 and F register 40,
An arithmetic logic unit (ALU) 36 that processes the data from the register file 6 via the shift / extension unit 44 to make one input and the other input from the A register 36;
The outputs from the PE number mask circuit, fixed value selection circuit, and n-th bit pattern data output circuit according to the present invention are input to the A register 38 and the T register 54 (described later). And a selection circuit 35 to be connected. The output of the arithmetic logic unit (ALU) 36 is set so as to be temporarily stored in the A register 36 or the F register 40, but data is transferred from the A register 36 to a predetermined one register 34 of the register file 6. Is also possible.

また、演算アレイ８は、後でも説明するように、「Ｔレジスタ」と呼ばれる演算制御レジスタ５４を備える。ＡＬＵ３６からの出力は、該Ｔレジスタ５４によって、Ａレジスタ３６もしくはＦレジスタ４０への書き込み内容が制御される。例えば、演算制御レジスタ（Ｔレジスタ）５４の中の所定の１ビットの状態に応じて、“１”あればＡレジスタ３６もしくはＦレジスタ５４への書き込みを行い、“０”であれば行わないというような制御が行なわれる。 The arithmetic array 8 includes an arithmetic control register 54 called “T register” as will be described later. The output from the ALU 36 is controlled by the T register 54 to be written to the A register 36 or the F register 40. For example, according to the state of a predetermined 1 bit in the arithmetic control register (T register) 54, if “1”, writing to the A register 36 or the F register 54 is performed, and if “0”, it is not performed. Such control is performed.

図３は、レジスタファイル６のレジスタ３４と演算アレイ８とを結び付けるマルチプレクサの機能を示すブロック図である。ＰＥｉ（ｉ＝０，１，２，・・・２５５）のプロセッサエレメントに備わるマルチプレクサは７ｔｏ１（７対１）のマルチプレクサであり、ＰＥｉ−３（ＰＥｉの３つ左隣り）、ＰＥｉ−２（ＰＥｉの２つ左隣り）、ＰＥｉ−１（ＰＥｉの１つ左隣り）、ＰＥｉ、ＰＥｉ＋１（ＰＥｉの１つ右隣り）、ＰＥｉ＋２（ＰＥｉの２つ右隣り）、ＰＥｉ＋３（ＰＥｉの３つ右隣り）のプロセッサエレメント３のレジスタファイル６からのデータを入出力することができるように設定されている。この機能を、ＰＥシフト機能と称する。マルチプレクサによって選択されたデータは、演算アレイ８のシフト・拡張部４４に転送される。 FIG. 3 is a block diagram showing the function of the multiplexer that connects the register 34 of the register file 6 and the operation array 8. The multiplexers provided in the processor elements of PEi (i = 0, 1, 2,... 255) are 7 to 1 (7 to 1) multiplexers, PEi-3 (3 left neighbors of PEi), PEi-2 (PEi 2), PEi-1 (next to the left of PEi), PEi, PEi + 1 (next to the right of PEi), PEi + 2 (2 to the right of PEi), PEi + 3 (3 to the right of PEi) It is set so that data from the register file 6 of the processor element 3 can be input / output. This function is referred to as a PE shift function. The data selected by the multiplexer is transferred to the shift / extension unit 44 of the arithmetic array 8.

ここで、プロセッサエレメント３の番号を含む呼称について定義する。図２に示すように、本発明に係るＳＩＭＤ型マイクロプロセッサ２には２５６個のプロセッサエレメント３が設置されており、それらプロセッサエレメント３の個々に対し、（図では左側から）ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、・・・、ＰＥ２５４、ＰＥ２５５というように、プロセッサエレメント番号（ＰＥ番号）を付すと定義する。 Here, a name including the number of the processor element 3 is defined. As shown in FIG. 2, the SIMD type microprocessor 2 according to the present invention is provided with 256 processor elements 3, and each of the processor elements 3 (from the left side in the figure) PE0, PE1, PE2 , PE3,..., PE254, PE255, and so on, are defined as processor element numbers (PE numbers).

≪第１の実施の形態≫
図４は、本発明の第１の実施の形態に係るＰＥ（プロセッサエレメント）番号マスク回路の回路図である。上記ＰＥ番号マスク回路は、（従来技術である）ＰＥ番号を所定の汎用レジスタに入力する接続線に対し、ＧＰ４からのマスク制御信号（即ち、ＰＥ番号マスク信号）を取り入れる論理回路を挿入することにより形成される。後でも説明するように、この図４の回路構成では、“０”によりマスクすることになる。 << First Embodiment >>
FIG. 4 is a circuit diagram of a PE (processor element) number mask circuit according to the first embodiment of the present invention. In the PE number mask circuit, a logic circuit that takes in a mask control signal (ie, PE number mask signal) from GP4 is inserted into a connection line for inputting a PE number (which is a prior art) to a predetermined general-purpose register. It is formed by. As will be described later, in the circuit configuration of FIG. 4, masking is performed with “0”.

各ＰＥｉ（ｉ＝０，１，２，・・・２５５）においては、各ＰＥ番号を形成する接続線部５０とＰＥ番号マスク信号を取り込む論理回路（ＡＮＤ回路）部５１とが結合して設置されている。そこからの出力は、後で説明するように、各プロセッサエレメント３毎に設置されている選択回路３５（図２参照）に繋がる。その選択回路３５を介して例えばＡレジスタ３８にて上記出力が格納される。 In each PEi (i = 0, 1, 2,... 255), a connection line unit 50 that forms each PE number and a logic circuit (AND circuit) unit 51 that captures a PE number mask signal are combined and installed. Has been. The output from there is connected to a selection circuit 35 (see FIG. 2) installed for each processor element 3, as will be described later. The output is stored in the A register 38 through the selection circuit 35, for example.

（従来技術である）各ＰＥ番号を形成する接続線部５０を利用して、Ａレジスタ３８にＰＥ番号を設定することは従来も可能であった。例えば、「ＬＤＰＮ」（ＬｏａｄＰＥＮｕｍｂｅｒ）という命令を実行することにより、上記機能を実現していたとする。ここで、本発明の第１の実施の形態に係る回路によっても、全く同じ機能を実現すること、即ち各ＰＥ番号を各選択回路３５に送ることを「ＬＤＰＮ」の命令の利用によって実現するように設定することができる。なお、図４からも明白なように、ＬＤＰＮ命令を用いるときのＰＥ番号マスク信号は（８ビット）すべて“１”となる。このような設定により、ＶＣＣとＧＮＤとで形成されるパターンでＰＥ番号を表すデータが、各プロセッサエレメント３の選択回路３５に入力されることになる。 It has been possible in the past to set the PE number in the A register 38 by using the connecting line portion 50 that forms each PE number (which is a conventional technique). For example, it is assumed that the above function is realized by executing an instruction “LDPN” (Load PE Number). Here, even with the circuit according to the first embodiment of the present invention, the same function is realized, that is, the transmission of each PE number to each selection circuit 35 is realized by using the instruction “LDPN”. Can be set to As is apparent from FIG. 4, the PE number mask signal when the LDPN instruction is used (8 bits) is all “1”. With this setting, data representing the PE number in a pattern formed by VCC and GND is input to the selection circuit 35 of each processor element 3.

具体的に示すと、選択回路３５に入力されるデータは、
・ＰＥ０、ＰＥ１、ＰＥ２、・・・ＰＥ２５５の順に、
・０、１、２、・・・２５５
の値となる。 Specifically, the data input to the selection circuit 35 is
-PE0, PE1, PE2, ... PE255 in this order
・ 0, 1, 2, ... 255
It becomes the value of.

更に、本発明の第１の実施の形態に係る回路においては、ＳＩＭＤ型マイクロプロセッサ２への命令にて指定するビットを“０”に設定することにより、入力データをマスクすることができる。そのような処理を行なう命令として、「ＬＤＮＭ」（ＬｏａｄＰＥＮｕｍｂｅｒＭａｓｋｉｎｇ）命令が用意されている。ＬＤＮＭ命令は、マスクパターンを即値で指定することで出力データをマスクする。その即値はＰＥマスク信号として図４の回路に取り込まれる。ＬＤＮＭ命令は、次のように、記述される。 Furthermore, in the circuit according to the first embodiment of the present invention, the input data can be masked by setting the bit designated by the instruction to the SIMD type microprocessor 2 to “0”. As an instruction for performing such processing, an “LDNM” (Load PE Number Masking) instruction is prepared. The LDNM instruction masks output data by designating a mask pattern with an immediate value. The immediate value is taken into the circuit of FIG. 4 as a PE mask signal. The LDNM instruction is described as follows.

ＬＤＮＭ／０＃００００００１１ｂ LDNM / 0 # 00000011b

上記の記述で「００００００１１ｂ」の末尾のｂは、２進数表記であることを示す。上記の命令では、ＰＥ番号のビット３からビット７までの出力値に対しマスクが施されることになる。また、“／０”は、“０”によるマスクを表すオプション記述である。 In the above description, “b” at the end of “00000011b” indicates binary notation. In the above instruction, the output values from bit 3 to bit 7 of the PE number are masked. “/ 0” is an optional description representing a mask by “0”.

よって、各選択回路３５に入力されるデータは、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・ＰＥ２５４、ＰＥ２５５の順に、
・０、１、２、３、０、１、２、・・・２、３
の値となる。 Therefore, the data input to each selection circuit 35 is
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, ... PE254, PE255 in this order,
・ 0, 1, 2, 3, 0, 1, 2, ... 2, 3
It becomes the value of.

図４の第１の実施の形態にかかる回路では、ＡＮＤ回路が利用されているが、これをＯＲ回路に置き換えることも想定され得る。即ち、各ＰＥｉ（ｉ＝０，１，２，・・・２５５）において、各ＰＥ番号を形成する接続線部５０とＰＥ番号マスク信号を取り込む論理回路（ＯＲ回路）部（図示せず。）とが結合して設置されることになる。このとき、上記（図４）の回路構成では“０”によりマスクをしていたが、ＯＲ回路に置き換える回路の場合、“１”によるマスクとなる。命令は以下のように記述される。 In the circuit according to the first embodiment of FIG. 4, an AND circuit is used, but it may be assumed that this is replaced with an OR circuit. That is, in each PEi (i = 0, 1, 2,... 255), a connection line portion 50 that forms each PE number and a logic circuit (OR circuit) portion that captures a PE number mask signal (not shown). Will be installed in combination. At this time, in the circuit configuration described above (FIG. 4), masking is performed with “0”. However, in the case of a circuit that is replaced with an OR circuit, masking with “1” is performed. The instructions are written as follows:

ＬＤＮＭ／１＃１１１１１１００ｂ LDNM / 1 # 11111100b

上記の記述では、ＰＥ番号のビット３からビット７までの入力値が、“１”でマスクされることになる。／１は“１”によるマスクを表すオプション記述である。 In the above description, the input values from bit 3 to bit 7 of the PE number are masked with “1”. / 1 is an optional description representing a mask by “1”.

このとき、各選択回路３５に入力されるデータは、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・ＰＥ２５４、ＰＥ２５５の順に、
・２５２（１１１１１１００ｂ）、２５３（１１１１１１０１ｂ）、２５４（１１１１１１１０ｂ）、２５５（１１１１１１１１ｂ）、２５２、２５３、２５４、・・・２５４、２５５
の値となる。 At this time, the data input to each selection circuit 35 is:
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, ... PE254, PE255 in this order,
252 (11111100b), 253 (11111101b), 254 (11111110b), 255 (11111111b), 252, 253, 254, ... 254, 255
It becomes the value of.

図４に示される本発明の第１の実施の形態に係る回路を利用することにより、マスク制御信号（ＰＥ番号マスク信号）にて“０”であるビットに対応する制御線がマスクされ、所定のビットのみで表され且つ繰り返しの規則性を備えるデータを、１命令で設定（形成）することができる。例えば、上位６ビットを全てマスクすることにより、下位２ビットのみが有効となり、ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・のプロセッサエレメント３の各選択回路３５に対し、
・０、１、２、３、０、１、２、３、・・・
の規則性のある繰り返しの値を１命令で出力することができる。また、図４の回路において、ＡＮＤ回路を全てＯＲ回路にすることによっても、ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・のプロセッサエレメント３の各選択回路３５に対し、規則性のある繰り返しの値を１命令で出力することができる。但し、このときは“１”によりマスクがなされることになる。 By using the circuit according to the first embodiment of the present invention shown in FIG. 4, a control line corresponding to a bit of “0” is masked by a mask control signal (PE number mask signal), and a predetermined value is masked. It is possible to set (form) data represented by only this bit and having repetitive regularity with one instruction. For example, by masking all the upper 6 bits, only the lower 2 bits are valid, and for each selection circuit 35 of the processor element 3 of PE0, PE1, PE2, PE3, PE4, PE5, PE6,.
・ 0, 1, 2, 3, 0, 1, 2, 3, ...
It is possible to output a repeated value with regularity in one instruction. In the circuit of FIG. 4, even if all the AND circuits are OR circuits, the rules for the selection circuits 35 of the processor elements 3 of PE0, PE1, PE2, PE3, PE4, PE5, PE6,. It is possible to output a repeated value with a single instruction. However, at this time, the mask is made by “1”.

従来技術を利用する場合、ＬＤＰＮ命令でＰＥ番号を一旦Ａレジスタ３８に設定し、更にそのＡレジスタ３８の値を４で割ったときの剰余を同じくＡレジスタ３８に設定すれば、図４と同様の結果が得られる。しかし、
・ＬＤＰＮ命令で１命令サイクル、
・（乗除算器があると仮定して、）除算命令で１命令サイクル、
計２命令サイクルが必要である。然も、乗除算器をＳＩＭＤ型マイクロプロセッサ２の全プロセッサエレメント３内に設定するならば、莫大な回路規模が必要となる。通常のＳＩＭＤ型マイクロプロセッサ２には乗除算器は設定されてないため、除算の実施は、ＡＬＵ３６を利用して被除数のビット数の分までの減算を行なうことで実現される。汎用レジスタであるＡレジスタ３８は、例えば１６ビットであるならば、最短１６回の減算が必要となる（現実には除算前の設定が数命令サイクル必要になる）。ここで１６命令サイクルかかり、結局、全体で最短１８命令サイクルとなる。 When using the prior art, if the PE number is once set in the A register 38 by the LDPN instruction, and the remainder when the value of the A register 38 is divided by 4 is also set in the A register 38, the same as in FIG. Result is obtained. But,
・ One instruction cycle with LDPN instruction
-1 instruction cycle with division instruction (assuming there is a multiplier / divider)
A total of two instruction cycles are required. However, if the multiplier / divider is set in all the processor elements 3 of the SIMD type microprocessor 2, an enormous circuit scale is required. Since a multiplier / divider is not set in the normal SIMD type microprocessor 2, division is performed by subtracting up to the number of bits of the dividend using the ALU 36. If the A register 38, which is a general-purpose register, is 16 bits, for example, subtraction is required at least 16 times (actually, setting before division requires several instruction cycles). Here, it takes 16 instruction cycles, and the total is 18 instruction cycles at the shortest.

以上、明白なように、本発明の第１の実施の形態に係る回路によって、相当数の命令サイクルの削減が行なえる。 As is apparent from the above, a considerable number of instruction cycles can be reduced by the circuit according to the first embodiment of the present invention.

≪第２の実施の形態≫
図５は、本発明の第２の実施の形態に係る固定値選択回路の回路図である。上記固定値選択回路は、ｎ（ｎは自然数）おきの周期変動を備え１周期内では単調に０から１ずつ増加する数値を、（後で説明する）選択回路３５に対し、入力値選択信号による選択制御の下、出力する回路である。図５に示される回路では、ｎの値として、３、５、７が想定されている。 << Second Embodiment >>
FIG. 5 is a circuit diagram of a fixed value selection circuit according to the second embodiment of the present invention. The fixed value selection circuit has a cycle variation every n (n is a natural number), and inputs a numerical value monotonously increasing from 0 to 1 within one cycle to the selection circuit 35 (described later). It is a circuit that outputs under the selection control by. In the circuit shown in FIG. 5, 3, 5, and 7 are assumed as the value of n.

図５に示されるように、各プロセッサエレメント３においては、各別の接続線部５０’とマルチプレクサ部５１’が備わる。接続線部５０’は、９ビットの信号を生成するのであるが、左方２ビットは「３」おきの周期変動を備え、中位３ビットは「５」おきの周期変動を備え、右方４ビットは「７」おきの周期変動を備えるように、接続線が設定されている。図５から明白なように、３おきの周期変動は、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、・・・の順に、
・０、１、２、０、１、２、０、・・・
となる。５おきの周期変動は、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、．．．の順に、
・０、１、２、３、４、０、１、・・・
となる。７おきの周期変動は、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、ＰＥ８、・・・の順に、
・０、１、２、３、４、５、６、０、１、・・・
となる。 As shown in FIG. 5, each processor element 3 includes a separate connection line portion 50 ′ and a multiplexer portion 51 ′. The connecting line section 50 'generates a 9-bit signal, but the left 2 bits have periodic fluctuations every "3", and the middle 3 bits have periodic fluctuations every "5". The connection lines are set so that the 4 bits have periodic fluctuations every "7". As is apparent from FIG. 5, every third periodic variation is
・ PE0, PE1, PE2, PE3, PE4, PE5, PE6, ...
・ 0, 1, 2, 0, 1, 2, 0, ...
It becomes. Periodic fluctuation every 5th
PE0, PE1, PE2, PE3, PE4, PE5, PE6,. . . In the order
・ 0, 1, 2, 3, 4, 0, 1, ...
It becomes. Periodic fluctuation every 7th
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, PE8,.
・ 0, 1, 2, 3, 4, 5, 6, 0, 1, ...
It becomes.

どの周期変動の出力を選択するかは、入力値選択信号により与えられる。該入力値選択信号は、各プロセッサエレメント３におけるマルチプレクサ部５１’に指示を与えるものである。上記の周期変動の抽出に係る命令として、次の命令「ＬＤＲＮ」（ＬｏａｄＲｅｐｅａｔｉｎｇＮｕｍｂｅｒ）が設定されている。 Which period variation output is selected is given by an input value selection signal. The input value selection signal gives an instruction to the multiplexer unit 51 ′ in each processor element 3. The next instruction “LDRN” (Load Repeating Number) is set as an instruction related to the above-described extraction of the period variation.

ＬＤＲＮ＃３ LDRN # 3

上記の命令では、各プロセッサエレメント３の選択回路３５に対し、３おきの周期変動を備え１周期内では単調に０から２まで１ずつ増加する数値が、プロセッサエレメント３の有するＰＥ番号の順に従い、出力されることになる。 In the above-described instruction, the selection circuit 35 of each processor element 3 has a period variation every third, and a numerical value that increases monotonically from 0 to 2 within one period follows the order of PE numbers of the processor element 3. Will be output.

図５に示される本発明の第２の実施の形態に係る固定値選択回路を利用すれば、上記のような、選択回路３５（Ａレジスタ３８）への数値入力を、１命令サイクルで行なうことができる。ここで、上記の例に示される３、５、７という数字は、一般的に、画像処理において頻繁に使用される値であることが知られている。 If the fixed value selection circuit according to the second embodiment of the present invention shown in FIG. 5 is used, the numerical value input to the selection circuit 35 (A register 38) as described above is performed in one instruction cycle. Can do. Here, it is known that the numbers 3, 5, and 7 shown in the above example are values that are frequently used in image processing.

≪第３の実施の形態≫
図６は、本発明の第３の実施の形態に係るｎおきビットパターンデータ出力回路の回路図である。上記ｎおきビットパターンデータ出力回路は、個別のビットにおいて（ＰＥ番号の順に従い）ｎおきに“１”を設定する接続線に対し、ＧＰ４からのビットマスク信号を取り入れる論理回路を挿入することにより形成される。 << Third Embodiment >>
FIG. 6 is a circuit diagram of an n-th bit pattern data output circuit according to the third embodiment of the present invention. The n-bit bit pattern data output circuit inserts a logic circuit that takes in a bit mask signal from GP4 to a connection line that sets “1” every n (in the order of PE numbers) in individual bits. It is formed.

各ＰＥｉ（ｉ＝０，１，２，・・・２５５）においては、夫々のビットにおいてｎおきの“１”を設定する接続線部５０”とビットマスク信号を取り込む論理回路（ＡＮＤ回路）部５１”が結合して設置されている。そこからの出力は、第１の実施の形態に係る回路、及び第２の実施の形態に係る回路と同様に、各プロセッサエレメント３に設置されている選択回路３５（図２参照）に繋がる。その選択回路３５を介して例えばＡレジスタ３８にて上記出力が格納される。 In each PEi (i = 0, 1, 2,... 255), a connection line portion 50 for setting every n to “1” in each bit and a logic circuit (AND circuit) portion for taking in a bit mask signal 51 ″ are installed in combination. The output from there is connected to a selection circuit 35 (see FIG. 2) installed in each processor element 3 in the same manner as the circuit according to the first embodiment and the circuit according to the second embodiment. The output is stored in the A register 38 through the selection circuit 35, for example.

各ＰＥｉの接続線部５０”は、８ビットの個別ビットを備える。例えば、ビット０は「２おき」、ビット１は「３おき」、ビット２は「５おき」、ビット３は「７おき」、ビット４は「１１おき」、ビット５は「１３おき」、ビット６は「１７おき」、及びビット７は「２３おき」の周期特性を備えている。夫々のビットにおいて各周期特性に相当するＰＥ番号であれば、該ビットが“１”となるように、各ＰＥｉの接続線部５０”が設定されている（即ち、該当するビットがＶＣＣに接続されている）（図６）。即ち、
・２おきの周期特性を有するビット０は、ＰＥ０、ＰＥ２、ＰＥ４、ＰＥ６、ＰＥ８、・・・において、
・３おきの周期特性を有するビット１は、ＰＥ０、ＰＥ３、ＰＥ６、ＰＥ９、ＰＥ１２、・・・において、
・５おきの周期特性を有するビット２は、ＰＥ０、ＰＥ５、ＰＥ１０、ＰＥ１５、ＰＥ２０、・・・において
ＶＣＣに接続されている。 Each PEi connection line section 50 "includes 8 individual bits. For example, bit 0 is" every 2 ", bit 1 is" every 3 ", bit 2 is" every 5 ", and bit 3 is" every 7 ". , Bit 4 is “every 11”, bit 5 is “every 13”, bit 6 is “every 17”, and bit 7 is “every 23”. If each bit has a PE number corresponding to each periodic characteristic, the connection line portion 50 ”of each PEi is set so that the bit is“ 1 ”(that is, the corresponding bit is connected to the VCC). (FIG. 6) That is,
Bit 0 having every other periodic characteristic is PE0, PE2, PE4, PE6, PE8,.
-Bit 1 having periodic characteristics of every third is PE0, PE3, PE6, PE9, PE12, ...
Bit 2 having periodic characteristics every 5 is connected to VCC at PE0, PE5, PE10, PE15, PE20,.

上記のように設定される各ＰＥｉの接続線部５０”の設定内容の抽出に係る命令として、次の命令「ＬＤＲＢ」（ＬｏａｄＲｅｐｅａｔｉｎｇＢｉｔ）が用意されている。 The next instruction “LDRB” (Load Repeating Bit) is prepared as an instruction relating to the extraction of the setting contents of the connection line portion 50 ″ of each PEi set as described above.

ＬＤＲＢ LDRB

この命令により、各ＰＥｉの接続線部５０”の設定内容が全て、例えば、選択回路３５（乃至Ａレジスタ３８）に入力される。上記の命令では、
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、ＰＥ８、ＰＥ９、ＰＥ１０、ＰＥ１１、・・・の順に、
・１１１１１１１１ｂ（２進数表記。以下同様である。）、００００００００ｂ、０００００００１ｂ、００００００１０ｂ、０００００００１ｂ、０００００１００ｂ、００００００１１ｂ、００００１０００ｂ、０００００００１ｂ、０００００１１０ｂ、０００００００１ｂ、・・・
となる。 By this instruction, all the setting contents of the connection line portion 50 ″ of each PEi are input to, for example, the selection circuit 35 (or the A register 38).
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, PE8, PE9, PE10, PE11,.
11111111b (binary notation; the same applies hereinafter), 00000000b, 00000001b, 00000010b, 00000001b, 00000100b, 00000011b, 00001000b, 00000001b, 00000110b, 00000001b, ...
It becomes.

但し、上述したように、上記接続線に対し、ＧＰ４からのビットマスク信号を取り入れる論理回路が挿入されているため、命令コードにおいて指定するビットが“０”に設定されることにより、選択回路３５（乃至Ａレジスタ３８）への入力データがマスクされ得る。そのマスク処理を行なうときには、命令は次のように記述される。マスクパターンは、即値で指定されてビットマスク信号として図６の回路に取り込まれ、選択回路３５（乃至Ａレジスタ３８）への入力データをマスクする。 However, as described above, since the logic circuit for taking in the bit mask signal from GP4 is inserted into the connection line, the selection circuit 35 is set by setting the bit specified in the instruction code to “0”. Input data to (or A register 38) may be masked. When performing the mask processing, the instruction is described as follows. The mask pattern is designated as an immediate value and is taken into the circuit of FIG. 6 as a bit mask signal to mask input data to the selection circuit 35 (or A register 38).

ＬＤＲＢ／０＃０００００１００ｂ LDRB / 0 # 00000100b

上記の命令では、ビット２に“１”が設定されるＰＥに係る出力データのみ“０００００１００ｂ”の数値となる。“０００００１００ｂ”が、例えばＡレジスタ３８に設定されるのは、
・ＰＥ０、ＰＥ５、ＰＥ１０、ＰＥ１５、ＰＥ２０、．．．
である。それ以外のＰＥにおいては、“００００００００ｂ”となる。 In the above instruction, only the output data related to the PE in which “1” is set in bit 2 has a numerical value of “00000100b”. For example, “00000100b” is set in the A register 38.
PE0, PE5, PE10, PE15, PE20,. . .
It is. For other PEs, the value is “00000000b”.

第３の実施の形態に係るｎおきビットパターンデータ出力回路では、上記のような選択回路３５（Ａレジスタ３８）への数値入力を、１命令サイクルで行なうことができる。 In the n-bit bit pattern data output circuit according to the third embodiment, the numerical value input to the selection circuit 35 (A register 38) as described above can be performed in one instruction cycle.

≪第４の実施の形態≫
本発明の第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。 << Fourth Embodiment >>
A SIMD type microprocessor 2 according to a fourth embodiment of the present invention will be described.

本発明の第１の実施の形態に係る「ＬＤＰＭ」命令において、即値（ＰＥ番号マスク信号）をすべてマスクしない数値（＃１１１１１１１１ｂ）とした場合、ＰＥ番号がそのまま選択回路３５（乃至Ａレジスタ３８）に入力される。また、前に説明したように、ＧＰ４からのＰＥ番号マスク信号を取り入れる論理回路が挿入されない、ＰＥ番号を所定の汎用レジスタ等に入力する接続線を、そのまま利用することによっても、ＰＥ番号がそのまま選択回路３５（乃至Ａレジスタ３８）に入力される。 In the “LDPM” instruction according to the first embodiment of the present invention, when the immediate value (PE number mask signal) is a numerical value (# 11111111b) that does not mask all, the PE number is directly selected by the selection circuit 35 (or A register 38). Is input. In addition, as described above, the PE number is not changed by using the connection line for inputting the PE number to a predetermined general-purpose register or the like without inserting a logic circuit for taking in the PE number mask signal from GP4. The data is input to the selection circuit 35 (or A register 38).

従って、選択回路３５には、
・ＰＥ０、ＰＥ１、ＰＥ２、・・・ＰＥ２５５の順に、
・０、１、２、．．．２５５
の値が入力される。 Therefore, the selection circuit 35 includes
-PE0, PE1, PE2, ... PE255 in this order
・ 0, 1, 2,. . . 255
The value of is entered.

各プロセッサエレメント３において、上記の選択回路３５に入力された値を、選択回路３５に備わるコンパレータ８０（図７）にて（ＳＩＭＤ型マイクロプロセッサ２への）命令で指定された即値と比較し、その結果をＴレジスタ（演算制御レジスタ）５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、２５６個あるプロセッサエレメント３のうち特定の１つのプロセッサエレメント３においてのみ演算を行ない、その他のプロセッサエレメント３の演算を行なわないという設定をすることができる。 In each processor element 3, the value input to the selection circuit 35 is compared with the immediate value specified by the instruction (to the SIMD type microprocessor 2) in the comparator 80 (FIG. 7) provided in the selection circuit 35. The result is stored in a bit at a predetermined position of the T register (arithmetic control register) 54. By storing the result in the T register 54, the calculation after the next instruction is performed only in one specific processor element 3 among the 256 processor elements 3, and the other processor elements 3 are not operated. You can set it.

図７は、本発明に係る選択回路３５の概略回路ブロック図を示す。図４、図５及び図６の、ＰＥ番号マスク回路、固定値選択回路及びｎおきビットパターンデータ出力回路からの出力データは、図２に示す演算アレイ８内の選択回路３５に繋がる。ＧＰ４からの制御信号である入力データ選択信号２によってマルチプレクサ７８にて選択され、３者のうちいずれか１データがＡレジスタ３８に入力される。また、選択された１データは、命令コード中の即値データＩＭＭ２とコンパレータ８０にて比較され、その結果一致した場合には “１”を出力する。一致しなかった場合“０”を出力する。 FIG. 7 shows a schematic circuit block diagram of the selection circuit 35 according to the present invention. Output data from the PE number mask circuit, the fixed value selection circuit and the n-th bit pattern data output circuit shown in FIGS. 4, 5, and 6 is connected to the selection circuit 35 in the arithmetic array 8 shown in FIG. The input data selection signal 2 which is a control signal from the GP 4 is selected by the multiplexer 78, and one of the three data is input to the A register 38. Also, the selected 1 data is compared with the immediate data IMM2 in the instruction code by the comparator 80, and when the result is coincident, “1” is output. If they do not match, “0” is output.

ここで、本発明に係るＴレジスタ５４について説明する。図２のように、各プロセッサエレメント３には、実行条件指定のための演算制御レジスタ（Ｔレジスタ）５４が装備されている。図８は、Ｔレジスタ５４の回路ブロック図の例である。図８では、Ｔレジスタ５４は８ビットのレジスタ（Ｔ０、Ｔ１、・・・Ｔ７）を備え、夫々の１ビットのレジスタは別々に制御される。そのため、１つのプロセッサエレメント３にて８通りの制御パターンを保持できる。 Here, the T register 54 according to the present invention will be described. As shown in FIG. 2, each processor element 3 is equipped with an operation control register (T register) 54 for specifying an execution condition. FIG. 8 is an example of a circuit block diagram of the T register 54. In FIG. 8, the T register 54 includes 8-bit registers (T0, T1,... T7), and each 1-bit register is controlled separately. Therefore, eight control patterns can be held by one processor element 3.

Ｔレジスタ５４における夫々のビットは、各プロセッサエレメント３の演算実行の無効／有効の制御を行ない、特定のプロセッサエレメント３だけを演算対象として選択するという制御を行なうことができる。例えば、次のような命令が想定される。 Each bit in the T register 54 can be controlled to invalidate / validate the execution of the operation of each processor element 3 and to select only a specific processor element 3 as an operation target. For example, the following instructions are assumed.

ＡＤＤ／Ｔ１＃１２ ADD / T1 # 12

この命令は加算命令であり、Ａレジスタ３８の値と即値“１２”とが加算されて結果がＡレジスタ３８に格納される。この命令において、“／Ｔ１”という実行制御オプションを記述することにより、Ｔレジスタ３８のうちＴ１の（ビット）フラグの値が“１”であるプロセッサエレメント３のみ、Ａレジスタ３８へのＡＬＵ３６からのデータの格納が行なわれる。Ｔ１フラグが“０”であるプロセッサエレメント３のＡレジスタ５４へのデータの格納は行なわれない。 This instruction is an addition instruction, the value of the A register 38 and the immediate value “12” are added, and the result is stored in the A register 38. In this instruction, by describing the execution control option “/ T1”, only the processor element 3 whose T1 (bit) flag value is “1” in the T register 38 is sent from the ALU 36 to the A register 38. Data is stored. Data is not stored in the A register 54 of the processor element 3 whose T1 flag is “0”.

図８の例では、このＴレジスタ５４への入力へは、
・ＰＥ３内の演算ユニット６のＡＬＵ３６にて発生したオーバーフローフラグ（Ｖ）、キャリーフラグ（Ｃ）からの入力、
・全Ｔフラグへの即値ＩＭＭ２によるマスク操作の結果に対する、ＯＲ操作の結果からの入力、
・図２に示される記憶手段２からの入力、
・図７の選択回路３５でのコンパレータ８０出力からの入力
などである。 In the example of FIG. 8, the input to the T register 54 is
An input from the overflow flag (V) and carry flag (C) generated in the ALU 36 of the arithmetic unit 6 in the PE 3
-Input from the result of OR operation to the result of mask operation by immediate value IMM2 to all T flags,
Input from the storage means 2 shown in FIG.
The input from the comparator 80 output in the selection circuit 35 in FIG.

以上説明した本発明の第４の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２によれば、特定のＰＥ番号のプロセッサエレメント３のみ演算を行なわせるという処理を、少ない命令数で行なうことができる。従来の技術であれば、全プロセッサエレメント３のＴレジスタ５４に対し先ず“０”を設定し、更に、演算を行なうプロセッサエレメント３のＴレジスタ５４に“１”を設定することにより、上記の機能は実現されるが、本発明の第４の実施の形態に比べて命令数は必然的に増加してしまう。 According to the SIMD type microprocessor 2 according to the fourth embodiment of the present invention described above, it is possible to perform the process of performing the operation only on the processor element 3 having a specific PE number with a small number of instructions. According to the conventional technique, first, “0” is set to the T registers 54 of all the processor elements 3, and further, “1” is set to the T registers 54 of the processor elements 3 that perform the operation. However, the number of instructions inevitably increases as compared with the fourth embodiment of the present invention.

≪第５の実施の形態≫
本発明の第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。 << Fifth Embodiment >>
A SIMD type microprocessor 2 according to a fifth embodiment of the present invention will be described.

第５の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２の各プロセッサエレメント３においては、第１の実施の形態に係るＰＥ番号マスク回路から選択回路３５に繋げられる（データ）値を、選択回路３５に備わるコンパレータ８０（図７）にて、命令コード中に指定された即値データと比較し、その結果をＴレジスタ５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、「２のべき乗」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させるという制御を行うことができる。 In each processor element 3 of the SIMD type microprocessor 2 according to the fifth embodiment, the (data) value connected to the selection circuit 35 from the PE number mask circuit according to the first embodiment is transferred to the selection circuit 35. The provided comparator 80 (FIG. 7) compares the immediate data specified in the instruction code and stores the result in a bit at a predetermined position of the T register 54. By storing the result in the T register 54, it is possible to perform control such that only the processor element 3 having the PE number every “power of 2” is executed in the calculation after the next instruction.

図４に示されるＰＥ番号マスク回路の出力は図７の選択回路３５に入力され、該選択回路３５では（予め用意されている）「ＬＤＴＭ」（ＬｏａｄｔｏＴｒｅｇｉｓｔｅｒＭａｓｋｅｄＰＥＮｕｍｂｅｒ）命令を実行することで、入力データ選択信号２によりそのＰＥ番号マスク回路の出力のデータが選択される。更に、上記ＬＤＴＭ命令の実行時に、即値データ（の１つ）がＩＭＭ２に入力され、ＰＥ番号マスク回路の出力データとＩＭＭ２とがコンパレータ８０にて比較される。一致するとそのプロセッサエレメント３のコンパレータ８０の出力が“１”となり、一致しなかったプロセッサエレメント３では“０”となる。次の命令は、実行される命令の例である。 The output of the PE number mask circuit shown in FIG. 4 is input to the selection circuit 35 of FIG. 7, and the selection circuit 35 executes an “LDTM” (Load to T register Masked PE Number) instruction (prepared in advance). Thus, the output data of the PE number mask circuit is selected by the input data selection signal 2. Further, when the LDTM instruction is executed, immediate data (one of them) is input to the IMM 2, and the output data of the PE number mask circuit and the IMM 2 are compared by the comparator 80. If they match, the output of the comparator 80 of the processor element 3 becomes “1”, and the processor element 3 that does not match becomes “0”. The following instruction is an example of an instruction to be executed.

ＬＤＴＭ／Ｔ２＃００００００１１ｂ、＃１ LDTM / T2 # 00000011b, # 1

上記命令において、即値オペランドのうち第１オペランドがマスクパターンである。第２オペランドが比較値である。但し、マスクパターンは２進数表記、比較値は１０進表記としている。
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、・・・にマスクした結果の値は、
・０、１、２、３、０、１、２、３、・・・
のような繰り返しの値となるが、比較値と一致する値のＰＥ、即ち、
・ＰＥ１、ＰＥ５、ＰＥ９、・・・
のＴレジスタ５４のＴ２フラグに“１”が設定される。他のＰＥのＴレジスタ５４のＴ２フラグには“０”が設定される。これによって以降の命令でＴレジスタ５４のＴ２フラグを実行制御に用いれば、特定のＰＥ、即ちＰＥ１から始まり「２2」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させることができる。 In the above instruction, the first operand of the immediate operand is a mask pattern. The second operand is a comparison value. However, the mask pattern is expressed in binary notation, and the comparison value is expressed in decimal.
The value of the result masked to PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7,.
・ 0, 1, 2, 3, 0, 1, 2, 3, ...
It is a repetitive value like
・ PE1, PE5, PE9, ...
"1" is set in the T2 flag of the T register 54. “0” is set in the T2 flag of the T register 54 of another PE. As a result, if the T2 flag of the T register 54 is used for execution control in subsequent instructions, only the processor element 3 having a PE number starting with a specific PE, that is, every “2 2”, starting with PE 1 can be executed.

≪第６の実施の形態≫
本発明の第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。 << Sixth Embodiment >>
A SIMD type microprocessor 2 according to a sixth embodiment of the present invention will be described.

第６の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２の各プロセッサエレメント３においては、第２の実施の形態に係る固定値選択回路から選択回路３５に繋げられる（データ）値を、選択回路３５に備わるコンパレータ８０（図７）にて、命令コード中に指定された即値データと比較し、その結果をＴレジスタ５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、３、５、７のような特定の数値おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させるという制御を行うことができる。 In each processor element 3 of the SIMD type microprocessor 2 according to the sixth embodiment, the (data) value connected to the selection circuit 35 from the fixed value selection circuit according to the second embodiment is transferred to the selection circuit 35. The provided comparator 80 (FIG. 7) compares the immediate data specified in the instruction code and stores the result in a bit at a predetermined position of the T register 54. By storing the result in the T register 54, it is possible to perform control such that only the processor element 3 having a PE number every specific numerical value such as 3, 5, 7 is executed in the calculation after the next instruction. it can.

図５に示される固定値選択回路の出力は図７の選択回路に入力され、該選択回路３５では（予め用意されている）「ＬＤＴＮ」（ＬｏａｄｔｏＴｒｅｇｉｓｔｅｒＰｒｅｄｅｆｉｎｅｄＮｕｍｂｅｒ）命令を実行することで、入力データ選択信号２によりその固定値選択回路の出力データが選択される。更に、上記ＬＤＴＮ実行時に、即値データ（の１つ）がＩＭＭ２に入力され、固定値選択回路の出力データとＩＭＭ２とがコンパレータ８０にて比較される。一致するとそのプロセッサエレメント３のコンパレータ８０の出力が“１”となり、一致しなかったプロセッサエレメント３では“０”となる。次の命令は、実行される命令の例である。 The output of the fixed value selection circuit shown in FIG. 5 is input to the selection circuit of FIG. 7, and the selection circuit 35 executes an “LDTN” (Load to T register Predefined Number) instruction (prepared in advance). The output data of the fixed value selection circuit is selected by the input data selection signal 2. Furthermore, when the LDTN is executed, immediate data (one of them) is input to the IMM 2, and the output data of the fixed value selection circuit and the IMM 2 are compared by the comparator 80. If they match, the output of the comparator 80 of the processor element 3 becomes “1”, and the processor element 3 that does not match becomes “0”. The following instruction is an example of an instruction to be executed.

ＬＤＴＮ／Ｔ２＃３、＃２ LDTN / T2 # 3, # 2

上記命令において、即値オペランドのうち第１オペランドが選択固定値である。第２オペランドが比較値である。
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、・・・に結果の値は、
・０、１、２、０、１、２、０、１、．．．
のような繰り返しの値となるが、比較値と一致する値のＰＥ、即ち、
・ＰＥ２、ＰＥ５、ＰＥ８、・・・
のＴレジスタ５４のＴ２フラグに“１”が設定される。他のＰＥのＴレジスタ５４のＴ２フラグには“０”が設定される。これによって以降の命令でＴレジスタ５４のＴ２フラグを実行制御に用いれば、特定のＰＥ、即ちＰＥ２から始まり「３」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させることができる。 In the above instruction, the first operand of the immediate operands is a selected fixed value. The second operand is a comparison value.
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, ...
0, 1, 2, 0, 1, 2, 0, 1,. . .
It is a repetitive value like
・ PE2, PE5, PE8, ...
"1" is set in the T2 flag of the T register 54. “0” is set in the T2 flag of the T register 54 of another PE. As a result, if the T2 flag of the T register 54 is used for execution control in subsequent instructions, only the processor element 3 having a PE number starting with PE2 and having every other “3” PE number can be executed.

≪第７の実施の形態≫
本発明の第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２について説明する。 << Seventh Embodiment >>
A SIMD type microprocessor 2 according to a seventh embodiment of the present invention will be described.

第７の実施の形態に係るＳＩＭＤ型マイクロプロセッサ２の各プロセッサエレメント３においては、第３の実施の形態に係るｎおきビットパターンデータ出力回路から選択回路３５に繋げられる（データ）値を、選択回路３５に備わるコンパレータ８０（図７）にて、命令コード中に指定された即値データと比較し、その結果をＴレジスタ５４の所定の位置のビットに格納する。このＴレジスタ５４への結果格納により、次命令以降の演算にて、２、３、５、７、１１のような特定の数値おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させるという制御を行うことができる。 In each processor element 3 of the SIMD type microprocessor 2 according to the seventh embodiment, a (data) value connected to the selection circuit 35 from the n-th bit pattern data output circuit according to the third embodiment is selected. The comparator 80 (FIG. 7) provided in the circuit 35 compares it with the immediate data specified in the instruction code, and stores the result in a bit at a predetermined position of the T register 54. By storing the result in the T register 54, the control is executed so that only the processor element 3 having a PE number every specific numerical value such as 2, 3, 5, 7, 11 is executed in the operation after the next instruction. It can be performed.

図６に示されるｎおきビットパターンデータ出力回路の出力は図７の選択回路３５に出力され、該選択回路３５では（予め用意されている）「ＬＤＴＢ」（ＬｏａｄｔｏＴｒｅｇｉｓｔｅｒＢｉｔＮｕｍｂｅｒ）命令を実行することで、入力データ選択信号２によりそのｎおきビットパターンデータ出力回路の出力のデータが選択される。更に、上記ＬＤＴＢ命令の実行時に、即値データ（の１つ）がＩＭＭ２に入力され、ｎおきビットパターンデータ出力回路の出力データとＩＭＭ２とがコンパレータ８０にて比較される。一致するとそのプロセッサエレメント３のコンパレータの出力が“１”となり、一致しなかったプロセッサエレメント３では“０”となる。次の命令は、実行される命令の例である。 The output of the n-bit bit pattern data output circuit shown in FIG. 6 is output to the selection circuit 35 of FIG. 7, and the selection circuit 35 executes an “LDTB” (Load to T register Bit Number) instruction (prepared in advance). As a result, the output data of the n-th bit pattern data output circuit is selected by the input data selection signal 2. Further, when the LDTB instruction is executed, immediate data (one of them) is input to the IMM 2, and the output data of the n-th bit pattern data output circuit is compared with the IMM 2 by the comparator 80. If they match, the output of the comparator of the processor element 3 becomes “1”, and the processor element 3 that does not match becomes “0”. The following instruction is an example of an instruction to be executed.

ＬＤＴＢ／Ｔ２＃００００００１０ｂ、＃１ LDTB / T2 # 00000010b, # 1

上記命令において、即値オペランドのうち第１オペランドがｎおきビット指定である。第２オペランドが比較値である。
・ＰＥ０、ＰＥ１、ＰＥ２、ＰＥ３、ＰＥ４、ＰＥ５、ＰＥ６、ＰＥ７、・・・に結果の値は、
・１、０、０、１、０、０、１、０、．．．
のような繰り返しの値となるが、比較値と一致する値のＰＥ、即ち、
・ＰＥ０、ＰＥ３、ＰＥ６、・・・
のＴレジスタ５４のＴ２フラグに“１”が設定される。他のＰＥのＴレジスタ５４のＴ２フラグには“０”が設定される。これによって以降の命令でＴレジスタ５４のＴ２フラグを実行制御に用いれば、特定のＰＥ、即ちＰＥ０から始まり「３」おきのＰＥ番号を備えるプロセッサエレメント３のみの演算を実行させることができる。 In the above instruction, the first operand of the immediate operand is designated every n bits. The second operand is a comparison value.
-PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, ...
1, 0, 0, 1, 0, 0, 1, 0,. . .
It is a repetitive value like
・ PE0, PE3, PE6, ...
"1" is set in the T2 flag of the T register 54. “0” is set in the T2 flag of the T register 54 of another PE. As a result, if the T2 flag of the T register 54 is used for execution control in the subsequent instructions, only the processor element 3 having a PE number starting with a specific PE, that is, every third, starting with PE0 can be executed.

一般的なＳＩＭＤ型マイクロプロセッサの概略の構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a general SIMD type microprocessor. FIG. 本発明に係るＳＩＭＤ型マイクロプロセッサのより詳細な構成を示すブロック図である。It is a block diagram which shows the more detailed structure of the SIMD type | mold microprocessor which concerns on this invention. レジスタファイルのレジスタと演算アレイとを結び付けるマルチプレクサの機能を示すブロック図である。It is a block diagram which shows the function of the multiplexer which connects the register | resistor of a register file, and an arithmetic array. 本発明の第１の実施の形態に係るＰＥ番号マスク回路の回路図である。1 is a circuit diagram of a PE number mask circuit according to a first embodiment of the present invention. 本発明の第２の実施の形態に係る固定値選択回路の回路図である。FIG. 5 is a circuit diagram of a fixed value selection circuit according to a second embodiment of the present invention. 本発明の第３の実施の形態に係るｎおきビットパターンデータ出力回路の回路図である。It is a circuit diagram of the n every bit pattern data output circuit based on the 3rd Embodiment of this invention. 本発明に係る選択回路の概略回路ブロック図を示す。1 shows a schematic circuit block diagram of a selection circuit according to the present invention. FIG. 演算制御レジスタ（Ｔレジスタ）の回路ブロック図の例である。It is an example of a circuit block diagram of an arithmetic control register (T register).

Explanation of symbols

２・・・ＳＩＭＤ型マイクロプロセッサ、３・・・プロセッサエレメント、４・・・グローバルプロセッサ、６・・・レジスタファイル、８・・・演算アレイ、３４・・・レジスタ、３５・・・演算回路、３６・・・ＡＬＵ（算術論理演算装置）、３８・・・Ａレジスタ、５０、５０’、５０”・・・接続線部、５１、５１”・・・論理回路部、５１’・・・マルチプレクサ部、５４・・・Ｔレジスタ（演算制御レジスタ）、７８・・・マルチプレクサ、８０・・・コンパレータ。 2 ... SIMD type microprocessor, 3 ... processor element, 4 ... global processor, 6 ... register file, 8 ... arithmetic array, 34 ... register, 35 ... arithmetic circuit, 36 ... ALU (arithmetic logic unit), 38 ... A register, 50, 50 ', 50 "... connection line part, 51, 51" ... logic circuit part, 51' ... multiplexer 54, T register (operation control register), 78, multiplexer, 80, comparator.

Claims

A global processor for overall control and a SIMD type microprocessor including a plurality of processor elements ,
Each processor element is given an integer number for identification, in order,
Each processor element
A connection line to which an integer number for identification attached to the processor element is input from the outside;
A mask circuit in which a logic circuit for receiving a mask control signal from the global processor is inserted into the connection line;
SIMD type microprocessor having