JP3996355B2

JP3996355B2 - Multiprocessor system

Info

Publication number: JP3996355B2
Application number: JP2001083506A
Authority: JP
Inventors: 拓若林; 隆喜中村; 直伸助川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2001-03-22
Filing date: 2001-03-22
Publication date: 2007-10-24
Anticipated expiration: 2021-03-22
Also published as: JP2002278947A

Description

【０００１】
【発明の属する技術分野】
本発明は、マルチプロセッサシステムにおける通信制御技術に関し、特にロック／アンロック処理を高速化する通信制御技術等に適用して有効な技術に関する。
【０００２】
【従来の技術】
マルチプロセッサシステムにおいて効率よいマルチプロセス制御または並列プロセス制御を行うためには、プロセッサ間の高速な同期、排他、通信制御が必要である。そのために通信レジスタと呼ばれる主記憶より高速な共有のレジスタを使用する場合がある。
【０００３】
特開２０００−１８７６５２号公報においては、図７に示すマルチプロセッサシステムにおいて、図８の構成をとる通信レジスタを用いたプロセッサ間の高速な同期の例が開示されている。
【０００４】
Ｎ個の通信レジスタモジュール４００〜４０２はＮ個のプロセッサ１０〜１２と信号線Ｌ８０〜Ｌ８２を介して１対１に接続される。各プロセッサ１０〜１２は、信号線Ｌ８０〜Ｌ８２にロードストア要求および主記憶の実アドレスを出力する。通信レジスタモジュール４００〜４０２は複数のレジスタから構成される通信レジスタ群５００〜５０２を有し、通信レジスタモジュール間は通信モジュール間バス７００で結合される。通信レジスタ群５００〜５０２の各々内の相対応する番号のレジスタは通信レジスタモジュール間で同一の値を保持するよう制御されている（これを「通信レジスタがミラーリングされている」と呼ぶ）。信号線Ｌ８０〜Ｌ８２上のロードストア要求は、通信レジスタモジュール内のデコーダ４３０〜４３２でデコードされ、要求が主記憶３０へのアクセス要求である場合は、相互結合網２０を介して主記憶３０をアクセスするよう制御され、要求が通信レジスタへのアクセス要求である場合は、通信モジュール間バス７００を介して通信レジスタ群５００〜５０２をアクセスするよう制御される。
【０００５】
具体的には、プロセッサから書き込み要求が来た場合は、全通信レジスタモジュールに対して同一の値を書き込むよう相互結合網２０、または通信モジュール間バス７００が制御し、プロセッサ１０〜１２からの読み出し要求に対しては、各プロセッサ１０〜１２に対応する通信レジスタモジュール４００〜４０２内のレジスタから値を読み出す。
【０００６】
各通信レジスタはメモリマップドレジスタ方式（ＭＭＲ方式）により、メモリアドレス空間上にマッピングされる。本従来例においては、ロック要求時、通信レジスタに非ロック状態を示す固定値が設定されていた場合、該通信レジスタには、ロック要求を発行したプロセッサのプロセッサ番号が設定され、リプライデータにも同じプロセッサ番号が返される。一方、通信レジスタに非ロック状態を示す固定値以外の値が設定されていたら、通信レジスタの内容（この場合はロックをかけている他のプロセッサ番号）をリプライデータとして返す。プロセッサはリプライデータの値が、自プロセッサ番号ならロック成功、他プロセッサ番号ならロック失敗を検知することができ、従来のプロセッサアーキテクチャを変えることなく、ロード／ストア要求によって、ロック／アンロック処理が可能になる。
【０００７】
また、ロック値をプロセッサ番号ではなく、個々のプロセッサ内にて並行動作する複数のプロセスの各々を識別するためにユニークに付与されたプロセス番号とし、マルチプロセス時にプロセス単位でロック／アンロック処理を実現することも可能である。
【０００８】
【発明が解決しようとする課題】
上述の特開２０００−１８７６５２号公報に示されるマルチプロセッサシステムではプロセス番号をロック値として使用する場合、通信レジスタモジュール内の通信レジスタ本数を、ロック要求元であるプロセスの数に応じて増加させる必要がある。また、通信レジスタがミラーリングされているため、通信レジスタの増加がシステム全体の物量に与える影響も大きい。
【０００９】
一方、ミラーリングせずに、通信レジスタが通信レジスタモジュール間で個別の内容を持つように制御する案も考えられる（これを「通信レジスタをフラットに置く」と呼ぶ）。こうすれば、物量の増大は抑えられるが、異なる通信レジスタモジュールに存在する通信レジスタへのアクセスやロック成功／失敗の報告信号が、通信モジュール間バスを中継しなければならず、アクセス速度が低下する。
【００１０】
本発明の目的は、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、回路や論理物量等を増大させることなく、プロセッサ間の同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することが可能な技術を提供することにある。
【００１１】
本発明の他の目的は、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、主記憶に対するアクセス速度を低下させることなく、プロセッサ間の同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することが可能な技術を提供することにある。
【００１２】
本発明の他の目的は、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、回路や論理物量等の増大や主記憶に対するアクセス速度の低下を生じることなく、個々のプロセッサにて並行動作する複数のプロセス間での同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することが可能な技術を提供することにある。
【００１３】
本発明の他の目的は、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、プロセッサアーキテクチャ等に変更を加えることなく、プロセッサ間あるいはプロセス間の同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することが可能な技術を提供することにある。
【００１４】
【課題を解決するための手段】
本発明は、主記憶と、この主記憶を共有する複数のプロセッサの各々との間に、通信レジスタを介在させた構成のマルチプロセッサシステムにおいて、前記通信レジスタは、ロック／アンロック状態を保持する第１のレジスタとロック／アンロック情報を保持する第２のレジスタを持つ構成としたものである。また、ロック／アンロック状態を保持する第１のレジスタは、通信レジスタモジュール間でミラーリングされている。つまり、通信レジスタモジュール間で同一の値を保持するよう制御される。一方、ロック／アンロック情報を保持する第２のレジスタは、各通信レジスタモジュール間でフラットに置かれる。つまり、通信レジスタモジュール毎に個別の内容を保持する。
【００１５】
【発明の実施の形態】
以下、本発明の実施の形態を図面を参照しながら詳細に説明する。
【００１６】
図１は、本発明の一実施の形態であるマルチプロセッサシステムの全体構成の一例を示す概念図であり、図２、図３および図４は、その構成の一例をより詳細に例示した概念図である。
【００１７】
本実施の形態のマルチプロセッサシステムは、複数のプロセッサ１０（＃１〜＃ＮのＮ個）が、通信レジスタ４０および相互結合網２０を介して主記憶３０を共有する構成となっている。通信レジスタ４０は複数の通信レジスタモジュール４００（＃１〜＃ＮのＮ個）を含んでいる。
【００１８】
複数の通信レジスタモジュール４００の各々は、複数のプロセッサ１０の各々と信号線Ｌ８０を介して１対１に接続される。各プロセッサ１０は、信号線Ｌ８０にロードストア要求および主記憶３０の実アドレスを出力する。
【００１９】
図２に例示されるように、個々の通信レジスタモジュール４００は複数のレジスタから構成される通信レジスタ群５００（＃１〜＃Ｎ）を有し、通信レジスタモジュール４００の間は通信モジュール間バス７００で結合される。信号線Ｌ８０上のロードストア要求は、通信レジスタモジュール４００内のデコーダ４３０でデコードされ、要求が主記憶３０へのアクセス要求である場合は、信号線Ｌ８０および相互結合網２０を介して主記憶３０をアクセスするよう制御され、要求が通信レジスタ４０へのアクセス要求である場合は、通信モジュール間バス７００を介して通信レジスタ群５００をアクセスするよう制御される。
【００２０】
なお、通信モジュール間バス７００は、全通信レジスタモジュール４００へのブロードキャスト書き込み要求が、同時に全通信レジスタモジュールに到達することが保証できれば、バスではなく結合網で構成してもよい。
【００２１】
図３に、本実施の形態における通信レジスタ群５００の構成の一例を示す。本実施の形態の場合、通信レジスタ群５００には、ロック情報レジスタ５１０とロック状態レジスタ５２０の２種類の通信レジスタが存在する。
【００２２】
ロック情報レジスタ５１０は、複数のビットから構成されるレジスタであり、通信レジスタ群５００の全てにわたって合計Ｎ×Ｍ本存在する。各レジスタには固有の番号（１からＮ×Ｍまで）がレジスタ番号として割り当てられている。各レジスタの実体は、Ｎ個の通信レジスタモジュールそれぞれにＭ本ずつ分配されて存在する。具体的には、通信レジスタモジュール＃１内にはレジスタ番号１からＭまで、通信レジスタモジュール＃２内にはレジスタ番号Ｍ＋１から２Ｍまで、通信レジスタモジュール＃Ｎ内にはレジスタ番号（Ｎ−１）Ｍ＋１からＮ×Ｍまで、といった形でレジスタが各通信レジスタモジュールに割り振られる。
【００２３】
ロック状態レジスタ５２０は、１ビットのフラグレジスタであり、各通信レジスタ群毎にＮ×Ｍ本ずつ存在する。レジスタ番号は各通信レジスタ群５００毎に１からＮ×Ｍまで割り当てられる。異なる通信レジスタモジュール４００間で対応するレジスタ番号をもつレジスタは同一の値を保持する（一つの通信レジスタモジュール４００内でのロック状態レジスタ５２０の変更が、常に他の通信レジスタモジュール４００内でのロック状態レジスタ５２０に反映される）ように制御される。
【００２４】
つまり、ロック情報レジスタ５１０は各通信レジスタモジュール間でフラットに置かれ、ロック状態レジスタ５２０は各通信レジスタモジュール間でミラーリングされる。これらはいずれも実アドレスにマッピングされ、レジスタへのアクセスは、メモリアクセスを行うロード／ストア等の命令で行う。
【００２５】
ロック状態レジスタ５２０のフラグがセットされている場合はその通信レジスタはロック状態にあることを示し、対応するレジスタ番号をもつロック情報レジスタ５１０には、ロック処理を要求したプロセスのプロセス番号が設定される。また、ロック状態レジスタ５２０のフラグがリセットされている場合はその通信レジスタはアンロック状態であることを示し、ロック情報レジスタ５１０の内容はＤｏｎ’ｔｃａｒｅである。
【００２６】
プロセス番号を設定するレジスタ（ロック情報レジスタ５１０）はフラットに存在し、通信レジスタ全体をミラーリングする場合に比べ物量の増大を抑えることができる。また、フラグレジスタ（ロック状態レジスタ５２０）のみミラーリングするため、ロック要求に対するロック成功／失敗の報告を通信モジュール間バス７００を介さずにプロセッサ１０に返すことが可能になる。
【００２７】
ロック情報レジスタ５１０とロック状態レジスタ５２０へアクセスして、実際にロック／アンロック処理を行うには、従来アーキテクチャでロックフラグが主記憶３０上にある場合のロック／アンロック処理時に使用する命令を使用する。本実施例では、一例として、ＰｏｗｅｒＰＣアーキテクチャにおけるロックフラグ操作用命令を用いる。
【００２８】
なお、ＰｏｗｅｒＰＣアーキテクチャにおけるメモリ同期命令等については、たとえば、ソフトバンク株式会社１９９４年１２月１０日発行「ＩｎｓｉｄｅＰｏｗｅｒＰＣ」Ｐ１８１以降、等の文献に記載されている。
【００２９】
ＰｏｗｅｒＰＣアーキテクチャにおいて、主記憶上にロックフラグが存在する場合のロック／アンロック処理は図５に示す命令列によって実現する。
【００３０】
図５の各命令の意味を以下に示す。尚、ｒ３はロックフラグの存在する主記憶上のアドレスが入っているレジスタ、ｒ５は読み出したロックフラグ値を格納するレジスタ、ｒ４は新たに設定するロック値が入っているレジスタである。
【００３１】
ｌｗａｒｘｒ５，０，ｒ３：ロックフラグの値を読み出し、レジスタｒ５に格納する。同時にリザベーション（後述）を確立する。
【００３２】
ｃｍｐｒ５，０：ｒ５の値をゼロと比較する。
【００３３】
ｂｃｌｏｏｐ：ｒ５の値がゼロでないならば、ｌｏｏｐへ戻る。
【００３４】
ｓｔｗｃｘ．ｒ４，０，ｒ３：リザベーションが残っていたらロックフラグにｒ４の値を書き込みに成功する。リザベーションが破棄されていたら、ロックフラグ書き込みに失敗する。
【００３５】
ｂｃｌｏｏｐ：ロックフラグ書き込みが失敗していたらｌｏｏｐへ戻る。成功していたら、次行以降のクリティカル・リージョン内の命令列を実行する。
【００３６】
ｓｙｎｃ：クリティカル・リージョン内の命令実行がすべて完了したことを保証する。
【００３７】
ｓｔｗ０，０（ｒ３）：ロックフラグにゼロを書き込む。
【００３８】
前半５行に渡る命令列がロック処理、最後の１行がアンロック処理、それ以外はクリティカル・リージョンを示す。
【００３９】
ｌｗａｒｘ命令、ｓｔｗｃｘ．命令はそれぞれロード・アンド・リザーブ命令、ストア・コンディショナル命令と呼ばれ、ＰｏｗｅｒＰＣアーキテクチャにおけるアトミックな処理を実現する際に、常に対となって使用される命令である。
【００４０】
ロード・アンド・リザーブ命令は、指定アドレスに対して、内容をリードする動作を行うとともに、リザベーションと呼ばれる保護領域を確立する。ストア・コンディショナル命令は、先行するロード・アンド・リザーブ命令によって確立されたリザベーションが破棄されていなかったら、指定アドレスへのライト動作を実行し、プロセッサにストア動作の「成功」を報告する。もし、リザベーションが存在する時に、他のプロセッサがストア命令を発行して、該アドレスの内容を更新すると、リザベーションは破棄される。この状態で、最初にロード・アンド・リザーブ命令を発行したプロセッサが、ストア・コンディショナル命令を発行すると、ストア動作は行われず、「失敗」したことを示すリプライをプロセッサに報告する。
【００４１】
ロック処理を実現する場合、プロセッサはまず主記憶上のロックフラグに対してロード・アンド・リザーブ命令を発行する。読み出されたフラグ値がゼロでない場合は、他のプロセッサがロック中なので、自プロセッサはｌｏｏｐに戻り、フラグ値がゼロになるまで、ロード・アンド・リザーブ命令を繰り返す。フラグ値がゼロになったら、ストア・コンディショナル命令によって、ロックフラグを設定する。このとき、自プロセッサよりも他のプロセッサのロックフラグ設定が早いと、リザベーションが破棄されるので、フラグ値設定に失敗し、再びｌｏｏｐに戻ってロック値がゼロになるのを待つ。フラグ値設定に成功した場合、クリティカル・リージョンを実行する。クリティカル・リージョン実行後は、ロックフラグにゼロをストアしてアンロック処理を行う。
【００４２】
図６に、本実施の形態におけるロック情報レジスタ５１０およびロック状態レジスタ５２０を使用してロック／アンロック処理を行う場合の命令列を例示する。
【００４３】
図６の各命令の意味を以下に示す。尚、ｒ３はロック情報レジスタＲ０のマッピングされている実アドレスが格納されているレジスタ、ｒ４は新たに設定するロック値が入っているレジスタである。
【００４４】
ｓｔｗｃｘ．ｒ４，０，ｒ３：ロック状態レジスタＦ０の値がゼロの場合、ロック状態レジスタＦ０を１にセットし、ロック情報レジスタＲ０にｒ４の値を書き込むことに成功する。ロック状態レジスタＦ０の値が１の場合、書き込み動作に失敗する。
【００４５】
ｂｃｌｏｏｐ：フラグ書き込みが「失敗」ならばｌｏｏｐへ戻る。「成功」ならば、次行以降のクリティカル・リージョンに移る。
【００４６】
ｓｙｎｃ：クリティカル・リージョン内の命令実行がすべて完了したことを保証する。
【００４７】
ｓｔｗ０，０（ｒ３）：ロック情報レジスタＲ０にゼロを書き込み、同時にロック状態レジスタＦ０にゼロをセットする。
【００４８】
前半２行分の命令列がロック処理、最後の１行がアンロック処理、それ以外がクリティカル・リージョンである。
【００４９】
ここでのストア・コンディショナル命令の動作は、図５の例の動作とは全く異なる。オペランドにはロック情報レジスタ５１０しか明示しないが、発行されると、対応するロック状態レジスタ５２０のフラグ値も参照・更新する。フラグ値がゼロならば、フラグをセットし、ロック情報レジスタ５１０にロック値を設定し、プロセッサに「成功」報告を行う。フラグ値が１ならば、フラグもロック値も設定せず、プロセッサに「失敗」報告を行う。
【００５０】
ロック処理を実現する場合、まず、ロック情報レジスタ５１０に対してストア・コンディショナル命令を発行する。すると、対応するロック状態レジスタ５２０のフラグ値を参照し、フラグ値がゼロならば非ロック状態であるので、フラグをセットし、ロック情報レジスタ５１０にロック値を設定する。フラグ値が１ならば、他のプロセスによってロックされている状態なので、ｌｏｏｐに戻り、再びストア・コンディショナル命令を繰り返す。ロック値設定に成功したら、クリティカル・リージョンを実行する。クリティカル・リージョン実行後は、ロック情報レジスタ５１０に対してストア命令を発行する。すると、対応するロック状態レジスタ５２０のフラグ値を無条件にゼロとし、ロック情報レジスタ５１０に値を書き込んで、アンロック処理を行う。
【００５１】
このように、本実施の形態では、主記憶３０をアクセスする場合と通信レジスタ４０をアクセスする場合とで、既存のロックフラグを操作する命令の動作を切り替えている。切り替え制御を通信レジスタモジュール４００内で行えば、従来のプロセッサアーキテクチャを変更することなく、通信レジスタ４０を用いた高速ロック処理を実現することができる。また、その高速性は、図５と図６との命令列数の差に示されるように、ロック処理に要する命令数を削減して実行時間を短縮できることからも明白である。
【００５２】
図４に本実施の形態のマルチプロセッサシステムにおける個々の通信レジスタモジュール４００の構成の一例を示す。
【００５３】
信号線Ｌ８０から入力された要求はデコーダ４３０により解析され、ロード／ストア要求であった場合には、アクセスアドレスが通信レジスタ４０にマッピングされた範囲外であれば、要求を信号線Ｌ８０、相互結合網２０を介して、主記憶３０に送る。アクセスアドレスが通信レジスタ４０にマッピングされた範囲内であれば、要求はリクエスト制御回路４１０に送られる。以下、要求ごとの動作を説明する。
【００５４】
尚、本実施の形態の説明においては、要求を出したプロセッサ１０と信号線Ｌ８０を介して直接接続されている通信レジスタモジュール４００を「マスタモジュール」、それ以外の通信レジスタモジュール４００を「スレーブモジュール」とそれぞれ定義する。
【００５５】
（ａ）ストア・コンディショナル要求（ロック要求）
ストア・コンディショナル要求が入力された場合、リクエスト制御回路４１０は、要求をセレクタ４８０に送り、それに基づきバス出力制御回路４９０は、通信モジュール間バス７００に全通信レジスタモジュールにブロードキャスト出力するためのバス権を確保する。バス権確保が完了しブロードキャストが実行されると、自モジュールを含む全通信レジスタモジュール４００のバス入力制御回路４２０に同時に要求が積まれる。
【００５６】
バス入力制御回路４２０は要求を解析し、ロック情報レジスタ５１０が自モジュールにあるか調べる。そして、要求をレジスタ入力回路４４０に送る。それに基づき、書き込み対象のロック状態レジスタ５２０の内容が読み出されて、レジスタ出力回路４５０から出力される。その際、レジスタ入力回路４４０にはロック情報レジスタ５１０への書き込み値（この場合はロック要求元のプロセス番号値）とロック状態レジスタ５２０への書き込み値（この場合はロック成功時に書き込む値１）が、信号線Ｌ７０からの指示信号を受け取るまで保持される。もし、書き込み対象となるロック情報レジスタ５１０が自モジュールに存在しない場合は、ロック状態レジスタ５２０への書き込み値のみが保持される。レジスタ出力回路４５０の出力は比較器４６０へ送られて、読み出された値がチェックされる。
【００５７】
以降の動作は、自モジュールが、マスタモジュールかスレーブモジュールかによって、以下のように異なる。
【００５８】
（ａ）−１：マスタモジュールの場合
読み出された値が１の場合は、他のプロセスによってロックされた状態であるのでロック失敗となり、リプライ制御回路４７０に、プロセッサへロックの「失敗」を示すリプライ報告を信号線Ｌ８０を介して返すよう指示する。同時にレジスタ入力回路４４０に保持された書き込み値は破棄される。
【００５９】
また、値がゼロの場合、非ロック状態なので、リプライ制御回路４７０に、プロセッサ１０へロックの「成功」を示すリプライ報告を信号線Ｌ８０を介して返すよう指示する。それと並行して、比較器４６０は信号線Ｌ７０を通じてレジスタ入力回路４４０に、ロック情報レジスタ５１０への書き込み値とロック状態レジスタ５２０への書き込み値をレジスタに書き込むよう指示する。自モジュールに書き込み対象となるロック情報レジスタ５１０が存在しない場合は、ロック状態レジスタ５２０へのみ書き込みを行う。
【００６０】
ロック状態レジスタ５２０はミラーリングされているため、プロセッサ１０へのロック成功／失敗の報告は、通信モジュール間バス７００を介さずに行うことができ、ロック処理の高速化が図れる。また、ロック情報レジスタ５１０はフラットに置かれるため、プロセス番号をロック値とする際に必要となる通信レジスタ数の増大によるシステムへの物量の影響を抑えることができる。
【００６１】
（ａ）−２：スレーブモジュールの場合
読み出された値が１の場合は、他のプロセスによってロックされた状態であるのでロック失敗となり、レジスタ入力回路４４０内の値を破棄して処理を終わる。
【００６２】
また、値がゼロの場合、非ロック状態なので、比較器４６０は信号線Ｌ７０を通じてレジスタ入力回路４４０に、ロック情報レジスタ５１０への書き込み値とロック状態レジスタ５２０への書き込み値をレジスタに書き込むよう指示する。自モジュールに書き込み対象となるロック情報レジスタ５１０が存在しない場合は、ロック状態レジスタ５２０へのみ書き込みを行う。
【００６３】
通信レジスタ４０へのアクセス要求は、いったん通信モジュール間バス７００経由でブロードキャストされ、バス入力制御回路４２０を介してからアクセスが行われる。そのため、複数のプロセスがほぼ同時に同じ通信レジスタ４０に対してロック要求を発しても、要求は通信モジュール間バス７００上でシリアライズされ、ロック要求のすり抜けは発生しない。
【００６４】
（ｂ）ストア要求（アンロック要求）
ストア要求が入力された場合、リクエスト制御回路４１０は、要求をセレクタ４８０に送り、それに基づきバス出力制御回路４９０は、通信モジュール間バス７００にロック情報レジスタ５１０への書き込み値（この場合ストア命令で指定されたストアデータ）とロック状態レジスタ５２０への書き込み値（この場合フラグリセット値）を出力するためのバス権を確保する。バス権を確保し書き込み値をブロードキャストすると、自モジュールを含む全通信レジスタモジュールのバス入力制御回路４２０に同時に書き込み要求が積まれる。バス入力制御回路４２０は要求を解析し、自モジュールにロック情報レジスタ５１０があるか調べる。ある場合には、ロック情報レジスタ５１０とロック状態レジスタ５２０の両方への書き込み値をレジスタ入力回路４４０に送り、ない場合にはロック状態レジスタ５２０への書き込み値だけをレジスタ入力回路４４０に送る。それに基づき、通信レジスタ群５００に書き込みが行われる。
【００６５】
（ｃ）ロード要求
ロード要求が入力された場合、リクエスト制御回路４１０は、要求をセレクタ４８０に送り、それに基づきバス出力制御回路４９０は、通信モジュール間バス７００にロック情報レジスタ５１０またはロック状態レジスタ５２０への読み出し要求を出力するためのバス権を確保する。バス権を確保し要求をブロードキャストすると、自モジュールを含む全通信レジスタモジュール４００のバス入力制御回路４２０に同時に読み出し要求が積まれる。
【００６６】
バス入力制御回路４２０は要求を解析し、自モジュールにアクセス先のレジスタが存在する場合、要求をレジスタ入力回路４４０に送る。それに基づき、通信レジスタ群５００内のレジスタの値が読み出され、レジスタ出力回路４５０から出力される。読み出された値はセレクタ４８０に送られ、バス出力制御回路４９０は、通信モジュール間バス７００に読み出し値を出力するためにバス権を確保する。バス権を確保すると、値はバスを通じてロード要求元の通信レジスタモジュール４００のバス入力制御回路４２０に送られる。
【００６７】
ロード要求元の通信レジスタモジュール４００のバス入力制御回路４２０は信号線Ｌ９０を通じて、リプライ制御回路４７０に、値をロード要求発行元のプロセッサにリプライするよう指示し、リプライ値が信号線Ｌ８０を通じてプロセッサ１０に返される。
【００６８】
以上説明したように、本実施の形態のマルチプロセッサシステムによれば、以下のことが可能となる。
【００６９】
通信レジスタ４０をロック情報レジスタ５１０とロック状態レジスタ５２０に分化し、通信レジスタモジュール４００間でロック情報レジスタ５１０をフラットに持ち、ロック状態レジスタ５２０をミラーリングして持つことで、たとえば各プロセッサ１０で稼働するプロセス単位でロック処理を行うマルチプロセッサシステムの物量増大を抑えることが可能になり、また、ロック処理時におけるプロセッサ１０へのロック成功／失敗報告を通信モジュール間バス７００を介さずにできるため、ロック処理の高速化が可能になる。
【００７０】
通信レジスタ４０へのアクセス要求は、通信モジュール間バス７００経由でブロードキャストされ、シリアライズされるので、複数のプロセスがほぼ同時に同じ通信レジスタ４０に対してロック要求を発しても、ロック要求のすり抜けを防ぐことができる。
【００７１】
従来ロックフラグ操作命令を用い、ロックフラグが主記憶３０に存在する場合と通信レジスタ４０に存在する場合とで命令の動作を通信レジスタモジュール４００内で切り替える制御を行うことにより、従来アーキテクチャのプロセッサ１０を使用して高速なロック／アンロック処理を実現することができる。
【００７２】
以上本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。
【００７３】
たとえば、プロセッサとしては、ＰｏｗｅｒＰＣアーキテクチャに限らず、一般のプロセッサに広く適用することができる。
【００７４】
【発明の効果】
本発明によれば、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、回路や論理物量等を増大させることなく、プロセッサ間の同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することができる、という効果が得られる。
【００７５】
本発明によれば、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、主記憶に対するアクセス速度を低下させることなく、プロセッサ間の同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することができる、という効果が得られる。
【００７６】
本発明によれば、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、回路や論理物量等の増大や主記憶に対するアクセス速度の低下を生じることなく、個々のプロセッサにて並行動作する複数のプロセス間での同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することができる、という効果が得られる。
【００７７】
本発明によれば、複数のプロセッサにて主記憶を共有するマルチプロセッサシステムにおいて、プロセッサアーキテクチャ等に変更を加えることなく、プロセッサ間あるいはプロセス間の同期、排他、通信制御のための多数のロック／アンロック要求の処理を的確に実現することができる、という効果が得られる。
【図面の簡単な説明】
【図１】本発明の一実施の形態であるマルチプロセッサシステムの全体構成の一例を示す概念図である。
【図２】本発明の一実施の形態であるマルチプロセッサシステムの構成の一部をより詳細に例示した概念図である。
【図３】本発明の一実施の形態であるマルチプロセッサシステムの構成の一部をより詳細に例示した概念図である。
【図４】本発明の一実施の形態であるマルチプロセッサシステムの構成の一部をより詳細に例示した概念図である。
【図５】ロックフラグを主記憶上に置く場合の、プロセッサによるロック処理の命令列の一例を示す説明図である。
【図６】本発明の一実施の形態であるマルチプロセッサシステムにおけるプロセッサによるロック処理の命令列の一例を示す説明図である。
【図７】従来のマルチプロセッサシステムの構成を示す概念図である。
【図８】従来のマルチプロセッサシステムにおける構成の一部をより詳細に示す概念図である。
【符号の説明】
１０…プロセッサ、２０…相互結合網、３０…主記憶、４０…通信レジスタ、４００…通信レジスタモジュール、４１０…リクエスト制御回路、４２０…バス入力制御回路、４３０…デコーダ、４４０…レジスタ入力回路、４５０…レジスタ出力回路、４６０…比較器、４７０…リプライ制御回路、４８０…セレクタ、４９０…バス出力制御回路、５００…通信レジスタ群、５１０…ロック情報レジスタ（第２のレジスタ）、５２０…ロック状態レジスタ（第１のレジスタ）、７００…通信モジュール間バス、Ｌ７０…信号線、Ｌ８０…信号線、Ｌ９０…信号線。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a communication control technique in a multiprocessor system, and more particularly to a technique effective when applied to a communication control technique for speeding up lock / unlock processing.
[0002]
[Prior art]
In order to perform efficient multi-process control or parallel process control in a multi-processor system, high-speed synchronization, exclusion, and communication control between processors are necessary. Therefore, a shared register called a communication register that is faster than the main memory may be used.
[0003]
Japanese Patent Application Laid-Open No. 2000-187852 discloses an example of high-speed synchronization between processors using a communication register having the configuration shown in FIG. 8 in the multiprocessor system shown in FIG.
[0004]
The N communication register modules 400 to 402 are connected to the N processors 10 to 12 on a one-to-one basis via signal lines L80 to L82. Each of the processors 10 to 12 outputs a load store request and a main memory real address to the signal lines L80 to L82. The communication register modules 400 to 402 have communication register groups 500 to 502 including a plurality of registers, and the communication register modules are coupled to each other via a communication module bus 700. Registers with numbers corresponding to each other in each of the communication register groups 500 to 502 are controlled so as to hold the same value between the communication register modules (this is referred to as “the communication register is mirrored”). The load store requests on the signal lines L80 to L82 are decoded by the decoders 430 to 432 in the communication register module. When the request is an access request to the main memory 30, the main memory 30 is stored via the interconnection network 20. When access is controlled and the request is an access request to the communication register, the communication register group 500 to 502 is controlled to be accessed via the inter-communication module bus 700.
[0005]
Specifically, when a write request is received from the processor, the interconnection network 20 or the inter-communication module bus 700 controls to write the same value to all the communication register modules, and the reading from the processors 10 to 12 is performed. In response to the request, values are read from the registers in the communication register modules 400 to 402 corresponding to the processors 10 to 12.
[0006]
Each communication register is mapped on the memory address space by a memory mapped register method (MMR method). In this conventional example, when a lock request is made and a fixed value indicating an unlocked state is set in the communication register, the processor number of the processor that issued the lock request is set in the communication register, and the reply data is also set in the reply data. The same processor number is returned. On the other hand, if a value other than a fixed value indicating an unlocked state is set in the communication register, the contents of the communication register (in this case, the other processor number that is locked) are returned as reply data. If the value of the reply data is the own processor number, the processor can detect the lock success, and if the other processor number, the lock failure can be detected, and lock / unlock processing can be performed by load / store requests without changing the conventional processor architecture. become.
[0007]
Also, the lock value is not a processor number, but a process number uniquely assigned to identify each of a plurality of processes operating in parallel within each processor, and lock / unlock processing is performed in units of processes during multi-process. It can also be realized.
[0008]
[Problems to be solved by the invention]
When the process number is used as a lock value in the multiprocessor system disclosed in the above Japanese Patent Laid-Open No. 2000-187852, it is necessary to increase the number of communication registers in the communication register module in accordance with the number of processes that are lock request sources. There is. In addition, since the communication register is mirrored, the influence of the increase in the communication register on the physical quantity of the entire system is great.
[0009]
On the other hand, it is also conceivable to perform control so that the communication register has individual contents between the communication register modules without mirroring (this is referred to as “laying the communication register flat”). In this way, the increase in the amount can be suppressed, but the access signal to the communication register existing in the different communication register module and the report signal of the lock success / failure must be relayed through the communication module bus, and the access speed decreases. To do.
[0010]
An object of the present invention is to provide a large number of locks / unlocks for synchronization, exclusion, and communication control between processors in a multiprocessor system in which a main memory is shared by a plurality of processors without increasing a circuit or a logical amount. The object is to provide a technology capable of accurately realizing the processing of a request.
[0011]
Another object of the present invention is to provide a large number of locks for synchronization, exclusion, and communication control between processors without reducing the access speed to the main memory in a multiprocessor system in which a plurality of processors share the main memory. An object of the present invention is to provide a technique capable of accurately realizing processing of an unlock request.
[0012]
Another object of the present invention is a multiprocessor system in which a plurality of processors share a main memory, and the parallel operation of individual processors without causing an increase in circuit or logical quantity or a decrease in access speed to the main memory. It is an object of the present invention to provide a technique capable of accurately realizing processing of a large number of lock / unlock requests for synchronization, exclusion, and communication control among a plurality of processes.
[0013]
Another object of the present invention is that in a multiprocessor system in which a main memory is shared by a plurality of processors, a number of units for synchronization, exclusion, and communication control between processors or between processes without changing the processor architecture or the like. An object of the present invention is to provide a technique capable of accurately realizing processing of a lock / unlock request.
[0014]
[Means for Solving the Problems]
According to the present invention, in a multiprocessor system having a configuration in which a communication register is interposed between a main memory and each of a plurality of processors sharing the main memory, the communication register holds a locked / unlocked state. In this configuration, a first register and a second register for holding lock / unlock information are provided. The first register that holds the locked / unlocked state is mirrored between the communication register modules. That is, the communication register modules are controlled to hold the same value. On the other hand, the second register holding the lock / unlock information is placed flat between the communication register modules. That is, individual contents are held for each communication register module.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0016]
FIG. 1 is a conceptual diagram showing an example of the overall configuration of a multiprocessor system according to an embodiment of the present invention, and FIGS. 2, 3 and 4 are conceptual diagrams illustrating an example of the configuration in more detail. It is.
[0017]
The multiprocessor system according to the present embodiment is configured such that a plurality of processors 10 (N from # 1 to #N) share the main memory 30 via the communication register 40 and the interconnection network 20. The communication register 40 includes a plurality of communication register modules 400 (N of # 1 to #N).
[0018]
Each of the plurality of communication register modules 400 is connected to each of the plurality of processors 10 on a one-to-one basis via a signal line L80. Each processor 10 outputs the load store request and the real address of the main memory 30 to the signal line L80.
[0019]
As illustrated in FIG. 2, each communication register module 400 includes a communication register group 500 (# 1 to #N) including a plurality of registers, and a communication module bus 700 is provided between the communication register modules 400. Combined with The load / store request on the signal line L80 is decoded by the decoder 430 in the communication register module 400. When the request is an access request to the main memory 30, the main memory 30 is connected via the signal line L80 and the interconnection network 20. If the request is an access request to the communication register 40, the communication register group 500 is controlled to be accessed via the inter-communication module bus 700.
[0020]
Note that the inter-communication module bus 700 may be constituted by a coupled network instead of a bus as long as it is guaranteed that broadcast write requests to all the communication register modules 400 reach all the communication register modules at the same time.
[0021]
FIG. 3 shows an example of the configuration of the communication register group 500 in the present embodiment. In the present embodiment, the communication register group 500 includes two types of communication registers, a lock information register 510 and a lock state register 520.
[0022]
The lock information register 510 is a register composed of a plurality of bits, and there are a total of N × M registers throughout the communication register group 500. Each register is assigned a unique number (1 to N × M) as a register number. Each register entity is distributed by M to N communication register modules. Specifically, register numbers 1 to M are set in the communication register module # 1, register numbers M + 1 to 2M are set in the communication register module # 2, and register number (N-1) is set in the communication register module #N. Registers are allocated to each communication register module in the form of M + 1 to N × M.
[0023]
The lock status register 520 is a 1-bit flag register, and there are N × M registers for each communication register group. Register numbers from 1 to N × M are assigned to each communication register group 500. Registers having corresponding register numbers between different communication register modules 400 hold the same value (changes in the lock status register 520 in one communication register module 400 always lock in other communication register modules 400 Is reflected in the status register 520).
[0024]
That is, the lock information register 510 is placed flat between the communication register modules, and the lock status register 520 is mirrored between the communication register modules. All of these are mapped to real addresses, and access to the registers is performed by instructions such as load / store for performing memory access.
[0025]
When the flag of the lock state register 520 is set, it indicates that the communication register is in the lock state, and the process number of the process that requested the lock process is set in the lock information register 510 having the corresponding register number. The When the flag of the lock state register 520 is reset, it indicates that the communication register is unlocked, and the content of the lock information register 510 is Don't care.
[0026]
A register for setting a process number (lock information register 510) exists in a flat manner, and an increase in the amount of material can be suppressed as compared with the case where the entire communication register is mirrored. In addition, since only the flag register (lock status register 520) is mirrored, it is possible to return a lock success / failure report for the lock request to the processor 10 without going through the inter-communication module bus 700.
[0027]
In order to access the lock information register 510 and the lock status register 520 and actually perform the lock / unlock process, an instruction used in the lock / unlock process when the lock flag is on the main memory 30 in the conventional architecture is used. use. In this embodiment, as an example, a lock flag operation instruction in the PowerPC architecture is used.
[0028]
Note that the memory synchronization instruction and the like in the PowerPC architecture are described in documents such as “Inside PowerPC” P181 issued on and after December 10, 1994, SOFTBANK CORP.
[0029]
In the PowerPC architecture, the lock / unlock process when the lock flag exists in the main memory is realized by the instruction sequence shown in FIG.
[0030]
The meaning of each command in FIG. 5 is shown below. Note that r3 is a register that contains an address on the main memory where the lock flag exists, r5 is a register that stores the read lock flag value, and r4 is a register that contains a newly set lock value.
[0031]
lwarx r5, 0, r3: The value of the lock flag is read and stored in the register r5. At the same time, reservation (described later) is established.
[0032]
Compare the value of cmp r5,0: r5 with zero.
[0033]
bc loop: If the value of r5 is not zero, return to loop.
[0034]
stwcx. r4, 0, r3: If the reservation remains, the value r4 is successfully written to the lock flag. If the reservation has been discarded, writing the lock flag fails.
[0035]
bc loop: Return to loop if the lock flag writing has failed. If successful, execute the instruction sequence in the critical region after the next line.
[0036]
sync: guarantees that all instruction execution in the critical region is complete.
[0037]
stw 0,0 (r3): Write zero to the lock flag.
[0038]
The instruction sequence over the first five lines indicates the lock process, the last line indicates the unlock process, and the rest indicates the critical region.
[0039]
lwarx instruction, stwcx. The instructions are referred to as a load and reserve instruction and a store conditional instruction, respectively, and are always used as a pair when realizing atomic processing in the PowerPC architecture.
[0040]
The load and reserve instruction performs an operation of reading the contents for a specified address and establishes a protection area called reservation. If the reservation established by the preceding load and reserve instruction has not been discarded, the store conditional instruction performs a write operation to the specified address, and reports the “success” of the store operation to the processor. If a reservation exists and another processor issues a store instruction to update the contents of the address, the reservation is discarded. In this state, when the processor that first issued the load and reserve instruction issues a store conditional instruction, the store operation is not performed, and a reply indicating “failure” is reported to the processor.
[0041]
When realizing the lock processing, the processor first issues a load and reserve instruction to the lock flag in the main memory. If the read flag value is not zero, the other processor is locked, so that the own processor returns to loop and repeats the load and reserve instruction until the flag value becomes zero. When the flag value becomes zero, the lock flag is set by a store conditional instruction. At this time, if the lock flag setting of the other processor is earlier than the own processor, the reservation is discarded. Therefore, the flag value setting fails, and it returns to loop again and waits for the lock value to become zero. If the flag value is set successfully, the critical region is executed. After executing the critical region, store zero in the lock flag and perform unlock processing.
[0042]
FIG. 6 shows an example of an instruction sequence when lock / unlock processing is performed using the lock information register 510 and the lock state register 520 in the present embodiment.
[0043]
The meaning of each command in FIG. 6 is shown below. Note that r3 is a register storing a real address to which the lock information register R0 is mapped, and r4 is a register containing a lock value to be newly set.
[0044]
stwcx. r4, 0, r3: When the value of the lock state register F0 is zero, the lock state register F0 is set to 1, and the value of r4 is successfully written to the lock information register R0. When the value of the lock state register F0 is 1, the write operation fails.
[0045]
bc loop: If flag writing is “failure”, return to loop. If “success”, move to the critical region after the next line.
[0046]
sync: guarantees that all instruction execution in the critical region is complete.
[0047]
stw 0,0 (r3): Zero is written to the lock information register R0 and simultaneously zero is set to the lock state register F0.
[0048]
The instruction sequence for the first two lines is the lock process, the last one is the unlock process, and the other is the critical region.
[0049]
The operation of the store conditional instruction here is completely different from the operation of the example of FIG. Only the lock information register 510 is explicitly shown in the operand, but when issued, the flag value of the corresponding lock state register 520 is also referred to / updated. If the flag value is zero, the flag is set, the lock value is set in the lock information register 510, and a “success” report is sent to the processor. If the flag value is 1, neither the flag nor the lock value is set, and a “failure” report is sent to the processor.
[0050]
When realizing the lock processing, first, a store conditional instruction is issued to the lock information register 510. Then, the flag value of the corresponding lock state register 520 is referred to. If the flag value is zero, the lock state register 520 is in an unlocked state. If the flag value is 1, the state is locked by another process, so the loop is returned and the store-conditional instruction is repeated again. If the lock value is set successfully, the critical region is executed. After executing the critical region, a store instruction is issued to the lock information register 510. Then, the flag value of the corresponding lock state register 520 is unconditionally set to zero, the value is written in the lock information register 510, and unlock processing is performed.
[0051]
As described above, in the present embodiment, the operation of the instruction that operates the existing lock flag is switched between when the main memory 30 is accessed and when the communication register 40 is accessed. If the switching control is performed in the communication register module 400, high-speed lock processing using the communication register 40 can be realized without changing the conventional processor architecture. The high speed is also clear from the fact that the execution time can be shortened by reducing the number of instructions required for the lock processing, as shown by the difference in the number of instruction strings between FIG. 5 and FIG.
[0052]
FIG. 4 shows an example of the configuration of each communication register module 400 in the multiprocessor system of the present embodiment.
[0053]
The request input from the signal line L80 is analyzed by the decoder 430. If the request is a load / store request, if the access address is outside the range mapped to the communication register 40, the request is connected to the signal line L80. The data is sent to the main memory 30 via the network 20. If the access address is within the range mapped to the communication register 40, the request is sent to the request control circuit 410. The operation for each request will be described below.
[0054]
In the description of the present embodiment, the communication register module 400 directly connected to the requesting processor 10 via the signal line L80 is referred to as “master module”, and the other communication register modules 400 are referred to as “slave modules”. Are defined respectively.
[0055]
(A) Store conditional request (lock request)
When a store conditional request is input, the request control circuit 410 sends the request to the selector 480, and based on this, the bus output control circuit 490 provides a bus for broadcast output to all communication register modules on the inter-communication module bus 700. Secure the right. When the bus right is secured and the broadcast is executed, a request is simultaneously made to the bus input control circuit 420 of all the communication register modules 400 including the own module.
[0056]
The bus input control circuit 420 analyzes the request and checks whether the lock information register 510 is in its own module. Then, the request is sent to the register input circuit 440. Based on this, the contents of the lock state register 520 to be written are read and output from the register output circuit 450. At that time, the register input circuit 440 has a write value to the lock information register 510 (in this case, the process number value of the lock request source) and a write value to the lock state register 520 (in this case, a value 1 to be written when the lock is successful). Until the instruction signal is received from the signal line L70. If the lock information register 510 to be written does not exist in its own module, only the value written to the lock status register 520 is held. The output of the register output circuit 450 is sent to the comparator 460, and the read value is checked.
[0057]
Subsequent operations differ as follows depending on whether the own module is a master module or a slave module.
[0058]
(A) -1: Master module
When the read value is 1, it is locked by another process, so that the lock fails, and a reply report indicating “failure” of the lock is sent to the reply control circuit 470 via the signal line L80. Instruct to return. At the same time, the write value held in the register input circuit 440 is discarded.
[0059]
If the value is zero, the lock state is not locked, and the reply control circuit 470 is instructed to return a reply report indicating “success” to the processor 10 via the signal line L80. At the same time, the comparator 460 instructs the register input circuit 440 to write the write value to the lock information register 510 and the write value to the lock state register 520 to the register through the signal line L70. When the lock information register 510 to be written does not exist in the own module, writing is performed only to the lock state register 520.
[0060]
Since the lock status register 520 is mirrored, the lock success / failure report to the processor 10 can be performed without going through the inter-communication module bus 700, and the lock process can be speeded up. In addition, since the lock information register 510 is placed flat, it is possible to suppress the influence of the physical quantity on the system due to the increase in the number of communication registers required when the process number is set as the lock value.
[0061]
(A) -2: Slave module
When the read value is 1, the lock is failed because it is locked by another process, the value in the register input circuit 440 is discarded, and the process ends.
[0062]
Further, when the value is zero, since it is in the unlocked state, the comparator 460 instructs the register input circuit 440 to write the write value to the lock information register 510 and the write value to the lock state register 520 to the register through the signal line L70. To do. When the lock information register 510 to be written does not exist in the own module, writing is performed only to the lock state register 520.
[0063]
An access request to the communication register 40 is once broadcast via the inter-communication module bus 700 and accessed via the bus input control circuit 420. For this reason, even if a plurality of processes issue a lock request to the same communication register 40 almost simultaneously, the requests are serialized on the inter-communication module bus 700, and the lock request does not pass through.
[0064]
(B) Store request (unlock request)
When a store request is input, the request control circuit 410 sends the request to the selector 480, and based on this, the bus output control circuit 490 writes the write value to the lock information register 510 in the inter-communication module bus 700 (in this case, a store instruction). A bus right for outputting a specified store data) and a write value to the lock state register 520 (in this case, a flag reset value) is secured. When the bus right is secured and the write value is broadcast, a write request is simultaneously loaded in the bus input control circuit 420 of all the communication register modules including the own module. The bus input control circuit 420 analyzes the request and checks whether the lock information register 510 exists in the own module. In some cases, the write value to both the lock information register 510 and the lock state register 520 is sent to the register input circuit 440, and in the other case, only the write value to the lock state register 520 is sent to the register input circuit 440. Based on this, writing to the communication register group 500 is performed.
[0065]
(C) Load request
When a load request is input, the request control circuit 410 sends the request to the selector 480, and based on this, the bus output control circuit 490 sends a read request to the lock information register 510 or the lock status register 520 to the inter-communication module bus 700. Secure the bus right to output. When the bus right is secured and the request is broadcast, a read request is simultaneously loaded in the bus input control circuit 420 of all the communication register modules 400 including the own module.
[0066]
The bus input control circuit 420 analyzes the request, and when an access destination register exists in the own module, the bus input control circuit 420 sends the request to the register input circuit 440. Based on this, the value of the register in the communication register group 500 is read and output from the register output circuit 450. The read value is sent to the selector 480, and the bus output control circuit 490 secures the bus right to output the read value to the inter-communication module bus 700. When the bus right is secured, the value is sent to the bus input control circuit 420 of the communication register module 400 of the load request source through the bus.
[0067]
The bus input control circuit 420 of the load request source communication register module 400 instructs the reply control circuit 470 to reply the value to the load request issuer processor via the signal line L90, and the reply value is sent to the processor 10 via the signal line L80. Returned to
[0068]
As described above, according to the multiprocessor system of the present embodiment, the following is possible.
[0069]
The communication register 40 is divided into a lock information register 510 and a lock state register 520, and the lock information register 510 is held flat between the communication register modules 400, and the lock state register 520 is mirrored. Since it is possible to suppress an increase in the amount of a multiprocessor system that performs lock processing in units of processes, and a lock success / failure report to the processor 10 at the time of lock processing can be performed without using the inter-communication module bus 700, The lock process can be speeded up.
[0070]
Since an access request to the communication register 40 is broadcast and serialized via the inter-communication module bus 700, even if a plurality of processes issue a lock request to the same communication register 40 at the same time, the lock request is prevented from passing through. be able to.
[0071]
By using a conventional lock flag operation instruction and controlling the operation of the instruction in the communication register module 400 depending on whether the lock flag exists in the main memory 30 or in the communication register 40, the processor 10 of the conventional architecture is controlled. Can be used to realize high-speed lock / unlock processing.
[0072]
Although the invention made by the present inventor has been specifically described based on the embodiments, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention. Needless to say.
[0073]
For example, the processor is not limited to the PowerPC architecture and can be widely applied to general processors.
[0074]
【The invention's effect】
According to the present invention, in a multiprocessor system in which a main memory is shared by a plurality of processors, a large number of locks / unlocks for synchronization, exclusion, and communication control between processors without increasing the circuit and the amount of logical objects. The effect that the processing of the request can be accurately realized is obtained.
[0075]
According to the present invention, in a multiprocessor system in which a main memory is shared by a plurality of processors, a large number of locks / unlocks for synchronization, exclusion, and communication control between processors without reducing the access speed to the main memory. The effect that the processing of the request can be accurately realized is obtained.
[0076]
According to the present invention, in a multiprocessor system in which a main memory is shared by a plurality of processors, a plurality of processors that operate in parallel in each processor without causing an increase in circuits, logical quantities, etc. or a decrease in access speed to the main memory. It is possible to accurately realize processing of a large number of lock / unlock requests for synchronization, exclusion, and communication control between the processes.
[0077]
According to the present invention, in a multiprocessor system in which a main memory is shared by a plurality of processors, a number of locks for synchronization, exclusion, and communication control between processors or between processes without changing the processor architecture or the like. There is an effect that the processing of the unlock request can be accurately realized.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram showing an example of the overall configuration of a multiprocessor system according to an embodiment of the present invention.
FIG. 2 is a conceptual diagram illustrating a part of a configuration of a multiprocessor system according to an embodiment of the present invention in more detail.
FIG. 3 is a conceptual diagram illustrating a part of the configuration of a multiprocessor system according to an embodiment of the present invention in more detail.
FIG. 4 is a conceptual diagram illustrating a part of a configuration of a multiprocessor system according to an embodiment of the present invention in more detail.
FIG. 5 is an explanatory diagram showing an example of a command sequence for lock processing by a processor when a lock flag is placed on the main memory;
FIG. 6 is an explanatory diagram showing an example of an instruction sequence for lock processing by a processor in a multiprocessor system according to an embodiment of the present invention;
FIG. 7 is a conceptual diagram showing a configuration of a conventional multiprocessor system.
FIG. 8 is a conceptual diagram showing a part of a configuration in a conventional multiprocessor system in more detail.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Processor, 20 ... Interconnection network, 30 ... Main memory, 40 ... Communication register, 400 ... Communication register module, 410 ... Request control circuit, 420 ... Bus input control circuit, 430 ... Decoder, 440 ... Register input circuit, 450 ... register output circuit, 460 ... comparator, 470 ... reply control circuit, 480 ... selector, 490 ... bus output control circuit, 500 ... communication register group, 510 ... lock information register (second register), 520 ... lock status register (First register), 700 ... communication module bus, L70 ... signal line, L80 ... signal line, L90 ... signal line.

Claims

A plurality of processors, a main memory shared by the processors, an interconnection network for coupling the processors and the main memory, a plurality of communication register modules corresponding to each of the plurality of processors, and a plurality of the communications Each of the communication register modules is a multiprocessor system having a plurality of communication registers,
The communication register includes a first register that holds a lock / unlock state and a second register that holds lock / unlock information.
The first register holding the locked / unlocked state holds the same contents as the corresponding register in the different communication register module;
The second register holding lock / unlock information holds individual contents for each of the communication register modules,
In response to a lock processing request from the processor, the request is broadcast to all the communication register modules via the interface, and the lock / unlock state in the communication register in the communication register module corresponding to the processor is broadcast. The success / failure of the lock is reported to the processor with reference to the value of the first register that holds the first register, and the lock / unlock state in all the communication register modules is held when the lock is possible. A multiprocessor system in which the same lock flag is set in a register, and the process number in the processor that issued a lock processing request is set in the second register that simultaneously holds corresponding lock / unlock information .

2. The multiprocessor system according to claim 1, wherein, when a lock processing is executed, when an instruction for operating a lock flag existing in a conventional processor architecture is issued, the lock flag is on the main memory and on the communication register. A multiprocessor system characterized by switching an instruction operation depending on a case.

3. The multiprocessor system according to claim 2 , wherein when the instruction accesses the second register holding the lock / unlock information, the first register holding the lock / unlock state is simultaneously accessed. A multiprocessor system characterized by: