JP6495208B2

JP6495208B2 - Method and apparatus for a general purpose multi-core system for implementing stream-based operations

Info

Publication number: JP6495208B2
Application number: JP2016159234A
Authority: JP
Inventors: マスター，ポール・エル; ファーテック，フレデリック
Original assignee: スビラル・インコーポレーテッド
Priority date: 2010-01-21
Filing date: 2016-08-15
Publication date: 2019-04-03
Anticipated expiration: 2031-01-21
Also published as: US20110179252A1; US10073700B2; US8843928B2; KR20130009746A; KR101814221B1; JP2017004550A; WO2011091323A1; US20190004813A1; JP2013518327A; EP2526494A1; US20150012725A1; EP2526494A4; US11055103B2; JP5990466B2; EP2526494B1

Description

関連出願の相互参照
本願は、２０１０年１月２１に出願された米国仮特許出願整理番号第６１／２９７，１３９号の優先権を主張するものである。本願は、現在は米国特許第６，８３６，８３９号となった２００１年３月２２日に出願された「ＡＤＡＰＴＩＶＥＩＮＴＥＧＲＡＴＥＤ
ＣＩＲＣＵＩＴＲＹＷＩＴＨＨＥＴＥＲＯＧＥＮＥＯＵＳＡＮＤＲＥＣＯＮＦＩＧＵＲＡＢＬＥＭＡＴＲＩＣＥＳＯＦＤＩＶＥＲＳＥＡＮＤＡＤＡＰＴＩＶＥＣＯＭＰＵＴＡＴＩＯＮＡＬＵＮＩＴＳＨＡＶＩＮＧＦＩＸＥＤ，ＡＰＰＬＩＣＡＴＩＯＮＳＰＥＣＩＦＩＣＣＯＭＰＵＴＡＴＩＯＮＡＬＥＬＥＭＥＮＴＳ」を発明の名称とする米国特許出願整理番号第０９／８１５，１２２号、現在は米国特許第７，３２５，１２３号となった「ＨＩＥＲＡＲＣＨＩＣＡＬＩＮＴＥＲＣＯＮＮＥＣＴ
ＦＯＲＣＯＮＦＩＧＵＲＩＮＧＳＥＰＡＲＡＴＥＩＮＴＥＲＣＯＮＮＥＣＴＳＦＯＲＥＡＣＨＧＲＯＵＰＯＦＦＩＸＥＤＡＮＤＤＩＶＥＲＳＥＣＯＭＰＵＴＡＴＩＯＮＡＬＥＬＥＭＥＮＴＳ」を発明の名称とする米国特許出願整理番号台１０／３８４，４８６号、および現在は米国特許第７，６０９，２９７号となった「ＨＡＲＤＷＡＲＥＴＡＳＫＭＡＮＡＧＥＲ」を発明の名称とする米国特許出願整理番号第１０／４４３，５０１号に関する。これらの特許出願の全部は参照することにより本明細書に援用される。 This application claims the benefit of US Provisional Patent Application Ser. No. 61 / 297,139, filed Jan. 21, 2010. The present application is an "Adaptive Integrated Application" filed March 22, 2001, now U.S. Patent No. 6,836,839.
No. 09 / 815,122, now U.S. Patent Nos. 09 / 815,122, entitled "CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS""HIERARCHICAL INTERCONNECT
U.S. Patent Application Serial No. 10 / 384,486, whose title is the invention, and is now U.S. Patent No. 7,609,297, entitled "HARDWARE." No. 10 / 443,501, entitled "TASK MANAGER". All of these patent applications are incorporated herein by reference.

本発明は、全般的には、複数プロセッサシステムをプログラムすることに関し、さらに詳細には、ストリームおよびスレッドの両方を組み込む並列プログラミング言語の構文を効果的に利用するハードウェアタスクマネージャに関するものである。 The present invention relates generally to programming multi-processor systems, and more particularly to a hardware task manager that effectively exploits the syntax of parallel programming languages that incorporate both streams and threads.

一般に、デジタルシステムにおける処理性能に制限を加えるものは、デジタルシステム内の異なる構成要素およびサブシステムの間で命令、データ、および他の情報を伝送する際の効率および速度となっている。例えば、汎用フォンノイマン型アーキテクチャにおけるバス転送速度がプロセッサとメモリとの間のデータ転送速度を支配し、その結果として、演算性能（例えば、１００万命令／秒（ＭＩＰＳ：ｍｉｌｌｉｏｎｉｎｓｔｒｕｃｔｉｏｎｓｐｅｒｓｅｃｏｎｄ）、浮動小数点演算回数／秒（ＦＬＯＰＳ：ｆｌｏａｔｉｎｇ−ｐｏｉｎｔｏｐｅｒａｔｉｏｎｓｐｅｒｓｅｃｏｎｄ）、その他）に限界が生じてしまうこととなる。 In general, what limits processing performance in digital systems is the efficiency and speed at which instructions, data, and other information may be transmitted between different components and subsystems within the digital system. For example, the bus transfer rate in a general-purpose von Neumann-type architecture governs the data transfer rate between the processor and memory, resulting in arithmetic performance (eg, million instructions per second (MIPS), floating) A limit will occur in the number of decimal point operations per second (FLOPS: floating-point operations per second, etc.).

例えばマルチプロセッサ設計または並列プロセッサ設計等の他の型のコンピュータアーキテクチャ設計においては、異なるプロセッサのそれぞれが他のプロセッサ、複数のメモリ装置、入出力（Ｉ／Ｏ）ポート、その他と通信することが可能となるよう、複雑な通信能力、すなわち相互接続能力が必要となる。今日、プロセッサシステム設計が複雑化されるとともに、効果的かつ高速な相互接続機構の重要度が飛躍的に高まった。 In other types of computer architecture designs, such as multiprocessor designs or parallel processor designs, each of the different processors can communicate with other processors, multiple memory devices, input / output (I / O) ports, etc. In order to achieve this, complex communication capabilities, ie interconnection capabilities, are required. Today, with the complexity of processor system design, the importance of effective and fast interconnects has dramatically increased.

しかし、速度、設計の柔軟性、および簡易性の目的を最適化する係る機構を設計することは困難である。 However, it is difficult to design such a mechanism that optimizes the goals of speed, design flexibility, and simplicity.

現在のところ、並列プログラミングは、スレッドを演算の中心的・統率的原理としてスレッドに基づいて行われている。しかし、スレッドは演算モデルとしては顕著な欠点を有する。なぜなら、スレッドは極めて非決定的であり、係る非決定性を抑えて決定的な目的を達成するには、プログラミングスタイルに依存することになるからである。テストおよ
び検証は、この甚だしい非決定性が存在すると困難なものとなる。ＧＰＵ（グラフィックス処理ユニット）ベンダーにより提案されてきた１つのソリューションは、プログラミングモデルにおいて表現可能な並列性の形態を、狭めることである。しかし、データ並列性に関するＧＰＵベンダーの焦点は、プログラマたちの手を拘束し、マルチコアプロセッサの全潜在能力を利用する機会を妨げてしまうものである。 At present, parallel programming is performed based on threads as a central and strategic principle of operation. However, threads have significant drawbacks as a computational model. This is because threads are extremely non-deterministic, and rely on programming styles to reduce such non-determinism and achieve critical goals. Testing and verification becomes difficult when this tremendous nondeterminism exists. One solution that has been proposed by GPU (Graphics Processing Unit) vendors is to narrow the form of parallelism that can be represented in a programming model. However, the focus of GPU vendors on data parallelism is to constrain the hands of programmers and prevent them from taking advantage of the full potential of multi-core processors.

さらに、スレッドは同一コアのバンク上で実行されるとは限らない。現代のコンピュータ（スーパーコンピュータ、ワークステーション、デスクトップ、およびラップトップ）は、異なる異種コアの困惑的な配列を含み、それらすべてが、プログラムするにあたり、別個のプログラミングモデルを要求する。例えば、マザーボードは１個から４個の主要なＣＰＵ（中央処理装置、例えばＰｅｎｔｉｕｍプロセッサ）を有し、各ＣＰＵは、オンダイまたはオンパッケージのＧＰＵ（グラフィックス処理ユニット、例えばＮＶＩＤＩＡのＧＰＵ）とともに、１個から６個のオンダイのＣＰＵコアを有し、ＧＰＵ自体が、いくつかの、別個のビデオおよびオーディオ・エンコードおよびデコード・コア（複数のビデオ規格、例えばＭＰＥＧ２、ＭＰＥＧ４、ＶＣ−１、Ｈ．２６４その他をエンコードおよびデコードするための）とともに、１６個から２５６個のＧＰＵコアを備える。マザーボード上にも、１個から４個の別個のハイエンド、および設定可能な（コアが、様々なあらかじめ存在する規格をエンコード／デコードするために選択され得ることを意味する）ビデオ／オーディオ・エンコードおよびデコード・コア（複数のビデオ規格、例えば、解像度が高く複数の音響チャンネルを有するＭＰＥＧ２、ＭＰＥＧ４、ＶＣ−１、およびＨ．２６４）も存在する。プロセシングコアからなる追加的なサブシステムが、通信コアの形で、マザーボードに追加される（例えば、ＴＣＰ／ＩＰ機能を肩代わりするためのコア。これらのコアは、典型的には１つまたは複数のＣＰＵコアおよび１つまたは複数のパケットプロセシングコアから作られる。１つまたは複数のブロードバンド／ベースバンドプロセシングコアから作られたＷｉＦｉコア、ＢｌｕｅＴｏｏｔｈコア、ＷｉＭａｘコア、３Ｇコア、４Ｇコア）。 Furthermore, threads are not necessarily executed on banks of the same core. Modern computers (supercomputers, workstations, desktops, and laptops) contain puzzled arrangements of different heterogeneous cores, all of which require separate programming models to program. For example, the motherboard has 1 to 4 main CPUs (central processing unit, eg Pentium processor), each CPU with an on-die or on-package GPU (graphics processing unit, eg NVIDIA's GPU) 1 With six to six on-die CPU cores, the GPU itself has several separate video and audio encoding and decoding cores (multiple video standards such as MPEG2, MPEG4, VC-1, H.264 And 16 to 256 GPU cores, as well as others. Also on the motherboard, 1 to 4 separate high-end, and configurable (meaning that the core can be selected to encode / decode various pre-existing standards) video / audio encoding and There is also a decoding core (multiple video standards such as MPEG2, MPEG4, VC-1, and H.264 with high resolution and multiple audio channels). Additional subsystems of processing cores are added to the motherboard in the form of communication cores (e.g. cores to take over TCP / IP functions. These cores are typically one or more cores. CPU core and one or more packet processing cores: WiFi core, BlueTooth core, WiMax core, 3G core, 4G core) made from one or more broadband / baseband processing cores.

スーパーコンピュータ等の、現代のハイエンドのスペクトル装置においては、１個のマザーボードに対して１個または４個のＦＰＧＡ（フィールド・プログラマブル・ゲートアレイ：ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）が追加される。各ＦＰＧＡは、複数のハードＩＰまたはソフトＩＰのＣＰＵコアおよび複数のＤＳＰコアとともに、１０万個から１０００万個の非常に簡単なＣＬＢプロセシングコアからなる）。次に、これらのマザーボード自体が１００個から１０００個単位で複写および相互接続されて、現代のスーパーコンピュータが作られる。次いで、これらのシステム（デスクトップ／ワークステーション／ラップトップおよび／またはスーパーコンピュータ）がインターネットを介して相互接続されることにより、全国的なまたはグローバルな演算能力が提供されることとなる。 In modern high-end spectrum devices such as supercomputers, one or four FPGAs (field programmable gate arrays) are added to one motherboard. Each FPGA consists of 100,000 to 10,000,000 very simple CLB processing cores, with multiple hard IP or soft IP CPU cores and multiple DSP cores). Next, these motherboards are themselves copied and interconnected in 100 to 1000 units to make a modern super computer. These systems (desktops / workstations / laptops and / or supercomputers) are then interconnected via the Internet to provide national or global computing capabilities.

係る多能な一連のコアを「管理」および「プログラミング」することは、極めて困難である。大多数のプログラマは、このような労を取ろうとすることさえなく、他の複数のコアに関しては無視して、１つのコアのみについてプログラミングすることに落ち着く。当該技術分野においては、「自明な並列性の問題（ｅｍｂａｒｒａｓｓｉｎｇｌｙｐａｒａｌｌｅｌｐｒｏｂｌｅｍｓ）」（例えば、グーグル検索アルゴリズムは、並列スレッド間において双方向性がほとんど存在しないかまたはまったく存在しないという事実により、複数のＣＰＵ間に分配することが容易である）として知られる一定数のアルゴリズムが存在する。しかしながら大部分の問題はこれらの特性を有さず、高程度の双方向性および同期化が複数のスレッド間で要求される。 "Managing" and "programming" such a versatile set of cores is extremely difficult. Most programmers do not even try to do this and settle for programming on only one core, ignoring the other cores. In the art, the "embarrassively parallel problems" (e.g., the Google search algorithm has multiple CPUs due to the fact that there is little or no interactivity between parallel threads) There is a fixed number of algorithms known as being easy to distribute between). However, most problems do not have these properties, and a high degree of interactivity and synchronization is required among multiple threads.

したがって、現代のプログラミング言語のストリームにおけるような、マルチスレッド化、無制限の並列性、および決定的挙動を組み込むことが望まれるであろう。ストリーム
は、少なくとも１９７8年におけるＣプログラム言語の導入時にまでさかのぼり、Ｃ＋＋
、Ｊａｖａ（登録商標）、ＶｉｓｕａｌＢａｓｉｃ、およびＦ＃等の言語に組み込まれてきた。しかし、これらの言語において、ストリームは、Ｉ／Ｏおよびファイルアクセスに対するフレームワーク等の、むしろ狭い役割が委ねられている。したがって、並列プログラミングにおけるストリームの役割を、第１クラスのオブジェクト、すなわち変数の地位にほぼ匹敵する地位へと拡張することが望まれる。 Thus, it would be desirable to incorporate multi-threading, unlimited parallelism, and deterministic behavior, as in modern programming language streams. The stream dates back to at least the introduction of the C programming language in 1978, C ++
Have been incorporated into languages such as Java, Visual Basic, and F #. However, in these languages, streams are left with rather narrow roles, such as a framework for I / O and file access. Therefore, it is desirable to extend the role of streams in parallel programming to a class that is roughly comparable to the class of objects, ie, the class of variables.

１つの例によれば、プログラム可能なコアに基づく演算デバイスが開示される。係る演算デバイスは、相互に接続された複数のプロセシングコアを備える。メモリは、ストリームデスティネーションモジュールおよびストリームソースモジュールを定義するストリームを含むストリームドメインコードを記憶する。ストリームソースモジュールはデータ値をストリームに代入し、ストリームは、ストリームソースモジュールからストリームデスティネーションモジュールへとデータ値を伝える。ランタイムシステムは、いつデータ値がストリームデスティネーションモジュールに対して利用可能となるかを検出し、複数のプロセシングコアのうちの１つ上で実行されるようストリームデスティネーションモジュールをスケジュールする。 According to one example, a programmable core based computing device is disclosed. The computing device comprises a plurality of processing cores connected to one another. The memory stores stream domain code including a stream that defines a stream destination module and a stream source module. The stream source module assigns data values to the stream, and the stream communicates data values from the stream source module to the stream destination module. The runtime system detects when data values are available to the stream destination module and schedules the stream destination module to be run on one of a plurality of processing cores.

本発明の追加的な態様は、図面を参照してなされる様々な実施形態の詳細な説明を鑑みると、当業者に明らかとなるであろう。なお、図面の簡単な説明は以下で提供される。 Additional aspects of the present invention will be apparent to those skilled in the art in view of the detailed description of various embodiments made with reference to the drawings. A brief description of the drawings is provided below.

開示されたストリームに基づくプログラミングモデルと互換性を有する適応的演算エンジンの概略図である。FIG. 7 is a schematic diagram of an adaptive arithmetic engine compatible with the disclosed stream-based programming model. プログラミングモデルと互換性を有する適応的演算機械のブロック図である。FIG. 2 is a block diagram of an adaptive computing machine compatible with a programming model. 図２に示す適応的演算機械のネットワークにおけるネットワークワードを示す図である。FIG. 3 is a diagram showing a network word in the network of adaptive computing machines shown in FIG. 2; 図１のＡＣＥアーキテクチャまたは図２のＡＣＭアーキテクチャにおける、異質ノードと同種ネットワークとの間のノードラッパーインターフェースを示す図である。FIG. 3 illustrates a node wrapper interface between foreign nodes and a homogeneous network in the ACE architecture of FIG. 1 or the ACM architecture of FIG. 2; 図４におけるノードラッパーに使用されるハードウェアタスクマネージャの基本的構成品を示す図である。FIG. 5 illustrates the basic components of the hardware task manager used for the node wrapper in FIG. 4; 図２に示すＡＣＭアーキテクチャにおいてデータを流すために使用されるポイント・トゥ・ポイント・チャンネルを示す図である。FIG. 3 illustrates point-to-point channels used to stream data in the ACM architecture shown in FIG. 2; 図６におけるポイント・トゥ・ポイント・チャンネルにより使用されるポイント・トゥ・ポイント・ネットワークワードを示す図である。FIG. 7 is a diagram illustrating a point to point network word used by the point to point channel in FIG. 6; 異なるストリームの流れに対するノードに関するモジュールを示す図である。FIG. 6 shows modules for nodes for different stream flows. 異なるストリームの流れに対するノードに関するモジュールを示す図である。FIG. 6 shows modules for nodes for different stream flows. 異なるストリームの流れに対するノードに関するモジュールを示す図である。FIG. 6 shows modules for nodes for different stream flows. 異なるストリームの流れに対するノードに関するモジュールを示す図である。FIG. 6 shows modules for nodes for different stream flows. ストリームへの値の代入を示す図である。FIG. 7 is a diagram illustrating substitution of values into streams. ストリームへの値の代入を示す図である。FIG. 7 is a diagram illustrating substitution of values into streams. モジュール概念およびストリーム概念を用いてモデル化され得る５タップＦＩＲフィルタを示す図である。FIG. 5 illustrates a 5-tap FIR filter that may be modeled using modular and stream concepts. 異なる構成のＦＩＦＯを有するモジュールを示す図である。FIG. 6 shows a module with FIFOs of different configurations. 異なる構成のＦＩＦＯを有するモジュールを示す図である。FIG. 6 shows a module with FIFOs of different configurations. 異なる構成のＦＩＦＯを有するモジュールを示す図である。FIG. 6 shows a module with FIFOs of different configurations. プログラム言語例において使用されるスレッドのフローチャートである。FIG. 7 is a flowchart of a thread used in the programming language example. プログラム言語例の結合演算の形態を示す図である。FIG. 2 illustrates a form of a combined operation of the example programming language. プログラム言語例の結合演算の形態を示す図である。FIG. 2 illustrates a form of a combined operation of the example programming language.

適応的演算エンジンおよび適応的演算機械 Adaptive arithmetic engine and adaptive arithmetic machine

図１は、１つの演算モデル例を使用するマルチプロセッサシステムの１例を示すブロック図である。本明細書において適応的演算エンジン（ＡＣＥ）１００と称される装置１００は、好適には、集積回路として、または他の追加的な構成品を有する集積回路の１部分として具体化される。模範的な実施形態において、および、以下でより詳細に論じられるように、ＡＣＥ１００は、図示したマトリックス１５０Ａ〜マトリックス１５０Ｎ等の１つまたは複数の再設定可能なマトリックス（またはノード）１５０と、マトリックス相互接続ネットワーク１１０とを備える。模範的な実施形態において、および以下でより詳細に論じられるように、マトリックス１５０Ａおよびマトリックス１５０Ｂ等の１つまたは複数のマトリックス１５０は、コントローラ１２０として機能するよう構成され、一方、マトリックス１５０Ｃおよびマトリックス１５０Ｄ等の他のマトリックスは、メモリ１４０として機能するよう構成される。様々なマトリックス１５０およびマトリックス相互接続ネットワーク１１０は、フラクタルサブユニットとして、ともに実装され得、このフラクタルサブユニットは数個から１０００個のノードの規模となり得る。 FIG. 1 is a block diagram showing an example of a multiprocessor system using one operation model example. Apparatus 100, referred to herein as adaptive arithmetic engine (ACE) 100, is preferably embodied as an integrated circuit or as part of an integrated circuit with other additional components. In the exemplary embodiment, and as discussed in more detail below, the ACE 100 may be configured to communicate with one another with one or more reconfigurable matrices (or nodes) 150, such as the illustrated matrix 150A-matrix 150N. And a connection network 110. In the exemplary embodiment, and as discussed in more detail below, one or more matrices 150, such as matrix 150A and matrix 150B, are configured to function as controller 120, while matrix 150C and matrix 150D. , Etc. are configured to function as memory 140. The various matrices 150 and the matrix interconnect network 110 may be implemented together as fractal subunits, which may be on the order of a few to a thousand nodes.

好適な実施形態において、ＡＣＥ１００は、再設定可能マトリックス１５０、コントローラ１２０、およびメモリ１４０間における信号発信および他の伝送のために、または他の入力／出力（「Ｉ／Ｏ」）機能のために、従来の（典型的な別個の）データ、ＤＭＡ、ランダムアクセス、構成、および命令バスを利用しない。むしろ、データ、制御、および設定の情報は、マトリックス相互接続ネットワーク１１０を利用して、これらのマトリックス１５０間で伝送される。なお、このマトリックス相互接続ネットワーク１１０は、コントローラ１２０およびメモリ１４０として設定されたこれらのマトリックス１５０を含む再設定可能なマトリックス１５０間における任意の所与の接続を提供するよう、リアルタイムで設定および再設定が可能である。 In the preferred embodiment, ACE 100 is for signaling and other transmissions between reconfigurable matrix 150, controller 120, and memory 140, or for other input / output ("I / O") functions. It does not utilize conventional (typical separate) data, DMA, random access, configuration, and instruction buses. Rather, data, control, and configuration information is transmitted between these matrices 150 utilizing matrix interconnection network 110. Note that this matrix interconnection network 110 is configured and reconfigured in real time to provide any given connection between the reconfigurable matrix 150 including the controller 120 and these matrices 150 configured as the memory 140. Is possible.

メモリ１４０として機能するよう設定されたマトリックス１５０は、固定メモリ要素の演算要素（以下で説明する）を利用して任意の所望の方法または模範的な方法で実装されてもよく、ＡＣＥ１００内に含まれるか、もしくは他のＩＣ内あるいはＩＣの１部分の内部に組み込まれてもよい。模範的な実施形態において、メモリ１４０はＡＣＥ１００の内部に含まれ、好適には、低電力消費型ランダムアクセスメモリ（ＲＡＭ：ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）である演算要素から構成され得るが、フラッシュ、ＤＲＡＭ、ＳＲＡＭ、ＭＲＡＭ、ＲＯＭ、ＥＰＲＯＭ、またはＥ２ＰＲＯＭ等の他の任意の形態のメモリの演算要素から構成されてもよい。模範的な実施形態において、メモリ１４０は好適には、特には図示しないダイレクトメモリアクセス（ＤＭＡ：ｄｉｒｅｃｔｍｅｍｏｒｙａｃｃｅｓｓ）エンジンを含む。 Matrix 150 configured to function as memory 140 may be implemented in ACE 100 in any desired or exemplary manner utilizing computing elements (described below) of fixed memory elements. Or may be incorporated within another IC or within a portion of an IC. In the exemplary embodiment, memory 140 is contained within ACE 100 and may preferably be comprised of computing elements that are low power consuming random access memory (RAM), but may be flash, DRAM, SRAM It may be composed of computing elements of any other form of memory such as MRAM, ROM, EPROM, or E2PROM. In the exemplary embodiment, memory 140 preferably includes a direct memory access (DMA) engine, not specifically shown.

コントローラ１２０は、好適には、適応的有限状態機械（ＦＳＭ：ｆｉｎｉｔｅｓｔａｔｅｍａｃｈｉｎｅ）、縮小命令セット（「ＲＩＳＣ：ｒｅｄｕｃｅｄｉｎｓｔｒｕｃｔｉｏｎｓｅｔ」）プロセッサ、以下に説明する２種類の機能を実行する能力を有するコントローラまたは他の装置もしくはＩＣとして設定されたマトリックス１５０Ａおよびマトリックス１５０Ｂを用いて実装される。代替的には、これらの機能は従来のＲＳ
Ｃまたは他のプロセッサを利用して実装され得る。「カーネル」制御と称される第１制御機能は、マトリックス１５０Ａのカーネルコントローラ（「ＫＡＲＣ」）として図示され、「マトリックス」制御と称される第２制御機能は、マトリックス１５０Ｂのマトリックスコントローラ（「ＭＡＲＣ」）として図示される。コントローラ１２０のカーネルおよびマトリックスの制御機能は、様々なマトリックス１５０の設定可能性および再設定可能性を参照して、および本明細書において「シルバーウェア」モジュールと称される結合されたデータ、設定、および制御情報の模範的な形態を参照して、以下で詳細に説明される。 The controller 120 is preferably an adaptive finite state machine (FSM), a reduced instruction set ("RISC") processor, a controller having the ability to perform the two types of functions described below. Or, it is implemented using the matrix 150A and the matrix 150B set as another device or IC. Alternatively, these functions are conventional RS
It may be implemented using C or another processor. The first control function, referred to as "kernel" control, is illustrated as a kernel controller ("KARC") of matrix 150A, and the second control function, referred to as "matrix" control, is a matrix controller ("MARC") of matrix 150B. ")). The control functions of the controller 120 kernel and matrix refer to the configurability and reconfigurability of the various matrices 150, and combined data, settings, referred to herein as "silverware" modules, And will be described in detail below with reference to exemplary forms of control information.

図１のマトリックス相互接続ネットワーク１１０は、サブセットの相互接続ネットワーク（図示せず）を備える。これらの相互接続ネットワークは、ブール相互接続ネットワーク、データ相互接続ネットワーク、および本明細書において集合的および一般的に「相互連結（ｉｎｔｅｒｃｏｎｎｅｃｔ）」、「相互接続（ｉｎｔｅｒｃｏｎｎｅｃｔｉｏｎ）（単数または複数）」、「相互接続ネットワーク（ｉｎｔｅｒｃｏｎｎｅｃｔｉｏｎｎｅｔｗｏｒｋ）（単数または複数）」、または「ネットワーク」と称される他のネットワークまたは相互接続方式を含み、当該技術分野において一般に知られるように、ＦＰＧＡ相互接続ネットワークまたはスイッチングファブリックを利用する等のさらに多様な様式で実装され得る。模範的な実施形態において、様々な相互接続ネットワークは、例えば米国特許第５，２１８，２４０号、米国特許第５，３３６，９５０号、米国特許第５，２４５，２２７号、および米国特許第５，１４４，１６６号において説明されるように実装される。これらの様々な相互接続ネットワークは、コントローラ１２０、メモリ１４０、様々なマトリックス１５０、演算ユニット（または「ノード」）、および演算要素の間で選択可能（スイッチング可能）な接続を提供し、それにより、本明細書で一般に「設定情報」と称される設定信号に応答して、またはその制御下で、本明細書で説明される設定および再設定を行うための物理的基盤が提供される。加えて、様々な相互接続ネットワーク（１１０、２１０、２４０、および２２０）は、任意形態の従来のまたは個別の入力／出力バス、データバス、ＤＭＡ、ＲＡＭ、設定および命令バスに代わって、コントローラ１２０、メモリ１４０、様々なマトリックス１５０、および、演算ユニット、構成品、および要素間における、選択可能なまたはスイッチング可能な、データ、入力、出力、制御、および設定経路を提供する。 The matrix interconnect network 110 of FIG. 1 comprises a subset interconnect network (not shown). These interconnection networks may be Boolean interconnection networks, data interconnection networks, and collectively and generally herein "interconnect", "interconnection (s)", " An FPGA interconnection network or switching fabric, as is commonly known in the art, including other networks or interconnection schemes referred to as "interconnection network (s)" or "networks" It can be implemented in various ways, such as using. In an exemplary embodiment, the various interconnect networks may be configured, for example, as in US Pat. No. 5,218,240, US Pat. No. 5,336,950, US Pat. No. 5,245,227, and US Pat. , As described in U.S. Pat. These various interconnection networks provide selectable (switchable) connections between controller 120, memory 140, various matrices 150, computing units (or "nodes"), and computing elements, thereby A physical basis is provided for performing the configuration and reconfiguration described herein in response to or under the control of configuration signals generally referred to herein as "configuration information." In addition, the various interconnect networks (110, 210, 240 and 220) can be used to replace the controller 120 in place of any form of conventional or separate input / output bus, data bus, DMA, RAM, configuration and instruction bus. , Memory 140, various matrices 150, and selectable, switchable data, input, output, control, and setup paths among computing units, components, and elements.

しかし、様々な相互接続ネットワークの、または様々な相互接続ネットワーク内における、スイッチングまたは選択の操作は当該技術分野において周知の方法で実装され得るが、本発明に係る相互接続ネットワークの設計およびレイアウトが以下に詳細に説明するように新しく新規であることは指摘されるべきである。例えば、様々なレベルの相互接続が、様々なレベルのマトリックス、演算ユニット、および要素に対応して提供される。マトリックス１５０のレベルにおいては、従来技術のＦＰＧＡ相互接続と比較すると、所与のエリアにおける接続能力がより低いマトリックス相互接続ネットワーク１１０は、より制限され且つ「豊富さ」がより低く、そのために、キャパシタンスが低減され、動作スピードが増加する。しかし、特定のマトリックスまたは演算ユニット内においては、相互接続ネットワークは、顕著に濃度が高く豊富であるため、狭いまたは近接した参照の局所性においては、より高い適応および再設定の能力を有し得る。 However, although switching or selection operations of various interconnection networks or within various interconnection networks may be implemented in a manner well known in the art, the design and layout of the interconnection network according to the present invention are as follows. It should be pointed out that it is new and new as described in detail in. For example, different levels of interconnection are provided corresponding to different levels of matrices, computing units, and elements. At the level of matrix 150, matrix interconnect network 110 with lower connectivity in a given area compared to prior art FPGA interconnects is more restricted and has a lower "richness", so that capacitance Is reduced and the operating speed is increased. However, within a particular matrix or computing unit, the interconnect network may have higher adaptation and reconfigurability capabilities in narrow or close reference locality because it is significantly denser and richer .

様々なマトリックスまたはノード１５０は、再設定可能および異種である。すなわち、一般に、所望の設定に応じて、再設定可能なマトリックス１５０Ａは、一般に、再設定可能なマトリックス１５０Ｂからマトリックス１５０Ｎとは異なり、再設定可能なマトリックス１５０Ｂは、一般に、再設定可能なマトリックス１５０Ａおよびマトリックス１５０Ｃからマトリックス１５０Ｎとは異なり、再設定可能なマトリックス１５０Ｃは、一般に、再設定可能なマトリックス１５０Ａ、マトリックス１５０Ｂ、マトリックス１５０Ｄからマトリックス１５０Ｎとは異なり、以下同様に異なる。様々な再設定可能なマトリック
ス１５０はそれぞれ、一般に、異なる、または多様な混合の、適応的且つ再設定可能なノードまたは演算ユニットを含み、次いでノードは、一般に、異なった、または多様な混合の、様々な方法で適応的に接続、設定、および再設定され得ることにより、様々な相互接続ネットワークを通して様々な機能を実行する、固定の、特定用途向けの演算構成品および要素を含む。多様な内部設定および再設定に加えて、様々なマトリックス１５０は、マトリックス相互接続ネットワーク１１０を通して他のマトリックス１５０のそれぞれに対して、高レベルで接続、設定、および再設定され得る。ＡＣＥアーキテクチャの詳細は、上記で参照した関連する特許出願において見出され得る。 The various matrices or nodes 150 are reconfigurable and disparate. That is, generally, depending on the desired settings, reconfigurable matrix 150A is generally different from reconfigurable matrix 150B to matrix 150N, and reconfigurable matrix 150B is generally configured to be reconfigurable matrix 150A. And unlike matrix 150C to matrix 150N, resettable matrix 150C generally differs from resettable matrix 150A, matrix 150B, matrix 150D to matrix 150N, and so on. The various reconfigurable matrices 150 generally each contain different or various mixed, adaptive and reconfigurable nodes or computing units, and then the nodes are generally different or mixed, It can be adaptively connected, configured and reconfigured in various ways, including fixed, application specific computing components and elements that perform different functions through different interconnection networks. In addition to the various internal settings and reconfigurations, the various matrices 150 can be connected, configured, and reconfigured at a high level to each of the other matrices 150 through the matrix interconnection network 110. Details of the ACE architecture can be found in the related patent applications referenced above.

並列演算モデルを使用し得る適応的演算機械１６０の他の例が図２に示される。この例における適応的演算機械１６０は、ネットワーク１６２を介してともに接続された３２個の異種リーフノード１８０を有する。ネットワーク１６２は、１群のネットワーク入力ポート１６４、１群のネットワーク出力ポート１６８、省略可能なシステムインターフェースポート１７０、外部メモリインターフェース１７２、および内部メモリインターフェース１７４に接続された単一のルート１６４を有する。スーパーバイザーノードまたはＫノード１７８もルート１６４に接続される。 Another example of an adaptive computing machine 160 that may use a parallel computing model is shown in FIG. The adaptive computing machine 160 in this example has 32 heterogeneous leaf nodes 180 connected together via a network 162. The network 162 has a group of network input ports 164, a group of network output ports 168, an optional system interface port 170, an external memory interface 172, and a single route 164 connected to an internal memory interface 174. Supervisor node or K node 178 is also connected to route 164.

ノード１８０は、それぞれが、４進木１８２等の４進木にグループ化される。４進木１８２等の４進木は、それぞれが単一の親ノードおよび最大４個の子ノード１８０に接続された、５ポートのスイッチ要素１８４を用いて実装される。このスイッチ要素は、公正なラウンドロビン調停方式を実装し、性能増強のためにマルチレベルの先読みを有するパイプラインを提供する。この例において、全経路の幅は一定（５１ビット）であるが、ネットワークのバンド幅を増強するために、Ｌｅｉｓｅｒｓｏｎのファットツリーのスタイルで、木が上昇するにつれて経路を広げるためのオプションが利用可能である。 Nodes 180 are each grouped into quaternary trees, such as quaternary tree 182. Quaternary trees, such as quaternary tree 182, are implemented using 5-port switch elements 184, each connected to a single parent node and up to four child nodes 180. This switch element implements a fair round robin arbitration scheme and provides a pipeline with multi-level lookahead for performance enhancement. In this example, the width of the entire path is constant (51 bits), but in the style of Faters tree in Leiserson, an option is available to extend the path as the tree climbs, to increase the bandwidth of the network It is.

この例において、ネットワーク１６２上のすべてのトラフィックは、図３のネットワークワード１８８に示すように、５１ビットネットワークワードの形態である。ネットワークワード１８８は、ルートフィールド１９０、セキュリティビット１９２、サービスフィールド１９４、補助フィールド１９６、およびペイロードフィールド１９８を有する。ルートフィールド１９０は、ネットワークワード１８８の宛先アドレスである。ルートフィールド１９０における上位２ビットはチップＩＤである。セキュリティビット１９２は、設定メモリに対するピーク（読み出し）およびポーク（書き込み）を可能にする。セキュリティビット１９２は、Ｋノード１７８により送信されるワードに対してのみ設定される。サービスフィールド１９４はサービス種類を定義し、補助フィールド１９６はサービス種類に依存する。サービスフィールド１９４はポイント・トゥ・ポイント（ＰＴＰ）を含む１６のサービス種類のうちの１つを定義する。なお、このＰＴＰは、データおよびＰＴＰアクノレッジメントを流すことに関し、ＰＴＰアクノレッジメントは、データに対するフロー制御をサポートし、宛先ノードにおける消費側カウントまたは製作側カウントをインクリメントまたはデクリメントさせるものである。
ノードラッパー In this example, all traffic on network 162 is in the form of a 51-bit network word, as shown in network word 188 of FIG. Network word 188 includes route field 190, security bits 192, service field 194, auxiliary field 196, and payload field 198. The route field 190 is the destination address of the network word 188. The upper two bits in the route field 190 are chip IDs. Security bit 192 allows for peak (read) and poke (write) to configuration memory. Security bit 192 is set only for the word sent by K-node 178. The service field 194 defines the service type and the auxiliary field 196 depends on the service type. The service field 194 defines one of sixteen service types, including point-to-point (PTP). Note that this PTP relates to streaming data and PTP acknowledgment, and PTP acknowledgment supports flow control for data, and increments or decrements the consumer count or producer count at the destination node.
Node wrapper

図４は、図１のＡＣＥアーキテクチャまたは図２のＡＣＭアーキテクチャにおける異質ノードと同種ネットワークとの間のインターフェースを示す。このインターフェースは、各ノードに対して共通の入力および出力の機構を提供するために用いられるため、「ノードラッパー」と称される。ノードの実行ユニットおよびメモリは、ノードラッパーを介して、ネットワークおよび制御ソフトウェアに対して接続され、それにより、均一で一貫したシステムレベルのプログラミングモデルが提供される。この例において、ノード１８０はメモリ２１０および実行ユニット２１２を備える。ノードラッパーの詳細は、上記で参照した関連する特許出願に見出され得る。 FIG. 4 illustrates the interface between foreign nodes and a homogeneous network in the ACE architecture of FIG. 1 or the ACM architecture of FIG. This interface is referred to as a "node wrapper" because it is used to provide a common input and output mechanism for each node. The execution units and memories of nodes are connected to the network and control software via node wrappers, thereby providing a uniform and consistent system level programming model. In this example, node 180 comprises memory 210 and execution unit 212. Details of the node wrapper can be found in the related patent application referenced above.

好適な実施形態において、各ノードラッパーは、ハードウェアタスクマネージャ（ＨＴＭ：ｈａｒｄｗａｒｅｔａｓｋｍａｎａｇｅｒ）２００を備える。ノードラッパーは、データディストリビュータ２０２、省略可能なダイレクトメモリアクセス（ＤＭＡ）エンジン２０４、およびデータアグリゲータ２０６も備える。ＨＴＭは、ノードプロセッサおよびリソースの実行または使用をそれぞれ調整する。ＨＴＭは、タスクリストを処理し、実行準備完了キューを作成することにより、これを行う。ＨＴＭは、図２のＫノード１７８と称される専用ノードまたは制御ノード（図示せず）により設定および管理される。しかし他のＨＴＭ制御手法も使用され得る。 In the preferred embodiment, each node wrapper comprises a hardware task manager (HTM) 200. The node wrapper also comprises a data distributor 202, an optional direct memory access (DMA) engine 204, and a data aggregator 206. The HTM coordinates the execution or use of node processors and resources, respectively. The HTM does this by processing the task list and creating a ready to run queue. The HTM is configured and managed by a dedicated or control node (not shown), referred to as K node 178 in FIG. However, other HTM control techniques may also be used.

図４におけるノードラッパーは、ノード１８０を、その内部構造または機能に関わらず、図２における適応的演算機械１６０のすべての他のノードまたは図１における適応的演算エンジン１００と、外観において同等にする。ノードラッパーはまた、タスク管理およびネットワーク相互作用に関連する無数の活動を処理しなければならない状況から実行ユニット２１２を解放する。とりわけ、ノードラッパーは、図２のネットワークワード１８８等の受信するネットワークワードのそれぞれを１つのクロックサイクル内で適正な方法で処理する機能を担当する。 The node wrapper in FIG. 4 makes node 180 equal in appearance to all other nodes of adaptive computing machine 160 in FIG. 2 or adaptive computing engine 100 in FIG. 1 regardless of its internal structure or function. . The node wrapper also frees execution unit 212 from having to handle countless activities related to task management and network interaction. Among other things, the node wrapper is responsible for processing each of the received network words, such as network word 188 of FIG. 2, in a proper manner within one clock cycle.

図４における実行ユニット２１２は、タスク実行機能を担当する（タスクはモジュールインスタンスと同等である）。実行ユニット２１２はデジタル・シグナル・プロセッサ（ＤＳＰ：ｄｉｇｉｔａｌｓｉｇｎａｌｐｒｏｃｅｓｓｏｒ）、縮小命令セット（ＲＩＳＣ）プロセッサ、ドメイン固有プロセッサ、特定用途集積回路（ＡＳＩＣ：ａｐｐｌｉｃａｔｉｏｎ−ｓｐｅｃｉｆｉｃｉｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）、または再設定可能（ＦＰＧＡ）ファブリックを含み得る。その形態に関わらず、実行ユニット２１２は、標準的インターフェースを通してノードラッパーと相互作用する。 The execution unit 212 in FIG. 4 is in charge of the task execution function (the task is equivalent to a module instance). Execution unit 212 may be a digital signal processor (DSP), a reduced instruction set (RISC) processor, a domain-specific processor, an application-specific integrated circuit (ASIC), or a reconfigurable (FPGA) It may include a fabric. Regardless of its form, execution unit 212 interacts with the node wrapper through a standard interface.

ノードメモリ２１０は、ノードラッパーおよび実行ユニット２１２の両方に対してアクセス可能である。ノードメモリ２１０は、受信するストリーミングデータをノードラッパーが格納し且つそのデータを実行ユニット２１２がアクセスする位置にある。ノード自体のメモリ２１０は、典型的には、実行ユニット２１２が出力データを送信する位置にはない。メモリアクセスを最小限とするために、出力データは、通常は、そのデータを要求しているノード、すなわち消費側ノード（単数または複数）に直接的に送信される。ノードメモリ２１０は、タスクパラメータを記憶するためにも用いられ、一時的（作業用）記憶領域として、タスクに対して利用可能である。 Node memory 210 is accessible to both node wrapper and execution unit 212. The node memory 210 is at a position where the node wrapper stores streaming data to be received and the execution unit 212 accesses the data. The node's own memory 210 is typically not at a position where the execution unit 212 sends output data. In order to minimize memory accesses, output data is usually sent directly to the requesting node, ie the consuming node (s). The node memory 210 is also used to store task parameters, and is available to tasks as a temporary (working) storage area.

ノード１８０がストリーミングデータの消費側であると同時に作成側である図２のＡＣＭ１６０等のマルチノードシステムにおいては、作成速度と消費速度とが一致することが望ましい。１つのノード上の作成側タスクは、他のノード上の消費側タスクが対処可能であるよりも、より高いかまたはより低い速度で、データを作成し得る。作成側が、消費側が処理可能な速度よりも高い速度でデータを送信するならば、データはやがて失われる。作成側が、消費側が対処可能な速度よりも低い速度でデータを送信するならば、消費側はデータに対して飢餓状態となり、その結果、消費側は無駄に座して追加的なデータを待つことを余儀なくされ得る。 In a multi-node system such as the ACM 160 of FIG. 2 in which the node 180 is a consumer and a producer of streaming data, it is desirable that the production speed and the consumption speed match. The creator task on one node may create data at a higher or lower rate than consumer tasks on other nodes can handle it. If the producer sends data at a higher rate than the consumer can handle, the data will be lost eventually. If the producer sends data at a lower rate than the consumer can handle it, the consumer becomes starved of the data, so that the consumer is wasted waiting for additional data Can be forced.

ＡＣＭ１６０は、ポイント・トゥ・ポイント・プロトコルおよび図４のノードラッパーを介して、タスク管理、フロー制御、および負荷分散のための均一且つ一貫した機構を提供する。タスク管理により、タスクが十分な入力データを有し且つタスクにより生成されたデータを消費側ノード（単数または複数）において収容する十分なスペースが存在する場合にのみタスクが実行状態となることが確実なものとなる。フロー制御により、作成側タスクが、短すぎる時間内に多すぎるデータにより消費側タスクを圧倒しないことが保証される。負荷分散は、作成側タスクがデータをいくつかの代替的な消費側ノード間に分散
することを可能にし、その結果、作成側タスクはより高い速度で動作することが可能となる。 ACM 160 provides a uniform and consistent mechanism for task management, flow control, and load distribution via point-to-point protocol and the node wrapper of FIG. Task management ensures that the task is in the running state only if the task has sufficient input data and there is enough space to accommodate the data generated by the task at the consuming node (s). It becomes a thing. Flow control ensures that the creating task does not overwhelm the consuming task with too much data in too short a time. Load balancing allows the creating task to distribute data among several alternative consuming nodes, so that the creating task can operate at higher speeds.

ストリーミングデータは、図５に示すポイント・トゥ・ポイント・チャンネル（ポイント・トゥ・ポイント・ストリーム）２５０を介して、ノード１８０（ポイント）間で転送される。ポイント・トゥ・ポイント・チャンネル等の各ＰＴＰチャンネルは、作成側ノード２５２、作成側タスク２５４、出力ポート２５６、消費側ノード２５８、入力ポート２６０、入力バッファ２６２、および消費側タスク２６４を含む。作成側タスク２５４は、作成側ノード２５２の実行ユニット上で実行され、タスク活性化毎に有限サイズブロックのＰＴＰデータを製作する。データのブロックは、一連のＰＴＰワードとしてＰＴＰチャンネル２５０上で送信される。ブロックの送信は、図５においてタスクとして示される。作成側ノード２５２上の出力ポート２５６は、作成側タスク２５４と関連付けられる。 Streaming data is transferred between nodes 180 (point) via a point-to-point channel (point-to-point stream) 250 shown in FIG. Each PTP channel, such as a point-to-point channel, includes a producer node 252, a producer task 254, an output port 256, a consumer node 258, an input port 260, an input buffer 262, and a consumer task 264. The creator task 254 is executed on the execution unit of the creator node 252, and produces PTP data of a finite size block for each task activation. Blocks of data are transmitted on PTP channel 250 as a series of PTP words. The transmission of blocks is shown as a task in FIG. The output port 256 on the creating node 252 is associated with the creating task 254.

消費側タスク２６４は、消費側ノード２５８上の入力ポートを介して、ＰＴＰチャンネル２５２からＰＴＰデータを受信する。消費側ノード２５８のノードメモリ内の循環型入力バッファ２６２は、受信するＰＴＰデータを記憶する。消費側タスク２６４等の消費側タスクは、タスクの活性化（図５のタスク２）毎に、消費側ノード２５８の実行ユニット上で実行され、循環型入力バッファ２６２に存在する有限量のＰＴＰデータを消費する。 Consumer task 264 receives PTP data from PTP channel 252 via an input port on consumer node 258. The circular input buffer 262 in the node memory of the consumer node 258 stores the received PTP data. A consumer task such as the consumer task 264 is executed on the execution unit of the consumer node 258 for each activation of the task (task 2 in FIG. 5), and a limited amount of PTP data exists in the circular input buffer 262. To consume

データは、作成側タスク２５４が、図６に示す５０ビットポイント・トゥ・ポイント・ワード２７０を作成側ノード２５２のノードラッパーへと伝送すると、ＰＴＰチャンネル２５２上で伝えられる。ポイント・トゥ・ポイント・ワード２７０は、図３のネットワークワード１８８と同一のフィールドを有し、同じ要素／フィールドには図５の要素番号と同じ要素番号が付される。ポイント・トゥ・ポイント・ワード２７０は、ルートフィールド１９０においてノードワード２７２、補助フィールド１９６においてポートワード２７４、およびペイロードフィールド１９８においてデータワード２７６を含む。この例において、第５１番目のビット、すなわちセキュリティビット１９２は、図２のネットワーク１６２により後に追加される。図４のノードラッパー等のノードラッパーは、次いで、ＰＴＰワードを、図５における消費側ノード２５８に伝送するために、パケット交換ネットワークに渡す。ＰＴＰワード２７０の８ビットルートフィールド１９０は、図５におけるノード２５８等の消費側ノードのノードワード２７２の形でアドレスを提供する。ポートワード２７４は、消費側ノードの入力ポートのうちのどの入力ポートにデータが宛てられているかを指示する、補助フィールド１９６の下位５ビットを含む。ＰＴＰワードが消費側ノードに到達すると、ノードラッパーは、ペイロードフィールド１９８における３２ビットのデータワード１７６を、指示された入力ポートに関連付けられた循環型入力バッファに格納する。このようにして伝送が完了する。 Data is conveyed on PTP channel 252 as producer task 254 transmits the 50 bit point to point word 270 shown in FIG. 6 to the node wrapper of producer node 252. The point-to-point word 270 has the same fields as the network word 188 of FIG. 3 and the same elements / fields are given the same element numbers as the element numbers of FIG. Point-to-point word 270 includes node word 272 in root field 190, port word 274 in auxiliary field 196, and data word 276 in payload field 198. In this example, the 51st bit, the security bit 192, is added back by the network 162 of FIG. A node wrapper, such as the node wrapper of FIG. 4, then passes the PTP word to the packet switched network for transmission to the consuming node 258 in FIG. The 8-bit root field 190 of the PTP word 270 provides the address in the form of a node word 272 of the consuming node, such as node 258 in FIG. Port word 274 includes the lower five bits of auxiliary field 196 which indicate which input port of the consumer node's input port the data is destined for. When the PTP word reaches the consuming node, the node wrapper stores the 32-bit data word 176 in the payload field 198 in a circular input buffer associated with the indicated input port. Thus, transmission is completed.

ＡＣＭ１６０は、タスク管理、フロー制御、および負荷分散のための機構を備える。各入力ポートと関連付けられた入力バッファが存在する。入力および出力の両方のポートと関連付けられた、２の補数の符号付きカウントも存在する。 ACM 160 provides mechanisms for task management, flow control, and load balancing. There is an input buffer associated with each input port. There are also two's complement signed counts associated with both input and output ports.

入力ポートに関しては、カウントは消費側カウントと称される。なぜなら、このカウントは、関連するタスクにより消費されるために利用可能である、そのポートの入力バッファにおけるデータ量を反映するからである。消費側カウントは、その値が非負である場合、すなわちその符号ビットが０である場合、有効化される。有効化された消費側カウントは、関連する入力バッファが、関連するタスクの活性化により要求される最小量のデータを有することを示す。システム初期化時または再設定時に、消費側カウントは、典型的には−Ｃにリセットされる。なお、Ｃはタスク活性化毎に要求される３２ビットワードの最少個数である。 For the input port, the count is referred to as the consumer count. Because this count reflects the amount of data in the port's input buffer that is available to be consumed by the associated task. The consumer count is validated if its value is non-negative, ie if its sign bit is zero. The enabled consumer count indicates that the associated input buffer has the minimum amount of data required by activation of the associated task. At system initialization or reset, the consumer count is typically reset to -C. C is the minimum number of 32-bit words required for each task activation.

出力ポートに関しては、カウントは作成側カウントと称される。なぜなら、このカウントは、関連するタスクにより作成されたデータを受け入れるために、下流側入力バッファにおいて利用可能なスペースの量を反映するためである。作成側カウントは、その値が負である場合、すなわちその符号ビットが１である場合、有効化される。有効化された作成側カウントは、下流側関連入力バッファが、関連するタスクの活性化毎に作成されるデータの最大量を収容するために利用可能なスペースを有することを示す。システム初期化時または再設定時に、作成側カウントは、典型的にはＰ−Ｓ−１にリセットされる。なお、Ｐはタスク活性化毎に作成される３２ビットワードの最大個数、Ｓは３２ビットワードにおける下流側入力バッファのサイズである。 For output ports, the count is referred to as the producer count. Because this count reflects the amount of space available in the downstream input buffer to accept data created by the associated task. The producer count is validated if its value is negative, ie if its sign bit is one. The enabled creator count indicates that the downstream associated input buffer has space available to accommodate the maximum amount of data created per activation of the associated task. At system initialization or reset time, the producer count is typically reset to P-S-1. Here, P is the maximum number of 32-bit words created for each task activation, and S is the size of the downstream input buffer in the 32-bit words.

消費側カウントおよび作成側カウントの両方は、典型的には負の値に初期化され、それにより、消費側カウントは開始時において無効状態にあり、一方、作成側カウントは開始時において有効状態にある。この初期状態は、入力バッファが通常はシステム初期化／再設定時において空き状態であるという事実を反映するものである。 Both consumer count and producer count are typically initialized to a negative value so that consumer count is in an invalid state at the start, while producer count is in a valid state at the start is there. This initial state reflects the fact that the input buffer is normally free at system initialization / reconfiguration.

消費側カウントおよび作成側カウントは、フォワードアクノレッジメントおよびバックワードアクノレッジメントの形の借方のシステムにより更新される。両種のアクノレッジメントは、図７に示すアクノレッジメントネットワークワード２８０等のネットワークワードである。アクノレッジメントネットワークワード２８０は、図３のネットワークワード１８８と同一のフィールドを有し、同じ要素／フィールドは同一の要素番号が付される。アクノレッジメントネットワーク２８０ワードは、タスク活性化の最終ステップとしてタスクにより送信される。両方の場合において、ペイロードフィールド１９８は、４個のサブフィールド、すなわち、アクノレッジメント種類サブフィールド２８２（１ビット）、ポートサブフィールド２８４、（３）タスクサブフィールド２８６、およびＡｃｋ値サブフィールド２８８、を有する。 Consumer counts and producer counts are updated by the debit system in the form of forward acknowledgment and backward acknowledgment. Both types of acknowledgments are network words, such as the acknowledgment network word 280 shown in FIG. The acknowledgment network word 280 has the same fields as the network word 188 of FIG. 3 and the same elements / fields have the same element number. An acknowledgment network 280 words are sent by the task as the final step of task activation. In both cases, the payload field 198 has four sub-fields: an acknowledgment type sub-field 282 (1 bit), a port sub-field 284, (3) task sub-field 286, and an Ack value sub-field 288. .

タスクが各活性化の終了時にタスクが実行する一連のアクノレッジメントが、以下に説明される。タスクの各出力ポートに対して、消費側入力ポートおよび消費側タスクを指定するフォワードアクノレッジメントが消費側ノードに送信される。Ａｃｋ値は、タスクが消費側入力ポートに送信したＰＴＰワードの個数である。出力ポートおよびタスクを指定するバックワードアクノレッジメント（自己ａｃｋ）は、タスクが存在するノードに送信される。Ａｃｋ値は、タスクが出力ポートを介して送信したＰＴＰワードの個数である。 A series of acknowledgments that the task performs at the end of each activation are described below. For each output port of the task, a forward acknowledgment specifying the consumer input port and the consumer task is sent to the consumer node. The Ack value is the number of PTP words that the task has sent to the consuming input port. The backward acknowledgment (self ack) specifying the output port and task is sent to the node where the task resides. The Ack value is the number of PTP words that the task has transmitted via the output port.

タスクの各入力ポートに対して、作成側出力ポートおよび作成側タスクを指定するバックワードアクノレッジメントは、作成側ノードに送信される。Ａｃｋ値は、タスクが、入力ポートのバッファから消費した３２ビットワードの個数にマイナス符号を付した値である。入力ポートおよびタスクを示すフォワードアクノレッジメント（自己ａｃｋ）は、タスクが存在するノードに送信される。Ａｃｋ値は、タスクが入力ポートのバッファから消費した３２ビットワードの個数にマイナス符号を付した値である。
ハードウェアタスクマネージャ For each input port of the task, a backward acknowledgment specifying the creator output port and the creator task is sent to the creator node. The Ack value is a value obtained by adding a minus sign to the number of 32-bit words that the task has consumed from the buffer of the input port. Forward acknowledgment (self ack) indicating the input port and task is sent to the node where the task resides. The Ack value is a value obtained by adding a minus sign to the number of 32-bit words consumed by the task from the buffer of the input port.
Hardware task manager

図４に示すハードウェアタスクマネージャ２００は、受信するアクノレッジメントに応答して消費側カウントおよび作成側カウントを更新する機能を担当するノードラッパーの１部分である。ハードウェアタスクマネージャ２００はまた、これらのカウントの符号ビットを監視し、カウントの適切な組が有効化されるとタスクを起動する。この最後の任務は、ポートではなくタスクと関連付けられた２つの符号付きカウント、すなわちタスク入力カウントおよびタスク出力カウントを用いて満足される。タスクの入力（出力）カウントは、有効化されたタスク消費側（作成側）カウントの個数を反映する。タスクカウントはその値が非負である場合、有効化される。タスクは、その入力カウントおよび出力カウントの両方が有効化されると、有効化され、実行のために利用可能となる。 The hardware task manager 200 shown in FIG. 4 is part of a node wrapper that is responsible for updating consumer and creator counts in response to an acknowledgment being received. The hardware task manager 200 also monitors the sign bit of these counts and launches tasks when the appropriate set of counts is enabled. This last task is satisfied using two signed counts associated with the task rather than the port: task input count and task output count. The task's input (output) count reflects the number of activated task consuming side (creation side) counts. The task count is enabled if its value is non-negative. A task is enabled and available for execution when both its input count and output count are enabled.

受信するアクノレッジメントは、様々なカウントを更新し、それによりタスクは以下のように起動される。フォワードアクノレッジメントが受信されると、指定されたポートが入力ポートとして解釈され、Ａｃｋ値が、対応する消費側カウントに加えられる。消費側カウントが、無効状態から有効状態へと（有効状態から無効状態へと）遷移すると、指定されたタスクの入力カウントは、１だけインクリメント（デクリメント）される。バックワードアクノレッジメントが受信されると、指定されたポートが出力ポートとして解釈され、Ａｃｋ値が、対応する作成側カウントに加えられる。作成側カウントが、無効状態から有効状態へと（有効状態から無効状態へと）遷移すると、指定されたタスクの出力カウントは、１だけインクリメント（デクリメント）される。バックワードアクノレッジメントまたはフォワードアクノレッジメントが受信され、指定されたタスクの入力カウントおよび出力カウントの両方が有効化されると、そのタスクは、もし実行準備完了キュー上にない場合は、実行準備完了キュー上に置かれる。タスクは、キューの先頭に到達すると起動される。 Acknowledgments that receive update the various counts so that the task is triggered as follows. When forward acknowledgment is received, the designated port is interpreted as an input port and an Ack value is added to the corresponding consumer count. When the consumer count transitions from the inactive state to the active state (from the active state to the inactive state), the input count of the designated task is incremented (decremented) by one. When a backward acknowledgment is received, the designated port is interpreted as an output port and an Ack value is added to the corresponding creator count. When the creator count transitions from the invalid state to the valid state (from the valid state to the invalid state), the output count of the designated task is incremented (decremented) by one. If a backward acknowledgment or forward acknowledgment is received and both the input count and output count of the specified task are enabled, then the task is on the ready to run queue if it is not on the ready to run queue Will be put. The task is launched when it reaches the top of the queue.

これらの動作は、タスクに対する始動規則を具体化する。これらにより、タスクは実行準備完了キューに置かれることとなり、最終的に十分な個数の消費側カウントおよび十分な個数の作成側カウントが有効化されると、タスクは実行される。これらの十分な個数の具体的数値は、タスクの入力カウントおよび出力カウントの初期値により決定される。Ｉ（Ｏ）が、タスクに関連付けられた入力（出力）ポートの個数であり、ＩＣ_{Ｉｎｉｔｉａｌ}（ＯＣ_{Ｉｎｉｔｉａｌ}）がタスクの入力（出力）カウントの初期値であり、且つ、上述のように全部の消費側カウントが最初無効状態にあり全部の作成側カウントが最初有効状態にあると仮定すると、タスクは、
Ｉ個の消費側カウントうちの−ＩＣ_{Ｉｎｉｔｉａｉｌ}個が有効状態にあり、
Ｏ個の作成側カウントうち（Ｏ−ＯＣ_{Ｉｎｉｔｉａｌ}）個が有効状態にある場合、始動する。
例えば、Ｉ＝４に対して、
ＩＣ_{Ｉｎｉｔｉａｌ}＝−１であるならば、４個の消費側カウントのうち１個が有効化されなければならない。
ＩＣ_{Ｉｎｉｔｉａｌ}＝−２であるならば、４個の消費側カウントのうち２個が有効化されなければならない。
ＩＣ_{Ｉｎｉｔｉａｌ}＝−３であるならば、４個の消費側カウントのうち３個が有効化されなければならない。
ＩＣ_{Ｉｎｉｔｉａｌ}＝−４であるならば、４個の消費側カウントのうち４個が有効化されなければならない。
Ｏ＝４に対して、
ＯＣ_{Ｉｎｉｔｉａｌ}＝３であるならば、４個の作成側カウントのうち１個が有効化されなければならない。
ＯＣ_{Ｉｎｉｔｉａｌ}＝２であるならば、４個の作成側カウントのうち２個が有効化されなければならない。
ＯＣ_{Ｉｎｉｔｉａｌ}＝１であるならば、４個の作成側カウントのうち３個が有効化されなければならない。
ＯＣ_{Ｉｎｉｔｉａｌ}＝０であるならば、４個の作成側カウントのうち４個が有効化されなければならない。 These actions embody start-up rules for the task. As a result, the task is placed in the execution ready queue, and the task is executed when a sufficient number of consumer counts and a sufficient number of creator counts are finally activated. These sufficient numbers of concrete values are determined by the initial values of the task's input count and output count. I (O) is the number of input (output) ports associated with the task, IC _Initial (OC _Initial ) is the initial value of the task's input (output) count, and all consumption as described above Assuming that the side count is initially in an invalid state and all the creator counts are initially in a valid state, the task is:
Of the I consumer counts, _{-IC Initiail} is in active state,
If (O-OC _Initial ) out of the O producer counts are in the valid state, start up.
For example, for I = 4
If IC _Initial = -1, then one of the four consumer counts must be enabled.
If IC _Initial = -2, then two of the four consumer counts must be validated.
If IC _Initial = -3, 3 out of 4 consumer counts must be enabled.
If IC _Initial = -4, then 4 of the 4 consumer counts must be enabled.
For O = 4,
If OC _Initial = 3, then one of the four creator counts must be enabled.
If OC _Initial = 2, then two of the four producer counts must be enabled.
If OC _Initial = 1, then three of the four producer counts must be enabled.
If OC _Initial = 0, then 4 of the 4 creator counts must be enabled.

図１におけるＡＣＥ１００および図２におけるＡＣＭ２００等のマルチプロセッサシステムのプログラミングは、ストリームＣプログラム言語と称され得るものを用いてなされ得る。
ストリームＣモジュール Programming of a multiprocessor system such as ACE 100 in FIG. 1 and ACM 200 in FIG. 2 may be done using what may be referred to as the Stream C programming language.
Stream C module

ストリームＣプログラムにおいては、並行処理を表現するための機構は１つのみ存在する。その機構とは、すなわち、プログラムのモジュール（およびモジュールのようなストリーム式）の並行演算を用いることである。シンタックス的には、モジュールは、Ｃ関数と極めて類似するが、セマンティクス的には、モジュールと関数とは異なる。Ｃ関数（サブルーチン）は、コールされたときにのみ、動作を始める。コールされると、制御は、通常は入力引数とともに、Ｃ関数に渡される。次いで、Ｃ関数はタスク／演算を実行し、終了すると、出力結果とともに制御を返す。Ｃ関数とは異なり、モジュールはコールされることがなく、また制御は、モジュールに渡されることも、モジュールから返されることもない。代わって、モジュールは、他のモジュールおよび外部世界との進行中の相互作用を、入力ポートおよび出力ポートを通して実行する。これらのポートを通して、モジュールは入力値のストリームを受け取り、出力値のストリームを発行する。 In a stream C program, there is only one mechanism to express parallelism. The mechanism is to use the parallel operation of modules (and stream expressions like modules) of a program. Syntactically, modules are very similar to C functions, but semantically they are different from modules and functions. The C function (subroutine) starts its operation only when it is called. When called, control is passed to the C function, usually with input arguments. The C function then performs the task / operation and, upon completion, returns control with the output result. Unlike C functions, modules are never called, and control is neither passed to nor returned from modules. Instead, the modules perform ongoing interactions with other modules and the outside world through input and output ports. Through these ports, the module receives a stream of input values and issues a stream of output values.

モジュールプロトタイプのシンタックスはＣ関数プロトタイプのシンタックスと、３つの例外を除いて、同じである。第１に、キーワードｓｔｒｅａｍがモジュールプロトタイプに先行する。このキーワードは、各モジュール入力およびモジュール出力が、個別の値とではなく、指定された型の値のストリームと関連付けられていることを、コンパイラ／リンカに伝える。第２に、モジュールが複数の主力ストリームを有することを可能にするために、モジュールの戻り値型は、シンタックスにおいて入力パラメータリストと同一である、括弧で囲まれたリストにより置き換えられ得る。第３に、配列の概念をモジュールへと拡張するために、角括弧で囲まれた配列インデックスのリストは、モジュール名の直後および入力引数リストの直前に挿入され得る。モジュール配列については以下で論じる。 The syntax of module prototypes is the same as that of C function prototypes, with three exceptions. First, the keyword stream precedes the module prototype. This keyword tells the compiler / linker that each module input and module output is associated with a stream of values of the specified type, not with individual values. Second, to allow the module to have multiple main stream, the module's return type can be replaced by a parenthesized list that is identical in syntax to the input parameter list. Third, in order to extend the concept of arrays into modules, a bracketed list of array indices may be inserted immediately after the module name and just before the input argument list. The module arrangement is discussed below.

以下の式、
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ（ｉｎｔ，ｉｎｔ）；
ｓｔｒｅａｍ（ｉｎｔ，ｉｎｔ）ｍｏｄｕｌｅＢ（ｉｎｔ，ｉｎｔ）；
は、モジュール宣言の２つの例である。パラメータ名はここでは省略される。なぜなら、パラメータ名はモジュール宣言においては不要である（モジュール定義またはモジュールインスタンス化とは対比的に）ためである。しかしパラメータ名はプログラマの自由裁量により通常は記憶の一助として、入力に対して、および複数の出力が存在する場合には、出力に対しても、含まれ得る。例えば、２つの宣言は、次のように表現され得る。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ（ｉｎｔａ，ｉｎｔｂ）；
ｓｔｒｅａｍ（ｉｎｔｘ，ｉｎｔｙ）ｍｏｄｕｌｅＢ（ｉｎｔａ，ｉｎｔｂ）； The following formula,
stream int moduleA (int, int);
stream (int, int) moduleB (int, int);
Are two examples of module declarations. Parameter names are omitted here. Because parameter names are not required in module declarations (as opposed to module definition or module instantiation). However, parameter names may also be included at the programmer's discretion, usually as an aid to storage, to the input, and also to the output if there are multiple outputs. For example, two declarations can be expressed as:
stream int moduleA (int a, int b);
stream (int x, int y) moduleB (int a, int b);

第１の宣言は、ｍｏｄｕｌｅＡが２つの入力ストリームを有し、その両方が整数型であること、および単一の出力ストリームを有し、これもまた整数型であることを示す。第２の宣言は、ｍｏｄｕｌｅＢが２つの入力ストリームを有し、その両方が整数型であること、および２つの出力ストリームを有し、これらもまた整数型であることを示す。 The first declaration indicates that moduleA has two input streams, both of which are of integer type, and that it has a single output stream, which is also of integer type. The second declaration indicates that moduleB has two input streams, both of which are of integer type, and which have two output streams, which are also of integer type.

Ｃ関数の定義と同様に、モジュールの定義は、波括弧（｛および｝）で囲まれた本体を有する。Ｃ関数の定義の場合と同様に、各モジュール入力（および出力が複数ある場合は、各出力モジュール）は識別子が割り当てられる。以下は、モジュール定義の２つの例である。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ（ｉｎｔａ，ｉｎｔ，ｂ）
｛
／／モジュール本体
｝

ｓｔｒｅａｍ（ｉｎｔｘ，ｉｎｔｙ）ｍｏｄｕｌｅＢ（ｉｎｔａ，ｉｎｔｂ）
｛
／／モジュール本体
｝ Similar to the C function definition, the module definition has a body enclosed in braces ({and}). As with the C function definition, each module input (and each output module if there is more than one output) is assigned an identifier. The following are two examples of module definition.
stream int moduleA (int a, int, b)
{
/// main body of module
}

stream (int x, int y) moduleB (int a, int b)
{
/// main body of module
}

モジュールインスタンス化は、Ｃ関数コールに対するモジュールにおける対応物である。関数コールと同様に、モジュールインスタンス化は、モジュールがどこで用いられるかである。これら２種類の表現のシンタックスは同様であるが、セマンティクスは異なる。Ｃコードの１部分は、以下のように表現され得る。
ｉｎｔｘ，ｙ；
ｉｎｔＦ（ｉｎｔ，ｉｎｔ）；
．
．
．
ｉｎｔｚ＝Ｆ（４，ｘ＋５＊ｙ）； Module instantiation is the counterpart in modules to C function calls. Like function calls, module instantiation is where modules are used. The syntax of these two types of expressions is similar but the semantics are different. One part of the C code can be expressed as follows.
int x, y;
int F (int, int);
.
.
.
int z = F (4, x + 5 * y);

第１の文がｘおよびｙが正数であることを宣言する一方で、第２の文はＦが２つの整数パラメータと、１つの整数の結果とを有する関数であることを宣言する。最後の文が、関数コールＦ（４，ｘ＋５＊ｙ）を含む代入文であり、関数Ｆ（４，ｘ＋５＊ｙ）は、２つの引数、すなわちＦの２つのパラメータに対応する、式４および式ｘ＋５＊ｙを有する。 The first statement declares that x and y are positive numbers, while the second statement declares that F is a function with two integer parameters and one integer result. The last statement is an assignment statement containing the function call F (4, x + 5 * y), where the function F (4, x + 5 * y) corresponds to the two parameters, namely the two parameters of F, Equation 4 and It has the formula x + 5 * y.

この部分的コードのストリーム版は、以下のようになる。
ｓｔｒｅａｍｉｎｔｘ，ｙ；
ｓｔｒｅａｍｉｎｔＦ（ｉｎｔ，ｉｎｔ）；
．
．
．
ｓｔｒｅａｍｉｎｔｚ＝Ｆ（４，ｘ＋５＊ｙ）； The streamed version of this partial code is as follows:
stream int x, y;
stream int F (int, int);
.
.
.
stream int z = F (4, x + 5 * y);

このストリーム版において、これらの文のそれぞれにおいては、ｓｔｒｅａｍというキーワードが先頭にある。シンタックスが変化しているために、セマンティクスにおいても劇的な変化が生じている。個別の値に代わって、値のストリームが用いられる。したがって、第１の文は、ｘおよびｙが整数ストリームであることを宣言する一方、第２の文は、Ｆが、２つの整数ストリーム入力および整数ストリーム出力を有するモジュールであることを宣言する。最後の文は、モジュールインスタンス化Ｆ（４，ｘ＋５＊ｙ）を含む代入文である。このＦ（４，ｘ＋５＊ｙ）は、２つの引数、すなわちＦの２つのパラメータに対応する、ストリーム式４およびストリーム式ｘ＋５＊ｙを有する。 In this stream version, in each of these statements, the keyword stream is at the top. Because of the change in syntax, there is also a dramatic change in semantics. Instead of individual values, a stream of values is used. Thus, the first statement declares that x and y are integer streams, while the second statement declares that F is a module with two integer stream inputs and integer stream outputs. The last statement is an assignment statement that includes module instantiation F (4, x + 5 * y). This F (4, x + 5 * y) has stream expression 4 and stream expression x + 5 * y, corresponding to two arguments, namely two parameters of F.

関数コールの場合は、代入ｚ＝Ｆ（４，ｘ＋５＊ｙ）が実行される毎、式４および式４，ｘ＋５＊ｙが評価され、その結果生じた２つの値がパラメータとしてコール時に関数Ｆへと供給される。一定時間経過後、Ｆが値を返す。モジュールインスタンス化の場合は、代入文ｚ＝Ｆ（４，ｘ＋５＊ｙ）が実行されることもなく、また関数Ｆがコールされることもない。代わって、システム初期化時、ストリームＣプログラムが実行を始める直前に、関数Ｆのインスタンスが作成（インスタンス化）され、それにより、インスタンスは、その２つの入力ポート上で整数のストリームを受け取ること、およびその出力ポート上で
整数のストリームを作成することの準備が完了する。プログラム実行が開始されると、Ｆのインスタンスは、プログラム終了時まで、動作状態が保持される（すなわちＦのインスタンスは永続的である）。 In the case of a function call, whenever substitution z = F (4, x + 5 * y) is executed, expressions 4 and 4, x + 5 * y are evaluated, and the resulting two values are used as parameters to call function F Supplied to After a certain time, F returns a value. In the case of module instantiation, the assignment statement z = F (4, x + 5 * y) is not executed, and the function F is not called. Instead, at system initialization, just before the stream C program starts execution, an instance of the function F is instantiated, whereby the instance receives an integer stream on its two input ports, And is ready to create an integer stream on its output port. When program execution is started, an instance of F retains its operating state (ie, an instance of F is permanent) until the end of the program.

この簡単な例は、ストリームＣにおいて相互作用するモジュールの集団を作成するために用いられる一般的な機構を示すものである。各モジュールのインスタンス化により、別個のモジュールインスタンスがシステム初期化時に作成される。ひとたび作成（インスタンス化）されると、モジュールインスタンスは、その入力ポート上で値のストリームを受け取ること、およびその出力ポート上で値のストリームを作成することの準備が完了する。さらに、プログラム実行が開始されると、モジュールインスタンスは、プログラム終了時まで、動作状態を保持する。 This simple example shows the general mechanism used to create a collection of interacting modules in stream C. With each module instantiation, a separate module instance is created at system initialization. Once instantiated, the module instance is ready to receive a stream of values on its input port and to create a stream of values on its output port. Furthermore, when program execution is started, the module instance holds the operating state until the program end time.

複数の出力ポートを有するモジュールのインスタンス化の一般的な形は以下の通りである。
（＜識別子リスト＞）＜モジュール識別子＞（＜式リスト＞）
入力引数が式であるのに対し、出力引数は識別子である。これらの識別子は、名称がない出力ストリームに名称を与えるよう機能する。上記のストリーム代入文は、名称ｚをＦ（４，ｘ＋５＊ｙ）の名称がない出力ストリームに割り当てることにより、同じ役割を果たす。例えば、
ｓｔｒｅａｍｉｎｔｗ，ｘ，ｙ，ｚ；
ｓｔｒｅａｍ（ｉｎｔ，ｉｎｔ）Ｆ（ｉｎｔ，ｉｎｔ）；
．
．
．
（ｗ，ｚ）＝Ｆ（４，ｘ＋５＊ｙ）； The general form of instantiation of a module with multiple output ports is as follows.
(<Identifier list>) <module identifier>(<expressionlist>)
Output arguments are identifiers, while input arguments are expressions. These identifiers serve to provide a name for the unnamed output stream. The above stream assignment statement plays the same role by assigning the name z to the output stream without the name F (4, x + 5 * y). For example,
stream int w, x, y, z;
stream (int, int) F (int, int);
.
.
.
(W, z) = F (4, x + 5 * y);

前述のように、Ｆは２つの整数ストリーム入力を有するが、前の例とは対比的に、このＦは２つの整数ストリーム出力を有する。これら２つの出力ストリームは、識別子（ｗ，
ｚ）のリストとしてＦのインスタンス化時に現れる。なお、これは、２つの出力ストリームに名称ｗおよびｚを与えるよう機能する。 As mentioned above, F has two integer stream inputs, but in contrast to the previous example, this F has two integer stream outputs. These two output streams have identifiers (w,
Appears upon instantiation of F as a list of z). Note that this works to give the two output streams the names w and z.

モジュール本体内部の文は、２つのカテゴリー、すなわち、ストリームのみに関与するストリーム文と、Ｃの文の全範囲を含む他に、スレッドをストリームから読み出すこととスレッドをストリームに書き込むこととを可能にする文を含むスレッド文と、に分類される。各モジュールのインスタンス化により、別個のモジュールインスタンスがシステム初期化時に作成されるため、ストリームＣにおいては、モジュールが、その本体内において、またはサブモジュールの本体内において、モジュール自体のインスタンス化を有することはできない。換言すると、モジュール参照の循環性は不可能である。この禁止は、無限個のモジュールインスタンスのインスタンス化という困難なタスクを回避することを支援する。 The statements inside the module body allow you to read threads from the stream and write threads to the stream besides stream statements involving only the stream, including the full range of C statements Are classified as thread statements that contain In stream C, a module has its own instantiation within its body, or within the body of a submodule, as a separate module instance is created at system initialization by the instantiation of each module I can not do it. In other words, the cyclicity of the module reference is not possible. This prohibition helps to avoid the difficult task of instantiating an infinite number of module instances.

ストリームＣモジュールにおいて、制御を返すという概念は存在せず、したがってリターン文は適さない。モジュールにおいて、出力値はモジュールの出力ストリームに単に挿入される。しかし、それを行うためには、出力ストリームは名称を有さなければならない。括弧で囲まれた名称を有する出力ストリームのリストを有するモジュールに対しては、それは問題ではない。しかし、モジュールプロトタイプがモジュールの出力ストリームの型のみを提供する場合は、問題となる。その場合、モジュールの本体内のコードは、ストリームドメインにおいてもまたはスレッドドメインにおいても、キーワードｏｕｔをデフォルトモジュール出力ストリームの名称として用いることができる。この用法は、以下の部分的コードにおいて例示される。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ（ｉｎｔａ，ｉｎｔｂ）
｛
．
．
．
ｏｕｔ＝ａ＋ｂ；
．
．
． There is no notion of returning control in the stream C module, so the return statement is not suitable. In modules, output values are simply inserted into the module's output stream. However, to do that, the output stream must have a name. That is not a problem for modules that have a list of output streams with names in parentheses. However, it is a problem if the module prototype only provides the type of output stream of the module. In that case, the code within the body of the module can use the keyword out as the name of the default module output stream, either in the stream domain or in the thread domain. This usage is illustrated in the partial code below.
stream int moduleA (int a, int b)
{
.
.
.
out = a + b;
.
.
.

関数がプログラムの演算ビルディングブロックを提供するのに対して、モジュール、およびモジュールと提携するストリームは、ストリームＣプログラムに典型的な相互作用網および並行動作のための枠組を提供する。モジュールは値のストリームを処理するが、そのことにより、モジュールがストリーム内の個々の値にアクセスすることおよびこれらの値を関数に提供することが不可能となるわけではない。同様に、モジュールは、関数の出力値にアクセスすること、およびその値をストリームに挿入することも可能である。一方、関数はモジュールを参照することができない。なぜなら、関数内には係る相互作用のための機構が存在しないためである。能力におけるこの非対称性により、関数がプログラムヒエラルキー上でより低いレベルにある一方、モジュールはより高いレベルにあることがわかる。 The functions provide the operation building blocks of the program, whereas the modules, and the streams associated with the modules, provide a framework for interaction networks and parallel operations typical of stream C programs. A module processes a stream of values, which does not make it impossible for the module to access the individual values in the stream and to provide these values to the function. Similarly, the module can access the output value of the function and insert that value into the stream. On the other hand, functions can not refer to modules. This is because there is no mechanism for such interaction in the function. This asymmetry in capability shows that the module is at a higher level while the function is at a lower level on the program hierarchy.

モジュールと関数との違いが実質的である一方で、モジュールと関数とが同様である１つの領域が存在する。それは、モジュールおよび関数の両方が副作用をサポートする点である。すなわち、モジュールおよび関数の両方が、それぞれの入力ポートおよび出力ポートとは独立的に、外部データ構造を操作し得る。これは、モジュールが副作用を有し得るスレッドを含み得る事実に由来する。 While the differences between modules and functions are substantial, there is one domain where modules and functions are similar. That is, both modules and functions support side effects. That is, both modules and functions can manipulate external data structures independently of their respective input and output ports. This stems from the fact that a module may contain threads that may have side effects.

図８Ａは、モジュール３００と、モジュール３００に対してデータ／制御を提供するいくつか（ゼロ個からＮ個）の入力ストリーム３０２と、次のモジュール／関数に対してデータ／制御を提供するいくつか（ゼロ個からＮ個）の出力ストリーム３０４とを含む、一般的なモジュールを示す。出力ストリームを有さないモジュールは「シンク」であり、入力ストリームを有さないストリームは「ソース」である。 FIG. 8A shows module 300, several (zero to N) input streams 302 that provide data / control to module 300, and some that provide data / control to the next module / function. A general module is shown, including (zero to N) output streams 304. Modules that do not have an output stream are "sinks" and streams that do not have an input stream are "sources."

図８Ｂは、２つのモジュール、すなわちモジュールＡ３００およびモジュールＢ３１０を示す図であり、各モジュールは、入力ストリーム３０２および３１２と、出力ストリーム３０４および３１４とを有する。モジュールＡ３００の出力ストリーム３０４はモジュールＢ３１０の入力ストリーム３１２に接続される。モジュールＡ３００は、ＣＰＵコア３０８上で実行されるようマッピングされ、モジュールＢ３１０は第２のＣＰＵコア３１８上で実行されるようマッピングされる。コア３０８、３１８、および３２８は、図２のノード１８０と同様である。 FIG. 8B is a diagram showing two modules, namely module A 300 and module B 310, each module having input streams 302 and 312 and output streams 304 and 314. The output stream 304 of module A 300 is connected to the input stream 312 of module B 310. The module A 300 is mapped to execute on the CPU core 308, and the module B 310 is mapped to execute on the second CPU core 318. Cores 308, 318, and 328 are similar to node 180 of FIG.

図８Ｃは、モジュールＡ３００およびモジュールＢ３１０をＣＰＵコア３０８等の同一のＣＰＵコアにマッピングすることを示す図である。この事例において、モジュール３００および３１０は、任意の他の別個の制御スレッドのように動作する。第２コア３１８上で実行されるオペレーティングシステムは、モジュール３００および３１０を、プリエンプティブマルチタスキングに基づいてスケジューリングするか、または完了／解放まで実行し得る。両方のモジュール３００および３１０および入力／出力ストリーム３０２、３１２、および３０４、３１４は、「永続的」である（すなわち、これらは処理実行準備完了状態に留まる）ため、演算を実行するための「十分な」入力ストリームデータの、および出力ストリームが演算されたデータを伝達することができる「十分な」スペースの、両
方の利用可能性に基づいてモジュールをいつスケジュールするかに関する追加情報は、従来のオペレーティングシステムに提供されなければならない。 FIG. 8C is a diagram illustrating mapping of the module A 300 and the module B 310 to the same CPU core such as the CPU core 308. In this case, modules 300 and 310 behave like any other separate control thread. The operating system running on the second core 318 may schedule the modules 300 and 310 based on preemptive multitasking or run to completion / release. Both modules 300 and 310 and input / output streams 302, 312 and 304, 314 are "permanent" (ie, they remain ready for processing execution), so "enough" to perform the operation. Additional information on when to schedule modules based on both the availability of input stream data and of the "sufficient" space where the output stream can carry computed data can be obtained from the conventional operating system. Must be provided to the system.

多様な異なるアルゴリズムが、モジュールからコアへのマッピングを実行するために用いられ得る。これらのアルゴリズムは、キャッシュ近傍を含み得る。なお、キャッシュ近傍においては、最大個数のストリームを共有するモジュールは、共有されたＤＲＡＭに後続される共有されたＬ３キャッシュに後続される共有されたＬ２キャッシュに後続されるＬ１キャッシュを共有するコアに置かれる。これらのアルゴリズムは、物理近傍アルゴリズムも含み得る。なお、物理近傍アルゴリズムにおいては、最大個数のストリームを共有するモジュールは、互いに物理的に近接するコアに置かれる。例えば、アルゴリズムはダイから始まり、次いで、マザーボード上の集積回路へと、次いでラック上のマザーボードへと、次いで建物の同一階のラックへと、次いで地理的に近接する建物へと移行し得る。他のアルゴリズムは、次の利用可能なフリーであり得る。なお、そこにおいては、ＣＰＵ使用率（現在の使用率、または一定期間にわたる重み付き平均使用率）または順番上次に利用可能なコアに基づく次の「自由な」コアに基づいて、モジュールがコアに割り当てられる。他のアルゴリズムは、予想負荷であり得る。なお、これは、評価された統計的サンプリングに基づいて、モジュールおよびコアを選択する。コア利用の実行平均が、モジュールを、最も負荷が軽いコアにロードするために用いられ得る。他のアルゴリズムは、ユーザによる指定である。なお、ここでは、ユーザ指定によるバーチャルコアＩＤが、全モジュールを物理コアＩＤ上に置くために用いられる。バーチャルコアＩＤの個数が物理的に利用可能なコアを越えると、複数のモジュールが、利用可能な物理コアにわたって均等にロードされる。 A variety of different algorithms can be used to perform the module to core mapping. These algorithms may include cache neighborhoods. Note that in the cache neighborhood, the modules sharing the largest number of streams are to the core sharing the L1 cache following the shared L2 cache following the shared L3 cache following the shared DRAM Will be put. These algorithms may also include physical neighborhood algorithms. In the physical neighborhood algorithm, modules sharing the maximum number of streams are placed in cores physically close to each other. For example, the algorithm may start at the die and then move to integrated circuits on the motherboard, then to the motherboard on the rack, then to the rack on the same floor of the building, and then to the geographically close building. Another algorithm may be the next available free. Note that the modules are cored based on the CPU utilization (current utilization or weighted average utilization over time) or the next "free" core, which in turn is based on the next available core. Assigned to Another algorithm may be the expected load. Note that this selects modules and cores based on the estimated statistical sampling. A running average of core utilization may be used to load the module onto the lightest core. Another algorithm is user specified. Here, the virtual core ID specified by the user is used to place all modules on the physical core ID. When the number of virtual core IDs exceeds physically available cores, multiple modules are loaded evenly across the available physical cores.

図８Ｄは、モジュールＡ３００に存在し、入力ストリーム３０２および出力ストリーム３０４において用いられ得る、様々なデータ構造３３０、３３２、および３３４を示す。メモリ／キャッシュまたはＴＬＢのいずれかに存在し得るデータ構造３３０、３３２、および３３４は、シングルコアまたはマルチコアシステムが、入力ストリーム３０２等の入力ストリームから出力ストリーム３０４等の出力ストリームへとデータを搬送すること、入力ストリーム３０２をモジュールＡ３００等のモジュールへと搬送すること、およびモジュールＡ３００を出力ストリーム３０４へと入力することをスケジュールするために必要となる重要な情報を含む。各モジュールに対して、そのモジュールをユニークに識別し、そのモジュールに対するすべての入力ストリームをユニークに識別し、そのモジュールのすべての出力ストリームをユニークに識別し、入力ストリームおよび出力ストリームの「接続」方法をユニークに識別し、コアを識別し、モジュールが１つのコアから他のコアへとリロケートされるよう、すなわち仮想メモリを介してスワップアウトされるよう状態情報を保持する、情報が存在する。ストリームは動的にモジュールから追加または削除され得、モジュールは動的にコアから追加または削除され得る。
ストリーム FIG. 8D shows various data structures 330, 332, and 334 that may be present in module A 300 and used in input stream 302 and output stream 304. Data structures 330, 332, and 334, which may reside in either a memory / cache or TLB, allow single-core or multi-core systems to transport data from an input stream such as input stream 302 to an output stream such as output stream 304. And transports the input stream 302 to a module such as module A 300, and contains the important information needed to schedule the input of module A 300 to the output stream 304. For each module, uniquely identify that module, uniquely identify all input streams for that module, uniquely identify all output streams for that module, and "connect" the input and output streams The information exists that uniquely identifies the core, identifies the cores, and holds state information such that modules are relocated from one core to another, ie, swapped out through virtual memory. Streams can be dynamically added or deleted from modules, and modules can be dynamically added or deleted from cores.
stream

ストリームＣプログラム言語におけるストリームという用語は、すべて同じデータ型であり且つ典型的に一定期間にわたって利用可能状態となる、一連のデータ値を指す。しかし、ストリームＣにおいては、ストリームは、入力および出力のための枠組を遥かに越える機能を提供する。ストリームは、第１クラスのオブジェクト、すなわち変数の地位にほぼ匹敵する地位へと高められている。これは、ストリームが識別子と結合され得る（すなわち、ストリームに名称が与えられ得る）こと、関数の入力パラメータ（すなわちモジュールの入力パラメータ）と結合され得ること、関数の出力（すなわち、モジュールの入力パラメータ）と結合され得ること、式中のパラメータと結合され得ること、および式の出力と結合され得ることを意味する。 Stream The term stream in the C programming language refers to a series of data values that are all of the same data type and are typically available for a period of time. However, in stream C, the stream provides functionality far beyond the framework for input and output. The stream is enhanced to a position that is roughly comparable to that of the first class of objects, ie, variables. This means that the stream can be combined with the identifier (ie, the stream can be given a name), that it can be combined with the input parameter of the function (ie input parameter of the module), the output of the function (ie input parameter of the module) Means that it can be combined with the parameters in the formula, and can be combined with the output of the formula.

ストリームは、単一のデータ型の値を、１つまたは複数のストリームソースから１つま
たは複数のストリームデスティネーションへと伝える。この運搬がどのように遂行されるかについての正確な詳細は、実装に依存し、とりわけ、ストリームが単一の半導体ダイに限定されるかどうか、またはストリームが数メートルまたあるいは数千キロメートルにわたるかどうかに依存する。性能問題に対処する場合を除き、プログラマはこれらの詳細を考慮する必要はなく、４つのストリーム属性、すなわち、ストリーム型、ストリーム名、ストリームソース、およびストリームデスティネーションに関する、ストリームのこれらの側面のみを考慮すればよい。 A stream conveys values of a single data type from one or more stream sources to one or more stream destinations. The exact details of how this transport is accomplished depends on the implementation, and in particular whether the stream is limited to a single semiconductor die, or whether the stream spans a few meters or even thousands of kilometers It depends on you. Unless dealing with performance issues, programmers do not need to consider these details, only the four stream attributes: stream type, stream name, stream source, and stream destination. It should be taken into consideration.

ストリーム型は、伝えられる値の型を示す。ポインタおよびｔｙｐｅｄｅｆにより定義されるデータ型を含む、Ｃの正当なデータ型であり得るストリーム型は、例えば、モジュール入力または出力パラメータとして現れることにより文脈により暗黙的に特定され得るか、または以下に説明するストリーム宣言を用いて明示的に特定され得る。 The stream type indicates the type of value being conveyed. Stream types that may be legal data types of C, including data types defined by pointers and typedefs, may be implicitly specified by context, for example by appearing as module input or output parameters, or described below It can be explicitly specified using the stream declaration.

ストリームソースは、値がストリームに置かれる位置である。可能なストリームソースは、モジュール定義の入力パラメータ、モジュールインスタンス化の出力、ストリーム式の出力、およびスレッド（以下に説明する）を含む。ストリームデスティネーションは、ストリームがそのポイントへと値を伝える位置である。可能なストリームデスティネーションは、モジュール定義の出力パラメータ、モジュールインスタンス化の入力引数、ストリーム式の入力、およびスレッドを含む。省略可能なストリーム名は、ストリームがモジュール入力またはモジュール出力として現れるときに、またはストリームがストリーム宣言において導入されるときに、ストリームに割り当てられる名称／識別子である。名称を有さないストリームの１つの例は、ストリーム割り当てにより名称が割り当てられていないストリーム式の出力ストリームである。 The stream source is where the values are placed in the stream. Possible stream sources include module defined input parameters, module instantiation outputs, stream expressions output, and threads (described below). The stream destination is the position at which the stream conveys its value to that point. Possible stream destinations include module defined output parameters, module instantiation input arguments, stream expression inputs, and threads. An optional stream name is a name / identifier assigned to a stream when the stream appears as module input or module output, or when the stream is introduced in a stream declaration. One example of a stream without a name is a streamed output stream that has no name assigned by stream assignment.

ストリーム属性の概念は、関数Ｆの宣言およびモジュールＭの部分的な定義を含む、以下の部分的コードにより示される。
ｓｔｒｅａｍｉｎｔＦ（ｉｎｔ，ｉｎｔ）；
ｓｔｒｅａｍ（ｉｎｔｚＳｔｒｍ）Ｍ（ｉｎｔｘＳｔｒｍ，ｉｎｔｙＳｔｒｍ）
｛
・
・
・
ｚＳｔｒｍ＝ｘＳｔｒｍ＊ｙＳｔｒｍ＋Ｆ（ｘＳｔｒｍ，ｙＳｔｒｍ）；
・
・
・
｝ The notion of stream attributes is illustrated by the following partial code, including the declaration of function F and the partial definition of module M:
stream int F (int, int);
stream (int zStrm) M (int xStrm, int yStrm)
{
・
・
・
zStrm = xStrm * yStrm + F (xStrm, yStrm);
・
・
・
}

ここでは、名称を有する３つのストリーム、すなわちｘＳｔｒｍ、ｙＳｔｒｍ、およびｚＳｔｒｍが存在し、これらすべては、ｉｎｔのデータ型である。ｘＳｔｒｍおよびｙＳｔｒｍは、それぞれが、モジュールＭの入力パラメータである単一のソースを有する。ｘＳｔｒｍおよびｙＳｔｒｍのデスティネーションは、それぞれ、Ｍの本体における代入式に現れる、ｘＳｔｒｍおよびｙＳｔｒｍの２つのインスタンスにより表される（Ｃにおいては、代入も式であることを想起されたい）。これらのインスタンスは、代入式に対する入力を表す。したがって、ｘＳｔｒｍおよびｙＳｔｒｍは、それぞれが単一のソースおよび２つのデスティネーションを有する。 Here, there are three streams with names: xStrm, yStrm and zStrm, all of which are data types of int. xStrm and yStrm each have a single source which is an input parameter of module M. The destinations of xStrm and yStrm are represented by two instances of xStrm and yStrm, which appear in substitution expressions in the body of M, (remember, in C, substitution is also an expression). These instances represent inputs to assignment expressions. Thus, xStrm and yStrm each have a single source and two destinations.

ストリームの式は、ストリームの式においては変数の代わりに入力ストリームが存在す
ることを除いて、Ｃの式と同じである。ストリーム式は出力ストリームも有し、出力ストリームは、式評価からの結果を伝える。デフォルトにより、出力ストリームは名称を有さないが、ちょうど上記の代入で行ったように、ストリーム代入を用いることにより、名称を割り当てることが可能である。したがって、ストリーム式
ｘＳｔｒｍ＊ｙＳｔｒｍ＋Ｆ（ｘＳｔｒｍ，ｙＳｔｒｍ）
の出力ストリームは、ストリーム代入
ｚＳｔｒｍ＝ｘＳｔｒｍ＊ｙＳｔｒｍ＋Ｆ（ｘＳｔｒｍ，ｙＳｔｒｍ）
により、名称ｚＳｔｒｍが割り当てられる。これら２つの式のうちのいずれかが、ｚＳｔｒｍのソースであるとみなされ得る。ｚＳｔｒｍのデスティネーションは、モジュールＭの出力パラメータｚＳｔｒｍにより示されるモジュールＭの出力ストリームである。
ｓｔｒｅａｍ（ｉｎｔｚＳｔｒｍ）Ｍ（ｉｎｔｘＳｔｒｍ，ｉｎｔｙＳｔｒｍ）
したがって、ｘＳｔｒｍは単一のソースおよび単一のデスティネーションを有する。 The stream expression is the same as the C expression except that in the stream expression there is an input stream instead of a variable. The stream expression also has an output stream, which conveys the results from the expression evaluation. By default, the output stream does not have a name, but it is possible to assign a name by using stream substitution just as you did in the above substitution. Therefore, the stream expression xStrm * yStrm + F (xStrm, yStrm)
The output stream of is stream assignment zStrm = xStrm * yStrm + F (xStrm, yStrm)
Assigns the name zStrm. Either of these two formulas can be considered to be the source of zStrm. The destination of zStrm is the output stream of module M indicated by the output parameter zStrm of module M.
stream (int zStrm) M (int xStrm, int yStrm)
Thus, xStrm has a single source and a single destination.

ストリームの最も重要な属性は、ストリームが値を伝える際に果たす役割に関する。係る属性は４つ存在する。すなわち、ａ）値は、ストリームソースにおけるか、またはｉｎｉｔｉａｌｉｚｅ（）関数を使用するシステム初期化時以外は、ストリームに入らない、ｂ）単一のソースにおいてストリームに入る値は、完全に時間順に整列される、ｃ）ひとたびストリームに入ると、値はやがてはすべてのストリームデスティネーションに送達され、もしデスティネーションが複数存在する場合は、値の別個の複写が各デスティネーションに送達される、およびｄ）単一ソースからの値は、値がストリームに入ったのと同一の順番で（すなわち、ストリームにおいて値の飛び越しは存在しない）、各ストリームデスティネーションにおいて受け取られる。これらの４つの属性が、ストリームが値の運搬について提供する唯一の保証である。これら４つの属性から論理的帰結として従わない任意の他の属性は、一般的なストリーム属性ではない。 The most important attribute of the stream relates to the role that the stream plays in conveying the value. There are four such attributes. That is, a) the value does not enter the stream except at the stream source or at system initialization using the initialize () function, b) the value entering the stream at a single source is completely ordered in time (C) once in the stream, the values are delivered to all stream destinations eventually, and if there are multiple destinations, separate copies of the values are delivered to each destination, and d The values from the single source are received at each stream destination in the same order as the values entered the stream (ie there is no jump in values in the stream). These four attributes are the only guarantees that the stream provides for the transport of values. Any other attribute that does not follow as a logical outcome from these four attributes is not a general stream attribute.

ストリームが義務を負うのは、ただ最終的に値を送達することのみである。したがって、ストリームのレイテンシ、すなわち値がストリームソースからストリームデスティネーションへと移動するに要する時間は不定である。事実、レイテンシは、時間に応じて、または同じストリームのソース・デスティネーション対の間で、変動し得る。しかし、レイテンシが一定であるかまたは少なくとも限界を有することは、（プログラミングモデルよりもむしろ）システム実装により提供される保証に依存することにより達成され得る。例えば単一半導体ダイに限定されるソース・デスティネーション対は、通常、そのレイテンシは限界を有する。 The stream is obligated only to deliver the value ultimately. Thus, the latency of the stream, ie the time it takes for the value to move from the stream source to the stream destination, is indeterminate. In fact, latency can vary with time, or between source-destination pairs of the same stream. However, constant or at least bounded latency can be achieved by relying on the guarantees provided by the system implementation (rather than the programming model). For example, source-destination pairs limited to a single semiconductor die usually have limited latency.

また、上述の４つの属性は、ストリームの決定性および非決定性（不確定性）を暗示する。単一ソースを有するストリームに対しては、４つの属性は決定的なストリーム挙動を保証する。それは、値が単一ソースストリームに置かれる順番が、値がすべてのストリームデスティネーションに送達される順番を完全に決定することを意味する。しかし、複数のソースを有するストリームの場合、状況は大きく異なる。複数のストリームソースから生じる問題を示すために、以前のセクションに継続する以下の部分的コードの以下の適用を考慮してみる（ｏｕｔは単一出力モジュールのデフォルト出力ストリームである）。
ｉｎｔＦ（ｉｎｔ）；
ｓｔｒｅａｍｉｎｔＭ（ｉｎｔｘＳｔｒｍ，ｉｎｔｘＳｔｒｍ）
｛
・
・
・
ｏｕｔ＝ｘＳｔｒｍ＊ｘＳｔｒｍ＋Ｆ（ｘＳｔｒｍ）；
・
・
・
｝ Also, the four attributes mentioned above imply the determinacy and non-determinacy (uncertainty) of the stream. For streams with a single source, four attributes ensure deterministic stream behavior. That means that the order in which the values are placed in a single source stream completely determines the order in which the values are delivered to all stream destinations. However, in the case of a stream with multiple sources, the situation is very different. To illustrate the problem arising from multiple stream sources, consider the following application of the following partial code continuing to the previous section (out is the default output stream of a single output module):
int F (int);
stream int M (int xStrm, int xStrm)
{
・
・
・
out = xStrm * xStrm + F (xStrm);
・
・
・
}

モジュールＭの２つの入力パラメータは、同一のｘＳｔｒｍである。４つの属性から、モジュールＭの第１の入力パラメータを通してｘＳｔｒｍに入った値は、ｘＳｔｒｍの３つのデスティネーションのそれぞれにおいて、これらの値がストリームに入ったのと同じ順序で受け取られることとなる。モジュールＭの第２の入力パラメータを通してｘＳｔｒｍに入った値は、ｘＳｔｒｍの３つのデスティネーションのそれぞれにおいて、これらの値がストリームに入ったのと同じ順序で受け取られることとなる。このことは、２つのストリームの値は、ｘＳｔｒｍのそれぞれのデスティネーションに到達する前に、合併または交互配置されることを意味する。 The two input parameters of module M are identical xStrm. From the four attributes, the values entered into xStrm through the first input parameter of module M will be received at each of the three destinations of xStrm in the same order as these values entered the stream. The values entered into xStrm through the second input parameter of module M will be received in each of the three destinations of xStrm in the same order as these values entered the stream. This means that the values of the two streams are merged or interleaved before reaching the respective destinations of xStrm.

どのように交互配置が実行されるかは、一般に、プログラムの構成により影響される。例えば、上記のプログラムの欠落部分は、パラメータ１とパラメータ２との値の間で正確な交互配置がされるように構成され得る。例えば、モジュールＭの２つの入力パラメータ（ストリーム）ｘＳｔｒｍに到達する整数が、次の順序 How the interleaving is performed is generally influenced by the configuration of the program. For example, the missing part of the above program may be configured to be correctly interleaved between the values of parameter 1 and parameter 2. For example, the integers arriving at the two input parameters (stream) xStrm of module M have the following order:

を有する場合、式
ｏｕｔ＝ｘＳｔｒｍ＊ｘＳｔｒｍ＋Ｆ（ｘＳｔｒｍ）
のｘＳｔｒｍの３つのデスティネーションのそれぞれに到着する順序は、 If we have the formula out = xStrm * xStrm + F (xStrm)
The order of arrival at each of the three xStrm destinations is

という形になり得る。しかしながら、このようにプログラムにより課される決定性がい
つも成り立つとは限らず、複数のストリームソースからの値が非決定的に交互配置される場合もある。さらに、ターゲットシステムによっては、これらの非決定的交互配置が、ストリームデスティネーション毎に異なり得る。したがって、例えば、モジュールＭの２つ
の入力パラメータ（ストリーム）上に到達する値が上記の場合と同じである場合、ｘＳｔｒｍの３つのデスティネーションに到達する順序は、 It can be in the form of However, the determinacy imposed by the program does not always hold in this way, and values from multiple stream sources may be interleaved non-deterministically. Furthermore, depending on the target system, these non-deterministic alternations may differ from one stream destination to another. Thus, for example, if the values reached on the two input parameters (streams) of module M are the same as in the above case, the order of arriving at the three destinations of xStrm is

という形で始まり得る。複数ソースストリームのデスティネーションにおける到着順序
の非決定性は、単一ソースストリームのすべてのデスティネーションにわたる到着順序の一定性と対比的である。到着順序が一定である場合、以下の有用な表記を適用することが可能となる。単一ソースストリームｓｓＳｔｒｍおよび非負である整数ｉに対して、
ｓｓＳｔｒｍ（ｉ）
はｓｓＳｔｒｍのすべてのデスティネーションに現れる第ｉ番目の値を示す。慣例的に、ｓｓＳｔｒｍ（０）はすべてのデスティネーションに現れる第１の値を示す。 It can start in the form of The nondeterminism of arrival order at multiple source stream destinations is in contrast to the consistency of arrival order across all destinations of a single source stream. If the arrival order is fixed, it is possible to apply the following useful notation: For a single source stream ssStrm and a non-negative integer i
ssStrm (i)
Indicates the ith value that appears in all destinations in ssStrm. By convention, ssStrm (0) denotes the first value appearing at all destinations.

値がストリームデスティネーションに到達すると、そのデスティネーションがモジュール定義またはモジュールインスタンス化入力引数である場合、その値は、モジュール境界の他方の側におけるストリームへと渡される。このように値は、移行状態に留まる。デスティネーションがストリーム式の入力またはスレッドである場合、値はＦＩＦＯキューに置かれる。 When the value reaches the stream destination, if the destination is a module definition or module instantiation input argument, the value is passed to the stream on the other side of the module boundary. Thus, the value remains in transition. If the destination is a streamed input or thread, the value is placed in the FIFO queue.

移行状態に留まることを示すために、以下の部分的コードを示す。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅ１（ｉｎｔ）；
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅ２（ｉｎｔｘＳｔｒｍ）
｛
・
・
・
ｏｕｔ＝ｍｏｄｕｌｅｌ１（ｘＳｔｒｍ）；
・
・
・
｝
この部分的コードは、２つのモジュール、すなわちｍｏｄｕｌｅ１およびｍｏｄｕｌｅ２を含み、これら２つのモジュールのそれぞれは、単一の入力ストリームおよび単一の出力ストリームと、２つの名称を有するストリームｘＳｔｒｍおよびｙＳｔｒｍ（両方がｍｏｄｕｌｅ２の定義（本体）内に存在する）とを有する。ｘＳｔｒｍの唯一のデスティネーションすなわちｍｏｄｕｌｅ１（ｘＳｔｒｍ）は、ｍｏｄｕｌｅ１のインスタンス化の入力引数である。このデスティネーションに到達する値は、ｍｏｄｕｌｅ１の境界を単に通過し、ｍｏｄｕｌｅ１の内部ストリームに達する。この状況は、ｙＳｔｒｍの唯一のデスティネーション
ｓｔｒｅａｍ（ｉｎｔｙＳｔｒｍ）ｍｏｄｕｌｅ２（ｉｎｔｘＳｔｒｍ）
に到達する値に対しても同じである。なぜなら、このデスティネーションはｍｏｄｕｌｅ２の出力パラメータであるため、到達する値はｍｏｄｕｌｅ２の境界を単に通過して、ｍｏｄｕｌｅ２の外部にあるストリームに達する。 The following partial code is shown to indicate staying in transition:
stream int module1 (int);
stream int module2 (int xStrm)
{
・
・
・
out = modulel1 (xStrm);
・
・
・
}
This partial code contains two modules, namely module1 and module2, each of these two modules having a single input stream and a single output stream, and streams xStrm and yStrm with two names (both and exists in the definition (body) of module2. The only destination of xStrm, module1 (xStrm), is an input argument for instantiation of module1. The value reaching this destination simply passes the boundary of module 1 and reaches the internal stream of module 1. This situation is the only destination stream of yStrm (int yStrm) module2 (int xStrm)
The same is true for the values reaching. Because this destination is an output parameter of module 2, the arriving value simply passes the boundary of module 2 and reaches the stream outside of module 2.

他の例は、ストリームデスティネーションが以下の部分的コード等のストリーム式の入力である場合である。
ｓｔｒｅａｍｉｎｔＦ（ｉｎｔ，ｉｎｔ）；
ｓｔｒｅａｍｉｎｔＭ（ｉｎｔｘＳｔｒｍ，ｉｎｔｙＳｔｒｍ）
｛
・
・
・
ｏｕｔ＝ｘＳｔｒｍ＊ｙＳｔｒｍ＋Ｆ（ｘＳｔｒｍ，ｙＳｔｒｍ）；
・
・
・
｝
ストリーム式
ｘＳｔｒｍ＊ｙＳｔｒｍ＋Ｆ（ｘＳｔｒｍ，ｙＳｔｒｍ）
は、モジュールＭの本体内にあり、このストリーム式は、ｘＳｔｒｍの２つのデスティネーションと、ｙＳｔｒｍの２つのデスティネーションとを含む。このストリーム式は２つの演算子＊および＋と、関数Ｆとを含み、これらは普通のＣ構文である。このことは、この式を評価するためには、２つの演算子および関数Ｆが個々の値を供給されるべきであることを意味する。 Another example is where the stream destination is a streamed input, such as the following partial code:
stream int F (int, int);
stream int M (int xStrm, int yStrm)
{
・
・
・
out = xStrm * yStrm + F (xStrm, yStrm);
・
・
・
}
Stream expression xStrm * yStrm + F (xStrm, yStrm)
Is in the body of module M, this stream expression contains two destinations of xStrm and two destinations of yStrm. This stream expression includes two operators * and + and a function F, which are normal C syntax. This means that two operators and the function F should be supplied with individual values in order to evaluate this expression.

キューはストリームＣのリンカ／ローダにより自動的に挿入され、ストリームＣのランタイムにより管理される。ランタイムの任務の中には、キューが空き状態にあるときに信号を発すること、およびどのキューも確実にオーバーフローしないようにすることである。プログラマは特定の量を以下で説明するようにｐｒａｇｍａコマンドにより要求し得るが、各キューは関連するデータ型の少なくとも２つの値の容量を有することが保証される。この例のストリームにおいて、４つのキュー、すなわち４つのストリームデスティネーション（ストリーム式入力）のそれぞれに対して１つのキュー、が存在する。これらのキューは、プログラマに対しては大半が不可視である。 The queue is automatically inserted by the stream C linker / loader and managed by the stream C runtime. Some of the runtime's mission is to signal when queues are free and to ensure that no queues overflow. The programmer may request a specific amount with pragma commands as described below, but each queue is guaranteed to have a capacity of at least two values of the associated data type. In the stream of this example, there are four queues, one for each of the four stream destinations (streaming inputs). These cues are mostly invisible to the programmer.

ひとたびストリームＣプログラムが実行（動作）を開始すると、値がストリームに入る唯一の方法は、ストリームソースによる。より多くのストリームのうちの１つは、すでにストリーム中に含まれている値を要求する、有向サイクル（ｄｉｒｅｃｔｅｄｃｙｃｌｅ）を形成し得る。最も簡単な係るサイクルは、
ｘＳｔｒｍ＝ｘＳｔｒｍ＋ｙＳｔｒｍ
と等価である
ｘＳｔｒｍ＋＝ｙＳｔｒｍ
におけるように、ストリーム代入の両辺にストリームが現れるときに生じる。 Once the stream C program starts executing (action), the only way the value gets into the stream is by the stream source. One of the more streams may form a directed cycle, which requires the values already contained in the stream. The simplest such cycle is
xStrm = xStrm + yStrm
Is equivalent to xStrm + = yStrm
This occurs when streams appear on both sides of the stream assignment, as in.

図９Ａはこの代入の第１の図的表現４００である。なお、この代入においては、有向サイクルが、＋演算子の出力から同一の＋演算子の２つの入力のうちの１つに向かうフィードバック経路からなる。＋演算子が各入力ストリームからの値を消費して出力ストリーム上において値を作成することができないのは、この経路上において値が欠落しているため
である。ゆえに、第２の図的表現４０２に示すように、値４０４が、実行の開始前に、フィードバック経路上に置かれない限り、＋演算子は決して始動することがない。 FIG. 9A is a first pictorial representation 400 of this assignment. Note that in this assignment, the directed cycle consists of a feedback path from the output of the + operator to one of the two inputs of the same + operator. The + operator can not consume the values from each input stream to create a value on the output stream because the values are missing on this path. Thus, as shown in the second graphical representation 402, the + operator will never fire unless the value 404 is placed on the feedback path before the start of execution.

他の問題は、１つの単一ソースストリームの、他の単一ソースストリームに対するオフセットの変化に関する。例えば、ａＳｔｒｍおよびｂＳｔｒｍの両方が、図９Ｂに図的に表現される、
ａＳｔｒｍ＋ｂＳｔｒｍ
等の同一のモジュールまたはストリーム式に対する入力であり、そのモジュールまたは式が、これらのストリームから、対の形で、すなわちａＳｔｒｍから１つの値およびｂＳｔｒｍから１つの値を消費する場合。ａＳｔｒｍ（ｎ）（すなわち、ａＳｔｒｍ上に第ｎ番目に到達する値）が、ｂＳｔｒｍ（ｎ＋２）（すなわち、ｂＳｔｒｍ上に第ｎ＋２番目に到達する値）と対になることが望まれる場合。したがって、ａＳｔｒｍ（０）はｂＳｔｒｍ（２）と、ａＳｔｒｍ（１）はｂＳｔｒｍ（３）と（以下同様）対となる。 Another problem relates to the change in offset of one single source stream to another single source stream. For example, both aStrm and bStrm are graphically represented in FIG. 9B,
aStrm + bStrm
Etc. If the input to the same module or stream expression, etc., that module or expression consumes one stream from these streams, ie one value from aStrm and one value from bStrm. When it is desired that aStrm (n) (ie, the nth reaching value on aStrm) be paired with bStrm (n + 2) (ie, the n + 2th reaching value on bStrm). Therefore, aStrm (0) is paired with bStrm (2), and aStrm (1) with bStrm (3) (the same applies hereinafter).

両方の問題に対するソリューンは、
＜ストリーム識別子＞．ｉｎｉｔｉａｌｉｚｅ（＜値リスト＞）；
の形を取る、ストリーム初期化文により提供される。 The solution to both problems is
<Stream identifier>. initialize (<value list>);
Provided by the stream initialization statement, which takes the form of

ストリームＣコンパイラ／リンカ／ローダがこの文と遭遇すると、ストリームＣコンパイラ／リンカ／ローダは、ＦＩＦＯキューを＜ストリーム識別子＞の各デスティネーションに挿入すること（デスティネーションがモジュール定義の出力パラメータである場合も、ストリーム文またはスレッドの入力引数である場合も）と、そのキューのサイズを、Ｔデータ型の少なくともｎ＋１個の値を保持することができるよう設定すること（ただし、ｎは＜値リスト＞中の値の個数であり、Ｔは＜ストリーム識別子＞のデータ型である）と、＜値リスト＞中の値を、＜値リスト＞中の第１番目の値がキューの前部（頭部）に置かれる状態の順序でキューに置くこととを指示するものとして、この文を解釈する。 When the stream C compiler / linker / loader encounters this statement, the stream C compiler / linker / loader inserts a FIFO queue into each <stream identifier> destination (if the destination is a module defined output parameter) Set the size of the queue so that it can hold at least n + 1 values of T data type (even if n is a <value list>) Where T is the data type of <stream identifier> and the values in <value list>, the first value in <value list> is the front of the queue (head Interpret this sentence as indicating that it should be queued in the order of the states to be put into).

例えば、図９Ａにおいて、デッドロックを防ぐために、図的表現４０２において、値４０４が、フィードバック経路に、および、式出力にも、
ｘＳｔｒｍ．ｉｎｉｔｉａｌｉｚｅ（０）；
により挿入される。この文により、２つのＦＩＦＯキューが、ｘＳｔｒｍの各デスティネーションに対して、作成される（フィードバック経路のデスティネーションにおけるキューは、前のセクションで説明したように、すでに挿入されている）。ｘＳｔｒｍがｉｎｔ型であると仮定すると、各ストリームのサイズは少なくとも２×ｓｉｚｅｏｆ（ｉｎｔ）であり、システム初期化時においてｉｎｔ値０が各キューの先頭に位置する。このことは、図９Ａにおけるフローチャート４０２において図示される。このようにｘＳｔｒｍが初期化され、代入ｘＳｔｒｍ＋＝ｙＳｔｒｍの出力は、 For example, in FIG. 9A, to prevent deadlock, in graphical representation 402, the value 404 is on the feedback path and also on the equation output:
xStrm. initialize (0);
Inserted by This statement creates two FIFO queues for each destination of xStrm (the queues at the destination of the feedback path have already been inserted as described in the previous section). Assuming that xStrm is of type int, the size of each stream is at least 2 × sizeof (int), and an int value of 0 is located at the head of each queue at system initialization. This is illustrated in flow chart 402 in FIG. 9A. Thus, xStrm is initialized, and the output of substitution xStrm + = yStrm is

となる。図９Ｂにおける第２の図的表現４１２におけるｂＳｔｒｍに対するａＳｔｒｍ
のオフセットを変えることは、同様の方法で対処される。しかし、ここでは、２つの値がａＳｔｒｍのＦＩＦＯキューに挿入される。なぜなら、ｂＳｔｒｍに対して２つの値によりａＳｔｒｍをオフセットすることが望まれ得るためである。このことは、１および２が、システム初期化時にａＳｔｒｍのキューに挿入される２つの値４１４として選択された、以下のストリーム初期化文
ａＳｔｒｍ．ｉｎｉｔｉａｌｉｚｅ（１，２）；
を用いることにより達成される。この初期化の結果は図９Ｂにおける表現４１２に図示される。このようにｘＳｔｒｍが初期化され、代入ｘＳｔｒｍ＋＝ｙＳｔｒｍの出力に現れる値は、 It becomes. AStrm to bStrm in the second graphical representation 412 in FIG. 9B
Varying the offset of is addressed in a similar manner. However, here, two values are inserted into the aStrm FIFO queue. This is because it may be desirable to offset aStrm by two values for bStrm. This means that 1 and 2 are selected as two values 414 to be inserted into the aStrm queue at system initialization, and the following stream initialization statement aStrm. initialize (1, 2);
Is achieved by using The result of this initialization is illustrated in representation 412 in FIG. 9B. Thus, xStrm is initialized and the value appearing at the output of the assignment xStrm + = yStrm is

となる。 It becomes.

Ｃ変数の場合と同様に、特定のストリーム宣言は文脈上から、例えば、モジュール入力または出力パラメータとして現れることにより、暗黙的になされるが、すべてのストリームは使用の前に宣言されなければならない。明示的なストリーム宣言のシンタックスは、Ｃ変数宣言のシンタックスに従うが、この場合の宣言はキーワードｓｔｒｅａｍで始まる。
ｓｔｒｅａｍ＜記憶クラス識別子＞_省略可能＜型＞＜識別子リスト＞；
記憶クラス識別子を有さないストリーム宣言の例を下にいくつか挙げる。
ｓｔｒｅａｍｉｎｔｘＳｔｒｍ，ｙＳｔｒｍ；
ｓｔｒｅａｍｃｈａｒｃＳｔｒｍ；
ｓｔｒｅａｍｄｏｕｂｌｅｄＳｔｒｍ；
５つのＣにおける記憶クラス識別子、すなわち、ａｕｔｏ、ｒｅｇｉｓｔｅｒ、ｓｔａｔｉｃ、ｅｘｔｅｒｎ、およびｔｙｐｅｄｅｆのうち、ストリーム宣言においては、
ｓｔｒｅａｍｓｔａｔｉｃｉｎｔｘＳｔｒｍ，ｙＳｔｒｍ；
のように、ｓｔａｔｉｃのみが許可される。 As in the case of C variables, certain stream declarations are made implicit from the context, for example, by appearing as module input or output parameters, but all streams must be declared before use. The syntax of explicit stream declaration follows the syntax of C variable declaration, but the declaration in this case starts with the keyword stream.
stream <storage class identifier> _Optional <type><identifierlist>;
Here are some examples of stream declarations that do not have storage class identifiers:
stream int xStrm, yStrm;
stream char cStrm;
stream double dStrm;
Of the five C storage class identifiers: auto, register, static, extern, and typedef, in the stream declaration:
stream static int xStrm, yStrm;
Like, only static is allowed.

ｓｔａｔｉｃならびに非ｓｔａｔｉｃのストリーム宣言は、宣言が現れる文脈により決定される。係る文脈は３つ存在し、それぞれが、それぞれのスコープ規則を有する。それぞれの場合において、ストリーム宣言のスコープ規則は、変数宣言の対応物のスコープ規則と同じである。記憶クラス識別子を有さず且つモジュールの内部に現れるストリーム宣言に対しては、宣言スコープは、宣言からモジュールの終わりまで及ぶ。記憶クラス識別子を有さず且つモジュール（および関数）の外部に現れるストリーム宣言に対しては、宣言スコープはグローバルであり、すなわち、プログラム全体に対して可視である。ｓｔａｔｉｃ記憶クラス識別子を有し且つすべてのモジュール（および関数）の外部に現れるストリーム宣言に対しては、その宣言のスコープは、宣言から、宣言が現れるソースファイルの終わりまで及ぶ。 Static and non-static stream declarations are determined by the context in which they appear. There are three such contexts, each with its own scoping rules. In each case, the scope rules of the stream declaration are the same as those of the counterparts of the variable declaration. For stream declarations that do not have a storage class identifier and appear inside a module, the declaration scope extends from declaration to the end of the module. For stream declarations that do not have a storage class identifier and appear outside of a module (and function), the declarative scope is global, ie visible to the whole program. For stream declarations that have static storage class identifiers and appear outside of all modules (and functions), the scope of the declaration extends from the declaration to the end of the source file where the declaration appears.

変数には関係するがストリームには関係しない記憶クラス識別子に関するいくつかの宣言形態は、このリストに現れない。Ｃにおいては、ａｕｔｏ記憶クラス識別子を用いて宣
言された、またはまったく識別子を用いないで宣言された、これらの変数は、関数呼び出しの間では値を失う。ストリームはモジュール内においてのみ作用し、モジュールは呼び出されないため（モジュールは常に動作状態にある）、自動ストリームはそもそも無意味な概念である。したがって、ａｕｔｏ記憶クラス識別子はストリーム宣言に適用されない。 Some declarative forms relating to storage class identifiers related to variables but not to streams do not appear in this list. In C, these variables, declared with or without the auto storage class identifier, lose their value between function calls. Since streams operate only in modules, and modules are not called (modules are always in operation), autostreaming is a meaningless concept in the first place. Thus, the auto storage class identifier does not apply to stream declarations.

ｓｔａｔｉｃ識別子を用いて宣言され且つ関数の内部に現れる変数宣言は、宣言された変数が関数コール（関数呼び出し）間においてその値を保持することを示す。しかしモジュールの場合には、コールの概念が存在せず、したがって、ｓｔａｔｉｃ識別子はモジュール内部では無意味である。したがって、ｓｔａｔｉｃ識別子はモジュールスコープ内では用いられない。 A variable declaration declared using a static identifier and appearing inside a function indicates that the declared variable holds its value between function calls (function calls). However, in the case of modules, there is no notion of a call, so static identifiers are meaningless inside the module. Therefore, static identifiers are not used within module scope.

変数宣言に対して、ｅｘｔｅｒｎ記憶クラス識別子は、宣言および定義として働くグローバル変数のこれらの宣言と、単に宣言として働くグローバル変数の宣言とを区別することを支援する。しかしストリームの場合には、ストリーム宣言において記憶領域が決して除外されないため、宣言は決して定義にはならない。記憶領域は、以下のストリームＦＩＦＯのセクションにおいて説明されるように、ストリーム定義時においてのみ割り当てられる。ｒｅｇｉｓｔｅｒおよびｔｙｐｅｄｅｆの記憶クラス識別子は、ストリームにおいてはまったく妥当性がなく、ストリーム宣言において現れることがない。 For variable declarations, extern storage class identifiers help to distinguish between those declarations of global variables that act as declarations and definitions, and those of global variables that merely act as declarations. However, in the case of a stream, the declaration is never defined, as storage is never excluded in the stream declaration. Storage areas are allocated only at stream definition, as described in the stream FIFO section below. The storage class identifiers of register and typedef are not valid at all in streams and can not appear in stream declarations.

ストリーム式は、通常のＣ式に対するストリームにおける対応物である。すべての変数に対して入力ストリームが取って代わること、および結果に対して出力ストリームが取って代わることは別として、これら２つの種類の式は極めて類似性が高い。式においては、変数と定数とが組み合わされて新しい値が作成されるが、一方、ストリーム式においては、ストリームと定数とが組み合わされて、新しいストリームが作られる。Ｃ式とストリーム式の構造はほぼ同一である。すべてのＣ演算子は、ストリーム式において有効な演算子である。同じ演算子の優先度が、Ｃ式とストリーム式との両方に当てはまる。Ｃ関数コールは、ちょうどＣ式で認められるのと同様に、ストリーム式においても認められる。単一の出力ストリームを有するモジュールのインスタンス化はストリーム式において認められ、関数コールと同様に取り扱われる。 A stream expression is the counterpart in the stream to a normal C expression. Apart from the fact that the input stream replaces all variables and that the output stream replaces the result, these two types of expressions are very similar. In expressions, variables and constants are combined to create new values, while in stream expressions, streams and constants are combined to create new streams. The structures of the C equation and the stream equation are almost identical. All C operators are valid operators in stream expressions. The same operator precedence applies to both C expressions and stream expressions. C function calls are also accepted in stream expressions, just as in C expressions. Instantiation of modules with a single output stream is allowed in stream expressions and is treated the same as function calls.

Ｃ式とストリーム式との間の相違点は、第１に、評価が行われる時点および方法にある。Ｃ式については、制御のスレッドが、式を含む文に到達した時点で、評価が行われる。その評価は、最初に、各変数をその現時点での値に置き換え、次いで、演算子の優先度にしたがって必要な演算を行うことにより、行われる。次いで、最後の演算により返される値が評価結果として供給される。 The differences between equation C and the stream equation are primarily at the time and manner in which the evaluation takes place. For C expressions, evaluation occurs when the thread of control reaches a statement that contains the expression. The evaluation is done by first replacing each variable with its current value and then performing the necessary operations according to the precedence of the operator. The value returned by the last operation is then provided as the evaluation result.

Ｃ式の評価とは異なり、Ｃストリームプログラム言語におけるストリーム式の評価は、制御のスレッドには縛られない。代わって、ストリーム式は、便宜主義的に評価される。従来のように、評価は、演算子の優先度にしたがって必要な演算を行うことにより行われる。変数に対して値を置き換える代わりに、値は、式入力に属する各ＦＩＦＯキューから消費（ポップ）される。ＦＩＦＯキューは、ストリーム式の入力であるすべてのストリームデスティネーションにおいて挿入される。評価は便宜主義的である。なぜなら、式の各入力ＦＩＦＯキューに少なくとも１つの値が存在するときは常に評価が行われるためである。従来のように、評価により作られる結果は、最後の演算により返された値である。しかし結果は、Ｃ式の場合とは異なる方法で対処される。Ｃ式に対しては、結果が代入される用法は、式の文脈により決定される。ストリーム式に対しては、結果は式の出力ストリーム（式が代入かどうかに応じて、名称を持つ場合も持たない場合もある）に単に代入される。 Unlike the evaluation of C expressions, the evaluation of stream expressions in the C stream programming language is not tied to the thread of control. Instead, stream expressions are evaluated opportunistically. As is conventional, the evaluation is done by performing the necessary operations according to the priority of the operators. Instead of replacing the values for the variables, the values are popped from each FIFO queue belonging to the expression input. A FIFO queue is inserted at every stream destination that is a streamed input. Evaluation is opportunistic. This is because evaluation is performed whenever there is at least one value in each input FIFO queue of the equation. As is conventional, the result produced by the evaluation is the value returned by the last operation. However, the results are dealt with differently than in the case of the C formula. For C expressions, the usage in which the result is substituted is determined by the context of the expression. For stream expressions, the result is simply assigned to the output stream of the expression (which may or may not have a name, depending on whether the expression is a substitution).

ストリーム式の１例が、ｘＳｔｒｍ、ｙＳｔｒｍ、およびｚＳｔｒｍがすべてｉｎｔ型のストリームである、以下の式
ｘＳｔｒｍ＊ｙＳｔｒｍ＋５＊ｚＳｔｒｍ
において示され得る。これら３つのストリームに到達する値は、次のように始まる。 The following equation xStrm * yStrm + 5 * zStrm, where xStrm, yStrm, and zStrm are all int type streams.
Can be shown in The values reaching these three streams begin as follows:

すると、ｘＳｔｒｍ＊ｙＳｔｒｍ＋５＊ｚＳｔｒｍの（名称を有さない）出力スト
リームに代入される最初の３つの値は、以下のようになる。 Then the first three values assigned to the (without a name) output stream of xStrm * yStrm + 5 * zStrm are as follows:

ストリーム式の中では、特にストリーム代入が注目される。係るストリーム代入には２つの型が存在し、第１の型は
＜ストリーム識別子＞＝＜ストリーム式＞
の形を有する。 In stream expressions, stream assignment is particularly noted. There are two types of such stream assignment, and the first type is <stream identifier> = <stream expression>
It has the form of

そのＣにおける対応物、すなわち変数への代入と同様に、この型のストリーム代入は副作用を有する。その出力ストリームに値を供給することに加えて、ストリーム代入は、右辺（ＲＨＳ：ｒｉｇｈｔ−ｈａｎｄ−ｓｉｄｅ）式の出力を左辺（ＬＨＳ：ｌｅｆｔ−ｈａｎｄ−ｓｉｄｅ）のソースにし、そのプロセス中に、ＲＨＳ式の出力ストリームを代入の出力ストリームとする。ストリーム代入は、ＲＨＳ式の出力ストリームが名称を有さない場合は、その出力ストリームに名称も与える。より大きい式の部分式の出力ストリームに名称は必要ではないが、名称は、出力ストリームが任意の包含するｓｕｐｅｒ式（ｓｕｐｅｒｅｘｐｒｅｓｓｉｏｎ）の外側のデスティネーションに宛てられなければならない場合は、不可欠となる。 As with its counterpart in C, that is, assignment to a variable, this type of stream assignment has side effects. In addition to supplying values to the output stream, stream substitution makes the output of the right-hand (RHS) expression the source of the left-hand-side (LHS), and during the process Let the output stream of RHS expression be the output stream of substitution. Stream substitution also gives a name to the output stream of the RHS expression if the output stream does not have a name. The name is not required for the output stream of the larger expression subexpression, but the name is essential if the output stream has to be addressed to a destination outside of any containing superexpression.

以下の部分的コートにおける式代入文は１つの例である。ストリーム式は、代入の場合もそれ以外の場合も、セミコロンが後尾に付されるとストリーム文となる。
ｉｎｔＦ（ｉｎｔ，ｉｎｔ）；
ｉｎｔＧ（ｉｎｔ）；
ｓｔｒｅａｍｉｎｔＭ（ｉｎｔｘＳｔｒｍ，ｉｎｔｙＳｔｒｍ）
｛
．
．
．
ｏｕｔ＝Ｆ（ｘＳｔｒｍ，Ｇ（ｙＳｔｒｍ））；
．
．
．
｝
式Ｆ（ｘＳｔｒｍ，Ｇ（ｙＳｔｒｍ））および部分式Ｇ（ｙＳｔｒｍ）は、それぞれ、ストリーム文をストリーム式として有する。Ｇ（ｙＳｔｒｍ）の場合、出力ストリームは名称を有さない。なぜなら、ストリームのデスティネーションは、式の文脈から明らかであるためである。すなわち、デスティネーションはｓｕｐｅｒ式Ｆ（ｘＳｔｒｍ，Ｇ（ｙＳｔｒｍ））における関数Ｆの第２の入力引数である。しかし、Ｆ（ｘＳｔｒｍ，Ｇ（ｙＳｔｒｍ））の出力ストリームの場合は名称が必要となる。なぜなら、デスティネーションが式の外部にあるためである。その名称は、代入式
ｏｕｔ＝Ｆ（ｘＳｔｒｍ，Ｇ（ｙＳｔｒｍ））
において割り当てられる。この代入により、Ｆ（ｘＳｔｒｍ，Ｇ（ｙＳｔｒｍ））の出力はｚＳｔｒｍのソースとなり、ｚＳｔｒｍは単一のデスティネーション、モジュールＭの出力パラメータを有する。 The formula substitution statement in the partial coat below is an example. A stream expression becomes a stream statement when a semicolon is added at the end, in both assignment and other cases.
int F (int, int);
int G (int);
stream int M (int xStrm, int yStrm)
{
.
.
.
out = F (xStrm, G (yStrm));
.
.
.
}
Formula F (xStrm, G (yStrm)) and subexpression G (yStrm) each have a stream statement as a stream expression. For G (yStrm), the output stream has no name. Because the destination of the stream is apparent from the context of the expression. That is, the destination is the second input argument of the function F in the super expression F (xStrm, G (yStrm)). However, in the case of an output stream of F (xStrm, G (yStrm)), a name is required. Because the destination is outside the expression. Its name is the assignment expression out = F (xStrm, G (yStrm))
Assigned at By this substitution, the output of F (xStrm, G (yStrm)) becomes the source of zStrm, and zStrm has a single destination, the output parameter of module M.

ストリーム代入の第２の型は、
（＜コンマで分割されたストリーム識別子のリスト＞）＝＜モジュールインスタンス化＞
の形を取る。この型は、複数出力モジュールの出力を複数の名称を有するストリームのソースにすることが望まれるときに生じる。例示すると、ｔａｐの第１の出力がｉｎｔストリームｘのソースであり、ｔａｐの第２の出力がｉｎｔストリームｙのソースである場合における、以下の複数出力モジュール
ｓｔｒｅａｍ（ｉｎｔ，ｉｎｔ）ｔａｐ（ｉｎｔ，ｉｎｔ，ｉｎｔ）；
である。これは、ストリーム代入
（ｉｎｔｘ，ｉｎｔｙ）＝ｔａｐ（ａｒｇ１，ａｒｇ２，ａｒｇ３）；
を用いて達成される。この代入により、モジュールのｉ番目の出力はｉ番目のストリームのソースとなり、モジュールの出力ストリームが名称を有さない場合には、その出力ストリームに名称が与えられる。 The second type of stream assignment is
(<List of stream identifiers separated by commas>) = <Module instantiation>
Take the form of This type occurs when it is desired to make the output of multiple output modules the source of a stream with multiple names. To illustrate, if the first output of tap is the source of int stream x and the second output of tap is the source of int stream y, the following multiple output module stream (int, int) tap (int, int, int);
It is. This is stream assignment (int x, int y) = tap (arg1, arg2, arg3);
Is achieved using By this substitution, the ith output of the module becomes the source of the ith stream, and if the output stream of the module does not have a name, the output stream is given a name.

モジュール本体内の文は、２つのカテゴリー、すなわち、スレッドおよびストリームに分類される。ストリーム文はストリームを取り扱うが、変数は取り扱わない。スレッド文は変数を取り扱い、いくつかの場合においては、ストリームも取り扱う。スレッドドメインにおける文は、大部分がＣ文であり、Ｃ文と同様に、本質的に命令的（手続的）であって、ステップ毎の手順を定義する。制御の逐次フロー（多くの場合、スレッドと称される）は、係る手順と関連付けられ、文が実行される順序を支配する。それに対して、ストリーム文は宣言型である。係る文のそれぞれは、文中に現れるストリームについて宣言する。スレッドドメインにおいて存在するようなステップ毎の手順の概念は存在せず、したがって、モジュール本体内におけるストリーム文の順序は、１つの例外を除いて、重要ではない。変数は使用前に宣言される必要があるように、ストリームもまた使用前に宣言されなければならない。 The statements in the module body fall into two categories: threads and streams. Stream statements deal with streams but not variables. Thread statements deal with variables, and in some cases also streams. Sentences in the thread domain are mostly C sentences, and like C sentences, are essentially imperative (procedural) and define step-by-step procedures. The sequential flow of control (often referred to as a thread) is associated with such a procedure and governs the order in which statements are executed. In contrast, stream statements are declarative. Each such statement declares a stream that appears in the statement. There is no notion of a step-by-step procedure as in the thread domain, so the order of stream statements within the module body is not important, with one exception. Just as variables need to be declared before use, streams must also be declared before use.

ストリームドメインの性質により、制御フローを取り扱うこれらのＣ文、特に、ｉｆ−ｅｌｓｅ、ｅｌｓｅ−ｉｆ、ｓｗｉｔｃｈ、ｆｏｒ、ｗｈｉｌｅ、ｄｏ−ｗｈｉｌｅ、ｂｒｅａｋ、ｃｏｎｔｉｎｕｅ、ｇｏｔｏ、およびｒｅｔｕｒｎに対する対応物は存在しな
い。事実、ストリームドメインにおける唯一の文型は、Ｃ式文に対するストリームにおける対応物であり、Ｃにおけるように、最も一般的な式文は代入文である。ストリーム式文は、次の２つの形
＜ストリーム式＞；
ｓｔｒｅａｍ＜ストリーム式＞；
のうちの１つを有する。一方、ストリーム代入文は、次の２つの形
＜ストリーム識別子＞＝＜ストリーム式＞；
ｓｔｒｅａｍ＜ストリーム識別子＞＝＜ストリーム式＞；
（＜コンマで分割されたストリーム識別子のリスト＞）＝＜モジュールインスタンス化＞
のうちの１つを有する。 Due to the nature of the stream domain, there is no counterpart to those C statements that handle control flow, in particular if-else, else-if, switch, for, while, do-while, break, continue, goto, and return. In fact, the only sentence type in the stream domain is the counterpart in the stream to a C expression sentence, and as in C, the most common expression sentence is an assignment sentence. The stream expression statement has the following two forms <stream expression>;
stream <stream expression>;
Have one of the On the other hand, the stream assignment statement has the following two forms <stream identifier> = <stream expression>;
stream <stream identifier> = <stream expression>;
(<List of stream identifiers separated by commas>) = <Module instantiation>
Have one of the

モジュール、ストリームインスタンス化、ストリーム宣言、ストリーム式、およびストリーム文を使用する用途例は、デジタル信号処理において一般に使用される構文である有限インパルス応答（ＦＩＲ：ｆｉｎｉｔｅ−ｉｍｐｕｌｓｅ−ｒｅｓｐｏｎｓｅ）フィルタである。ＦＩＲフィルタは、離散型時間入力信号を離散型時間出力信号へと変換する。図１０は、５タップＦＩＲフィルタ５００の図であり、図中、Ｘ（ｚ）は離散型時間入力５０２を、Ｙ（ｚ）は離散型時間出力５０４を表す。ｚ^−１と表記される、一連の単位遅延５０６は、それぞれが、受信する離散型時間信号を１クロックサイクルだけ遅延させる。それぞれが、受信する離散型時間信号に対して一定の係数ｈ（ｉ）を乗算する、一連の乗算器５０８。最後に、それぞれが、２つの受信信号を加算する、Σと表記された一連の加算器５１０。フィルタ５００は５タップフィルタと称される。なぜなら、受信する離散型時間信号の５つの遅延されたバージョンのそれぞれが別個の係数を乗算され、結果として、５つの結果として生じた積が加算されるためである。 Examples of applications that use modules, stream instantiations, stream declarations, stream expressions, and stream statements are finite impulse response (FIR) filters, a syntax commonly used in digital signal processing. FIR filters convert discrete time input signals into discrete time output signals. FIG. 10 is a diagram of a 5-tap FIR filter 500, where X (z) represents discrete time input 502 and Y (z) represents discrete time output 504. FIG. A series of unit delays 506, denoted z ⁻¹ , each delay the received discrete time signal by one clock cycle. A series of multipliers 508, each multiplying the received discrete time signal by a constant factor h (i). Finally, a series of adders 510, labeled Σ, each adding two received signals. The filter 500 is referred to as a 5-tap filter. This is because each of the five delayed versions of the received discrete time signal is multiplied by a separate coefficient, and as a result, the five resulting products are added.

離散型時間信号は、サンプルのストリームとして表される。乗算器５０８および加算器５１０のそれぞれはストリーム式として表される。単位遅延はストリームインスタンス化として表される。１つまたは複数の値を有するストリームを初期化することにより、値は、そのストリーム内で、第２のストリーム内の値に対して、オフセット（遅延）される。これは、ＵｎｉｔＤｅｌａｙモジュールの動作の基礎となる原理である。
ｓｔｒｅａｍｉｎｔＵｎｉｔＤｅｌａｙ（ｉｎｔＸ）
｛
ｏｕｔ＝Ｘ；
ｏｕｔ．ｉｎｉｔｉａｌｉｚｅ（０）；
｝
ＵｎｉｔＤｅｌａｙの本体において、ストリーム代入文
ｏｕｔ＝Ｘ；
により、ＵｎｉｔＤｅｌａｙの入力ストリームであるＸは、ＵｎｉｔＤｅｌａｙのデフォルト出力ストリームであるｏｕｔのソースとなる。一方、ストリーム初期化文
ｏｕｔ．ｉｎｉｔｉａｌｉｚｅ（０）；
は、システム初期化時に、値０をｏｕｔに挿入する。ｏｕｔにおけるこの初期値は、ｏｕｔにおける後続のすべての値を１つの値分だけオフセット（遅延）させる効果を有する。 Discrete time signals are represented as a stream of samples. Each of multiplier 508 and adder 510 is represented as a stream equation. Unit delays are represented as stream instantiations. By initializing a stream having one or more values, the values are offset (delayed) within that stream relative to the values in the second stream. This is the principle underlying the operation of the UnitDelay module.
stream int UnitDelay (int X)
{
out = X;
out. initialize (0);
}
In the body of UnitDelay, a stream assignment statement out = X;
Thus, X, which is an input stream of UnitDelay, becomes a source of out which is a default output stream of UnitDelay. On the other hand, the stream initialization statement out. initialize (0);
Inserts the value 0 into out at system initialization. This initial value in out has the effect of offsetting (delaying) all subsequent values in out by one value.

以下は、１０、２０、３０、４０、および５０を任意に選択された５つのフィルタ係数として含む、図１０におけるフィルタ５００等の５タップＦＩＲフィルタのストリームＣ実装である。
ｓｔｒｅａｍｉｎｔＵｎｉｔＤｅｌａｙ（ｉｎｔＸ）
｛
ｏｕｔ＝Ｘ；
ｏｕｔ．ｉｎｉｔｉａｌｉｚｅ（０）；
｝

ｓｔｒｅａｍ（ｉｎｔｘＯｕｔ，ｉｎｔｙＯｕｔ）ｔａｐ（ｉｎｔ
ｘＩｎ，ｉｎｔｙＩｎ，ｉｎｔｈ）
｛
ｘＯｕｔ＝ＵｎｉｔＤｅｌａｙ（ｘＩｎ）；
ｙＯｕｔ＝ｙＩｎ＋ｈ＊ｘＯｕｔ；
｝

ｓｔｒｅａｍｉｎｔＦＩＲ５（ｉｎｔＸ）
｛
（ｉｎｔｘ２，ｉｎｔｙ２）＝ｔａｐ（Ｘ，１０＊Ｘ，
２０）；
（ｉｎｔｘ３，ｉｎｔｙ３）＝ｔａｐ（ｘ２，ｙ２，３０）；
（ｉｎｔｘ４，ｉｎｔｙ４）＝ｔａｐ（ｘ３，ｙ３，４０）；
（ｉｎｔ，ｏｕｔ）＝ｔａｐ（ｘ４，ｙ４，５０）；
｝ The following is a stream C implementation of a 5-tap FIR filter, such as filter 500 in FIG. 10, containing 10, 20, 30, 40, and 50 as 5 arbitrarily selected filter coefficients.
stream int UnitDelay (int X)
{
out = X;
out. initialize (0);
}

stream (int xOut, int yOut) tap (int
xIn, int yIn, int h)
{
xOut = UnitDelay (xIn);
yOut = yIn + h * xOut;
}

stream int FIR5 (int X)
{
(Int x2, int y2) = tap (X, 10 * X,
20);
(Int x3, int y3) = tap (x2, y2, 30);
(Int x4, int y4) = tap (x3, y3, 40);
(Int, out) = tap (x4, y4, 50);
}

この実装は並列性を示すが、明示的な並列性構文を用いることなく並列性が示されている。複数の名称を有するｔａｐの出力の他は通常の逐次コードと類似するコードから、この並列性は現れたものである。変数に代わって、ここではストリームが存在する。 This implementation shows parallelism, but without using explicit parallelism syntax. This parallelism emerges from a code similar to a normal sequential code except for the output of taps with multiple names. Instead of variables, here a stream exists.

ＦＩＲ５の本体内におけるｔａｐの４つインスタンス化のそれぞれは、式
ｙＩｎ＋ｈ＊ｘＯｕｔ
の、それ自身の複写を、３つの他のｔａｐのインスタンス化と並列的に演算する。このことは、ストリーム式の便宜主義的な性質により、および新しい入力値がｔａｐのインスタンス化のそれぞれへと継続的に到着することにより、可能となる。これらの新しい値は、ＦＩＲ５の５つの内部ストリームにより供給される。
Ｘは、ＦＩＲ５の入力から第１のｔａｐの入力へと値を伝える。
ｘ２およびｙ２は、第１のｔａｐの出力から第２のｔａｐの入力へと値を伝える。
ｘ３およびｙ３は、第２のｔａｐの出力から第３のｔａｐの入力へと値を伝える。
ｘ４およびｙ４は、第３のｔａｐの出力から第４のｔａｐの入力へと値を伝える。
ｔａｐの各インスタンス化の入力ｈは、定数により置き換えられる。このことにより、ストリームＣコンパイラはｔａｐインスタンス化内のｈのすべてのインスタンスを定数で置き換える。ｔａｐのインスタンス化により実行されるすべての演算により、ＦＩＲ５入力値はＦＩＲ５出力値へと変換される。
これらの最後の出力値が、ＦＩＲ５のデフォルト出力ストリームにより供給される。
ｏｕｔは、第４のｔａｐの出力からＦＩＲ５の出力へと値を伝える。
この実装は、多数のデジタル信号処理関数がどのようにストリームＣにおいて取り扱われるかの１つの例である。 Each of the four instantiations of tap within the body of FIR 5 has the formula yIn + h * xOut
Compute its own copy in parallel with the three other tap instantiations. This is made possible by the opportunistic nature of stream expressions, and by the continuous arrival of new input values into each of the tap's instantiations. These new values are provided by the five internal streams of FIR5.
X conveys a value from the input of FIR 5 to the input of the first tap.
x2 and y2 convey values from the output of the first tap to the input of the second tap.
x3 and y3 convey values from the output of the second tap to the input of the third tap.
x4 and y4 convey values from the output of the third tap to the input of the fourth tap.
The input h of each instantiation of tap is replaced by a constant. This causes the stream C compiler to replace all instances of h in the tap instantiation with constants. All operations performed by the instantiation of tap convert the FIR5 input values to FIR5 output values.
These last output values are provided by the default output stream of FIR5.
out conveys a value from the output of the fourth tap to the output of FIR5.
This implementation is one example of how many digital signal processing functions are handled in stream C.

上述のＦＩＲフィルタ例においては、５つの係数すなわち１０、２０、３０、４０、５０が、コンパイル時に既知である。しかしＦＩＲ５係数がコンパイル時に未知である場合、または係数が長い期間にわたって一定であるが、随時変更される場合は、他の技術を用いる必要がある。係る場合においては、これらの擬似的定数は、変化するため真の定数で
はなく、また、ストリーム式またはスレッドにより消費（ＦＩＦＯキューからポップ）されないため、真のストリームではない。 In the example FIR filter described above, five coefficients are known at compile time: 10, 20, 30, 40, 50. However, if the FIR5 coefficients are unknown at compile time, or if the coefficients are constant over time, but are changed from time to time, other techniques need to be used. In such a case, these pseudo constants are not true constants because they change, and they are not true streams because they are not consumed (streamed from a FIFO queue) by a stream expression or thread.

擬似的定数ストリームは、いくつかの面において通常のストリームに類似する。擬似的定数ストリームは、型と、１つまたは複数のソースと、１つまたは複数のデスティネーションと、名称とを有する。擬似的定数ストリームは、指定されたソースから指定されたデスティネーションへと指定された型を伝える。しかし、いくつかの点において、擬似的定数ストリームは通常のストリームとは異なる。通常のストリームがＦＩＦＯキューを有するのに対して、擬似的定数ストリームは、指定された型の１つの値のための記憶領域（変数と関連付けられた記憶領域に極めて類似する）を有する。係る記憶領域に存在する値は、ストリーム式またはスレッドによりアクセスされたときに、ポップされることも消費されることもなく、記憶領域内に留まり続ける。記憶された値は、新しい値がストリームソースのうちの１つからストリームに入ると、更新される。係るとき、新しい値は単に古い値を上書きする。この更新は典型的にはシステム動作とは非同期的になされるため、更新がストリームデスティネーションにおいて認識される時点は、一般に、非決定的である。擬似的定数ストリームの宣言は、システム初期化時に各ストリーム記憶位置に記憶される初期値を指定しなければならない。 Pseudo-constant streams are similar to regular streams in some aspects. The pseudo-constant stream has a type, one or more sources, one or more destinations, and a name. The pseudo-constant stream conveys the specified type from the specified source to the specified destination. However, in some respects, pseudo-constant streams differ from regular streams. Whereas a regular stream has a FIFO queue, a pseudo-constant stream has a storage area for one value of the specified type (very similar to the storage area associated with a variable). The values present in such storage areas continue to stay in the storage area without being popped or consumed when accessed by a streaming expression or thread. The stored values are updated as new values enter the stream from one of the stream sources. When involved, the new value simply overwrites the old value. Because this update is typically made asynchronous to system operation, the point at which the update is recognized at the stream destination is generally non-deterministic. The declaration of a pseudo-constant stream must specify an initial value to be stored in each stream location at system initialization.

擬似的定数ストリームは、単独型の宣言においても、モジュールの入力または出力パラメータリストにおいても、以下のシンタックス
ｃｏｎｓｔ＜ストリーム型＞＜ストリーム識別子＞＝＜初期値＞
を用いて宣言される。通常は変数のみに対して適用される既存のＣキーワードｃｏｎｓｔは、宣言されるストリームが擬似的定数ストリームであることを示す（ｃｏｎｓｔの使用は、新規キーワードの導入の手間を省く）。 The pseudo-constant stream has the following syntax, either in the singleton declaration or in the module's input or output parameter list:
const <stream type><streamidentifier> = <initial value>
Declared using. The existing C keyword const, which normally applies only to variables, indicates that the stream being declared is a pseudo-constant stream (the use of const saves the effort of introducing a new keyword).

これらの考えは、ＦＩＲ５モジュールの以下の変更例において示される。ここでは、もとの例における５つの係数すなわち１０、２０、３０、４０、および５０は、５つの擬似的定数ストリームｈ０、ｈ１、ｈ２、ｈ３、およびｈ４により置き換えられる。システム初期化時にこれらのストリームに挿入される初期値はもとの係数と同一であるため、新しいＦＩＲ５は、もとと同一の係数で動作を開始する。しかし、新しいＦＩＲ５に関しては、これらの係数は、状況が許可するならば、更新され得る。
ｓｔｒｅａｍｉｎｔＦＩＲ５（ｉｎｔＸ，ｃｏｎｓｔｉｎｔｈ０
＝１０，
ｃｏｎｓｔｉｎｔｈ１＝２０，
ｃｏｎｓｔｉｎｔｈ２＝３０，
ｃｏｎｓｔｉｎｔｈ３＝４０，
ｃｏｎｓｔｉｎｔｈ４＝５０，
｛
（ｉｎｔｘ２，ｉｎｔｙ２）＝ｔａｐ（Ｘ，ｈ０＊Ｘ，ｈ１）；
（ｉｎｔｘ３，ｉｎｔｙ３）＝ｔａｐ（ｘ２，ｙ２，ｈ２）；
（ｉｎｔｘ４，ｉｎｔｙ４）＝ｔａｐ（ｘ３，ｙ３，ｈ３）；
（ｉｎｔ，ｏｕｔ）＝ｔａｐ（ｘ４，ｙ４，ｈ４）；
｝ These ideas are illustrated in the following modifications of the FIR5 module. Here, the five coefficients in the original example, namely 10, 20, 30, 40 and 50, are replaced by five pseudo constant streams h0, h1, h2, h3 and h4. Because the initial values inserted into these streams at system initialization are identical to the original coefficients, the new FIR 5 starts operating with the same coefficients as the original. However, for the new FIR5, these coefficients can be updated if the situation permits.
stream int FIR5 (int X, const int h0
= 10,
const int h1 = 20,
const int h2 = 30,
const int h3 = 40,
const int h4 = 50,
{
(Int x2, int y2) = tap (X, h0 * X, h1);
(Int x3, int y3) = tap (x2, y2, h2);
(Int x4, int y4) = tap (x3, y3, h3);
(Int, out) = tap (x4, y4, h4);
}

図１１Ａは、入力ストリーム６０４上の一連のＦＩＦＯバッファ６０２が強調されたモジュール６００を示す。図１１ＢおよびＣは、ＦＩＦＯバッファ６０２およびモジュール６００を用いる、２つの追加の代替的な実装を示す。図１１Ｂは、ＦＩＦＯバッファ６０
２が一連の出力ストリーム６０６上でのみ使用される状態を示す。図１１Ｃは、ＦＩＦＯバッファ６０２が入力ストリーム６０４および出力ストリーム６０６の両方上で使用される状態を示す。プログラマの観点からは、図１１ＡからＣに示す３つの図は同等である。性能の観点からは、図１１Ｃにおけるように入力および出力にバッファを有することにより、モジュール６００は、ストリームを受け取るモジュール６００上の利用可能スペースを考慮することなく、スケジュールされることが可能となる。これは、追加的なメモリおよび別途のスケジュールステップのコストにより実現される。ＦＩＦＯバッファ６０２は、実装に応じて、仮想メモリスペース上、物理メモリスペース上、および登録ファイルスペース上に存在し得る。 FIG. 11A shows module 600 with a series of FIFO buffers 602 on input stream 604 highlighted. 11B and C show two additional alternative implementations using FIFO buffer 602 and module 600. 11B shows the FIFO buffer 60.
State 2 is used only on a series of output streams 606. FIG. 11C shows the FIFO buffer 602 being used on both the input stream 604 and the output stream 606. From the programmer's point of view, the three figures shown in FIGS. 11A to C are equivalent. From a performance point of view, having buffers at the input and output as in FIG. 11C allows the module 600 to be scheduled without considering the available space on the module 600 that receives the stream. This is realized by the cost of additional memory and additional scheduling steps. FIFO buffer 602 may reside on virtual memory space, physical memory space, and registered file space, depending on the implementation.

図１１Ａにおけるような入力ストリームＦＩＦＯに対する高レベルのスケジューリングアルゴリズムの１例を以下に示す。
ａ．次の場合、モジュールを実行するようスケジュールする｛
入力ストリーム（単数または複数）の入力ＦＩＦＯにデータが存在する
且つ
現在のモジュールの出力ストリームに接続されたモジュールの入力ストリームＦＩＦＯに利用可能スペースが存在する
｝ One example of a high level scheduling algorithm for an input stream FIFO as in FIG. 11A is shown below.
a. Schedule a module to run if:
Data is present in the input FIFO (s) of the input stream (s)
and
Available space in the module's input stream FIFO connected to the current module's output stream
}

図１０Ｂにおけるような出力ストリームＦＩＦＯに対する高レベルのスケジューリングアルゴリズムの１例を以下に示す。
ｂ．次の場合、モジュールを実行するようスケジュールする｛
現在のモジュールの入力ストリーム（単数または複数）に接続されたモジュールの出力ストリームＦＩＦＯにデータが存在する
且つ
出力ストリーム（単数または複数）のＦＩＦＯに利用可能スペースが存在する
｝ One example of a high level scheduling algorithm for an output stream FIFO as in FIG. 10B is shown below.
b. Schedule a module to run if:
There is data in the output stream FIFO of the module connected to the input stream (s) of the current module
and
Available space in FIFO of output stream (s)
}

図１０Ｃにおけるような入力および出力ストリームＦＩＦＯに対する高レベルのスケジューリングアルゴリズムの１例を以下に示す。
ｃ．次の場合、モジュールを実行するようスケジュールする｛
入力ストリーム（単数または複数）の入力ＦＩＦＯにデータが存在する
且つ
（現在のモジュールの出力ストリームに接続されたモジュールの入力ストリームＦＩＦＯに利用可能スペースが存在する
または
出力ストリーム（単数または複数）のＦＩＦＯに利用可能スペースが存在する）
｝
スレッド One example of a high level scheduling algorithm for the input and output stream FIFOs as in FIG. 10C is shown below.
c. Schedule a module to run if:
Data is present in the input FIFO (s) of the input stream (s)
and
(Available space exists in the input stream FIFO of the module connected to the output stream of the current module
Or
Available space in FIFO of output stream (s)
}
thread

スレッドは、ストリームＣが完全且つ包括的な言語となるにあたり必要不可欠な能力を提供する。スレッドは、Ｃ関数（すなわち、その入力が個別の値であり、その出力が単一の値である関数）の本体内か、またはモジュール（すなわち、その入力および出力が値のストリームである関数）の本体内に、現れ得る。これら２種類のスレッドは、モジュールの本体内のスレッドがストリームＣストリームにアクセスし得（通常はストリームＣストリームにアクセスする）、その理由のために、通常は終了しないことを除いて、同じである。また、Ｃ関数の本体内のスレッドは、ストリームＣストリームにアクセスせず、すべての（良好な挙動を示す）Ｃスレッドと同様に、終了する。 Threads provide the essential capabilities needed to make stream C a complete and comprehensive language. A thread is in the body of a C function (ie a function whose input is a discrete value and whose output is a single value) or a module (ie a function whose input and output are a stream of values) It can appear in the body of the. These two types of threads are the same except that threads in the body of the module can access stream C stream (usually access stream C stream), and for that reason they do not normally terminate. . Also, the threads in the body of the C function do not access the stream C stream and terminate like all C threads (showing good behavior).

ストリームＣスレッドの顕著な特性は、並列問題からの完全な乖離である。並列構文は存在せず、他のスレッドと直接的に相互作用することはなく、および新規スレッドは生成されない。したがって、ストリームＣスレッドに関しては、ストリームＣスレッドが複数スレッド環境で動作中であることを意識する必要がない。したがって、スレッドドメインで作業するプログラマは厳格に逐次的な問題に集中してよい。 A striking property of stream C threads is the complete departure from parallel problems. There is no parallel syntax, no direct interaction with other threads, and no new threads are created. Therefore, as for the stream C thread, it is not necessary to be aware that the stream C thread is operating in a multi-thread environment. Thus, programmers working in the thread domain may focus strictly on serial issues.

ストリームＣにおける関数宣言および関数定義は、Ｃにおける対応物と同じシンタックスおよびセマンティクスを有する。ストリームＣにおける関数コールに関しては、シンタックスおよびセマンティクスは、コールが（ａ）関数の本体に現れるか、または（ｂ）ストリーム式に現れるかに依存する。同じ関数（再帰関数）の本体、または他の関数の本体におけるストリームＣ関数コールは、通常のＣ関数コールと同じシンタックスおよびセマンティクスを有する。ストリーム式におけるストリームＣ関数コールは、Ｃ関数コールと同じシンタックスを有するが、ただし、ストリームが関数コール引数における変数と置き換わる。係るコールのセマンティクスは同じであるが、通常の関数コールのセマンティクスとは同じでない。相違点は、関数の各評価（コール）がどのように行われるかに関する。さらに詳細には、相違点は、（１）関数コール引数に現れるパラメータ（ストリーム）に対して値がどのように取得されるか、（２）関数コール出力のデスティネーション、および（３）制御がどのように取り扱われるか、に関する。 The function declarations and function definitions in stream C have the same syntax and semantics as their counterparts in C. For function calls in stream C, the syntax and semantics depend on whether the call appears (a) in the body of the function or (b) in the stream expression. Stream C function calls in the same function (recursive function) body, or in the body of other functions, have the same syntax and semantics as normal C function calls. Stream C function calls in stream expressions have the same syntax as C function calls, except that the stream replaces variables in the function call arguments. The semantics of such calls are the same, but not the semantics of normal function calls. The differences relate to how each evaluation (call) of the function is performed. More specifically, the difference is that (1) how values are obtained for parameters (streams) that appear in function call arguments, (2) destination of function call output, and (3) control It relates to how it is handled.

Ｃにおいて、関数コールに現れる引数に現れるパラメータはすべて変数であり、係る関数入力変数に代入される値は、その変数の現在値である。ストリームＣにおいては、ストリーム式関数コールの引数に現れるパラメータはすべてストリームであり、係る関数入力ストリームのそれぞれに代入される値は、（ａ）通常のストリームの場合は、ストリームデスティネーションにおけるＦＩＦＯキューからポップ（消費）される値である、または（ｂ）擬似的定数ストリームの場合は、そのストリームデスティネーションにおける現在の値である、のいずれかである。 In C, all parameters appearing in arguments appearing in function calls are variables, and the value assigned to the function input variable is the current value of that variable. In stream C, all parameters appearing in the arguments of the stream expression function call are streams, and the value assigned to each of such function input streams is (a) in the case of a normal stream, from the FIFO queue in the stream destination It is either a value that is popped (consumed) or (b) in the case of a pseudo-constant stream, it is the current value at that stream destination.

Ｃにおいて、関数コールにより変えられる値は、関数コール元に渡される。ストリームＣにおいては、ストリーム式関数コールにより返される値は、関数コール出力ストリーム（名称を有する場合も有さない場合もある）に代入される。ストリーム式そのものであるため、ストリーム式関数コールは常に出力ストリームを有する。出力値のデスティネーションは、ストリームのデスティネーションにより決定される。 In C, the value changed by the function call is passed to the function caller. In stream C, the value returned by the stream expression function call is assigned to the function call output stream (which may or may not have a name). Because it is a stream expression itself, stream expression function calls always have an output stream. The destination of the output value is determined by the destination of the stream.

Ｃにおいて、関数は、制御のスレッドがその関数に対するコールと遭遇するとき、コールされる。ストリームＣにおいては、ストリーム式関数コールは、制御スレッドに関わりなく評価される（すなわち、関数がコールされる）。代わって、関数は、関数コールの通常の入力ストリームのそれぞれのＦＩＦＯキューに少なくとも１つの値が存在するときは常に便宜主義的にコールされる。擬似的定数入力ストリームは、値を供給する準備が常に整っており、したがって、関数コールまたはストリーム式の評価を決して妨げることはない。 In C, a function is called when the thread of control encounters a call for that function. In stream C, streamed function calls are evaluated regardless of the control thread (ie, the function is called). Instead, the function is opportunistically called whenever there is at least one value in each FIFO queue of the normal input stream of the function call. The pseudo-constant input stream is always ready to supply a value, and thus never prevents the evaluation of a function call or stream expression.

これら３つの相違点は別として、通常のＣ関数コールおよびストリーム式関数コールのセマンティクスは同じである。このことは、両方の場合においてスレッドに基づくセマンティクスが関数実行に適用されることを意味する。 Apart from these three differences, the semantics of normal C function calls and streamed function calls are the same. This means that in both cases thread-based semantics apply to function execution.

以下の関数ＧＤＣの定義およびモジュールＧＤＣ４を有するＣストリームにおけるスレッドの１例が示され得る。
ｉｎｔＧＣＤ（ｉｎｔａ，ｉｎｔｂ）／／再帰関数
｛
ｉｆ（（ａ＞＝ｂ）＆＆（ａ％ｂ）＝＝０）／／スレッドの開始
｛
ｒｅｔｕｒｎ（ｂ）；
｝
ｉｆ（ａ＜ｂ）
｛
ｒｅｔｕｒｎＧＣＤ（ｂ，ａ）；／／関数コール
｝
ｒｅｔｕｒｎＧＣＤ（ｂ，（ａ％ｂ））；／／関数コール
｝
ｓｔｒｅａｍｉｎｔＧＣＤ４（ｉｎｔｗ，ｉｎｔｘ，ｉｎｔｙ，
ｉｎｔｚ）／／モジュール
｛
ｏｕｔ＝ＧＣＤ（ＧＣＤ（ｗ，ｘ），ＧＣＤ（ｙ，ｚ））；／／３つの関数コールを有するストリーム式
｝ One example of a thread in a C stream with the following definition of function GDC and module GDC4 may be shown.
int GCD (int a, int b) // Recursion function {
if ((a> = b) && (a% b) == 0) // Start of thread
{
return (b);
}
if (a <b)
{
return GCD (b, a); // Function call
}
return GCD (b, (a% b)); // Function call}
stream int GCD4 (int w, int x, int y,
int z) // module {
out = GCD (GCD (w, x), GCD (y, z)); / / Stream expression with / 3 function calls}

再帰関数の古典的な例であるＧＣＤは、２つの整数の最大公約数を返す。ＧＣＤは２つの整数入力ａおよびｂを有し、１つの整数結果を返す。ＧＣＤ４は、４つの整数ストリーム入力すなわちｗ、ｘ、ｙ、およびｚを有し、１つの整数ストリーム出力を有する。ストリーム式文
ｏｕｔ＝ＧＣＤ（ＧＣＤ（ｗ，ｘ），ＧＣＤ（ｙ，ｚ））；
が、ＧＣＤ４の本体内に存在し、ストリーム式
ＧＣＤ（ＧＣＤ（ｗ，ｘ），ＧＣＤ（ｙ，ｚ））
がこの文の中に存在する。 GCD, a classic example of a recursive function, returns the greatest common divisor of two integers. GCD has two integer inputs a and b and returns one integer result. GCD 4 has 4 integer stream inputs, w, x, y and z, and has 1 integer stream output. Stream expression statement out = GCD (GCD (w, x), GCD (y, z));
Exist in the body of GCD4, and stream expression GCD (GCD (w, x), GCD (y, z))
Is present in this sentence.

この式は、ストリームｗ、ｘ、ｙ、およびｚのデスティネーションを含むため、これら４つのデスティネーションのそれぞれにおいてＦＩＦＯキューが存在する。これらのキューは、関数コールＧＣＤ（ｗ，ｘ）およびＧＣＤ（ｙ，ｚ）が上述のように便宜主義的および並列的に評価（実行）されることを許可する。これらの２つのコールのように、ＧＣＤに対する第３のコールは、その２つの入力ストリームのＦＩＦＯキューから取得された入力値を用いて便宜主義的に実行される。これらの入力ストリームは、ＧＣＤに対する２つの他のコールの出力ストリームとして作られたものであり、そのために、これら２つのストリーム上のＦＩＦＯキューは、ＧＣＤに対する第３のコールが最初の２つのコールと並列的に実行されることを可能にする。この第３の関数コールの出力ストリームは、ｏｕｔへのストリーム代入により、ＧＣＤ４の出力ストリームに宛てられる。図１２におけるデータフロー図に表される、ＧＣＤに対する関数コールのこの構成は、４つの入力ストリームすなわちｗ、ｘ、ｙ、およびｚからのデータが、３つの並列動作関数コールを通ってストリームされることを可能にし、その結果、それぞれの出力値が、ｉ＞＝０であるいくつかの整数ｉに対してｗ（ｉ）、ｘ（ｉ）、ｙ（ｉ）、およびｚ（ｉ）の最大公約数である、出力値のストリームが作成される。 Since this equation includes the destinations of streams w, x, y and z, there is a FIFO queue at each of these four destinations. These queues allow the function calls GCD (w, x) and GCD (y, z) to be evaluated (implemented) opportunistically and in parallel as described above. Like these two calls, the third call to the GCD is opportunistically performed using the input values obtained from the FIFO queues of the two input streams. These input streams were created as output streams of two other calls to GCD, so that the FIFO queues on these two streams are such that the third call to GCD is the first two calls. Allows to be performed in parallel. The output stream of this third function call is directed to the output stream of GCD 4 by stream assignment to out. This configuration of function calls to GCD, represented in the data flow diagram in FIG. 12, is such that data from four input streams, w, x, y and z, are streamed through three parallel operation function calls , And as a result, the maximum of w (i), x (i), y (i), and z (i) for several integers i, each output value being i> = 0 A stream of output values is created, which is a common divisor.

ストリームの観点からすると、どのようにモジュールが入力ストリーム値を出力トリーム値に変換するかは、重要ではない。重要であるのは、入力から出力への変換（単数または複数）（および任意の副作用）のみである。これまで挙げてきた例においては、これらの変換は、ストリーム式、すなわち特定用途用ハードウェア、再設定可能なハードウェア（図１および図２における等の）、逐次コードを実行するプロセッサまたは何らかの他の機構を用いて実装され得る式に関して表されてきた。 From the stream point of view, it is not important how the module converts input stream values to output stream values. What is important is only the transformation (s) from input to output (and any side effects). In the examples given so far, these transformations may be streamed, ie application specific hardware, reconfigurable hardware (such as in FIGS. 1 and 2), a processor that executes serial code or some other It has been expressed in terms of expressions that can be implemented using the mechanism of.

これらの変換は、モジュールの本体内に存在する逐次コードとして明示的に表され得る。係るコードは、記憶されたプログラム逐次プロセッサ（ｓｔｏｒｅｄ−ｐｒｏｇｒａｍ
ｓｅｑｕｅｎｔｉａｌｐｒｏｃｅｓｓｏｒ）上で実行され得、スレッドドメインと称され得るものの中に存在し得る。モジュールの本体は、排他的にストリームドメインまたはスレッドドメインにおける文を典型的には含むであろうが、しかし、このことにより、同じモジュール本体内において両種の文が排除されるわけではない。その場合、２つのドメインは、並んで（すなわち並列的に）動作する。 These transformations can be explicitly represented as sequential code present in the body of the module. Such code is stored program
It can be run on a sequential processor) and can be in what can be referred to as a thread domain. The body of the module will typically exclusively contain statements in the stream domain or thread domain, but this does not exclude both types of statements within the same module body. In that case, the two domains operate side by side (i.e. in parallel).

スレッドドメインのシンタックスおよびセマンティクスは、ＢｒｉａｎＷ．ＫｅｒｎｉｇｈａｎおよびＤｅｎｎｉｓＭ．Ｒｉｔｃｈｉｅ共著「ＣＰｒｏｇｒａｍｍｉｎｇＬａｎｇｕａｇｅ」（１９７８年）により非公式に、およびＩＳＯのＣ規格ＩＳＯ／ＩＥＣ９８９９により公式に定義されるＣ言語のスーパーセットである。標準Ｃ言語に対する追加は、スレッドが、モジュール入力ストリーム、モジュール出力ストリーム、モジュール本体に対して内部にあるストリーム、またはグローバルストリームであれ、スレッドに対して可視であるこれらのストリームにアクセスすることを可能にする動作に関するものである。これらのストリームアクセス動作は、２つのカテゴリー、すなわちブロック型および非ブロック型に分類される。これらの動作を理解するために、ストリームにおける値のフローを規制するために用いられる機構、およびタスク（タスクはモジュールインスタンスと等価である）を管理するための機構が、図４におけるノードラッパーを参照して説明したように重要である。 The syntax and semantics of the thread domain are: Kernighan and Dennis M. It is a superset of the C language, informally defined by Ritchie, co-authored "C Programming Language" (1978), and officially defined by the ISO C standard ISO / IEC 9899. Additions to the standard C language allow threads to access module input streams, module output streams, streams internal to the module body, or global streams that are visible to threads Relates to the action to be taken. These stream access operations fall into two categories: block and non-block. To understand these operations, the mechanism used to regulate the flow of values in the stream, and the mechanism for managing tasks (tasks are equivalent to module instances) refer to the node wrapper in Figure 4 It is important as I explained.

フロー制御およびタスク管理は、ストリームＣランタイムサポートシステムにより提供される重要なサービスである。フロー制御は、ＦＩＦＯキューのオーバーフロー（すなわち、すでにフル状態であるキューにデータを書き込みこと）およびＦＩＦＯキューのアンダーフロー（すなわち、空き状態のキューからデータを読み込みむこと）を防ぐ。タスク管理は、いつタスクが実行状態に置かれるか、いつかの場合においては、いつタスク実行が終了されるか、を制御する。ストリームＣフロー制御システムおよびタスク管理システムにおいては、３つの重要な要素、すなわち消費側カウント、作成側カウント、およびタスクマネージャが存在する。 Flow control and task management are important services provided by the Stream C runtime support system. Flow control prevents FIFO queue overflow (i.e. writing data to a queue that is already full) and FIFO queue underflowing (i.e. reading data from a free queue). Task management controls when the task is put into execution state, and in some cases when the task execution is ended. In the stream C flow control system and the task management system, there are three key elements: consumer count, producer count, and task manager.

整数消費側カウントは、通常の（擬似的定数ではない）ストリームの各ＦＩＦＯキューと関連付けられる。特定ストリームの特定スレッドによるすべての読み込みは、同一のＦＩＦＯキューをアクセスし、したがって、同一の消費側カウントにアクセスする。消費側カウントの符号ビットは、ＦＩＦＯキューが空き状態であるかどうかを示す。１の符号ビット（消費側カウントは負である）は、キューが空き状態であることを示す。０の符号ビット（消費側カウントは非負である）は、キューが非空き状態であることを示す。 An integer consumer count is associated with each FIFO queue of a normal (not pseudo-constant) stream. All reads by a particular thread in a particular stream access the same FIFO queue, and thus access the same consumer count. The consumption count sign bit indicates whether the FIFO queue is empty. A sign bit of 1 (consumer count is negative) indicates that the queue is empty. A sign bit of 0 (consumer count is non-negative) indicates that the queue is non-empty.

整数作成側カウントは、各通常（擬似的定数ではない）ストリームの各ソースと関連付けられる。作成側カウントの符号ビットは、このストリームソースに挿入された値を受け取るために下流側ＦＩＦＯキューに利用可能スペースが存在するかどうかを示す。０の符号ビット（作成側カウントは非負である）は、すべての下流側キューが、この出力ストリームにおいて値を受け取るためのスペースを有するとは限らないことを示す。１の符号ビット（作成側カウントは負である）は、すべての下流側キューが、この出力ストリームにおいて値を受け取るためのスペースを有することを示す。 An integer producer count is associated with each source of each normal (not pseudo-constant) stream. The sign bit of the producer count indicates whether there is space available in the downstream FIFO queue to receive the value inserted into this stream source. A sign bit of 0 (creator count is non-negative) indicates that not all downstream queues have space for receiving values in this output stream. A sign bit of 1 (creator count is negative) indicates that all downstream queues have space to receive values in this output stream.

図２におけるノード１８０等の各プロセシングコアは、実行を開始するに必要な入力データを含むすべてのリソースを有する、タスクの先入れ先出し・実行準備完了キーを有する。各プロセシングコアは、タスクの実行を管理し、且つ必要な調節信号をタスク間に提供するタスクマネージャを有する。タスクマネージャは、データがＦＩＦＯキューにプッシュされた（書き込まれた）ときに消費側カウントをインクリメントすることと、データ
がＦＩＦＯキューからポップされた（消費された）ときに消費側カウントをデクリメントすることと、スペースがデスティネーションＦＩＦＯキューにおいて利用可能となったことを示すためのバックワードアクノレッジメントを、ストリームソースへと送信すること（デフォルトは、各値が各ＦＩＦＯキューから消費された後にバックワードアクノレッジメントを送信する）と、を自動的に行う。タスクマネージャは、データがモジュールの出力ストリームに書き込まれた場合にそのストリームの作成側カウントをインクリメントすることと、モジュールの出力ストリームに対するバックワードアクノレッジメントが受け取られた場合にその出力ストリームの作成側カウントをデクリメントすることと、タスクが、入力データと、タスクが進行するために必要である任意の他の要求されるリソースと、を有する場合に、プロセシングコアの実行準備完了タスクキューにタスクを置くことと、も実行する。タスクマネージャは、タスクが実行準備完了タスクキューの先頭にあり且つ実行ユニットが利用可能である場合に、タスクを実行状態に置き、タスクが進行するために必要な入力データを有さない場合またはタスクがタイムアウトする場合に、タスクの実行を停止する。 Each processing core, such as node 180 in FIG. 2, has a task first-in-first-out ready-to-execute key with all resources including the input data needed to begin execution. Each processing core has a task manager that manages the execution of tasks and provides the necessary adjustment signals between tasks. The task manager increments the consumer count when data is pushed (written) to the FIFO queue and decrements the consumer count when data is popped (consumed) from the FIFO queue And send a backward acknowledgment to the stream source to indicate that space is available in the destination FIFO queue (default is to send backward acknowledgment after each value has been consumed from each FIFO queue) Send automatically). The task manager increments the creator count of the stream when data is written to the module's output stream, and the creator count of the output stream when a backward acknowledgment is received for the module's output stream. Decrementing and placing the task in the processing core's ready-to-execute task queue if the task has input data and any other required resources that the task requires to proceed , Also run. The task manager puts the task into execution if the task is at the top of the ready-to-execute task queue and the execution unit is available, or if the task does not have the input data required to proceed Stop execution of the task if it times out.

ブロック型ストリームアクセス演算は、モジュール本体に現れるスレッドが、モジュール入力ストリーム、モジュール出力ストリーム、モジュール本体に対して内部にあるストリーム、およびグローバルストリーム等の、スレッドに対して可視であるストリームにアクセスすることを可能にする。これらはストリームにアクセスするための好適な方法である。なぜなら、非ブロック型ストリームアクセス動作とは異なり、ブロック型ストリームアクセス動作は非決定性を生じさせないためである。係る演算のブロックおよび非ブロックは、プロセシングコアのタスクマネージャにより自動的に対処される。 A block type stream access operation is such that a thread appearing in a module body accesses a stream that is visible to threads, such as a module input stream, a module output stream, a stream internal to the module body, and a global stream. Make it possible. These are the preferred methods to access the stream. This is because, unlike the non-block type stream access operation, the block type stream access operation does not cause nondeterminism. Such blocks of operations and non-blocks are handled automatically by the task manager of the processing core.

係る演算は３つあり、それぞれの演算はＣ＋＋における同様の演算にちなんで作られたものである。演算子＞＞は、ストリームＦＩＦＯキューから単一の値をポップ（消費）し、その値を変数に代入するために用いられる。演算子＞＞は
＜ストリーム識別子＞＞＞＜変数識別子＞；
の形の文において用いられる。この文により、単一の値が左側のストリームからポップされ、右側の変数に代入される。しかし、ストリームに対するＦＩＦＯキューが、ストリームの消費側カウントの符号ビットにより示されるように空き状態である場合、文はブロック（ストール）され、キューが、ストリームの消費側カウントの符号ビットにより示されるように再び非空き状態となるまで、ブロック状態に保持される。 There are three such operations, each of which was created after a similar operation in C ++. The operator >> is used to pop (consume) a single value from the stream FIFO queue and assign that value to a variable. The operator >> is <stream identifier>>><variableidentifier>;
Used in sentences of the form This statement pops a single value from the left stream and assigns it to the right variable. However, if the FIFO queue for the stream is free as indicated by the stream's consumer count's sign bit, then the statement is blocked (stalled) and the queue is indicated by the stream's consumer count's sign bit. It is held in a blocked state until it becomes non-empty again.

演算子＜＜は、変数の現在の値をストリームに代入するために用いられる。演算子＜＜は、
＜ストリーム識別子＞＜＜＜変数識別子＞；
の形の文において用いられる。この文により、右側の変数の値は左側のストリームに代入される。しかし、１つまたは複数の下流側キューが、ストリームソースにおける作成側カウントの符号ビットにより示されるように、係るデータを受け取るスペースを有さない場合、文は、ブロック（ストール）され、すべての下流側キューが、ストリームの作成側カウントの符号ビットにより示されるように再び値を受け取るスペースを有するようになるまで、ブロック状態に保持される。 The operator << is used to assign the current value of a variable to a stream. The operator << is
<Stream identifier><<<variableidentifier>;
Used in sentences of the form This statement assigns the value of the right variable to the left stream. However, if one or more downstream queues do not have space to receive such data, as indicated by the creator count's sign bit in the stream source, the statement is blocked (stalled) and all downstream It is held blocked until the side queue has space again to receive the value as indicated by the sign bit of the stream's creator count.

ｐｅｅｋ演算子は、ストリームＦＩＦＯキューの先頭における値を、ポップ（消費）せずに、取得するために用いられる。ｐｅｅｋ演算子は、
＜ストリーム識別子＞．ｐｅｅｋ（）
の形の式において用いられる。この式は、＜ストリーム識別子＞のＦＩＦＯキューの先頭における現在の値を返すが、キューから値をポップ（消費）しない。しかし、ストリームに対するＦＩＦＯキューが、ストリームの消費側カウントの符号ビットにより示されるように空き状態である場合、文は、ブロック（ストール）され、キューが、ストリームの消
費側カウントの符号ビットにより示されるように再び非空き状態となるまで、ブロック状態に保持される。 The peek operator is used to obtain the value at the beginning of the stream FIFO queue without popping (consumption). The peek operator is
<Stream identifier>. peek ()
Used in the formula of the form This expression returns the current value at the beginning of the <stream identifier>'s FIFO queue but does not pop (consume) values from the queue. However, if the FIFO queue for the stream is free as indicated by the stream's consumer count's sign bit, then the statement is blocked (stalled) and the queue is indicated by the stream's consumer count's sign bit It is held in the blocked state until it becomes non-empty again.

ブロック型のストリームアクセス演算子と同様に、非ブロック型ストリームアクセス演算は、モジュール本体に現れるスレッドが、モジュール入力ストリーム、モジュール出力ストリーム、モジュール本体に対して内部にあるストリーム、およびグローバルストリーム等の、スレッドに対して可視であるストリームにアクセスすることを可能にする。しかし、ブロック型演算とは異なり、非ブロック型演算は、典型的には、演算の結果に影響する競合状態に関与するものであり、したがって、非決定性を導入する。係る演算は２つある。 Similar to block type stream access operators, non-block type stream access operations are such that threads appearing in a module body are module input streams, module output streams, streams internal to the module body, global streams, etc. Allows access to streams that are visible to threads. However, unlike block-type operations, non-block-type operations are typically involved in race conditions that affect the result of the operation, thus introducing non-determinism. There are two such operations.

＜ストリーム識別子＞．ｃｏｎｓｕｍｅｒＣｏｕｎｔ（）
の形の式は＜ストリーム識別子＞の消費側カウントを返す。なお、＜ストリーム識別子＞は、＞＞演算またはｐｅｅｋ演算を介して、スレッドにより読み込まれるストリームである。この式は、＜ストリーム識別子＞のＦＩＦＯキューが空き状態である場合に＞＞演算またはｐｅｅｋ演算を回避するために、＜ストリーム識別子＞の消費側カウント符号ビットをテストするために、主に用いられる。 <Stream identifier>. consumerCount ()
An expression of the form returns a consumer count of <stream identifier>. Note that <stream identifier> is a stream read by a thread via >> operation or peek operation. This expression is mainly used to test the consumer count sign bit of <stream identifier> to avoid >> operation or peek operation when FIFO queue of <stream identifier> is empty .

＜ストリーム識別子＞．ｐｒｏｄｕｃｅｒＣｏｕｔ（）
の形の式は、＜ストリーム識別子＞の作成側カウントを返す。なお、＜ストリーム識別子＞は＜＜演算を介して、スレッドにより書き込まれたストリームである。この式は、１つまたは複数の下流側キューが係る新しい値を受け取るスペースを有さない場合に、＜＜演算を回避するために、＜ストリーム識別子＞の作成側カウント符号ビットをテストするために、主に用いられる。 <Stream identifier>. producerCout ()
An expression of the form returns the creator count of <stream identifier>. Note that <stream identifier> is a stream written by a thread via << operation. This expression is to test the creator count sign bit of <stream identifier> to avoid the << operation, if one or more downstream queues do not have space to receive such a new value. , Mainly used.

モジュール本体内のスレッドが多数の異なる形を取り得るが、多くの変化例が、以下の典型的な形となるであろう。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ（ｉｎｔｓｔｒｍ１，．．．，ｉｎｔｓｔｒｍＮ）
｛
ｉｎｔｖａｒ１，．．．，ｖａｒＮ，ｒｅｓｕｌｔ；／／変数を宣言
ｗｈｉｌｅｔｒｕｅ／／永遠にループする
｛
ｓｔｒｍ１＞＞ｖａｒ１；
・／／入力ストリームから値を読み込む
・
・
ｓｔｒｍＮ＞＞ｖａｒＮ；
・／／ｒｅｓｕｌｔを算出する
・
・
ｏｕｔ＜＜ｒｅｓｕｌｔ；／／ｒｅｓｕｌｔを出力ストリームに代入する
｝
｝
ここでｍｏｄｕｌｅＡは、１つまたは複数の入力ストリームと、単一の出力ストリームとを有するモジュールである。入力ストリームおよび出力ストリームのデータ型は、整数型へと任意選択される。ｍｏｄｕｌｅＡの本体内のスレッドが最初に実行することは、各入力ストリームに対する値と、単一の出力ストリームに対する値とを宣言することである。
次いでスレッドは、各反復が（ａ）各入力ストリームから値を読み込む（消費する）こと、（ｂ）結果を算出すること、および（ｃ）結果を出力ストリームに代入すること、を含む無限ループに入る。
配列 The threads in the module body may take many different forms, but many variations would be the following typical form.
stream int moduleA (int strm1, ..., int strmN)
{
int var1,. . . , VarN, result; // Declare variable
while true // loop forever
{
strm1 >>var1;
// Read value from input stream
・
・
strmN >>varN;
・ / / Calculate the result
・
・
out <<result; // Assign result to output stream
}
}
Here, moduleA is a module having one or more input streams and a single output stream. The data types of the input and output streams are optionally chosen to be integer types. The first thing the threads in the body of module A execute is to declare the values for each input stream and the single output stream.
The thread is then put into an infinite loop, including (a) reading (consuming) values from each input stream, (b) calculating the result, and (c) assigning the result to the output stream. enter.
Array

他の言語におけるのと同様に、配列は、データ要素の配列ばかりではなく、ストリーム配列およびモジュール配列もまた、ストリームＣにおいて重要な役割を果たす。実際のデータ値の配列（データ値の配列へのポインタではなく）は、複数のストリーム上で並列的に伝えられる。ストリーム配列は、モジュールの配列とともに用いられるとき、特に有用である。 As in other languages, sequences play an important role in stream C, as well as sequences of data elements, as well as stream and module sequences. An array of actual data values (as opposed to a pointer to an array of data values) is conveyed in parallel over multiple streams. Stream arrays are particularly useful when used with an array of modules.

ストリームＣは、Ｃからデータ配列のためのシンタックスおよびセマンティクスを受け継ぐ。このことは、配列の名称が（関数）引数として用いられる場合、関数に渡される値は、配列の先頭の位置またはアドレスであり、配列要素は複写されないことを意味する。ストリーム入力（引数）およびモジュールの出力に対しても、同じことが成り立つ。例示のために、上述のＧＤＣ４モジュールが用いられ得る。
ｓｔｒｅａｍｉｎｔＧＣＤ４（ｉｎｔｗ，ｉｎｔｘ，ｉｎｔｙ，
ｉｎｔｚ）／／４つの整数引数を有する
｛／／モジュール
ｏｕｔ＝ＧＣＤ（ＧＣＤ（ｗ，ｘ），ＧＣＤ（ｙ，ｚ））；
｝
ＧＣＤ４に４つの別個の整数ストリーム引数を供給する代わりに、各値が４つの整数の配列である単一のストリーム引数が供給される。ＧＣＤ４は、以下のように変形されるであろう。
ｓｔｒｅａｍｉｎｔＧＣＤ４（ｉｎｔ＊ｗｘｙｚ）／／１つの配列引数を有するモジュール
｛
ｏｕｔ＝ＧＣＤ（ＧＣＤ（ｗｘｙｚ［０］，ｗｘｙｚ［１］），
ＧＣＤ（ｗｘｙｚ［２］，ｗｘｙｚ［３］））；
｝ Stream C inherits from C syntax and semantics for data arrays. This means that when the name of the array is used as a (function) argument, the value passed to the function is the position or address of the beginning of the array, and the array element is not copied. The same is true for stream inputs (arguments) and module outputs. For illustration purposes, the GDC4 module described above may be used.
stream int GCD4 (int w, int x, int y,
int z) / / with two integer arguments {// module
out = GCD (GCD (w, x), GCD (y, z));
}
Instead of supplying GCD 4 with 4 separate integer stream arguments, a single stream argument is supplied, each value being an array of 4 integers. GCD4 will be transformed as follows.
stream int GCD4 (int * wxyz) // module with one array argument {
out = GCD (GCD (wxyz [0], wxyz [1]),
GCD (wxyz [2], wxyz [3]));
}

Ｃ言語の規則によれば、ＧＣＤ４の単一の引数はｉｎｔ＊型、すなわち整数へのポインタであり、この場合、４つの整数の配列における第１の整数である。ＧＣＤ４の本体内のこれらの４つの整数は、標準的なＣ言語の演算子［］を用いてアクセスされる。Ｃ言語型のデータ配列をモジュールに供給することが、ストリームの文脈において配列を取り扱うための１つの方法である。 According to the rules of the C language, the single argument of GCD4 is an int * type, ie a pointer to an integer, in this case the first integer in an array of 4 integers. These four integers in the body of GCD4 are accessed using the standard C language operator []. Providing a C language type data array to a module is one way to handle the array in the context of a stream.

いくつかの用途に関しては、モジュールに配列ポインタのストリームを供給することは、その用途に固有の並列性を十分に利用するためには不十分である。したがって、配列のストリームよりもむしろ、ストリームの配列は、データ値の配列へのポインタではなく、実際のデータ値の配列が、複数のストリーム上で並列的に伝えられることを可能にする。ストリーム配列の宣言は、２つの相違、すなわちキーワードｓｔｒｅａｍが宣言に先行すること、および配列のサイズがコンパイル時に既知でなければならないことを除き、通常のＣ言語の配列の宣言と同じである。この制限は、モジュールと同じくアプリケーション内のすべてのストリームがコンパイル時にインスタンス化されるので、必要である。 For some applications, supplying a stream of array pointers to the module is not sufficient to take full advantage of the application-specific parallelism. Thus, rather than a stream of arrays, an array of streams is not a pointer to an array of data values, allowing an array of actual data values to be conveyed in parallel over multiple streams. The declaration of a stream array is the same as a normal C-language array declaration, except that there are two differences: the keyword stream precedes the declaration, and that the size of the array must be known at compile time. This limitation is necessary because all streams in the application as well as modules are instantiated at compile time.

ストリーム配列宣言の例を以下に挙げる。
ｓｔｒｅａｍｉｎｔａｒｒａｙ１Ｄ［４］；
ｓｔｒｅａｍｉｎｔａｒｒａｙ２Ｄ［４］［１６］；
ｓｔｒｅａｍｉｎｔａｒｒａｙ３Ｄ［４］［１６］［９］；
第１の宣言は、ａｒｒａｙ１Ｄが４つの整数ストリームの１次元配列であることを宣言する。同様に、ａｒｒａｙ２Ｄが６４個の整数ストリームの２次元配列であること、およびａｒｒａｙ３Ｄが５７６個の整数ストリームの３次元配列であることが宣言される。ストリーム配列の個々のストリームは、データ配列の個々の要素と同じ方法でアクセスされる。例えば、
ａｒｒａｙ３Ｄ［３］［１５］［７］
は、ａｒｒａｙ３Ｄの５７６個のストリームのうちの１つを示す。 An example of a stream array declaration is given below.
stream int array1D [4];
stream int array2D [4] [16];
stream int array3D [4] [16] [9];
The first declaration declares that array1D is a one-dimensional array of four integer streams. Similarly, it is declared that array2D is a two dimensional array of 64 integer streams, and array3D is a three dimensional array of 576 integer streams. The individual streams of the stream array are accessed in the same way as the individual elements of the data array. For example,
array3D [3] [15] [7]
Indicates one of 576 streams of array3D.

ひとたびストリーム配列が宣言されると、配列全体、配列内のサブ配列、または配列内の個々のストリームは参照され得る。これらの３つの場合が以下の部分的コードにおいて例示される。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ（ｉｎｔ）；／／モジュール宣言
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＢ（ｉｎｔ［４］）；／／モジュール宣言
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＣ（ｉｎｔ［３］［４］）；／／モジュール宣言

ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＤ（ｉｎｔＷ［３］［４］）／／モジュール定義
｛
・
・
・
ｓｔｒｅａｍｉｎｔＸ＝ｍｏｄｕｌｅＡ（Ｗ［２］［０］）；
／／ストリーム式
ｓｔｒｅａｍｉｎｔＹ＝ｍｏｄｕｌｅＢ（Ｗ［２］［０］）；
／／ストリーム式
ｓｔｒｅａｍｉｎｔＺ＝ｍｏｄｕｌｅＣ（Ｗ）；／／ストリーム式
・
・
・
｝
ここでは、ｍｏｄｕｌｅＡ、ｍｏｄｕｌｅＢ、およびｍｏｄｕｌｅＣに対する宣言と、ｍｏｄｕｌｅＤの部分的な定義が示される。これら４つのモジュールの入力型を以下に示す。 Once a stream array is declared, the entire array, subarrays within the array, or individual streams within the array may be referenced. These three cases are illustrated in the partial code below.
stream int moduleA (int); // module declaration stream int moduleB (int [4]); // module declaration stream int moduleC (int [3] [4]); // module declaration

stream int moduleD (int W [3] [4]) // module definition {
・
・
・
stream int X = moduleA (W [2] [0]);
// Stream type
stream int Y = moduleB (W [2] [0]);
// Stream type
stream int Z = moduleC (W); // stream expression
・
・
・
}
Here, declarations for moduleA, moduleB, and moduleC and partial definitions of moduleD are shown. The input types of these four modules are shown below.

ｍｏｄｕｌｅＤの本体内のｍｏｄｕｌｅＡ、ｍｏｄｕｌｅＢ、およびｍｏｄｕｌｅＣのインスタンス化に供給される入力引数を以下に示す。 The input arguments supplied to the instantiation of moduleA, moduleB, and moduleC in the body of moduleD are shown below.

それぞれの場合において、モジュールインスタンス化引数型は、モジュール入力型と一
致し、したがって各モジュールインスタンス化は、ストリームＣの型づけの強い要件を満足する。 In each case, the module instantiation argument type matches the module input type, so each module instantiation satisfies the strong requirement of stream C typing.

ストリーム式内におけるストリーム配列の個々のストリームをアクセスすることも、この複素数乗算モジュール例に示すように、単純明快である。
ｓｔｒｅａｍｉｎｔ［２］ｃｏｍｐｌｅｘＭｕｌｔ（ｉｎｔＸ［２］，
ｉｎｔＹ［２］）
｛
ｏｕｔ［０］＝Ｘ［０］＊Ｙ［０］ − Ｘ［１］＊Ｙ［１］；
ｏｕｔ［１］＝Ｘ［０］＊Ｙ［１］＋Ｘ［１］＊Ｙ［０］；
｝
ストリーム式内の演算子が並列的にアクティブであるため、ストリーム式Ｘ［０］＊Ｙ［０］−Ｘ［１］＊Ｙ［１］およびＸ［０］＊Ｙ［１］＋Ｘ［１］＊Ｙ［０］における４つの乗算、１つの加算、および１つの減算は、並列的に評価される。 Accessing individual streams of stream arrays within a stream expression is also straightforward, as shown in this example complex multiplication module.
stream int [2] complexMult (int X [2],
int Y [2])
{
out [0] = X [0] * Y [0]-X [1] * Y [1];
out [1] = X [0] * Y [1] + X [1] * Y [0];
}
Because the operators in the stream expression are active in parallel, the stream expressions X [0] * Y [0] -X [1] * Y [1] and X [0] * Y [1] + X [1] The four multiplications, one addition, and one subtraction at Y [0] are evaluated in parallel.

並列処理に対する最も普及している手法の１つであるデータ並列性は、同一のタスクが同一のデータ構造（典型的には配列）の異なる部分上で並列的に（並行して）実行される、並列性の１つの形態である。ストリームＣにおいて、データ並列性は、モジュール配列によりサポートされる。 Data parallelism, one of the most prevalent approaches to parallel processing, is the same task performed in parallel (in parallel) on different parts of the same data structure (typically an array) , Is a form of parallelism. In stream C, data parallelism is supported by the module arrangement.

モジュール配列は、その名称が暗示するように、モジュールの配列である。モジュール配列は、モジュール名と入力パラメータのリストとの間に角括弧で囲まれた配列次元を挿入することにより宣言される。以下はモジュール配列宣言の２つの例である。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ［３］［４］（ｉｎｔ，ｉｎｔ）；
ｓｔｒｅａｍ（ｉｎｔ，ｉｎｔ）ｍｏｄｕ１ｅＢ［３］［４］（ｉｎｔ，ｉｎｔ）；
両方の場合において、配列次元は３×４である。 A module array is, as the name implies, an array of modules. A module array is declared by inserting an array dimension enclosed in square brackets between the module name and the list of input parameters. The following are two examples of module array declarations.
stream int moduleA [3] [4] (int, int);
stream (int, int) modu1eB [3] [4] (int, int);
In both cases, the array dimension is 3 × 4.

通常（単独型）モジュールの定義と同様に、モジュール配列の定義は波括弧（｛および｝）で囲まれた本体を有する。以下は、モジュール配列定義の２つの例である。第１の例は単一（デフォルト）出力ストリームを有し、それに対して、第２の例は名称を有する２つの出力ストリームを有する。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ［３］［４］（ｉｎｔａ，ｉｎｔｂ）
｛
／／モジュール本体
｝

ｓｔｒｅａｍ（ｉｎｔｘ，ｉｎｔｙ）ｍｏｄｕｌｅＢ［３］［４］（ｉｎｔａ，ｉｎｔｂ）
｛
／／モジュール本体
｝ Similar to the normal (stand-alone) module definition, the module arrangement definition has a body enclosed in curly brackets ({and}). The following are two examples of module sequence definitions. The first example has a single (default) output stream, whereas the second example has two output streams with names.
stream int moduleA [3] [4] (int a, int b)
{
// Module body}

stream (int x, int y) moduleB [3] [4] (int a, int b)
{
// Module body}

ひとたびモジュール配列が宣言（宣言または定義）されると、配列全体、配列内のサブ配列、または配列内の個々のモジュールは、データ配列およびストリーム配列と同じ方法で、ストリーム式内でインスタンス化され得る。ｍｏｄｕｌｅＡ［３］［４］に対して、これら３つの場合が示される。 Once a module array is declared (declared or defined), the entire array, subarrays within the array, or individual modules within the array can be instantiated within the stream expression in the same manner as data arrays and stream arrays . These three cases are shown for module A [3] [4].

モジュール配列の重要な属性は、モジュール配列がシステム初期化時にインスタンス化されるとき、顕著なものとなる。モジュール配列の各要素は、別個のモジュールインスタンス化として、インスタンス化される。その結果、すべての配列要素が並列的に動作することが可能となる。ｍｏｄｕｌｅＡ［３］［４］がこの概念の１つの例である。モジュールがインスタンス化されると、ｍｏｄｕｌｅＡの１２（３×４）個の別個のインスタンス化が作られ、それぞれのインスタンス化は、他の１１個のインスタンス化と並列的に動作する。さらに、このインスタンス化の乗算は、ｍｏｄｕｌｅＡ［３］［４］の各インスタンス化に当てはまる。したがって、ｍｏｄｕｌｅＡ［３］［４］の３つのインスタンスが存在する場合、ｍｏｄｕｌｅＡの３６（３×１２）個の別個のインスタンス化が作られる。 An important attribute of the module array becomes noticeable when the module array is instantiated at system initialization. Each element of the module array is instantiated as a separate module instantiation. As a result, all array elements can operate in parallel. moduleA [3] [4] is an example of this concept. When a module is instantiated, 12 (3 × 4) separate instantiations of module A are created, each instantiation operating in parallel with the other 11 instantiations. Furthermore, this instantiation multiplication applies to each instantiation of moduleA [3] [4]. Thus, if there are three instances of moduleA [3] [4], 36 (3 × 12) separate instantiations of moduleA are created.

モジュール配列インスタンス化のパーソナル化は、インスタンス化がどのデータ上で演算されるかを決定する。インスタンス化は、各モジュールインスタンス化にそれ自体のユニークなデータをインスタンス化の入力ストリームを通して供給することにより、パーソナル化され得る。各モジュールインスタンス化が、その配列インデックスを、インデックス演算子を用いて特定することが可能となり、それにより、インスタンス化が、グローバル配列のそれ自体のユニークな部分にアクセスすることが可能となることにより、インスタンス化はパーソナル化され得る。 The personalization of the module array instantiation determines which data the instantiation is to operate on. The instantiation can be personalized by supplying each module instantiation with its own unique data through the input stream of the instantiation. By allowing each module instantiation to specify its array index using the index operator, this allows the instantiation to access its own unique part of the global array , Instantiation can be personalized.

ストリーム配列がモジュール配列の各要素にユニークなデータを供給するために用いられ得る第１タイプのパーソナル化が以下に示される。第２タイプのパーソナル化は、各配列モジュールの配列インデックスのインスタンス化がコンパイル時に既知であるという事実を利用する。これらのインデックスにアクセスするために、プログラマは、以下のシンタックス
ｉｎｔｉｎｄｅｘ（ｉｎｔｉ）
で演算子を使用する。なお、式中、ｉはコンパイル時に定数へと評価される整数式である。コンパイル時に、ｉｎｄｅｘ（ｉ）はインスタンス化の第１番目のインデックスと置き換えられる。ｉが配列境界外である場合、コンパイル時エラーまたはランタイムエラーが生じる。 The first type of personalization that can be used to provide unique data for stream arrays to each element of a module array is shown below. The second type of personalization takes advantage of the fact that the instantiation of the array index of each array module is known at compile time. To access these indices, the programmer uses the following syntax: int index (int i)
Use operators in. In the equation, i is an integer expression that is evaluated to a constant at compile time. At compile time, index (i) is replaced with the first index of the instantiation. If i is out of array bounds, compile-time or runtime errors occur.

ストリーム配列およびモジュール配列は、ストリームＣの特別な配列結合機能を用いてストリーム配列とモジュール配列とが結合されたときに、最大の有用性を発揮する。結合に対しては、３つの要件、すなわちａ）ストリーム配列およびモジュール配列が同じ次元を有さなければならないこと、ｂ）ストリーム配列がモジュール配列の入力または出力と
接続（結合）されていなければならないこと、およびｃ）ストリーム配列型がモジュールの入力／出力型と一致しなければならないこと、が存在する。 Stream and module arrangements provide maximum utility when stream and module arrangements are combined using the special arrangement combining function of stream C. For combining, three requirements, a) the stream array and the module array must have the same dimensions, b) the stream array must be connected (coupled) with the input or output of the module array And c) the stream array type must match the input / output type of the module.

係る結合が生じると、ストリーム配列内の各個別ストリームは、同一のインデックスを有するモジュール配列の個々のモジュールの入力／出力ストリームに接続（結合）される。したがって、０＜＝ｉ_１＜Ｄ_１，０＜＝ｉ_２＜Ｄ_２．．．０＜＝ｉ_ｎ＜Ｄ_ｎに対して、ストリーム配列Ｓ［Ｄ_１］［Ｄ_２］．．．［Ｄ_ｎ］がモジュール配列Ｍ［Ｄ_１］［Ｄ_２］．．．［Ｄ_ｎ］の入力／出力に結合されると、各個別ストリームＳ［ｉ_１］［ｉ_２］．．．［ｉ_ｎ］は個別モジュールＭ［ｉ_１］［ｉ_２］．．．［ｉ_ｎ］の入力／出力に接続される。 When such a combination occurs, each individual stream in the stream array is connected (coupled) to the input / output stream of an individual module of the module array having the same index. Therefore, 0 <= i ₁ <D ₁ , 0 <= i ₂ <D ₂ . . . For 0 <= i _n <D _n , the stream array S [D ₁ ] [D ₂ ]. . . [D _n ] is a module array M [D ₁ ] [D ₂ ]. . . When coupled to the input / output of [D _n ], each individual stream S [i ₁ ] [i ₂ ]. . . [I _n ] are individual modules M [i ₁ ] [i ₂ ]. . . Is connected to the input / output _{[i n].}

以下は、１つのモジュール配列の出力および他のモジュール配列の入力に結合されたストリーム配列の例である。
ｓｔｒｅａｍｉｎｔｍｏｄｕｌｅＡ［３］［２］（）；／／第１の結合されたモジュール
ｓｔｒｅａｍｖｏｉｄｍｏｄｕｌｅＢ［３］［２］（ｉｎｔ）；／／第２の結合されたモジュール
ｓｔｒｅａｍｖｏｉｄｐａｒｅｎｔＭｏｄｕｌｅ（）
｛
ｓｔｒｅａｍｉｎｔｃＳｔｒｍ［３］［２］；／／結合されたストリーム
ｃＳｔｒｍ［］［］＝ｍｏｄｕｌｅＡ［］［］（）；／／ｃＳｔｒｍに結合されたｍｏｄｕｌｅＡの出力
ｍｏｄｕｌｅＢ［］［］（ｃＳｔｒｍ［］［］）；／／ｍｏｄｕｌｅＢの入力に結合されたｃＳｔｒｍ
｝
ここで、ｍｏｄｕｌｅＡ［３］［２］の出力ストリームはｃＳｔｒｍ［３］［２］に結合され、ｃＳｔｒｍ［３］［２］はｍｏｄｕｌｅＢ［３］［２］の入力ストリームに結合される。これらは正規の結合である。なぜなら、
・ｃＳｔｒｍ［３］［２］、ｍｏｄｕｌｅＡ［３］［２］、およびｍｏｄｕｌｅＢ［３］［２］が、すべて同一の次元を有し、
・ｃＳｔｒｍ［３］［２］が、ｍｏｄｕｌｅＡ［３］［２］の出力およびｍｏｄｕｌｅＢ［３］［２］の入力に接続され、
・ｃＳｔｒｍ［３］［２］の型、ｍｏｄｕｌｅＡ［３］［２］の出力型、およびｍｏｄｕｌｅＢ［３］［２］の入力型がすべてｉｎｔである、
ためである。 The following is an example of a stream array coupled to the output of one module array and the input of another module array.
stream int moduleA [3] [2] (); // first combined module stream void moduleB [3] [2] (int); // second combined module stream void parentModule ()
{
stream int cStrm [3] [2]; // Combined stream
cStrm [] [] = moduleA [] [] (); // output of moduleA coupled to cStrm
moduleB [] [] (cStrm [] []); // cStrm coupled to the input of moduleB
}
Here, the output stream of moduleA [3] [2] is coupled to cStrm [3] [2], and cStrm [3] [2] is coupled to the input stream of moduleB [3] [2]. These are regular combinations. Because
CStrm [3] [2], moduleA [3] [2], and moduleB [3] [2] all have the same dimensions,
• cStrm [3] [2] is connected to the output of moduleA [3] [2] and the input of moduleB [3] [2],
The type of cStrm [3] [2], the output type of moduleA [3] [2], and the input type of moduleB [3] [2] are all int,
It is for.

以下の表は、ｃＳｔｒｍ［３］［２］の各個別ストリーム、すなわち（ａ）出力がストリームソースであるモジュール、（ｂ）ｃＳｔｒｍ［３］［２］における個別ストリーム、および（ｃ）インプットがストリームデスティネーションであるモジュールをリストする。 The following table shows each individual stream of cStrm [3] [2]: (a) a module whose output is a stream source, (b) an individual stream in cStrm [3] [2], and (c) an input stream List the modules that are the destination.

ＰＩＮＧ PING

モジュールが他のモジュールに対して、そのモジュールが実行する特定の演算、副作用が完了したことを通知することが必要となる状況が存在する。例えば、モジュールがグローバルメモリ内のデータ構造に関する演算を実行するとき、おそらく同一のデータ構造に関する演算を実行する多数のモジュールのうちの１つとして、そのモジュールは、典型的には、演算が完了したため下流側の演算またはタスクが開始され得ることを下流側モジュールに通知する必要がある。これらの状況においては、値を返す必要はなく、特定のタスクが完了した信号を返すのみでよい。値ではなく信号が必要とされるこれらの状況に対して、ストリームＣはｐｉｎｇデータ型を提供する。ｐｉｎｇ（ｐｉｎｇ型の値）は、特性を有さず、互いに対して完全に区別がつかない。 There are situations where a module needs to notify other modules that a particular operation that the module is performing, that a side effect has been completed. For example, when a module performs an operation on a data structure in global memory, that module will typically complete the operation, perhaps as one of many modules that perform the operation on the same data structure. It is necessary to notify the downstream module that the downstream operation or task can be started. In these situations, it is not necessary to return a value, only a signal that a particular task has been completed. For those situations where a signal rather than a value is required, stream C provides the ping data type. Ping (ping type values) has no property and is completely indistinguishable from one another.

ｐｉｎｇは、３つの演算子、すなわち、（１）タスクの同期を提供するｊｏｉｎ演算子、（２）＞＞ストリームアクセス演算子、および（３）＜＜ストリームアクセス演算子、とともに用いられる。第１の使用法はストリームのみに関するが、第２および第３の使用法は、ストリームおよびスレッドに関する。 Ping is used with three operators: (1) join operator to provide task synchronization, (2) >> stream access operator, and (3) << stream access operator. The first usage relates only to streams, while the second and third usages relate to streams and threads.

ｐｉｎｇキーワードは、１つまたは複数のｐｉｎｇ型のストリームを宣言するときに用いられる。例えば、以下の式
ｓｔｒｅａｍｐｉｎｇｐＳｔｒｍ０，ｐＳｔｒｍ１，ｐＳｔｒｍ２；
は、ｐＳｔｒｍ０、ｐＳｔｒｍ１、およびｐＳｔｒｍ２がｐｉｎｇ型のストリームであることを宣言する。ｐｉｎｇキーワードは、
ｓｔｒｅａｍｐｉｎｇｍｏｄｕｌｅＮａｍｅ（ｉｎｔ，ｐｉｎｇ）；
のように、モジュール入力または出力がｐｉｎｇ型であることを宣言するモジュールプロトタイプ／定義にも用いられる。 The ping keyword is used when declaring one or more ping type streams. For example, the following equation stream ping pStrm0, pStrm1, pStrm2;
Declares that pStrm0, pStrm1, and pStrm2 are ping type streams. The ping keyword is
stream ping moduleName (int, ping);
It is also used for module prototypes / definitions that declare that module inputs or outputs are of ping type.

ｐｉｎｇの第１の使用法はｊｏｉｎ演算子に関し、ｊｏｉｎ演算子は、ｐｉｎｇストリームと他の１つまたは複数のストリームとを繋いで、単一の出力ストリームを作る機能を有する。この演算子は、他のいくつかの演算モデルにおいて見られる接合演算（ｒｅｎｄｅｚｖｏｕｓｏｐｅｒａｔｉｏｎ）と同様である。ｊｏｉｎ演算子を含む式は、２つの形
＜ｐｉｎｇストリーム配列＞．ｊｏｉｎ（）
＜ｐｉｎｇストリーム＞．ｊｏｉｎ（＜ストリーム式＞）
のうちの１つを取る。すべてのストリーム式と同様に、これらの形のうちの１つにおける式の各評価は、各入力ストリームから単一の値／ｐｉｎｇを消費し、式の（名称を有さない）出力ストリーム上に単一の値／ｐｉｎｇを作る。入力ストリームが空き状態（値が存在しない）である場合、すべての入力ストリームが少なくとも１つの値／ｐｉｎｇを有するまで、評価はストール（ブロック）される。非ｐｉｎｇ式に対しては明示的なｊｏｉｎ演算は必要ない。なぜなら、ｊｏｉｎ演算の効果はすでに式評価のセマンティクスにより包含されるためである。 The first usage of ping relates to the join operator, which has the ability to connect the ping stream with one or more other streams to create a single output stream. This operator is similar to the rendezvous operation found in some other arithmetic models. Expressions that include the join operator have two forms: <ping stream array>. join ()
<Ping stream>. join (<stream expression>)
Take one of the. As with all stream expressions, each evaluation of the expression in one of these forms consumes a single value / ping from each input stream, and on the output stream (without a name) of the expression Make a single value / ping. If the input stream is idle (no value present), evaluation is stalled (blocked) until all input streams have at least one value / ping. No explicit join operation is required for non-ping expressions. Because the effect of the join operation is already covered by the semantics of expression evaluation.

第１の型の式が評価されると、単一のｐｉｎｇがｐｉｎｇストリームの配列内の各ストリームから消費され、単一のｐｉｎｇが式の出力ストリーム上に発行される。 When the first type of expression is evaluated, a single ping is consumed from each stream in the array of ping streams, and a single ping is issued on the output stream of the expression.

第２の形の式が評価されると、＜ｐｉｎｇストリーム＞からの単一のｐｉｎｇと、＜ストリーム式＞の評価とが消費される。このストリーム式＜ストリーム式＞は、ｐｉｎｇを含む任意の型であり得る。＜ストリーム式＞の評価から得られる値は、ｊｏｉｎ演算の出力ストリーム上に発行される。式がｐｉｎｇ型である場合、式は単一のｐｉｎｇに評価される。このように、ｐｉｎｇストリームは、上述の＞＞演算子の場合と同じく、＜ｐｉｎｇストリーム＞にｐｉｎｇが存在する場合にのみ評価の進行を許可する守衛として機能する。 Once the second form of expression is evaluated, a single ping from <ping stream> and an evaluation of <stream expression> are consumed. This stream expression <stream expression> may be of any type including ping. The values obtained from the evaluation of <stream expression> are issued on the output stream of the join operation. If the expression is of ping type, the expression is evaluated to a single ping. Thus, the ping stream functions as a guard that allows the progress of evaluation only when there is a ping in <ping stream>, as in the case of the above-mentioned >> operator.

ｊｏｉｎ演算の２つの形が図１３Ａおよび１３Ｂに図示される。図１３Ａにおいては、サイズがｎである１次元ｐｉｎｇストリーム配列の個別ストリームが繋がれて、単一の（名称を有さない）出力ｐｉｎｇストリームが作られる。図１３Ｂにおいては、単一のｐｉｎｇストリームであるｐｉｎｇＳｔｒｍが式ｅｘｐｒと繋がれて、ｅｘｐｒと同じ型を有する単一の（名称を有さない）出力ストリームが作られる。 Two forms of the join operation are illustrated in FIGS. 13A and 13B. In FIG. 13A, individual streams of a one-dimensional ping stream array of size n are concatenated to create a single (nameless) output ping stream. In FIG. 13B, a single ping stream, pingStrm, is concatenated with the expression expr to create a single (no name) output stream having the same type as expr.

ｊｏｉｎ演算の１例は、データ構造Ｘを含み得る。ただし、データ構造Ｘについて、２つの演算、すなわち演算Ａおよび演算Ｂが行われる。これらの演算は、以下の要件、すなわちａ）ｇｏ信号に応答して実行される以外には、演算Ａも演算Ｂも実行されないこと、ｂ）ｇｏ信号が受信されると、演算Ａおよび演算Ｂが並列的に実行されること、およびｃ）演算Ａまたは演算Ｂのいずれかが開始される前に、直前のｇｏ信号に応答して実行された両方の演算が完了していなければならないこと、を満足する。 One example of a join operation may include data structure X. However, for the data structure X, two operations, ie, operations A and B, are performed. These operations have the following requirements: a) neither operation A nor operation B is performed except in response to the go signal; b) operation A and operation B when the go signal is received Are performed in parallel, and c) both operations performed in response to the immediately preceding go signal must be completed before either operation A or operation B is started, Satisfy.

この問題に対する簡単なソリューションは、ｊｏｉｎ演算の２つのインスタンスを用いることである。
ｓｔｒｅａｍｐｉｎｇｍｏｄｕｌｅＡ（ｐｉｎｇｐＳｔｒｍ）
｛
ｗｈｉｌｅ（ｔｒｕｅ）
｛
ｐＳｔｒｍ＞＞ｐｉｎｇ；
／／データ構造Ｘ上で演算Ａを実行する
ｏｕｔ＜＜ｐｉｎｇ；
｝
｝

ｓｔｒｅａｍｐｉｎｇｍｏｄｕｌｅＢ（ｐｉｎｇｐＳｔｒｍ）
｛
ｗｈｉｌｅ（ｔｒｕｅ）
｛
ｐＳｔｒｍ＞＞ｐｉｎｇ；
／／データ構造Ｘに関して演算Ｂを実行する
ｏｕｔ＜＜ｐｉｎｇ；
｝
｝

ｓｔｒｅａｍｐｉｎｇｍｏｄｕｌｅＣ（ｐｉｎｇｇｏＳｔｒｍ）
｛
ｓｔｒｅａｍｐｉｎｇｓｔａｒｔＳｔｒｍ＝ｇｏＳｔｒｍ．ｊｏｉ
ｎ（ｄｏｎｅＳｔｒｍ）；
ｓｔｒｅａｍｐｉｎｇＳｔｒｍＡ＝ｍｏｄｕｌｅＡ（ｓｔａｒｔＳｔｒｍ）；
ｓｔｒｅａｍｐｉｎｇＳｔｒｍＢ＝ｍｏｄｕｌｅＢ（ｓｔａｒｔＳｔｒｍ）；
ｓｔｒｅａｍｐｉｎｇｄｏｎｅＳｔｒｍ＝ＳｔｒｍＡ．ｊｏｉｎ（ＳｔｒｍＢ）；
ｄｏｎｅＳｔｒｍ．ｉｎｉｔｉａｌｉｚｅ（ｐｉｎｇ）；
ｏｕｔ＝ｄｏｎｅＳｔｒｍ；
｝
ｍｏｄｕｌｅＡおよびｍｏｄｕｌｅＢは、それぞれ演算Ａおよび演算Ｂをカプセル化する。それぞれは、ｐｉｎｇ毎に１つの演算を開始する入力ｐｉｎｇストリームと、ｐｉｎｇ毎に１つの演算の完了を確認する出力ｐｉｎｇストリームとを有する。ｍｏｄｕｌｅＣはｍｏｄｕｌｅＡおよびｍｏｄｕｌｅＢの両方の１つのインスタンスを含み、ｇｏＳｔｒｍ入力ｐｉｎｇストリームを介してｇｏ信号を受け取る。 A simple solution to this problem is to use two instances of the join operation.
stream ping moduleA (ping pStrm)
{
while (true)
{
pStrm >>ping;
// execute operation A on data structure X
out <<ping;
}
}

stream ping module B (ping pStrm)
{
while (true)
{
pStrm >>ping;
// perform operation B on data structure X
out <<ping;
}
}

stream ping moduleC (ping goStrm)
{
stream ping startStrm = goStrm. joi
n (doneStrm);
stream ping StrmA = moduleA (startStrm);
stream ping StrmB = module B (startStrm);
stream ping doneStrm = StrmA. join (StrmB);
doneStrm. initialize (ping);
out = doneStrm;
}
moduleA and moduleB encapsulate operation A and operation B, respectively. Each has an input ping stream that starts one operation per ping and an output ping stream that confirms the completion of one operation per ping. moduleC contains one instance of both moduleA and moduleB and receives the go signal via the goStrm input ping stream.

ｍｏｄｕｌｅＣにおける６つの文は以下の役割を果たす。
ｓｔｒｅａｍｐｉｎｇｓｔａｒｔＳｔｒｍ＝ｇｏＳｔｒｍ．ｊｏｉｎ（ｄｏｎｅＳｔｒｍ）；
は、ｇｏＳｔｒｍとｄｏｎｅＳｔｒｍとを繋いで、ｓｔａｒｔＳｔｒｍを作る。このように、ｇｏＳｔｒｍ（すなわちｇｏ信号）上にｐｉｎｇが存在し、ｄｏｎｅＳｔｒｍ上にｐｉｎｇが存在する（これは、直前のｇｏ信号に応答した演算Ａおよび演算Ｂが完了したことを示す）場合に、ｐｉｎｇがｓｔａｒｔＳｔｒｍに代入される。
ｓｔｒｅａｍｐｉｎｇＳｔｒｍＡ＝ｍｏｄｕｌｅＡ（ｓｔａｒｔＳｔｒｍ）；
は、ｓｔａｒｔＳｔｒｍをｍｏｄｕｌｅＡの入力ｐｉｎｇストリームに接続し、ｍｏｄｕｌｅＡの出力ｐｉｎｇストリームをＳｔｒｍＡに接続する。このことは、演算Ａは、直前のｇｏ信号に関連付けられた両方の演算が完了した後にのみ、ｇｏ信号に応答して行われることを意味する。
ｓｔｒｅａｍｐｉｎｇＳｔｒｍＢ＝ｍｏｄｕｌｅＢ（ｓｔａｒｔＳｔｒｍ）；
は直前の文と同様であり、この文は、直前のｇｏ信号に関連付けられた両方の演算が完了した後にのみ、演算Ｂがｇｏ信号に応答して行われることが確実になされるようにする。しかし、演算Ａおよび演算Ｂが行われる順序に制限はない。換言すれば、演算Ａおよび演算Ｂは並列的に行われる。
ｓｔｒｅａｍｐｉｎｇｄｏｎｅＳｔｒｍ＝ＳｔｒｍＡ．ｊｏｉｎ（ＳｔｒｍＢ）；
はｍｏｄｕｌｅＡの出力ｐｉｎｇストリームであるＳｔｒｍＡと、ｍｏｄｕｌｅＢの出力ｐｉｎｇストリームであるＳｔｒｍＢとを繋ぐ。このように、直前のｇｏ信号に応答して行われた両方の演算が完了したならば、ｐｉｎｇがｄｏｎｅＳｔｒｍに代入される。
ｄｏｎｅＳｔｒｍ．ｉｎｉｔｉａｌｉｚｅ（ｐｉｎｇ）；
は、システム初期化時に単一のｐｉｎｇをｄｏｎｅＳｔｒｍに代入する。このことは、すべての以前の演算が、まったく存在せず、完了したことを示す。この文がなければ、ｍｏｄｕｌｅＣはデッドロックし、演算はまったく行われないであろう。
ｏｕｔ＝ｄｏｎｅＳｔｒｍ；
は、ｄｏｎｅＳｔｒｍをｍｏｄｕｌｅＣのデフォルト出力ストリームであるｏｕｔに接続する。このストリーム上の各ｐｉｎｇは、ｇｏ信号に応答して行われた演算Ａおよび演算Ｂが完了したことを確認する。ｍｏｄｕｌｅＣの挙動は、ｍｏｄｕｌｅＣの入力ポート上でｇｏ信号（ｐｉｎｇ）として総括され、その結果、以前の演算が完了した後にのみ、演算Ａおよび演算Ｂはデータ構造Ｘに関して並列的に行われ得る。演算Ａおよび演算Ｂの両
方が完了すると、モジュールＣはその出力ポート上で確認としてｐｉｎｇを送信する。 The six sentences in module C play the following roles.
stream ping startStrm = goStrm. join (doneStrm);
Connects goStrm and doneStrm to make startStrm. Thus, if there is a ping on the goStrm (i.e. the go signal) and a ping on the doneStrm (which indicates that operations A and B have been completed in response to the previous go signal): ping is assigned to startStrm.
stream ping StrmA = moduleA (startStrm);
Connects startStrm to the input ping stream of moduleA and connects the output ping stream of moduleA to StrmA. This means that operation A is performed in response to the go signal only after both operations associated with the previous go signal have been completed.
stream ping StrmB = module B (startStrm);
Is similar to the previous sentence, and this sentence ensures that the operation B is performed in response to the go signal only after both operations associated with the previous go signal have been completed. . However, the order in which the operation A and the operation B are performed is not limited. In other words, the operation A and the operation B are performed in parallel.
stream ping doneStrm = StrmA. join (StrmB);
Connects StrmA, which is the output ping stream of moduleA, to StrmB, which is the output ping stream of moduleB. Thus, if both operations performed in response to the previous go signal are complete, ping is substituted for doneStrm.
doneStrm. initialize (ping);
Assigns a single ping to doneStrm at system initialization. This indicates that all previous operations have been completely absent. Without this statement, moduleC will deadlock and no operation will be performed at all.
out = doneStrm;
Connects doneStrm to out which is the default output stream of moduleC. Each ping on this stream confirms that operations A and B performed in response to the go signal are complete. The behavior of module C is summarized as a go signal (ping) on the input port of module C so that operations A and B can be performed in parallel with respect to data structure X only after the previous operations have been completed. When both operation A and operation B are complete, module C sends ping as a confirmation on its output port.

ｐｉｎｇＳｔｒｍ＞＞ｐｉｎｇ；
（式中、ｐｉｎｇＳｔｒｍはｐｉｎｇ型のストリームである）の形の文は、スレッドの実行をｐｉｎｇＳｔｒｍ内のｐｉｎｇと同期する機能を有する。この文がスレッドにおいて遭遇されると、単一のｐｉｎｇがｐｉｎｇＳｔｒｍから読み出される（消費される）。ｐｉｎｇＳｔｒｍが空き状態（すなわち、ｐｉｎｇＳｔｒｍにｐｉｎｇが存在しない）である場合、この文は、ｐｉｎｇが利用可能となるまで、ブロック（ストール）される。したがって、この文は、ｐｉｎｇがｐｉｎｇＳｔｒｍに存在するときにのみスレッドの進行を許可する守衛として機能する。この演算では変数は関与せず、＞＞通常は変数の存在が期待される演算子の右側には、キーワードｐｉｎｇのみが存在する。 pingStrm >>ping;
The statement in the form (where pingStrm is a ping-type stream) has the ability to synchronize the execution of a thread with the ping in pingStrm. When this statement is encountered in a thread, a single ping is read (spent) from pingStrm. If pingStrm is free (ie, there is no ping in pingStrm), this statement is blocked (stall) until a ping is available. Thus, this statement acts as a guard that only allows the thread to progress when the ping is in pingStrm. In this operation, no variable is involved, and >> Only the keyword ping exists to the right of the operator that is normally expected to exist.

ｐｉｎｇＳｔｒｍ＜＜ｐｉｎｇ；
（式中、ｐｉｎｇＳｔｒｍはｐｉｎｇ型のストリームである）の形の文は、スレッドが、特定の演算（単数または複数）が完了したことを関係者に知らせることを可能にする。この文がスレッドにおいて遭遇されると、単一のｐｉｎｇがｐｉｎｇＳｔｒｍに書き込まれる（代入される）。上述の第１の文とは異なり、この文は決してブロックされない。 pingStrm <<ping;
The statement in the form (where pingStrm is a ping-type stream) allows the thread to inform interested parties that a particular operation or operations have been completed. When this statement is encountered in a thread, a single ping is written to (assigned to) pingStrm. Unlike the first sentence above, this sentence is never blocked.

ｐｉｎｇが関与するこれらの２つの形のストリーム／スレッド相互作用が、以下の部分的コードにおいて例示される。
ｓｔｒｅａｍｐｉｎｇｍｏｄｕｌｅＡ（ｐｉｎｇｐＳｔｒｍ）
｛
／／ループに入る前に初期化を行う
ｗｈｉｌｅ（ｔｒｕｅ）
｛
ｐＳｔｒｍ＞＞ｐｉｎｇ；
／／副作用を有する演算を行う
ｏｕｔ＜＜ｐｉｎｇ；
｝
｝
ｍｏｄｕｌｅＡは、単一の入力ポートおよび単一の出力ポートを有し、その両方がｐｉｎｇ型である。ｍｏｄｕｌｅＡ内には無限ループを含むスレッドが存在する。なお、この無限ループの各反復は、以下の文
ｐＳｔｒｍ＞＞ｐｉｎｇ；
により開始される。この文は、ループの反復を、モジュール入力ストリームｐＳｔｒｍにおけるｐｉｎｇと同期させる機能を有する。ｐＳｔｒｍが空き状態であるとき、この文はブロックされ、ｐＳｔｒｍが非空き状態であるとき、この文はｐＳｔｒｍから単一のｐｉｎｇを消費する。その文に続いて、必ず副作用を伴う活動に関連する文がある。副作用がないなら、ｍｏｄｌｅＡは操作不能と等価となるであろう。各反復の末尾には、次の文
ｏｕｔ＜＜ｐｉｎｇ；
が存在する。なお、この文は、ｍｏｄｕｌｅＡの標準出力ポートを通して、他のループ反復が完了したことを知らせる。 These two forms of stream / thread interaction involving ping are illustrated in the partial code below.
stream ping moduleA (ping pStrm)
{
// Initialize before entering loop
while (true)
{
pStrm >>ping;
// Perform operations with side effects
out <<ping;
}
}
moduleA has a single input port and a single output port, both of which are pinged. In module A, there is a thread including an infinite loop. Note that each iteration of this infinite loop has the following statement: pStrm >>ping;
It is started by This statement has the function of synchronizing the iterations of the loop with the pings in the module input stream pStrm. When pStrm is idle, this sentence is blocked, and when pStrm is non-empty, this sentence consumes a single ping from pStrm. That sentence is followed by a sentence that is always related to activities with side effects. If there are no side effects, modleA will be equivalent to inoperable. At the end of each iteration, the following statement out <<ping;
Exists. Note that this statement informs other loop iterations through the standard output port of moduleA.

完全にストリームドメイン内で作業するとき、ｊｏｉｎ演算子は有用である。しかし、スレッド内で結合を行うことがより便利となる状況も存在し得る。例えば、スレッド
ｓｔｒｅａｍｐｉｎｇｐｉｎｇＳｔｒｍ［３２］；
内で、個別ストリームを結合することを考えてみる。それは、スレッド内でｆｏｒループを埋め込むことにより達成され得る。
ｆｏｒ（ｉｎｔｉ＝０；ｉ＜３２；＋＋ｉ）
｛
ｐｉｎｇＳｔｒｍ［ｉ］＞＞ｐｉｎｇ；
｝
このループは、１つのｐｉｎｇがｐｉｎｇＳｔｒｍ内の３２個のストリームのそれぞれから消費されるまで、ブロックされる。ｐｉｎｇＳｔｒｍ［］．ｊｏｉｎ（）の出力ストリームに対応する出力ストリームは、文
ｊｏｉｎＳｔｒｍ＜＜ｐｉｎｇ；
を有するｆｏｒループに従うことにより、作られる。 The join operator is useful when working entirely in the stream domain. However, there may be situations where it is more convenient to perform coupling within a thread. For example, thread stream ping pingStrm [32];
Consider combining individual streams within. It can be achieved by embedding a for loop in the thread.
for (int i = 0; i <32; ++ i)
{
pingStrm [i] >>ping;
}
This loop is blocked until one ping is consumed from each of the 32 streams in pingStrm. pingStrm []. The output stream corresponding to the output stream of join () is the statement joinStrm <<ping;
It is made by following a for loop with.

ｐｉｎｇＳｔｒｍ［］．ｊｏｉｎ（）の挙動を模倣するモジュールを作るために、これらの２つの部分的コードがｗｈｉｌｅ（ｔｒｕｅ）ループに埋め込まれ、そのループは、モジュール
ｓｔｒｅａｍｐｉｎｇｊｏｉｎＡｒｒａｙ（ｐｉｎｇｐｉｎｇＳｔｒｍ［３２］）
｛
ｗｈｉｌｅ（ｔｒｕｅ）
｛
ｆｏｒ（ｉｎｔｉ＝０；ｉ＜３２；＋＋ｉ）
{
ｐｉｎｇＳｔｒｍ［ｉ］＞＞ｐｉｎｇ；
}
ｏｕｔ＜＜ｐｉｎｇ；
｝
｝
に代入される。 pingStrm []. These two partial codes are embedded in a while (true) loop to create a module that mimics the behavior of join (), which is the module stream ping joinArray (ping pingStrm [32])
{
while (true)
{
for (int i = 0; i <32; ++ i)
{
pingStrm [i] >>ping;
}
out <<ping;
}
}
Assigned to

埋め込まれたスレッドを有するモジュールは、ｐｉｎｇＳｔｒｍ．ｊｏｉｎ（ｅｘｐｒ）（式中、ｅｘｐｒは式である）の挙動を模倣するために用いられ得る。しかし、この場合、モジュールは、ｐｉｎｇＳｔｒｍに対する入力ストリームばかりではなく、ｅｘｐｒの各入力ストリームに対する入力ストリームも必要とする。したがって、例えば、ｅｘｐｒが式Ｘ＊Ｙ＋Ｚ（式中、Ｘ、Ｙ、およびＺは整数である）である場合、ｐｉｎｇＳｔｒｍ．ｊｏｉｎ（ｅｘｐｒ）を実装するモジュールは、
ｓｔｒｅａｍｐｉｎｇｊｏｉｎＥｘｐｒ（ｐｉｎｇｐｉｎｇＳｔｒｍ，ｉｎｔＸ，ｉｎｔＹ，ｉｎｔＺ）
｛
ｗｈｉｌｅ（ｔｒｕｅ）
｛
ｐｉｎｇＳｔｒｍ＞＞ｐｉｎｇ；
ｏｕｔ＜＜Ｘ＊Ｙ＋Ｚ；
｝
｝
のようになるであろう。 The module with the embedded thread is pingStrm. It can be used to mimic the behavior of join (expr), where expr is an expression. However, in this case, the module needs not only the input stream for pingStrm but also the input stream for each input stream of expr. Thus, for example, if expr is the expression X * Y + Z, where X, Y and Z are integers, then pingStrm. The module that implements join (expr) is
stream ping joinExpr (ping pingStrm, int X, int Y, int Z)
{
while (true)
{
pingStrm >>ping;
out << X * Y + Z;
}
}
It will be like.

画素処理例は、同一のタスクが配列等の同一のデータ構造の異なる部分上で並列的に（並行して）実行される、並列性の１つの形態であるデータ並列性の実装におけるｐｉｎｇ、ストリーム配列、およびモジュール配列の使用を示す。この例は、モジュール配列およびモジュールからなる。
ｅｘｔｅｒｎｉｎｔｘＳｃａｌｅＦａｃｔｏｒ，ｙＳｃａｌｅＦａｃｔｏｒ；
ｓｔｒｅａｍｐｉｎｇｄｏＰｉｘｅｌ［６４］［２５６］（ｉｎｔ＊ｂａＳｔｒｍ）／／本体は
｛／／スレッドドメインにある
ｃｏｎｓｔｉｎｔｘ＝ｘＳｃａｌｅＦａｃｔｏｒ＊ｉｎｄｅｘ（０）；
ｃｏｎｓｔｉｎｔｙ＝ｙＳｃａｌｅＦａｃｔｏｒ＊ｉｎｄｅｘ（１）；
ｉｎｔ＊ｂａｓｅＡｄｄｒｅｓｓ；
ｗｈｉｌｅｔｒｕｅ
｛
ｂａＳｔｒｍ＞＞ｂａｓｅＡｄｄｒｅｓｓ；
・／／演算をｂａＳｔｒｍ［ｘ］［ｙ］および
. ／／その近傍に関して行う
・
ｏｕｔ＜＜ｐｉｎｇ；
｝
｝

ｓｔｒｅａｍｖｏｉｄｐａｒｅｎｔＭｏｄｕｌｅ（ｉｎｔ＊ｂａＳｔｒｍ）／／本体は
｛／／ストリームドメイン内にある
ｓｔｒｅａｍｐｉｎｇｘＳｔｒｍ［６４］［２５６］；
ｓｔｒｅａｍｐｉｎｇｊＳｔｒｍ；
ｊＳｔｒｍ．ｉｎｉｔｉａｌｉｚｅ（ｐｉｎｇ）；
ｘＳｔｒｍ［］［］＝ｄｏＰｉｘｅｌ［］［］（ｊＳｔｒｍ．ｊｏｉｎ（ｂａＳｔｒｍ））；
ｊＳｔｒｍ＝ｘＳｔｒｍ［］［］．ｊｏｉｎ（）；
｝ The pixel processing example is a form of parallelism, a ping in the implementation of data parallelism, a stream in which the same task is performed in parallel (parallel) on different parts of the same data structure such as an array, etc. The use of sequences and module sequences is shown. This example consists of a module arrangement and modules.
extern int xScaleFactor, yScaleFactor;
stream ping doPixel [64] [256] (int * baStrm) // The body is {// the thread domain
const int x = xScaleFactor * index (0);
const int y = yScaleFactor * index (1);
int * baseAddress;
while true
{
baStrm >>baseAddress;
・ / / Operation baStrm [x] [y] and
. // // Do it in the vicinity
・
out <<ping;
}
}

stream void parentModule (int * baStrm) // The body is {// the stream domain
stream ping xStrm [64] [256];
stream ping jStrm;
jStrm. initialize (ping);
xStrm [] [] = doPixel [] [] (jStrm.join (baStrm));
jStrm = xStrm [] []. join ();
}

２次元モジュール配列ｄｏＰｉｘｅｌ［６４］［２５６］は、画素の２次元配列のサイズと一致するように作られている。ｄｏＰｉｘｅｌ［６４］［２５６］が動作する画素配列のベースアドレスは、入力ストリームｂａＳｔｒｍにより供給される。個別ｄｏＰｉｘｅｌモジュール上の画素のｘ座標は、個別ｄｏＰｉｘｅｌモジュールのｘインデックスであるｉｎｄｅｘ（０）（セクション５．３参照）にグローバル定数ｘＳｃａｌｅＦａｃｔｏｒを乗算することにより得られる。個別ｄｏＰｉｘｅｌモジュール上の画素のｙ座標は、個別ｄｏＰｉｘｅｌモジュールのｙインデックスであるｉｎｄｅｘ（１）にグローバル定数ｙＳｃａｌｅＦａｃｔｏｒを乗算することにより得られる。各画素の処理は、変数ｂａＳｔｒｍをｂａＳｔｒｍの現在値に設定することにより始まる。次いで、ｂａＳｔｒｍ［ｘ］［ｙ］およびその近傍について演算が行われる。処理が終わると、個別ｄｏＰｉｘｅｌモジュールはｐｉｎｇを発行することにより完了を知らせる。 The two-dimensional module array doPixel [64] [256] is made to match the size of the two-dimensional array of pixels. The base address of the pixel array on which doPixel [64] [256] operates is supplied by the input stream baStrm. The x-coordinates of the pixels on the individual doPixel module are obtained by multiplying the global constant xScaleFactor with index (0) (see section 5.3), which is the x index of the individual doPixel module. The y-coordinate of the pixel on the individual doPixel module is obtained by multiplying the global constant yScaleFactor with index (1), which is the y index of the individual doPixel module. The processing of each pixel begins by setting the variable baStrm to the current value of baStrm. Next, operations are performed on baStrm [x] [y] and its neighborhood. Once processed, the individual doPixel module signals completion by issuing a ping.

ｐａｒｅｎｔＭｏｄｕｌｅは、画素配列のベースアドレスを、ｄｏＰｉｘｅｌ［６４］［２５６］内の個別モジュールにブロードキャストする機能を担当する。このことは、以下の文
ｘＳｔｒｍ［］［］＝ｄｏＰｉｘｅｌ［］［］（ｊＳｔｒｍ．ｊｏｉｎ（ｂａＳｔｒｍ））；
によりなされる。ここで、ｄｏＰｉｘｅｌの入力引数リスト内の式ｊＳｔｒｍ．ｊｏｉｎ（ｂａＳｔｒｍ）は、ｊＳｔｒｍにｐｉｎｇが存在する場合にのみｂａＳｔｒｍ内の値が通過することを許可する守衛として機能する。以下の文
ｊＳｔｒｍ．ｉｎｉｔｉａｌｉｚｅ（ｐｉｎｇ）；
によりｊＳｔｒｍに挿入される初期ｐｉｎｇは、まさに第１のベースアドレスが妨げられずに通過することを許可する。その後、ｐｉｎｇは、以下の文
ｊＳｔｒｍ＝ｘＳｔｒｍ［］［］．ｊｏｉｎ（）；
（ここで、ｘＳｔｒｍ［６４］［２５６］は、ｄｏＰｉｘｅｌ［６４］［２５６］内の個々のモジュールにより作られたｐｉｎｇストリームの配列である）により、ｊＳｔｒｍに挿入される。したがって、新しいｐｉｎｇは、ｄｏＰｉｘｅｌ［６４］［２５６］内のすべのモジュールが、ｐｉｎｇを発行することにより、以前の演算の完了を知らせる場合にのみ、ｊＳｔｒｍに挿入される。これにより、画素配列に関するすべての演算は、次の配列に関する演算が開始される前に、確実に完了することとなる。 The parentModule is responsible for broadcasting the base address of the pixel array to the individual modules in doPixel [64] [256]. The following statement xStrm [] [] = doPixel [] [] (jStrm.join (baStrm));
It is done by Here, the expression jStrm. In the input argument list of doPixel. join (baStrm) acts as a guard, allowing values in baStrm to pass only if there is a ping on jStrm. The following statement jStrm. initialize (ping);
The initial ping, which is inserted into jStrm by, allows just the first base address to pass undisturbed. Then ping is as follows: jStrm = xStrm [] []. join ();
(Here, xStrm [64] [256] is an array of ping streams created by the individual modules in doPixel [64] [256]) and is inserted into jStrm. Thus, a new ping is inserted in jStrm only if all modules in doPixel [64] [256] signal the completion of the previous operation by issuing a ping. This ensures that all operations on the pixel array are completed before operations on the next array begin.

標準的Ｃデータ型よりもむしろｐｉｎｇを用いることに、大きい利点が存在する。Ｃデータ型を用いると、先入れ先出し（ＦＩＦＯ）が、Ｃデータ型ストリームのすべてのデスティネーション、すなわち、ストリームが式への入力となるすべての位置において、データ値に必要とされる。しかしｐｉｎｇは互いに対して区別がつかないため、ｐｉｎｇストリームの各デスティネーションにおいて必要とされるものは、キューに入れられたｐｉｎｇの個数を知らせるカウンタのみである。これにより、データ値に対する先入れ先出しキューと比較して、コストが顕著に削減されることとなる。 There are significant advantages to using ping rather than the standard C data type. With the C data type, first in, first out (FIFO) is required for data values at all destinations of the C data type stream, ie, at every position where the stream is an input to the expression. However, because pings are indistinguishable from one another, all that is required at each destination of the ping stream is a counter that tells the number of pings queued. This results in a significant cost reduction compared to a first-in-first-out queue for data values.

ｐｒａｇｍａコマンドはストリームＣコンパイラ／リンカ／ローダに対する指示である。指示＃ｐｒａｇｍａＩｎｉｔｉａｌｉｚｅＣｏｕｎｔ（ｍ，ｐ，ｎ）は、モジュールｍからｎの入力／出力ポートｐの消費側／作成側カウントを初期化する。ｐｒａｇｍａコマンドは、モジュール定義＃ｐｒａｇｍａＦｗｒｄｓＡｃｋＶａｌｕｅ（ｍ，ｓ，ｎ）の直後に続かなければならない。この定義は、ｎを、モジュールｍの出力ストリームｓから開始されるポイント・トゥ・ポイント接続に対するフォワードアクノレッジメント値として指定する。ｐｒａｇｍａコマンドは、モジュールｍの出力ストリームｓから開始されるポイント・トゥ・ポイント接続に対するバックワードアクノレッジメント値としてｎを指定するモジュール定義
＃ｐｒａｇｍａＢｗｒｄｓＡｃｋＶａｌｕｅ（ｍ，ｓ，ｎ）
の直後に続かなければならない。ｐｒａｇｍａコマンドは、モジュール定義の直後に続かなければならない。 The pragma command is an instruction to the stream C compiler / linker / loader. The instruction #pragma InitializeCount (m, p, n) initializes the consumption / creation side count of the input / output port p of the modules m to n. The pragma command must immediately follow the module definition #pragma FwrdsAckValue (m, s, n). This definition specifies n as the forward acknowledgment value for point-to-point connections starting from output stream s of module m. The pragma command specifies n as the backward acknowledgment value for point-to-point connections starting from output stream s of module m. #pragma BwrdsAckValue (m, s, n)
It must follow right after. The pragma command must immediately follow the module definition.

上述の概念のいくつかの利点例は、スレッドおよびマルチスレッド、すなわち、複数スレッドの並列実行のサポートである。また、ＳＩＭＤ、ＭＩＤＭ、命令レベル、タスクレベル、データ並列性、データフロー、およびＳｙｓｔｏｌｉｃ等の並列性のすべての形態が表現可能である。決定的挙動がデフォルトである。非決定性は、プログラムに必要な場合にのみ明示的に追加される。なぜなら、ソフトウェアテストの容易性および信頼性をより効果的なものとするものは、逐次プログラミングの中にあるためである。上述した概念は、明示的な並列的構文を有さない。並列性は、普通の逐次コードに、少なくともシンタックスの面で、類似するストリームドメインにおけるコードから抜け落ちてしまう。したがって、スレッドドメインで作業するプログラマは厳格にシーケンシャル問題に集中してよい。プログラミングモデルは、モデルに基づく設計およびモデルに基づくテストに役立ち、プロセシングコアの任意の個数に応じて拡大縮小する。プログラミングモデルは、プロセシングコアを隔てる距離がナノメートル単位の場合にも千キロメートル単位の場合にも、等しく適用可能である。フォアグラウンドまたはバックグラウンドのタスクは存在せず、ただタスクのみが存在し、割込またはメッセージ引き渡しも存在せず、ただストリームのみが存在する。 Some example benefits of the above concept are the support of threads and multithreading, i.e. parallel execution of multiple threads. In addition, all forms of parallelism such as SIMD, MIDM, instruction level, task level, data parallelism, data flow, and Systolic can be represented. Deterministic behavior is the default. Nondeterminism is added explicitly only when needed for the program. The reason is that what makes software test easiness and reliability more effective is in sequential programming. The concepts described above do not have explicit parallel syntax. Parallelism falls out of code in similar stream domains, at least in terms of syntax, to regular sequential code. Thus, programmers working in the thread domain may focus strictly on sequential issues. The programming model is useful for model-based design and model-based testing, scaling with any number of processing cores. The programming model is equally applicable when the distance separating the processing cores is in nanometers or thousands of kilometers. There are no foreground or background tasks, only tasks, no interrupts or message passing, only streams.

本発明は、本発明に係る特定の実施形態に関して説明してきたが、これらの実施形態は単に例示に過ぎず、本発明を限定するものではない。例えば、任意の種類の処理ユニット、機能的回路または、１つまたは複数のユニットの集合、および／またはメモリ、Ｉ／Ｏ素子、その他等のリソースがノードに含まれ得る。ノードは、簡単なレジスタでもよく、または、デジタル信号処理システム等のように、より複雑なものであってもよい。本明細
書に説明したものではなく、他の種類のネットワークまたは相互接続方式も用いられ得る。本発明の特徴または態様は、好適な実施形態に関して本明細書で説明された適合システム以外のシステムにおいても達成され得る。
なお、本願発明に実施形態には、以下の発明（出願時の請求項１＋６に対応）が含まれるが、これに限定されない：
＜発明１＞
ストリームベースのコンピュータソースコード変換して、マルチコアコンピュータシステムのランタイム動作の実行を、効率的に動作させるようにするシステムであって、
ストリームベースのコンピュータソースコードを記憶するメモリデバイスであって、
当該ソースコードが、1つ又は複数のモジュール定義を含み、各モジュール定義が、入力ストリームのリスト、出力ストリームのリスト、及び、本体を有するものと、
複数のプロセッシングコアと、
前記複数のコアのうちの少なくとも1つを含む変換システム(conversion system)であって、当該変換システムが、前記ストリームベースのコンピュータソースコードからの、1つ又は複数のプロセッシングコア上での実行のためのタスクの収集（collection）を生成し、各タスクが、前記ソースコード内のスレッドドメイン本体を伴ったモジュール定義か、前記ソースコード内のストリーム式（expression）を伴ったモジュール定義か、のいずれかの、実行可能なバージョンであり、当該タスクが、更に、前記ソースコードモジュール定義、又は、ストリーム式の入力ストリームと出力ストリームに対応する、入力ストリーム及び出力ストリームを有する、
ものと、
を備えるシステム。
＜発明６＞
発明１に記載のシステムであって、
前記変換システムが、各プロセッシングコアに、ゼロ又はそれより多い、実行されるべきタスクを割り当て（assigns）、
各プロセッシングコアが、それぞれのプロセッシングコアに割り当てられたタスクの各々の実行を管理するタスクマネジャを含む、
システム。 Although the invention has been described with reference to particular embodiments according to the invention, these embodiments are merely exemplary and do not limit the invention. For example, a node may include any type of processing unit, functional circuit or collection of one or more units, and / or resources such as memory, I / O elements, and the like. The nodes may be simple registers, or more complex, such as digital signal processing systems. Other types of networks or interconnection schemes may also be used, not described herein. The features or aspects of the invention may also be achieved in a system other than the adaptation system described herein with respect to the preferred embodiments.
The embodiments of the present invention include the following inventions (corresponding to claims 1 + 6 at the time of application), but are not limited thereto:
<Invention 1>
A system that performs stream-based computer source code conversion so that execution of runtime operation of a multi-core computer system is efficiently operated,
A memory device for storing stream based computer source code, comprising:
The source code includes one or more module definitions, and each module definition includes a list of input streams, a list of output streams, and a body.
With multiple processing cores,
A conversion system that includes at least one of the plurality of cores, for the conversion system to execute on the one or more processing cores from the stream-based computer source code. Create a collection of tasks, and each task is either a module definition with a thread domain body in the source code or a module definition with a stream expression in the source code An executable version of the task, the task further comprising an input stream and an output stream corresponding to the source code module definition or the streamed input stream and output stream,
things and,
System with
<Invention 6>
A system according to Invention 1;
The transformation system assigns zero or more tasks to be performed to each processing core,
Each processing core includes a task manager that manages the execution of each of the tasks assigned to the respective processing core,
system.

Claims

A system that translates stream-based computer source code to efficiently execute the execution of runtime operations of a multi-core computer system, comprising:
A memory device for storing stream based computer source code, comprising:
The source code includes one or more module definitions, and each module definition includes a list of input streams, a list of output streams, and a body.
With multiple processing cores,
A conversion system that includes at least one of the plurality of cores, for the conversion system to execute on the one or more processing cores from the stream-based computer source code. Either zero or more tasks , each task being a module definition with a thread domain body in the source code or a module definition with a stream expression in the source code The executable version, the task further comprises an input stream and an output stream, and the input stream and the output stream respectively are module-defined input stream and output stream, or stream- type input For streams and output streams,
things and,
System with

The system according to claim 1, wherein the source code includes function definitions having variables as inputs and outputs, wherein the function definitions and the module definitions are distinguishable from one another.

The system according to claim 1, wherein each task is
When the data value is available, the task when it encounters the instruction for obtaining a data value from the input stream of the task, to consume the data values from the first-in first-out queue in the input stream of the task,
Task, a data value, when encountering the instructions for placement in the output stream, the data value, Ru is arranged on the output stream of the task,
system.

The system according to claim 1, wherein each task corresponding to a stream expression is
When the data value is available from the first-in first-out queue in the input stream of the task, to consume the data value,
Every time the stream formula Ru is evaluated, the data values, Ru is arranged on the output stream of the task,
system.

The system according to claim 1, wherein the stream based computer source code specifies module definition and stream type interconnection, and the module definition is defined in the computer source code or the included library. The stream expression is identical to the C expression computer program expression containing the variables , except that it is an instance of, and the variables are replaced by the stream in the stream expression , and the interconnection is
Input stream module definition in the input stream module instance or stream type,
To the input stream module instance or stream type, module instance or stream type of the output stream, or,
To the module definition of the output stream, the module instance or stream type of the output stream,
Consisting of streams connecting one or more of
system.

The system according to claim 1, wherein
Each processing core includes a task manager that manages the execution of each of the tasks assigned to the respective processing core,
system.

The system of claim 6, wherein the task manager manages task execution on a plurality of processing cores.

The system according to claim 6, wherein
The task manager
Consumer against the input stream of the task (consumer) count,
Production side count of the output stream of the task,
Ready-to-run queue, ready-to-run queue,
An input count for each task that determines the number of input streams for that task that are required to be enabled to prepare the task for execution;
An output count for each task, which determines the number of output streams of that task that are required to be enabled to prepare the task for execution.
Maintain the system.

The system according to claim 1, wherein the stream based computer source code comprises a stream enhanced to a first-class status.
The position of the first class, stream identifier, the input parameters of the function, the output of the function, and allow it to be constrained such that parameters or outside expressions in a system.

The system of claim 1, wherein the stream-based computer source code comprises a ping stream that conveys pings that do not have associated data values from the ping stream source to the ping stream destination. system.

The system of claim 1, wherein stream-based computer source code includes an assignment statement that causes the output stream of the stream expression to be the source of a second, previously declared stream. system.

The system of claim 1, further comprising an interconnection network carrying data values between the memory device and a plurality of processing cores, wherein the interconnection network and the plurality of processing cores are a runtime system. Configure the system.

A method performed by the multi-core computer system for converting stream-based computer source code to efficiently operate the execution of runtime operations of a multi-core computer system including multiple processing cores, comprising:
Storing stream-based computer source code in a memory device, the source code comprising one or more module definitions, each module definition comprising a list of input streams, a list of output streams, and With one having the body,
Generating zero or more tasks for execution on one or more processing cores from stream-based computer source code, each task comprising a thread domain body within the source code; An executable version of either an accompanying module definition or a module definition with a stream expression in source code, the task further comprising an input stream and an output stream, each of the input stream and the output stream but the input and output streams of the module definition, or, corresponding to the input and output streams of a stream type,
Performing the generation of zero or more tasks on a plurality of processing cores to perform runtime operations;
Method including.

14. The method of claim 13, further comprising: conveying data values between the plurality of processing cores and the memory device via an interconnection network.

The method according to claim 13, wherein the stream-based computer source code comprises function definitions having variables as input and output, wherein the function definitions and module definitions can be distinguished from one another.

The method of claim 13, wherein each task is
When the data value is available, the task when it encounters the instruction for obtaining a data value from the input stream of the task, to consume the data values from the first-in first-out queue in the input stream of the task,
Task, a data value, when encountering the instructions for placement in the output stream, the data value, Ru is arranged on the output stream of the task,
Method.

The method according to claim 13, wherein each task corresponding to a stream expression is
When the data value is available from the first-in first-out queue in the input stream of the task, to consume the data value,
Every time the stream formula Ru is evaluated, the data values, Ru is arranged on the output stream of the task,
Method.

The method according to claim 13, wherein the stream based computer source code specifies module definition and stream type interconnection, and the module definition is a computer source code or a module defined in an included library. The stream expression is identical to the C expression computer program expression containing the variables, except that it is an instance of, and the variables are replaced by the stream in the stream expression , and the interconnection is
Input stream module definition in the input stream module instance or stream type,
To the input stream module instance or stream type, module instance or stream type of the output stream, or,
To the module definition of the output stream, the module instance or stream type of the output stream,
Consisting of streams connecting one or more of
Method.

The method according to claim 13, wherein the stream based computer source code comprises a stream enhanced to a first-class status.
The position of the first class, stream identifier, the input parameters of the function, the output of the function, and allow it to be constrained such that parameters or outside expressions in, and a method.

A programmable computing device,
With multiple processing cores,
A memory device for storing stream-based computer source code, wherein the source code comprises one or more module definitions, each module definition comprising a list of input streams, a list of output streams, and a body (body With)),
A converter coupled to the memory device, wherein the converter performs zero or more tasks from stream-based computer source code for execution on one or more processing cores And each task is an executable version of either a module definition with a thread domain body in the source code or a module definition with a stream expression in the source code, and further comprising an input and output streams, each of the input and output streams are input and output streams of the module definition, or correspond to the input and output streams of a stream type,
The converter selects one of the plurality of processing cores for each of the zero or more tasks , wherein the plurality of cores are for zero or more tasks in parallel with execution of the run-time operation. A computing device that performs generation.