JPH054712B2

JPH054712B2 -

Info

Publication number: JPH054712B2
Application number: JP61011577A
Authority: JP
Inventors: Masaki Aoki; Morie Sagawa
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-01-22
Filing date: 1986-01-22
Publication date: 1993-01-20
Also published as: JPS62169272A

Description

【発明の詳細な説明】〔概要〕自動ベクトル化対象プログラムのコンパイルに
あたつて、ベクトル化後のベクトル演算列に関す
る外側ループ中のデータ依存関係を把握し、その
結果に従つて、外側ループの回転数を１／Ｎと
し、ベクトル演算列をＮ倍に展開することによ
り、コンパイルされたプログラムの実行性能を向
上させる。[Detailed Description of the Invention] [Summary] When compiling a program to be automatically vectorized, data dependencies in the outer loop regarding vector operation sequences after vectorization are grasped, and according to the results, data dependencies in the outer loop are By setting the rotation speed to 1/N and expanding the vector operation sequence N times, the execution performance of the compiled program is improved.

[Industrial application field]

本発明は、ベクトル計算機を持つデータ処理装
置によつて実行されるプログラムをコンパイルす
る処理方式に係り、特にループ中のベクトル演算
列をアンローリングするベクトル演算列ループア
ンローリング処理方式に関するものである。 The present invention relates to a processing method for compiling a program executed by a data processing device having a vector computer, and more particularly to a vector operation string loop unrolling processing method for unrolling a vector operation string in a loop.

[Conventional technology]

例えばFORTRAN言語等により作成されたプ
ログラムを、ベクトル計算機を用いて実行させる
ために、DOループの配列等について、自動的に
ベクトル演算列を生成するコンパイラが用いられ
ている。このコンパイラが生成するオプジエクト
について、ベクトル化率を上げることは、ベクト
ル計算機による実行性能を向上させるために重要
な課題とされている。しかしながら、ハードウエ
ア資源であるベクトル計算機を最大限有効に使う
には、ベクトル化後のベクトル演算列を最適にス
ケジユーリングすることも必要である。 For example, in order to execute a program written in the FORTRAN language or the like using a vector computer, a compiler is used that automatically generates vector operation sequences for DO loop arrays and the like. Increasing the vectorization rate of objects generated by this compiler is considered an important issue in order to improve the execution performance of vector computers. However, in order to make the most effective use of a vector computer, which is a hardware resource, it is also necessary to optimally schedule the vector operation sequence after vectorization.

この最適スケジユーリングとは、ベクトル計算
機におけるロード・ストアパイプライン、加算パ
イプライン、乗算パイプライン等を流れるデータ
の密度を濃くし、実行の待ち時間が少なくなるよ
うに、ベクトル演算列を並べることである。 Optimal scheduling refers to arranging vector operation sequences in a way that increases the density of data flowing through the load/store pipeline, addition pipeline, multiplication pipeline, etc. in a vector computer, and reduces the waiting time for execution. It is.

この最適スケジユーリングのため、従来、ソー
スレベルのスカライメージで、ユーザの手作業に
より、プログラムをチユーニングすることが行わ
れていた。 In order to achieve this optimal scheduling, programs have conventionally been manually tuned by users using source-level scalar images.

[Problem that the invention seeks to solve]

しかし、ユーザが手作業により、ソースプログ
ラムをチユーニングした場合、次のような問題が
発生する。 However, when a user manually tunes a source program, the following problems occur.

スカライメージでベクトル版にチユーニング
したソースプログラムは、ベクトル処理機能を
持たない汎用計算機上では、実行性能が低下す
る可能性がある。 A source program tuned to a vector version using a scalar image may have lower execution performance on a general-purpose computer that does not have vector processing capabilities.

チユーニングするために多大な労力および時
間を要する。 Tuning requires a great deal of effort and time.

ソースプログラムの記述性が損なわれる。 The descriptive nature of the source program is impaired.

ユーザのチユーニングにより性能が低下し、
逆効果となることがある。 Performance decreases due to user tuning,
It may have the opposite effect.

本発明は上記問題点を解決するため、ベクトル
演算列をループアンローリングすることにより、
ソースプログラムから自動的に最適化されたオブ
ジエクトを生成する１方式を提供することを目的
としている。 In order to solve the above problems, the present invention performs loop unrolling on a vector operation sequence.
The purpose is to provide a method for automatically generating optimized objects from a source program.

[Means for solving problems]

第１図は本発明の基本構成例ブロツク図を示
す。 FIG. 1 shows a block diagram of an example of the basic configuration of the present invention.

第１図において、１０は高級言語により記述さ
れたソースプログラム、１１はCPUおよびメモ
リ等からなる処理装置、１２はソースプログラム
１０を機械語のオブジエクトに翻訳するコンパイ
ラ、１３はプログラム入力部、１４はベクトル化
処理部、１５はデータ依存関係解析部、１６はア
ンローリング実施条件判定部、１７はアンローリ
ング処理部、１８はオブジエクト生成部、１９は
ソースプログラム１０に対応する機械語コード列
からなるオブジエクトプログラムを表す。 In FIG. 1, 10 is a source program written in a high-level language, 11 is a processing unit consisting of a CPU, memory, etc., 12 is a compiler that translates the source program 10 into machine language objects, 13 is a program input unit, and 14 is a 15 is a data dependency analysis unit; 16 is an unrolling execution condition determination unit; 17 is an unrolling processing unit; 18 is an object generation unit; 19 is an object consisting of a machine code string corresponding to the source program 10; represents an actual program.

プログラム入力部１３は、ソースプログラム１
０から処理すべきソースステートメントを入力す
る。この入力プログラムを解析することにより、
中間テキストが生成される。コンパイラ１２は、
自動ベクトル化機能を備えており、ベクトル化処
理部１４によつて、中間テキストを解読し、ベク
トル化可能なものを検出して、ベクトル演算列を
生成する。 The program input section 13 includes a source program 1
Enter the source statement to be processed starting from 0. By parsing this input program,
Intermediate text is generated. The compiler 12 is
It has an automatic vectorization function, and the vectorization processing unit 14 decodes the intermediate text, detects what can be vectorized, and generates a vector operation sequence.

データ依存関係解析部１５は、多重ループにお
ける内側のループが、ベクトル化処理部１４によ
つて、ベクトル化されている場合に、そのベクト
ル化されたベクトル演算列の外側ループにおける
データ依存関係を解析するものである。 When the inner loop in the multiple loop is vectorized by the vectorization processing section 14, the data dependence analysis section 15 analyzes the data dependence relation in the outer loop of the vectorized vector operation sequence. It is something to do.

アンローリング実施条件判定部１６は、データ
依存関係解析部１５による解析結果により、予め
各データ依存関係に対応してアンローリングの可
否情報が登録されたテーブルを検索することによ
り、アンローリングの可否を判定する。ループの
アンローリングとは、外側ループの回転数を１／
Ｎ（Ｎは２以上の整数）とし、ベクトル演算列を
Ｎ倍に展開する処理である。 The unrolling execution condition determining unit 16 determines whether or not unrolling is possible by searching a table in which unrolling permission information is registered in advance for each data dependency relationship based on the analysis result by the data dependency relationship analysis unit 15. judge. Unrolling the loop means reducing the number of rotations of the outer loop by 1/
N (N is an integer of 2 or more), and the vector operation sequence is expanded N times.

アンローリング処理部１７は、アンローリング
実施条件判定部１６により、アンローリング可と
判定された場合に、ベクトル演算列を分解して、
ループアンローリングを行う。外側ループの回転
数は、１／Ｎに削減されるが、端数が出る場合に
は、その残りのベクトル演算列による処理命令列
を、ループの外側に付加する。 When the unrolling execution condition determining unit 16 determines that unrolling is possible, the unrolling processing unit 17 decomposes the vector operation sequence,
Perform loop unrolling. The number of rotations in the outer loop is reduced to 1/N, but if a fraction occurs, a processing instruction sequence based on the remaining vector operation sequence is added to the outside of the loop.

ベクトル化され、アンローリングされた中間テ
キストは、必要に応じてさらに他の手段により最
適化される。オブジエクト生成部１８は、最終的
にオブジエクトプログラム１９を生成する。 The vectorized and unrolled intermediate text is further optimized by other means as necessary. The object generation unit 18 finally generates an object program 19.

[Effect]

以下、FORTRANプログラムのループアンロ
ーリングを例にして、本発明の作用を説明する。 The operation of the present invention will be explained below using loop unrolling of a FORTRAN program as an example.

例えば、 DO 10 Ｊ＝１，100 DO 10 Ｉ＝１，10000 Ａ（Ｉ，Ｊ）＝Ｂ（Ｉ，Ｊ）＋Ｃ（Ｉ，Ｊ）＊
Ｄ（Ｉ，Ｊ） 10 CONTINUE という二重ループのプログラムは、ベクトル化処
理部１４により、内側ループについて、次のよう
にベクトル化が行われる。 For example, DO 10 J=1,100 DO 10 I=1,10000 A(I,J)=B(I,J)+C(I,J)*
In the double-loop program D(I,J) 10 CONTINUE, the vectorization processing unit 14 vectorizes the inner loop as follows.

DO 10 Ｊ＝１，100 Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ）＊
Ｄ（＊，Ｊ） 10 CONTINUE ここで、配列中の「＊」は、１から10000まで
の値をとるベクトル・パラメータであつて、ベク
トル長は10000である。 DO 10 J=1,100 A(*,J)=B(*,J)+C(*,J)*
D(*,J) 10 CONTINUE Here, "*" in the array is a vector parameter that takes a value from 1 to 10,000, and the vector length is 10,000.

アンローリング処理部１７は、これについて、
次のようにループアンローリングを行う。 Regarding this, the unrolling processing unit 17
Perform loop unrolling as follows.

DO 10 Ｊ＝１，100，２Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ）＊
Ｄ（＊，Ｊ）Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ＋１）＋Ｃ（＊，
Ｊ＋１）＊Ｄ（＊，Ｊ＋１） 10 CONTINUE 即ち、ループ制御変数の増分値を２倍にするこ
とにより、外側ループのループ回転数を1/2とし、
内部のベクトル演算列を分解して２倍にする。３
重展開以上についても同様である。展開されたベ
クトル演算列は、個別にベクトル計算機における
パイプラインによつて処理されるので、パイプラ
インの処理密度を高密度化することが可能にな
り、パイプライン・スケジユーリングが最適化さ
れる。 DO 10 J=1,100,2 A(*,J)=B(*,J)+C(*,J)*
D(*,J) A(*,J+1)=B(*,J+1)+C(*,
J+1)*D(*,J+1) 10 CONTINUE In other words, by doubling the increment value of the loop control variable, the loop rotation speed of the outer loop is halved,
Decomposes the internal vector operation sequence and doubles it. 3
The same applies to double expansion and above. The expanded vector operation sequence is individually processed by the pipeline in the vector calculator, making it possible to increase the processing density of the pipeline and optimizing pipeline scheduling. .

また、次のような場合には、ベクトル演算にお
ける共通式の最適化によるベクトルテキスト最適
化が可能になる。例えば、ベクトル化後のベクト
ル演算列が、 DO 10 Ｊ＝１，100 Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ）＋Ａ（＊，Ｊ） 10 CONTINUE であるとする。ここで、配列中の「＊」は、前例
と同様に、１から10000までの値をとるベクト
ル・パラメータである。 Furthermore, in the following cases, vector text optimization becomes possible by optimizing common expressions in vector operations. For example, assume that the vector operation sequence after vectorization is DO 10 J=1,100 A(*, J+1)=B(*, J)+A(*, J) 10 CONTINUE. Here, "*" in the array is a vector parameter that takes values from 1 to 10,000, as in the previous example.

DO 10 Ｊ＝１，100，２Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ）＋Ａ（＊，Ｊ）
…… Ａ（＊，Ｊ＋２）＝Ｂ（＊，Ｊ＋１）＋Ａ（＊，
Ｊ＋１） …… 10 CONTINUE このベクトル演算列における右辺第２項は、
ベクトル演算列の左辺と同じ値をとる。ベクト
ル計算機により、ベクトル演算列を実行する
と、ベクトルレジスタにＡ（＊，Ｊ＋１）が得ら
れるので、次のベクトル演算列の実行におい
て、Ａ（＊，Ｊ＋１）をロードする必要がない。
これにより、高速実行が可能になり、ベクトルテ
キストの最適化が可能になる。 DO 10 J=1,100,2 A(*,J+1)=B(*,J)+A(*,J)
... A(*, J+2)=B(*, J+1)+A(*,
J+1) ... 10 CONTINUE The second term on the right side of this vector operation sequence is
Takes the same value as the left side of the vector operation sequence. When the vector calculation sequence is executed by the vector computer, A(*, J+1) is obtained in the vector register, so there is no need to load A(*, J+1) when executing the next vector calculation sequence.
This enables fast execution and optimization of vector text.

〔Example〕

第２図は本発明の一実施例処理説明図、第３図
はアンローリング可否テーブルの例、第４図はデ
ータ依存関係値とアンローリング展開数との関連
を説明する図を示す。 FIG. 2 is a diagram illustrating a process of an embodiment of the present invention, FIG. 3 is an example of an unrolling availability table, and FIG. 4 is a diagram illustrating the relationship between data dependency values and the number of unrolling expansions.

本発明によるループアンローリング処理は、例
えば第２図に示すように行われる。なお、この処
理は、処理対象ループ内にベクトル化された演算
列が存在するときに呼び出される。以下の説明に
おける処理番号〜は、第２図に示す番号〜
に対応する。 The loop unrolling process according to the present invention is performed, for example, as shown in FIG. Note that this process is called when a vectorized operation sequence exists in the loop to be processed. Processing numbers ~ in the following explanation are numbers shown in Figure 2 ~
corresponds to

データ依存関係値をもとに、第３図に示すよ
うなアンローリング可否テーブルを検索する。 Based on the data dependency value, an unrolling possibility table as shown in FIG. 3 is searched.

なお、データ依存関係値およびアンローリン
グ可否テーブルについては、後に詳述する。 Note that the data dependency value and the unrolling possibility table will be described in detail later.

アンローリング可否テーブルを検索した結果
により、アンローリングの可／不可を判定し、
アンローリングが不可である場合には、アンロ
ーリングによる最適化処理を行わずに、次の最
適化処理へ進む。 Based on the result of searching the unrolling possibility table, it is determined whether unrolling is possible or not.
If unrolling is not possible, the process proceeds to the next optimization process without performing the optimization process by unrolling.

他のアンローリング実施条件についても判定
する。この条件として、例えばループの回転数
が２以上（陽に判明している場合）であるこ
と、ループの出口が１つであること、ループ内
でベクトル長の変化がないことなどがある。ま
た、アンローリングにより、実行効率がよくな
るかどうかの条件についても判定する。これら
の各条件が満足されない場合、次の最適化処理
へ進む。 Other unrolling execution conditions are also determined. These conditions include, for example, that the number of rotations of the loop is 2 or more (if it is explicitly known), that the loop has one exit, that there is no change in the vector length within the loop, etc. It also determines the conditions as to whether or not unrolling improves execution efficiency. If each of these conditions is not satisfied, proceed to the next optimization process.

ループアンローリングのために、外側ループ
の回転数を１／Ｎにする。なお、説明を簡単に
するために、以下、Ｎ＝２の場合について説明
する。 For loop unrolling, the rotation speed of the outer loop is set to 1/N. Note that to simplify the explanation, the case where N=2 will be explained below.

ベクトル演算列を２倍にする。即ち、元のベ
クトル演算列に対して、配列の添字式の値を歩
進したベクトル演算列を生成して付加する。 Double the vector operation sequence. That is, a vector operation sequence is generated and added by incrementing the value of the subscript expression of the array to the original vector operation sequence.

ループ回転数が定数であるかどうかを判定す
る。定数でない場合には、処理へ制御を移
す。 Determine whether the loop rotation speed is constant. If it is not a constant, control is transferred to processing.

元のループ回転数が偶数であるか奇数である
かを判定する。偶数である場合、次の最適化処
理へ進み、奇数である場合には、処理を実行
する。 Determine whether the original loop rotation number is even or odd. If the number is even, proceed to the next optimization process; if the number is odd, execute the process.

元のループにおいて最後に実行されるベクト
ル演算列の部分を、新しいループの外に付加し
て、次の最適化処理へ進む。 The part of the vector operation sequence that is executed last in the original loop is added to the outside of the new loop, and the process proceeds to the next optimization process.

ループ回転数が変数である場合、ダイナミツ
クに回転数を判定するテキストを生成して、付
加する。 If the loop rotation speed is a variable, a text that dynamically determines the rotation speed is generated and added.

回転数の判定に対応して、1/2にした回転数
の端数となる分のベクトル演算列をループの外
に付加する。その後、次の最適化処理へ進む。 Corresponding to the determination of the number of rotations, a vector operation sequence corresponding to the fraction of the halved number of rotations is added to the outside of the loop. After that, proceed to the next optimization process.

ベクトル演算列をループアンローリングする場
合、アンローリングによつて、配列の定義／参照
に関するベクトル計算機による実行順番が意図し
ないものとなつて、正しい結果が得られなくなる
可能性がある。そのため、本発明では、予め、次
のようなデータ依存関係値を求めておき、これに
よつて、アンローリングの可否を決定する。 When unrolling a vector operation sequence in a loop, the unrolling may cause the vector computer to execute the array definitions/references in an unintended order, making it impossible to obtain correct results. Therefore, in the present invention, the following data dependency relationship value is obtained in advance, and based on this, it is determined whether or not unrolling is possible.

データ依存関係値は、ループ内における前後す
る配列添字式の相対的な値関係を示すものと考え
てよい。例えば、前に現れる配列が、Ａ（Ｉ）で
あつて、後に現れる配列が、Ａ（Ｉ＋２）である
とき、データ依存関係値は、制御変数Ｉが共通し
ているので、Ｉ＝０として、次のように求められ
る。 The data dependency value can be considered to indicate the relative value relationship between the preceding and following array subscript expressions within the loop. For example, when the array that appears before is A(I) and the array that appears after is A(I+2), the data dependency value is set as I=0 because the control variable I is common. It is calculated as follows.

（０）−（０＋２）＝−２データ依存関係値の種類は、例えば、以下の通
りである。 (0)-(0+2)=-2 Examples of the types of data dependency values are as follows.

（記号）（意味） φ：重なりなし（データ依存関係なし）．＋：順方向のデータ依存関係あり． −：逆方向のデータ依存関係あり．＊：制御変数が出現していない．？：データ依存関係が不明である．＋OR−：順方向のデータ依存関係あり．（スカラとベクトル）０：同じ位置をアクセスしている．＋の値：順方向にいくつ、ずれているかを表す． −の値：逆方向にいくつ、ずれているかを表す．アンローリングの可否は、以上のようなデータ
依存関係値によつて、決められる。そのため、例
えば第３図に示すようなアンローリング可否テー
ブルが用いられる。(Symbol) (Meaning) φ: No overlap (no data dependence). +: There is a forward data dependency. -: There is a data dependency in the opposite direction. *: Control variable does not appear. ? : Data dependencies are unknown. +OR-: There is a forward data dependency. (scalar and vector) 0: Accessing the same location. + value: Indicates the amount of shift in the forward direction. - value: Indicates how much it is shifted in the opposite direction. Whether or not unrolling is possible is determined by the data dependency value as described above. Therefore, for example, an unrolling possibility table as shown in FIG. 3 is used.

第３図図示アンローリング可否テーブルにおい
て、○はアンローリング可能、×はアンローリン
グ不可能、△は値によつて可否が決定されるもの
を表している。縦の列は１次元目のデータ依存関
係値、横の列は２次元目のデータ依存関係値を表
している。 In the unrolling possibility table shown in FIG. 3, ◯ indicates that unrolling is possible, × indicates that unrolling is impossible, and △ indicates whether unrolling is possible or not. The vertical columns represent first-dimensional data dependency relationship values, and the horizontal columns represent second-dimensional data dependency relationship values.

DO 10 Ｊ＝１，Ｎ DO 10 Ｉ＝１，ＭＡ（Ｉ，Ｊ）＝…… …… 10 CONTINUE このような場合、Ｉが１次元目であり、Ｊが２
次元目である。 DO 10 J=1,N DO 10 I=1,M A(I,J)=…… 10 CONTINUE In this case, I is the first dimension and J is the second dimension.
It is the dimension.

第３図において、×印に該当する場合には、ア
ンローリングすることによつて、従来なかつたデ
ータ依存関係が生じることになるので、アンロー
リング不可能とされる。△印に該当する場合に
は、第４図に示すデータ依存関係値と、アンロー
リング展開数とによつて可否が決められる。例え
ば、データ依存関係値が±２である場合、２重展
開（即ち、Ｎ＝２）のときにはアンローリング可
能であるが、３重展開以上（Ｎ≧３）ではアンロ
ーリングが不可能とされる。 In FIG. 3, in cases corresponding to the x marks, unrolling is considered impossible because unrolling will result in a data dependency relationship that did not exist before. If it falls under the △ mark, whether or not it is possible is determined based on the data dependency relationship value shown in FIG. 4 and the number of unrolling expansions. For example, if the data dependency value is ±2, unrolling is possible when there is double expansion (i.e., N=2), but unrolling is impossible when triple expansion or more (N≧3) occurs. .

次に、FORTRANプログラムの例により、ル
ープアンローリングの具体例を示す。 Next, a concrete example of loop unrolling will be shown using an example FORTRAN program.

(a) ループの回転数が陽に判明している場合であ
つて、回転数が偶数である場合［ループアンローリング前］ DO 10 Ｊ＝１，４Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ） 10 CONTINUE ［ループアンローリング後］ DO 10 Ｊ＝１，４，２Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ）Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ＋１）＋Ｃ（＊，
Ｊ＋１） 10 CONTINUE (b) 回転数が奇数である場合［ループアンローリング前］ DO 10 Ｊ＝１，５Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ） 10 CONTINUE ［ループアンローリング後］ DO 10 Ｊ＝１，３，２Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ）Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ＋１）＋Ｃ（＊，
Ｊ＋１） 10 CONTINUE Ａ（＊，５）＝Ｂ（＊，５）＋Ｃ（＊，５）最後にＪ＝５のベクトル演算列が付加されてい
る。(a) When the number of rotations of the loop is explicitly known and the number of rotations is an even number [before loop unrolling] DO 10 J=1,4 A(*, J)=B(*, J)+C(*,J) 10 CONTINUE [After loop unrolling] DO 10 J=1,4,2 A(*,J)=B(*,J)+C(*,J) A(*,J+1) =B(*,J+1)+C(*,
J+1) 10 CONTINUE (b) When the number of rotations is odd [before loop unrolling] DO 10 J=1,5 A(*,J)=B(*,J)+C(*,J) 10 CONTINUE [Loop After unrolling] DO 10 J=1,3,2 A(*,J)=B(*,J)+C(*,J) A(*,J+1)=B(*,J+1)+C(*,
J+1) 10 CONTINUE A(*,5)=B(*,5)+C(*,5) Finally, a vector operation sequence of J=5 is added.

(c) 回転数が不明な場合［ループアンローリング前］ DO 10 Ｊ＝１，ＮＡ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ） 10 CONTINUE ［ループアンローリング後］ IF（N.EQ.1）GOTO 20 DO 10 Ｊ＝１，Ｎ−１，２Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ）Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ＋１）＋Ｃ（＊，
Ｊ＋１） 10 CONTINUE IF（MOD（Ｎ，２）．EQ.0）GOTO 30 20 CONTINUE Ａ（＊，Ｎ）＝Ｂ（＊，Ｎ）＋Ｃ（＊，Ｎ） 30 CONTINUE 上記実施例では、ループアンローリングの展開
数を２としたが、例えばループの回転数が陽に３
の場合には、３重展開にするというように、多重
展開も可能である。いわゆる最適化制御行によつ
て、ユーザがアンローリングの展開数を外側から
指定できるようにしてもよい。この場合、ユーザ
は、例えば次のような最適化制御行をソースプロ
グラムに記述する。(c) When the rotation speed is unknown [Before loop unrolling] DO 10 J=1,NA A(*,J)=B(*,J)+C(*,J) 10 CONTINUE [After loop unrolling] IF (N.EQ.1) GOTO 20 DO 10 J=1,N-1,2 A(*,J)=B(*,J)+C(*,J) A(*,J+1)=B(*, J+1)+C(*,
J+1) 10 CONTINUE IF (MOD(N,2).EQ.0) GOTO 30 20 CONTINUE A(*,N)=B(*,N)+C(*,N) 30 CONTINUE In the above example, loop unrolling The number of expansions of is set to 2, but for example, if the number of rotations of the loop is explicitly 3
In this case, multiple expansion is also possible, such as triple expansion. The user may be able to specify the number of unrolling expansions from the outside using a so-called optimization control line. In this case, the user writes, for example, the following optimization control line in the source program.

「＊VOCL LOOP，UNROL（４）」ここで、＊VOCLは、この行が最適化制御行で
あることを示している。LOOPは、最適化がルー
プに対して有効であることを示す。UNROL（４）
は、４重展開にすべきことを指示している。４重
展開の場合、例えば次のようになる。"*VOCL LOOP, UNROL (4)" Here, *VOCL indicates that this line is an optimization control line. LOOP indicates that optimization is effective for loops. UNROL (4)
indicates that it should be expanded into four layers. In the case of quadruple expansion, for example, it is as follows.

［ループアンローリング前］＊VOCL LOOP，UNROL（４） DO 10 Ｊ＝１，ＮＡ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ） 10 CONTINUE ［ループアンローリング後］ IF（N.LT.4）GOTO 20 DO 10 Ｊ＝１，Ｎ−１，４Ａ（＊，Ｊ）＝Ｂ（＊，Ｊ）＋Ｃ（＊，Ｊ）Ａ（＊，Ｊ＋１）＝Ｂ（＊，Ｊ＋１）＋Ｃ（＊，
Ｊ＋１）Ａ（＊，Ｊ＋２）＝Ｂ（＊，Ｊ＋２）＋Ｃ（＊，
Ｊ＋２）Ａ（＊，Ｊ＋３）＝Ｂ（＊，Ｊ＋３）＋Ｃ（＊，
Ｊ＋３） 10 CONTINUE 20 Ｍ＝MOD（Ｎ，４） IF（M.EQ.0）GOTO 50 IF（M.EQ.1）GOTO 40 IF（M.EQ.2）GOTO 30 Ａ（＊，Ｎ−２）＝Ｂ（＊，Ｎ−２）＋Ｃ
（＊，Ｎ−２） 30 Ａ（＊，Ｎ−１）＝Ｂ（＊，Ｎ−１）＋Ｃ（＊，
Ｎ−１） 40 Ａ（＊，Ｎ）＝Ｂ（＊，Ｎ）＋Ｃ（＊，Ｎ） 50 CONTINUE この例では、ユーザが指定した最適化制御行に
より、アンローリングを４重展開で実施するとと
もに、制御変数がＮであつて、コンパイル時に
は、ループ回転数が不明であるため、回転数判定
テキストを生成して、ループの後に付加してい
る。[Before loop unrolling] *VOCL LOOP, UNROL (4) DO 10 J=1,NA A(*,J)=B(*,J)+C(*,J) 10 CONTINUE [After loop unrolling] IF( N.LT.4) GOTO 20 DO 10 J=1,N-1,4 A(*,J)=B(*,J)+C(*,J) A(*,J+1)=B(*,J+1 )+C(*,
J+1) A(*, J+2)=B(*, J+2)+C(*,
J+2) A(*, J+3)=B(*, J+3)+C(*,
J + 3) 10 CONTINUE 20 M = MOD (N, 4) IF (M.EQ.0) GOTO 50 IF (M.EQ.1) GOTO 40 IF (M.EQ.2) GOTO 30 A (*, N-2 )=B(*,N-2)+C
(*, N-2) 30 A (*, N-1) = B (*, N-1) + C (*,
N-1) 40 A(*,N)=B(*,N)+C(*,N) 50 CONTINUE In this example, unrolling is performed in quadruple expansion using the optimization control line specified by the user, and , since the control variable is N and the loop rotation speed is unknown at the time of compilation, a rotation speed determination text is generated and added after the loop.

〔Effect of the invention〕

以上説明したように、本発明によれば、データ
依存関係を把握することにより、自動的にベクト
ル演算列のループアンローリングがなされること
になり、これにより、パイプライン・スケジユー
リングの最適化が可能になる。また、ベクトルテ
キストの最適化も可能になる。従つて、実行性能
が向上し、ユーザのチユーニング時間を短縮する
ことができる。また、ソースプログラムについ
て、FORTRANプログラム等の記述性を保持す
ることができ、ソースレベルでの汎用計算機との
互換性を維持することができる。 As explained above, according to the present invention, by understanding data dependencies, loop unrolling of vector operation sequences is automatically performed, thereby optimizing pipeline scheduling. becomes possible. It also allows optimization of vector text. Therefore, execution performance is improved and the user's tuning time can be shortened. Further, the descriptive nature of the FORTRAN program and the like can be maintained for the source program, and compatibility with general-purpose computers at the source level can be maintained.

[Brief explanation of the drawing]

第１図は本発明の基本構成例ブロツク図、第２
図は本発明の一実施例処理説明図、第３図はアン
ローリング可否テーブルの例、第４図はデータ依
存関係値とアンローリング展開数との関連を説明
する図を示す。図中、１０はソースプログラム、１１は処理装
置、１２はコンパイラ、１３はプログラム入力
部、１４はベクトル化処理部、１５はデータ依存
関係解析部、１６はアンローリング実施条件判定
部、１７はアンローリング処理部、１８はオブジ
エクト生成部、１９はオブジエクトプログラムを
表す。 Fig. 1 is a block diagram of an example of the basic configuration of the present invention;
FIG. 3 shows an example of an unrolling possibility table, and FIG. 4 shows a diagram illustrating the relationship between data dependency values and the number of unrolling expansions. In the figure, 10 is a source program, 11 is a processing device, 12 is a compiler, 13 is a program input section, 14 is a vectorization processing section, 15 is a data dependency analysis section, 16 is an unrolling execution condition determination section, and 17 is an unrolling execution condition determination section. 18 is a rolling processing section, 18 is an object generation section, and 19 is an object program.

Claims

[Claims] 1. In a data processing system having a compile processing function that performs automatic vectorization, a program input unit 13 inputs a source program 10 to be compiled, and analyzes the program to be compiled input by the program input unit 13. a vectorization processing unit 14 that decodes the resulting intermediate text, detects vectorizable text, and generates a vector operation sequence; and an inner loop of a multiple loop in a program to be compiled is a data dependency relationship analysis unit 15 that analyzes data dependencies related to unrolling in the outer loop of the vectorized vector operation sequence when the vectorization is vectorized by the data dependency relationship analysis unit 15; An unrolling execution condition determination unit 16 that determines whether or not unrolling is possible according to the analysis result; and the unrolling execution condition determination unit 16,
and an unrolling processing unit 17 that sets the number of rotations of the outer loop to 1/N (N is an integer of 2 or more) and expands the vector operation sequence N times when it is determined that unrolling is possible. Features a vector operation sequence loop unrolling processing method.