JP4783387B2

JP4783387B2 - Semiconductor device

Info

Publication number: JP4783387B2
Application number: JP2008026588A
Authority: JP
Inventors: 和淑小林; 有理杉原; 洋平久米; 秀俊小野寺
Original assignee: 株式会社半導体理工学研究センター
Priority date: 2008-02-06
Filing date: 2008-02-06
Publication date: 2011-09-28
Anticipated expiration: 2028-02-06
Also published as: JP2009188193A

Description

本発明は、マトリックス状に配置された複数個の演算回路を備え、演算回路間の配線経路が変更可能に構成された半導体デバイスに関するものである。 The present invention relates to a semiconductor device that includes a plurality of arithmetic circuits arranged in a matrix and is configured such that a wiring path between arithmetic circuits can be changed.

ムーアの法則に従い集積回路は年々微細化の道を辿っている。製造技術の進歩によりＬＳＩの性能は向上を続けているがデバイスの特性バラツキによる回路性能のばらつきは深刻化している。図１７に示すように、以前はチップ間ばらつきと呼ばれる、デバイスの特性がウエハ上を滑らかに変動するばらつき成分が支配的であったが、近年の微細プロセスにおいてはチップ内でデバイス特性が確率分布に基づいて変化するチップ内ばらつきが支配的となっている。図中、ｆｒｅｑｕｅｎｃｙは、チップの性能に相当する。微細プロセスにおけるチップ内ばらつきは、トランジスタの閾値電圧に影響を与える不純物密度のランダムなばらつきによって生じる。 In accordance with Moore's Law, integrated circuits are on the path of miniaturization year by year. Although LSI performance continues to improve due to advances in manufacturing technology, variations in circuit performance due to variations in device characteristics have become serious. As shown in FIG. 17, previously, a variation component called device-to-chip variation in which device characteristics fluctuate smoothly on the wafer was dominant. However, in recent microprocesses, device characteristics have a probability distribution within a chip. The variation in the chip that changes based on this is dominant. In the figure, frequency corresponds to the performance of the chip. In-chip variation in a fine process is caused by random variation in impurity density that affects the threshold voltage of a transistor.

このようなデバイス特性のばらつきを解決するために統計的静的遅延解析（ＳＳＴＡ）（例えば、非特許文献１参照）やＤＦＭなどによるばらつきを考慮した設計が考案されている。チップ内ばらつきに対する従来手法では、プロセス技術や回路技術を用いてばらつきを抑えるのみであり、例えば、チップや機能ブロック毎に基板電圧を制御する基板バイアス制御がある（非特許文献２参照）。しかしながら、非特許文献３に記載のように、異なる特性をもつ小さな領域に対して様々な電圧を供給するのは不可能であるので、基板バイアス制御は高周波帯域や現在の微細プロセスにおいては効果が小さい。 In order to solve such variations in device characteristics, a design that takes into account variations due to statistical static delay analysis (SSTA) (see, for example, Non-Patent Document 1) or DFM has been devised. The conventional method for intra-chip variation only suppresses the variation using process technology or circuit technology. For example, there is substrate bias control for controlling the substrate voltage for each chip or functional block (see Non-Patent Document 2). However, as described in Non-Patent Document 3, since it is impossible to supply various voltages to a small region having different characteristics, the substrate bias control is effective in the high frequency band and the current fine process. small.

将来の微細化プロセスにおいてデバイスばらつきを抑制する事は困難であると予想される。そこで本願の発明者は、ばらつきを抑制するのではなく、チップ内ばらつきを利用することができ、ばらつきに応じて再構成可能なデバイスを提案した（非特許文献４参照）。これにより、チップ内ばらつきがより大きくなるほど、回路性能をより向上することができる。 It is expected that it will be difficult to suppress device variations in future miniaturization processes. Therefore, the inventor of the present application has proposed a device that can use intra-chip variation instead of suppressing variation and can be reconfigured according to the variation (see Non-Patent Document 4). Thereby, the circuit performance can be further improved as the variation in the chip becomes larger.

現在の微細プロセスにおいてＦＰＧＡ（フィールドプログラマブルゲートアレイ（登録商標））でのばらつきを観察するために、９０ｎｍプロセスにおいてｉｓｌａｎｄ−ｓｔｙｌｅＦＰＧＡを試作した。図１８（ａ）は、ＦＰＧＡの全体構造を示す概略図であり、図１８（ｂ）は、チップの写真である。 In order to observe variations in FPGA (Field Programmable Gate Array (registered trademark)) in the current fine process, an island-style FPGA was prototyped in a 90 nm process. FIG. 18A is a schematic view showing the entire structure of the FPGA, and FIG. 18B is a photograph of the chip.

図１８（ａ）に示すように、ＦＰＧＡは４８×４８の論理ブロックアレイを持ち、チップ面積は３ｍｍ角である。このチップは、ＰＬＬ、ＳＲＡＭおよびカウンタをさらに備えている。 As shown in FIG. 18A, the FPGA has a 48 × 48 logic block array, and the chip area is 3 mm square. This chip further includes a PLL, an SRAM, and a counter.

図１９は、上記ＦＰＧＡの構造を示す概略図であり、ＦＰＧＡには、複数のＣＬＢ１３０（可変論理ブロック（ＣｏｎｆｉｇｕｒａｂｌｅＬｏｇｉｃＢｌｏｃｋ））、ＳＢ１４０（スイッチブロック（ＳｗｉｔｃｈＢｌｏｃｋ））およびＣＢ１７０（入出力スイッチ）が構成部品として設けられている。図中のレイアウト単位をマクロ化して、４８×４８個のアレイ状に配置することで、４７×４７はほぼ均質なレイアウトとなる。 FIG. 19 is a schematic diagram illustrating the structure of the FPGA. The FPGA includes a plurality of CLBs 130 (variable logic blocks), SB140 (switch blocks) and CB170 (input / output switches). It is provided as a component. By arranging the layout units in the figure as macros and arranging them in an array of 48 × 48, 47 × 47 becomes a substantially homogeneous layout.

図２０（ａ）は、ＣＬＢ１３０の構成を示す回路図である。ＣＬＢ１３０は、４入力のＬＵＴ１３１、４つのセレクタ１３２ａ・１３２ｂ・１３２ｃ・１３２ｄ、５つのＤＦＦ１３３ａ・１３３ｂ・１３３ｃ・１３３ｄ・１３３ｅを備えている。この試作ＦＰＧＡでは、セレクタ１３２ａ・１３２ｂ・１３２ｄおよびＤＦＦ１３３ｂ・１３３ｄが追加されている。この追加による面積オーバーヘッドは２３％であるが、チップ全体では１％の増加にとどまる。 FIG. 20A is a circuit diagram showing a configuration of the CLB 130. FIG. The CLB 130 includes a four-input LUT 131, four selectors 132a, 132b, 132c, and 132d, and five DFFs 133a, 133b, 133c, 133d, and 133e. In this prototype FPGA, selectors 132a, 132b, 132d and DFFs 133b, 133d are added. The area overhead due to this addition is 23%, but the increase in the entire chip is only 1%.

図２０（ｂ）は、ＤＦＦ１３３ｂを示す回路図である。ＤＦＦ１３３ｂは分周器として使用され、セレクタ１３２ｃからの発振波形を数回分周し、分周波形を出力する。この発振周波数は、論理ブロックアレイの性能（遅延時間）に依存しているので、発振周波数を測定することで、当該性能を判別できる。 FIG. 20B is a circuit diagram showing the DFF 133b. The DFF 133b is used as a frequency divider, divides the oscillation waveform from the selector 132c several times, and outputs a divided waveform. Since this oscillation frequency depends on the performance (delay time) of the logic block array, the performance can be determined by measuring the oscillation frequency.

発振波形は、ＦＰＧＡ内にリングオシレータ（ＲＯ）を設けることで発生させる。図２１（ａ）は、リングオシレータの構成例を示す回路図である。リングオシレータは、複数のインバータを環状に接続することで発振する。 The oscillation waveform is generated by providing a ring oscillator (RO) in the FPGA. FIG. 21A is a circuit diagram showing a configuration example of the ring oscillator. The ring oscillator oscillates by connecting a plurality of inverters in a ring shape.

図２１（ｂ）に示すように、このようなリングオシレータを論理ブロック１個を使用して構成し、発振させる。発振波形を図２０（ｂ）に示すＤＦＦ１３３ｂによって分周し、カウンタで一定時間の発振回数をカウントする。ここで、分周回数をｄ、発振回数をＮ、計数時間をＴとすると、周波数ｆは、
ｆ＝（Ｎ×２^ｄ）／Ｔ
となる。 As shown in FIG. 21B, such a ring oscillator is configured using one logic block and oscillated. The oscillation waveform is divided by the DFF 133b shown in FIG. 20B, and the number of oscillations for a predetermined time is counted by the counter. Here, when the frequency division frequency is d, the oscillation frequency is N, and the counting time is T, the frequency f is
f = (N × 2 ^d ) / T
It becomes.

このように、１つのチップにつき、４７×４８×８個のブロックにおいて、周波数ｆを測定する。図２２は、ＦＰＧＡのばらつき測定結果を示す概略図であり、性能によって各論理ブロックの濃淡が決定されている。サンプルＡ〜Ｄにおける同じ場所でも構成により速度が異なり、隣接する論理ブロックであっても全く性能が異なる場合がある。これにより、試作したチップでは位置相関のないランダムばらつきが支配的であることが分かる。 In this manner, the frequency f is measured in 47 × 48 × 8 blocks per chip. FIG. 22 is a schematic diagram showing the variation measurement result of FPGA, and the shade of each logical block is determined by the performance. Even in the same place in the samples A to D, the speed varies depending on the configuration, and even the adjacent logical blocks may have completely different performance. Thus, it can be seen that random variations without positional correlation are dominant in the prototype chip.

また、図２３は、ＦＰＧＡのばらつき測定結果を示すヒストグラムであり、１チップ１６１９２個のリングオシレータの発振周波数分布を示している。このヒストグラムは、正規分布の確率密度関数に測定結果のμ、σを代入した曲線とほぼ一致し、歪度の１０チップ平均は０．００５５、尖度の１０チップ平均は０．０２５である。これから、チップ内遅延ばらつきは、正規分布に従っていることが分かる。 FIG. 23 is a histogram showing the results of FPGA variation measurement, and shows the oscillation frequency distribution of 16192 ring oscillators per chip. This histogram almost coincides with the curve obtained by substituting μ and σ of the measurement result into the probability density function of the normal distribution, the 10-chip average of skewness is 0.0055, and the 10-chip average of kurtosis is 0.025. From this, it can be seen that the in-chip delay variation follows a normal distribution.

非特許文献５では、ＦＰＧＡにおいてチップ内ばらつきを利用する、領域に応じた機能割り当て（ｒｅｇｉｏｎｒｅｌｏｃａｔｉｏｎ）、複数コンフィグレーション（ｍｕｌｔｉｐｌｅｃｏｎｆｉｇｕｒａｔｉｏｎｓ）、クリティカルパス再構成（ｃｒｉｔｉｃａｌｐａｔｈｒｅｃｏｎｆｉｇｕｒａｔｉｏｎ）の三つの手法が開示されている。 Non-Patent Document 5 discloses three methods of function allocation (region relocation), multiple configurations (multiple configurations), and critical path reconfiguration (critical path reconfiguration) that use intra-chip variation in an FPGA. ing.

領域に応じた機能割り当ては、デバイス特性がチップ上をなめらかに変動するシステマティックばらつきに適している。システマティックばらつきが支配的であると、デバイス特性はチップ上の位置によって変化する。このように、隣接する素子のばらつきに相関があれば、遅延ばらつきの測定によって各チップのばらつきマップを得る事は容易になる。複数コンフィグレーションとクリティカルパス再構成では、比較的長いクリティカルパスをもつ機能ブロックを、システマティックばらつきによって高速になっている領域に割り当てることにより、ランダムばらつきによる回路性能の低下を防止している。 The function assignment according to the region is suitable for systematic variations in which device characteristics smoothly change on the chip. If the systematic variation is dominant, the device characteristics change depending on the position on the chip. Thus, if there is a correlation between the variations of adjacent elements, it is easy to obtain a variation map for each chip by measuring the delay variation. In multiple configurations and critical path reconfiguration, functional blocks having relatively long critical paths are assigned to areas that are faster due to systematic variations to prevent circuit performance degradation due to random variations.

しかしながら、図２２に示すように、現在の微細プロセスでは位置相関の無いランダムばらつきが支配的である。無相関なランダムばらつきが支配的であると、これらの素子を構成している個々のトランジスタの特性は、全て異なったものとなる。例えば、スイッチブロックを構成する各々のスイッチはそれぞれ異なった速度となる。 However, as shown in FIG. 22, random variations without positional correlation are dominant in the current fine process. If uncorrelated random variations are dominant, the characteristics of the individual transistors constituting these elements are all different. For example, each switch constituting the switch block has a different speed.

したがって、領域に応じた機能割り当ての手法では、個々の論理ブロックやスイッチブロック、コネクションブロックの性能を予測し、個々のスイッチの速度ばらつきを測定するのは非常に困難となる。非特許文献６において指摘されているように、チップ毎に最適な配置を行うために詳細なばらつき情報を得ることは難しいため、測定したばらつきマップを基にしてチップ毎に最適なコンフィグレーションを得ることは不可能である。 Therefore, it is very difficult to predict the performance of each logical block, switch block, and connection block and measure the speed variation of each switch by the function allocation method according to the area. As pointed out in Non-Patent Document 6, since it is difficult to obtain detailed variation information for optimal arrangement for each chip, an optimal configuration is obtained for each chip based on the measured variation map. It is impossible.

このため、ランダムばらつきによる速度、歩留まりの低下を改善する手法としては、クリティカルパス再構成および複数コンフィグレーションが適している。 For this reason, critical path reconfiguration and multiple configurations are suitable as methods for improving the reduction in speed and yield due to random variations.

図２４は、複数コンフィグレーションの概念図である。非特許文献７では、ＭＥＣＰＣｓ（ＭｕｔｕａｌｌｙＥｘｃｌｕｓｉｖｅＣｒｉｔｉｃａｌＰａｔｈＣｏｎｆｉｇｕｒａｔｉｏｎｓ）と呼ばれる複数コンフィグレーションの手法を提案している。これはほぼ同じ性能の複数のコンフィグレーションを用意する。コンフィグレーションは、クリティカルパスが前に生成したコンフィグレーションのクリティカルパスによって既に用いられている素子を用いないように配線される。ここで生成したコンフィグレーションをＦＰＧＡに搭載し、タイミング制約を満たすコンフィグレーションが見つかるまでタイミング検証が行われる。 FIG. 24 is a conceptual diagram of multiple configurations. Non-Patent Document 7 proposes a multi-configuration technique called MECPCs (Mutual Exclusive Critical Path Configurations). This provides multiple configurations with approximately the same performance. The configuration is wired so that the critical path does not use elements already used by the critical path of the configuration generated earlier. The configuration generated here is mounted on the FPGA, and timing verification is performed until a configuration satisfying the timing constraint is found.

図２５は、クリティカルパス再構成の概念図である。クリティカルパス再構成は、全てのチップで初期構成を同一にする。タイミング制約を満たさないクリティカルパスを再配線し、タイミング検証を行う。全てのクリティカルパスがタイミング制約を満たすか、ある回数を超えるまで再配線およびタイミング検証を繰り返す。 FIG. 25 is a conceptual diagram of critical path reconstruction. In the critical path reconfiguration, the initial configuration is the same for all chips. Rewiring critical paths that do not meet timing constraints and verifying timing. Rewiring and timing verification are repeated until all critical paths satisfy timing constraints or exceed a certain number of times.

複数コンフィグレーションでは全てのコンフィグレーションについてテストを行う。クリティカルパス数が多くなると、クリティカルパスがタイミング制約に違反する確率も大きくなる。 For multiple configurations, test all configurations. As the number of critical paths increases, the probability that the critical path violates the timing constraint also increases.

一方、クリティカルパス再構成では、タイミング制約を満たしたクリティカルパスは固定され、タイミング制約に違反したパスのみ再配線される。個々のチップのランダムばらつきに合わせてコンフィグレーションは徐々に最適化される。 On the other hand, in critical path reconfiguration, critical paths that satisfy timing constraints are fixed, and only paths that violate timing constraints are rerouted. The configuration is gradually optimized to meet the random variations of individual chips.

図２６は、従来の半導体デバイス９１の構成を示すブロック図である。半導体デバイス９１は、フィールドプログラマブルデバイスによって構成されており、基板２を備えている。基板２には、複数のＣＬＢ９３ａ〜ＣＬＢ９３ｏがマトリックス状に配置されている。また、ＳＢ９４ａ〜９４ｈが、４個のＣＬＢに囲まれる位置に設けられている。互いに隣接するＣＬＢ９３間には、複数本のトラック５が設けられ、各ＳＢ９４は、互いに隣接するＣＬＢ９３間のトラック５のうちの１本と、互いに隣接する他のＣＬＢ９３間のトラック５のうちの１本とを接続する。 FIG. 26 is a block diagram showing a configuration of a conventional semiconductor device 91. The semiconductor device 91 is configured by a field programmable device and includes a substrate 2. On the substrate 2, a plurality of CLBs 93a to CLB 93o are arranged in a matrix. Further, SBs 94a to 94h are provided at positions surrounded by four CLBs. A plurality of tracks 5 are provided between adjacent CLBs 93, and each SB 94 includes one of the tracks 5 between adjacent CLBs 93 and one of the tracks 5 between other adjacent CLBs 93. Connect the book.

ＳＢ９４は、ｉｓｌａｎｄ−ｓｔｙｌｅアーキテクチャの一般的なＦＰＧＡを開示している非特許文献８に開示されているスイッチブロックである。図２７は、ＳＢ９４の構造を示す概略図であり、ＣＬＢ間の配線トラックを４本とする。配線トラックＬ１は配線トラックＴ１、Ｒ１、Ｂ１と接続することができ、同様に、各々の配線トラックは３方向の配線トラックと接続することができる。このように、ＳＢ９４では、スイッチブロックの自由度Ｆｓは３となる。ここで、自由度とは、各々の配線トラックが接続可能な他の配線トラックの数を指す（非特許文献９参照）。 The SB 94 is a switch block disclosed in Non-Patent Document 8 that discloses a general FPGA of an island-style architecture. FIG. 27 is a schematic diagram showing the structure of the SB 94, and there are four wiring tracks between CLBs. The wiring track L1 can be connected to the wiring tracks T1, R1, and B1. Similarly, each wiring track can be connected to a wiring track in three directions. Thus, in SB94, the degree of freedom Fs of the switch block is 3. Here, the degree of freedom refers to the number of other wiring tracks to which each wiring track can be connected (see Non-Patent Document 9).

図２８は、クリティカルパスのトラック入れ替えを示す概念図である。クリティカルパスがタイミング制約に違反しているとき、図２６に示す配線経路変更部６は、ＣＬＢ間のパスを隣接する配線トラックに再配線する。隣接する配線トラックに再配線したとき、遅延が改善されるような再配線のみ受け入れる。 FIG. 28 is a conceptual diagram showing track replacement of a critical path. When the critical path violates the timing constraint, the wiring path changing unit 6 shown in FIG. 26 reroutes the path between the CLBs to the adjacent wiring track. Only rewiring that will improve delay when rewired to an adjacent wiring track is accepted.

また、再配線を実行するために、どのパスが最適パス（クリティカルパス）であるかを知る必要がある。図２９において、ＣＬＢ９３ａ〜ＣＬＢ９３ｅ〜ＣＬＢ９３ｉ〜ＣＬＢ９３ｌのパス、およびＣＬＢ９３ｄ〜ＣＬＢ９３ｋ〜ＣＬＢ９３ｏのパスにおけるクリティカルパスを探索する場合を考える。 Further, in order to execute rewiring, it is necessary to know which path is the optimum path (critical path). In FIG. 29, consider a case where a critical path is searched for in the paths CLB93a to CLB93e to CLB93i to CLB93l and the paths CLB93d to CLB93k to CLB93o.

図３０は、Ｐａｔｈ−Ｄｅｌａｙ法によるクリティカルパス探索方法を示す説明図である。Ｐａｔｈ−Ｄｅｌａｙ法は、非特許文献１０および非特許文献１１などに開示されている一般的な探索方法である。Ｐａｔｈ−Ｄｅｌａｙ法では、ＣＬＢ９３ａ〜ＣＬＢ９３ｅ〜ＣＬＢ９３ｉ〜ＣＬＢ９３ｌのクリティカルパスを探索する場合、ＣＬＢ９３ａを信号伝送起点（ＳＲＣ）として、ＣＬＢ９３ｌを信号伝送終点（ＳＩＮＫ）とする。また、ＣＬＢ間の遅延時間を正確に測定するために、クロック周期を変えた測定が行われる。これは、クロック周期を少しずつ変化させて、ＣＬＢ間でどこまで正常に信号が伝送できるかを測定する方法である。
Ｍ．Ｃｈａｏ，Ｌ．Ｗａｎｇ，Ｋ．Ｃｈｅｎｇ，ａｎｄＳ．Ｋｕｎｄｕ．ＳｔａｔｉｃＳｔａｔｉｓｔｉｃａｌｔｉｍｉｎｇＡｎａｌｙｓｉｓｆｏｒＬａｔｃｈ−ｂａｓｅｄＰｉｐｅｌｉｎｅＤｅｓｉｇｎ．ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩｎｉｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｆＣｏｍｐｕｔｅｒＡｉｄｅｄＤｅｓｉｇｎ，ｐａｇｅｓ４６８−４７２，２００４．Ｍ．Ｓｕｍｉｔａ，Ｓ．Ｓａｋｉｙａｍａ，Ｍ．Ｋｉｎｏｓｈｉｔａ，Ｙ．Ａｒａｋｉ，Ｙ．Ｉｋｅｄａ，ａｎｄＫ．Ｆｕｋｕｏｋａ．Ｍｉｘｅｄｂｏｄｙ−ｂｉａｓｔｅｃｈｎｉｑｕｅｓｗｉｔｈ￣ｘｅｄＶｔａｎｄＩｄｓｇｅｎｅｒａｔｉｏｎｃｉｒｃｕｉｔｓ．ＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＳｏｌｉｄ−ＳｔａｔｅＣｉｒｃｕｉｔｓＣｏｎｆｅｒｅｎｃｅ，ＸＶＩＩ, ｐａｇｅｓ１５８−１５９，２００４．Ｓ．Ｏｈｋａｗａ，Ｍ．Ａｏｋｉ，ａｎｄＨ．Ｍａｓｕｄａ．ＡｎａｌｙｓｉｓａｎｄＣｈａｒａｃｔｅｒｉｚａｔｉｏｎｏｆＤｅｖｉｃｅＶａｒｉａｔｉｏｎｓｉｎａｎＬＳＩＣｈｉｐＵｓｉｎｇａｎＩｎｔｅｇｒａｔｅｄＤｅｖｉｃｅＭａｔｒｉｘＡｒｒａｙ．ＩＥＥＥＴｒａｎｓ．ｏｎＳｅｍｉｃｏｎｄｕｃｔｏｒＭａｎｕｆａｃｔｕｒｉｎｇ，Ｖｏｌ．１７，Ｎｏ．２，ｐａｇｅｓ１５５−１６５，２００４．Ｋ．Ｋａｔｓｕｋｉ，Ｍ．Ｋｏｔａｎｉ，Ｋ．Ｋｏｂａｙａｓｈｉ，ａｎｄＨ．Ｏｎｏｄｅｒａ．ＡＹｉｅｌｄａｎｄＳｐｅｅｄＥｎｈａｎｃｅｍｅｎｔＳｃｈｅｍｅｕｎｄｅｒＷｉｔｈｉｎ−ｄｉｅＶａｒｉａｔｉｏｎｓｏｎ９０ｎｍＬＵＴＡｒｒａｙ．ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ２００５ＣｕｓｔｏｍＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔｓＣｏｎｆｅｒｅｎｃｅ，ｐａｇｅｓ６０１−６０４，２００５．Ｐ．ＳｅｄｃｏｌｅａｎｄＰ．Ｃｈｅｕｎｇ．ＰａｒａｍｅｔｒｉｃｙｉｅｌｄｉｎＦＰＧＡｓｄｕｅｔｏｗｉｔｈｉｎ−ｄｉｅｄｅｌａｙｖａｒｉａｔｉｏｎｓ：ａｑｕａｎｔｉｔａｔｉｖｅａｎａｌｙｓｉｓ．Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ２００７ＡＣＭ／ＳＩＧＤＡ１５ｔｈｉｎｔｅｒｎａｔｉｏｎａｌｓｙｍｐｏｓｉｕｍｏｎＦｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙｓ，ｐａｇｅｓ１７８−１８７，２００７．Ｌ．Ｃｈｅｎｇ，Ｊ．Ｘｉｏｎｇ，Ｌ．Ｈｅ，ａｎｄＭ．Ｈｕｔｔｏｎ．ＦＰＧＡＰｅｒｆｏｒｍａｎｃｅＯｐｔｉｍｉｚａｔｉｏｎｖｉａＣｈｉｐｗｉｓｅＰｌａｃｅｍｅｎｔＣｏｎｓｉｄｅｒｉｎｇＰｒｏｃｅｓｓＶａｒｉａｔｉｏｎｓ．Ｐｒｏｃｅｅｄｉｎｇｓｏｆ２００６ＩｎｔｅｒｎａｔｉｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃａｎｄＡｐｐｌｉｃａｔｉｏｎｓ，ｐａｇｅｓ４４−４９，２００６．Ｙ．Ｍａｔｓｕｍｏｔｏ，Ｍ．Ｈｉｏｋｉ，Ｔ．Ｋａｗａｎａｍｉ，Ｔ．Ｔｓｕｔｓｕｍｉ，Ｋ．Ｎａｋａｇａｗａ，Ｔ．Ｓｅｋｉｋａｗａ，ａｎｄＨ．Ｋｏｉｋｅ．ＰｅｒｆｏｒｍａｎｃｅａｎｄＹｉｅｌｄＥｎｈａｎｃｅｍｅｎｔｏｆＦＰＧＡｓｗｉｔｈＷｉｔｈｉｎ−ｄｉｅＶａｒｉａｔｉｏｎｕｓｉｎｇＭｕｌｔｉｐｌｅＣｏｎ￣ｇｕｒａｔｉｏｎｓ．ＩｎｔｅｒｎａｔｉｏｎａｌＳｙｍｐｏｓｉｕｍｏｎＦｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙｓ，ｐａｇｅｓ１６９−１７７，２００７．Ｙ．−Ｗ．Ｃｈａｎｇ，Ｄ．Ｆ．Ｗｏｎｇ，ａｎｄＣ．Ｋ．Ｗｏｎｇ．Ｕｎｉｖｅｒｓａｌｓｗｉｔｃｈｍｏｄｕｌｅｓｆｏｒｆｐｇａｄｅｓｉｇｎ．ＡＣＭＴｒａｎｓ．Ｄｅｓ．Ａｕｔｏｍ．Ｅｌｅｃｔｒｏｎ．Ｓｙｓｔ．，１（１）：８０−１０１，１９９６．Ｊ．Ｒｏｓｅ，Ｓ．Ｂｒｏｗｎ， ”Ｆｌｅｘｉｂｉｌｉｔｙｏｆｉｎｔｅｒｃｏｎｎｅｃｔｉｏｎｓｔｒｕｃｔｕｒｅｓｆｏｒｆｉｅｌｄ−ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙｓ” ＩＥＥＥＪｏｕｒｎａｌｏｆＳｏｌｉｄ−ＳｔａｔｅＣｉｒｃｕｉｔｓ，Ｖｏｌｕｍｅ：２６，Ｉｓｓｕｅ：３，ｐｐ．２７７−２８２，１９９１．Ｓ．Ｂｏｓｅ，Ｐ．Ａｇｒａｗａｌ，ａｎｄＶ．Ｄ．Ａｇｒａｗａｌ， ”Ａｒａｔｅｄ−ｃｌｏｃｋｔｅｓｔｍｅｔｈｏｄｆｏｒｐａｔｈｄｅｌａｙｆａｕｌｔｓ，” ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＶｅｒｙＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ（ＶＬＳＩ）Ｓｙｓｔｅｍｓ，３２３−３３１．Ｈ．Ｃｈａｎｇ，Ｊ．Ａ．Ａｂｒａｈａｍ， ”Ｄｅｌａｙｔｅｓｔｔｅｃｈｎｉｑｕｅｓｆｏｒｂｏｕｎｄａｒｙｓｃａｎｂａｓｅｄａｒｃｈｉｔｅｃｔｕｒｅｓ，” １９９２ＩＥＥＥＣｕｓｔｏｍＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔｓＣｏｎｆｅｒｅｎｃｅ． FIG. 30 is an explanatory diagram showing a critical path search method based on the Path-Delay method. The Path-Delay method is a general search method disclosed in Non-Patent Document 10, Non-Patent Document 11, and the like. In the Path-Delay method, when searching for a critical path of CLB 93a to CLB 93e to CLB 93i to CLB 93l, CLB 93a is set as a signal transmission start point (SRC), and CLB 93l is set as a signal transmission end point (SINK). Further, in order to accurately measure the delay time between CLBs, measurement is performed by changing the clock cycle. This is a method of measuring how much a signal can be normally transmitted between CLBs by changing the clock cycle little by little.
M.M. Chao, L.C. Wang, K.K. Cheng, and S.C. Kundu. Static Statistical timing Analysis for Latch-based Pipeline Design. Proceedings of the International Conference of Computer Aided Design, pages 468-472, 2004. Proceedings of the International Conference of Computer Aided Design, pages 468-472, 2004. M.M. Sumita, S .; Sakiyama, M .; Kinoshita, Y. et al. Araki, Y. et al. Ikeda, and K.K. Fukuoka. Mixed body-bias technologies with ￣xed Vt and Ids generation circuits. IEEE International Solid-State Circuits Conference, XVII, pages 158-159, 2004. S. Ohkawa, M .; Aoki, and H.A. Masuda. Analysis and Characterisation of Device Variations in an LSI Chip Using an Integrated Device Matrix Array. IEEE Trans. on Semiconductor Manufacturing, Vol. 17, no. 2, pages 155-165, 2004. K. Katsuki, M .; Kotani, K .; Kobayashi, and H.K. Onodera. A Yield and Speed Enhancement Scheme under With-die Variations on 90 nm LUT Array. In Proceedings of IEEE 2005 Custom Integrated Circuits Conference, pages 601-604, 2005. P. Sedcore and P.M. Cheung. Parametric Yields in FPGAs due to within-die delay variations: a quantitative analysis. Proceedings of the 2007 ACM / SIGDA 15th international symposium on Field programmable gate arrays, pages 178-187, 2007. L. Cheng, J.M. Xiong, L.A. He, and M.M. Hutton. FPGA Performance Optimization via Chipwise Placement Concealing Process Variations. Proceedings of 2006, International Conference on Field, Programmable Logic and Applications, pages 44-49, 2006. Y. Matsumoto, M .; Hioki, T .; Kawanami, T .; Tsutsumi, K .; Nakagawa, T .; Sekikawa, and H.K. Koike. Performance and Yield Enhancement of FPGAs with With-die Variation using multiple Conjugations. International Symposium on Field-Programmable Gate Arrays, pages 169-177, 2007. Y. -W. Chang, D.C. F. Wong, and C.W. K. Wong. Universal switch modules for fpga design. ACM Trans. Des. Autom. Electron. Syst. , 1 (1): 80-101, 1996. J. et al. Rose, S.M. Brown, “Flexibility of interconnection structures for field-programmable gate arrays” IEEE Journal of Solid-State Circuits, Vol. 26: 277-282, 1991. S. Bose, P.M. Agrawal, and V.A. D. Agrawal, “A rated-clock test method for path delay faults,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 323-331. H. Chang, J. et al. A. Abraham, “Delay test techniques for boundary scan based architectures,” 1992 IEEE Custom Integrated Circuits Conference.

しかしながら、従来のクリティカルパス再構成手法では、タイミング制約を満たすようコンフィグレーションを再構成していく事が難しいという問題がある。 However, the conventional critical path reconstruction method has a problem that it is difficult to reconfigure the configuration so as to satisfy the timing constraint.

図３１（ａ）は、初期配線における接続関係を示す概略図であり、５つのパスＡ〜Ｅがスイッチブロックを通っている。ＰａｔｈＡがクリティカルパス、その他はクリティカルパスではないとする。ここで、ＰａｔｈＡがタイミング制約に違反しているため、トラック１からトラック０に再配線する場合を考える。 FIG. 31A is a schematic diagram showing a connection relationship in the initial wiring, and five paths A to E pass through the switch block. It is assumed that Path A is a critical path and the others are not critical paths. Here, since Path A violates the timing constraint, a case where rewiring from track 1 to track 0 is considered.

図３１（ｂ）は、図２７に示すＦｓ＝３のＳＢ９４を用いた場合における再配線後の接続関係を示す概略図である。ＰａｔｈＡをトラック１からトラック０に再配線すると、ＰａｔｈＢ、Ｃもパス全体をトラック１および０に再配線しなければならない。これによって、さらに他のスイッチブロックで他のパスの再配線の必要が生じ、配線トポロジーの変更が生じる可能性がある。したがって、回路のパス遅延分布が変化し、新たなクリティカルパスが生じる可能性がある。 FIG. 31B is a schematic diagram showing a connection relationship after rewiring when the SB 94 with Fs = 3 shown in FIG. 27 is used. When Path A is rewired from Track 1 to Track 0, Path B and C must also be rewired to Tracks 1 and 0 for the entire path. As a result, another switch block needs to be rewired for another path, and the wiring topology may be changed. Therefore, the path delay distribution of the circuit may change and a new critical path may occur.

また、図３２は、半導体デバイス９１における再配線の接続関係を示す概略図である。図３２（ａ）は、初期配線を示しており、（ｂ）は、経路変更後の配線を示している。矢印に示すように、クリティカルパスＡのトラックを入れ替えることにより、パスＢ，Ｃ，Ｄも再配線が必要となる。このため、遅延が増加する可能性がある。 FIG. 32 is a schematic diagram showing a rewiring connection relationship in the semiconductor device 91. FIG. 32A shows the initial wiring, and FIG. 32B shows the wiring after the path change. As indicated by the arrows, by replacing the track of the critical path A, the paths B, C, and D also need to be rewired. This can increase the delay.

さらに、図３０に示すＰａｔｈ−Ｄｅｌａｙ法では、複数経路の最短経路を抽出するには、各パスの遅延の絶対値を知る必要がある。例えば、ＣＬＢ９３ｄ〜ＣＬＢ９３ｋ〜ＣＬＢ９３ｏのパスでは、ＣＬＢ９３ｄ〜ＣＬＢ９３ｋおよびＣＬＢ９３ｋ〜ＣＬＢ９３ｏの２つの区間があり、各区間に２つの配線経路が存在する。したがって、ＣＬＢ９３ｄ〜ＣＬＢ９３ｏには、計２^２＝４つの配線経路が存在するため、各配線経路の遅延の絶対値を知るために、４回の測定が必要となる。同様に、ＣＬＢ９３ａ〜ＣＬＢ９３ｅ〜ＣＬＢ９３ｉ〜ＣＬＢ９３ｌのパスでは、ＣＬＢ９３ａ〜ＣＬＢ９３ｅ、ＣＬＢ９３ｅ〜ＣＬＢ９３ｉおよびＣＬＢ９３ｉ〜ＣＬＢ９３ｌの３つの区間があり、各区間に２つの配線経路が存在する。したがって、ＣＬＢ９３ａ〜ＣＬＢ９３ｌには、計２^３＝８つの配線経路が存在するため、各配線経路の遅延の絶対値を知るために、８回の測定が必要となる。このように多くの測定回数を必要とするため、測定コストが増大してしまうという問題がある。 Further, in the Path-Delay method shown in FIG. 30, in order to extract the shortest path among a plurality of paths, it is necessary to know the absolute value of the delay of each path. For example, in the path of CLB 93d to CLB 93k to CLB 93o, there are two sections CLB 93d to CLB 93k and CLB 93k to CLB 93o, and there are two wiring paths in each section. Therefore, since CLB93d to CLB93o have a total of 2 ² = 4 wiring paths, four measurements are required to know the absolute value of the delay of each wiring path. Similarly, in the paths CLB93a to CLB93e to CLB93i to CLB931, there are three sections CLB93a to CLB93e, CLB93e to CLB93i, and CLB93i to CLB93l, and there are two wiring paths in each section. Therefore, since CLB 93a to CLB 93l have a total of 2 ³ = 8 wiring paths, eight measurements are required to know the absolute value of the delay of each wiring path. Thus, since many measurement frequency is required, there exists a problem that a measurement cost will increase.

本発明は上記の問題点に鑑みてなされたものであり、その目的は、複数の演算回路間のクリティカルパスの測定コストを抑えることができる半導体デバイスを実現することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to realize a semiconductor device capable of suppressing the measurement cost of a critical path between a plurality of arithmetic circuits.

本発明に係る半導体デバイスは、上記課題を解決するために、マトリックス状に配置された複数個の演算回路と、前記複数個の演算回路のうち互いに隣接する４個の演算回路のうちの互いに隣接する２個の間を通り抜けるように配置された複数本の配線トラックと、前記４個の演算回路によって囲まれる位置に設けられ、前記互いに隣接する２個の演算回路の間の複数本の配線トラックのうちの１本と、互いに隣接する他の２個の演算回路の間の複数本の配線トラックのうちの１本とを接続するスイッチブロックと、前記複数個の演算回路のうちの１つから他の１つに信号を伝送するために、それぞれ前記配線トラックと前記スイッチブロックとによって構成された第１配線経路及び第２配線経路の遅延を比較するために前記演算回路のうちの他の１つに設けられた第１遅延比較器と、前記複数個の演算回路のうちの他の１つからさらに他の１つに信号を伝送するために、それぞれ前記配線トラックと前記スイッチブロックとによって構成された第３配線経路及び第４配線経路の遅延を比較するために前記演算回路のうちのさらに他の１つに設けられた第２遅延比較器とを備えたことを特徴としている。 In order to solve the above-described problem, a semiconductor device according to the present invention includes a plurality of arithmetic circuits arranged in a matrix and an adjacent one of four arithmetic circuits adjacent to each other among the plurality of arithmetic circuits. A plurality of wiring tracks arranged between the two arithmetic circuits, and a plurality of wiring tracks provided between the two arithmetic circuits adjacent to each other. And a switch block for connecting one of the plurality of wiring tracks between the other two arithmetic circuits adjacent to each other, and one of the plurality of arithmetic circuits. In order to transmit a signal to the other one of the arithmetic circuits in order to compare the delay of the first wiring path and the second wiring path respectively constituted by the wiring track and the switch block. A first delay comparator provided in another one and a wiring block and a switch block for transmitting a signal from another one of the plurality of arithmetic circuits to another one, respectively; And a second delay comparator provided in another one of the arithmetic circuits in order to compare the delay of the third wiring path and the fourth wiring path configured by .

複数個の演算回路のうちの１つを信号伝送起点とし、当該演算回路のうちの他の１つを経由して、当該演算回路のうちのさらに他の１つを信号伝送終点とする経路のクリティカルパスを探索する場合、従来のＰａｔｈ−Ｄｅｌａｙ法では、２^２＝４回の測定回数が必要であった。これに対し、上記の構成によれば、１回目の測定において、第１遅延比較器が第１配線経路及び第２配線経路の遅延を比較し、２回目の測定において、第２遅延比較器が第３配線経路及び第４配線経路の遅延を比較する。すなわち、クリティカルパスの探索に必要な測定回数は２回で済み、遅延の絶対値を測定する必要がない。したがって、複数の演算回路間のクリティカルパスの測定コストを抑えることができる半導体デバイスを実現できるという効果を奏する。 One of a plurality of arithmetic circuits is a signal transmission start point, and the other one of the arithmetic circuits is a signal transmission end point via the other one of the arithmetic circuits. When searching for a critical path, the conventional Path-Delay method requires 2 ² = 4 measurement times. On the other hand, according to the above configuration, in the first measurement, the first delay comparator compares the delays of the first wiring path and the second wiring path, and in the second measurement, the second delay comparator The delays of the third wiring path and the fourth wiring path are compared. That is, the number of measurements required for searching the critical path is only two, and there is no need to measure the absolute value of the delay. Therefore, there is an effect that it is possible to realize a semiconductor device capable of suppressing the measurement cost of a critical path between a plurality of arithmetic circuits.

本発明に係る半導体デバイスでは、前記第１遅延比較器及び第２遅延比較器は、ＳＲラッチによって構成されていることが好ましい。 In the semiconductor device according to the present invention, it is preferable that the first delay comparator and the second delay comparator are configured by SR latches.

上記の構成によれば、第１遅延比較器及び第２遅延比較器は、入力信号のうちいずれが先に到達したかが判別でき、２つのＮＡＮＤで構成可能である。したがって、回路面積の増加が少なくて済む。 According to the above configuration, the first delay comparator and the second delay comparator can determine which of the input signals has arrived first, and can be configured with two NANDs. Therefore, the increase in circuit area is small.

本発明に係る半導体デバイスでは、前記演算回路のうちの他の１つ及びさらに他の１つは、前記ＳＲラッチの前段に設けられてクラスタ化されたＬＵＴを有していることが好ましい。 In the semiconductor device according to the present invention, it is preferable that the other one of the arithmetic circuits and the other one have a clustered LUT provided in the preceding stage of the SR latch.

上記の構成によれば、ＬＵＴをインバータで構成することで、演算回路を信号伝送起点にすることができ、ＬＵＴをＮＡＮＤで構成することで、演算回路を信号伝送終点とすることができる。 According to the above configuration, the arithmetic circuit can be a signal transmission start point by configuring the LUT with an inverter, and the arithmetic circuit can be the signal transmission end point by configuring the LUT with NAND.

本発明に係る半導体デバイスでは、各演算回路は、可変論理ブロックによって構成されていてもよい。 In the semiconductor device according to the present invention, each arithmetic circuit may be constituted by a variable logic block.

本発明に係る半導体デバイスは、以上のように、マトリックス状に配置された複数個の演算回路と、前記複数個の演算回路のうち互いに隣接する４個の演算回路のうちの互いに隣接する２個の間を通り抜けるように配置された複数本の配線トラックと、前記４個の演算回路によって囲まれる位置に設けられ、前記互いに隣接する２個の演算回路の間の複数本の配線トラックのうちの１本と、互いに隣接する他の２個の演算回路の間の複数本の配線トラックのうちの１本とを接続するスイッチブロックと、前記複数個の演算回路のうちの１つから他の１つに信号を伝送するために、それぞれ前記配線トラックと前記スイッチブロックとによって構成された第１配線経路及び第２配線経路の遅延を比較するために前記演算回路のうちの他の１つに設けられた第１遅延比較器と、前記複数個の演算回路のうちの他の１つからさらに他の１つに信号を伝送するために、それぞれ前記配線トラックと前記スイッチブロックとによって構成された第３配線経路及び第４配線経路の遅延を比較するために前記演算回路のうちのさらに他の１つに設けられた第２遅延比較器とを備えているので、複数の演算回路間のクリティカルパスの測定コストを抑えることができる半導体デバイスを実現できるという効果を奏する。 As described above, the semiconductor device according to the present invention includes a plurality of arithmetic circuits arranged in a matrix, and two adjacent ones of four arithmetic circuits adjacent to each other among the plurality of arithmetic circuits. A plurality of wiring tracks arranged so as to pass between the four arithmetic circuits and a position surrounded by the four arithmetic circuits, out of the plural wiring tracks between the two arithmetic circuits adjacent to each other. A switch block connecting one and one of a plurality of wiring tracks between two other arithmetic circuits adjacent to each other, and one of the plurality of arithmetic circuits to another one One of the arithmetic circuits to compare the delay of the first wiring path and the second wiring path formed by the wiring track and the switch block, respectively. A first delay comparator, and a wiring block and a switch block, respectively, for transmitting a signal from another one of the plurality of arithmetic circuits to the other one. In order to compare the delays of the third wiring path and the fourth wiring path, the second delay comparator provided in the other one of the arithmetic circuits is provided, so that the criticality between the plural arithmetic circuits is provided. There is an effect that it is possible to realize a semiconductor device capable of reducing the path measurement cost.

本発明の一実施形態について図１ないし図１６に基づいて説明すると以下の通りである。 An embodiment of the present invention will be described below with reference to FIGS.

図１は、本実施形態に係る半導体デバイス１の構成を示すブロック図である。半導体デバイス１は、図２６に示す半導体デバイス９１において、ＣＬＢ９３ａ〜９３ｏをＣＬＢ３ａ〜３ｏに置き換え、ＳＢ９４ａ〜９４ｈをＳＢ４ａ〜４ｈに置き換えた構成である。ＳＢ９４ａ〜９４ｈの自由度Ｆｓが３であったのに対し、ＳＢ４ａ〜４ｈの自由度Ｆｓは６となっている。なお、図１では、図１９に示すＣＢの図示を省略している。 FIG. 1 is a block diagram showing a configuration of a semiconductor device 1 according to the present embodiment. The semiconductor device 1 has a configuration in which CLB 93a to 93o are replaced with CLB 3a to 3o and SB 94a to 94h are replaced with SB 4a to 4h in the semiconductor device 91 shown in FIG. The degree of freedom Fs of SBs 94a to 94h is 3, whereas the degree of freedom Fs of SBs 4a to 4h is 6. In FIG. 1, the illustration of CB shown in FIG. 19 is omitted.

図２は、ＳＢ４の接続関係を示す概略図である。図２７に示すＳＢ９４と比較すると、ＳＢ４のトラックＬ１は、トラックＴ１、Ｒ１、Ｂ１に加え、トラックＴ０、Ｒ０、Ｂ０と接続することができる。同様に、各トラックは他の６つのトラックと接続可能となっている。 FIG. 2 is a schematic diagram showing the connection relationship of SB4. Compared to SB94 shown in FIG. 27, the track L1 of SB4 can be connected to the tracks T0, R0, B0 in addition to the tracks T1, R1, B1. Similarly, each track can be connected to the other six tracks.

図３は、Ｆｓ＝６のＳＢ４を用いた場合における、図３１（ａ）の初期配線からの再配線後の接続関係を示す概略図である。Ｆｓ＝３のＳＢ９４を用いた場合は、図３１（ｂ）に示すように、ＰａｔｈＡをトラック１からトラック０に再配線すると、ＰａｔｈＢ、Ｃもパス全体をトラック１および０に再配線しなければならない。一方、図３では、ＰａｔｈＡの再配線により、ＰａｔｈＢのみ再配線するだけで、他のＰａｔｈＣ〜Ｅは再配線する必要が無い。このため、他のスイッチブロックでパスの再配線の必要が生じる可能性は低くなり、コンフィグレーションの再構成が容易になる。 FIG. 3 is a schematic diagram showing a connection relationship after rewiring from the initial wiring in FIG. 31A when SB4 with Fs = 6 is used. When SB94 with Fs = 3 is used, when Path A is rewired from Track 1 to Track 0, Path B and C are also rewired to Tracks 1 and 0 as shown in FIG. There must be. On the other hand, in FIG. 3, only Path B is rewired by rewiring Path A, and other Paths C to E do not need to be rewired. For this reason, the possibility that the rewiring of the path is required in other switch blocks is reduced, and the configuration can be easily reconfigured.

また、図４（ａ）は、Ｆｓ＝６のＳＢ４を用いた場合における初期配線を示す概略図であり、図４（ｂ）は、再配線後の接続関係を示す概略図である。図４（ａ）は、図３２（ａ）と同一である。この場合、ＰａｔｈＡのトラック変更によってＰａｔｈＢ、Ｃ全体の再配線の必要はなく、ＰａｔｈＢのごく一部が再配線されるだけである。したがって、クリティカルパスの再配線による他のパスへの影響はごくわずかであり、トラック入れ替えによる回路のパス遅延分布への影響は小さい。このように、スイッチブロックの自由度を３から６に増やすことによって、タイミング制約に違反したクリティカルパスをトラック変更によって再配線する事が容易になる。 FIG. 4A is a schematic diagram showing initial wiring when SB4 with Fs = 6 is used, and FIG. 4B is a schematic diagram showing a connection relationship after rewiring. FIG. 4A is the same as FIG. In this case, it is not necessary to reroute Path B and C as a whole by changing the track of Path A, and only a small part of Path B is rewired. Therefore, the influence of the critical path rewiring on other paths is negligible, and the influence of the track replacement on the path delay distribution of the circuit is small. In this way, by increasing the degree of freedom of the switch block from 3 to 6, it becomes easy to reroute critical paths that violate the timing constraints by changing the track.

このように、スイッチブロックの自由度Ｆｓを増やすことによって連鎖的なパスのトラック変更を抑制することができるため、コンフィグレーションの再構成が容易になる。本実施形態では、Ｆｓ＝３（図５（ａ））からＦｓ＝６（図５（ｂ））に増やす場合について説明したが、一部の配線トラックについてＦｓ＝９としてもよい（図５（ｃ））。 As described above, since the track change of the chain path can be suppressed by increasing the degree of freedom Fs of the switch block, the configuration can be easily reconfigured. In the present embodiment, the case of increasing from Fs = 3 (FIG. 5A) to Fs = 6 (FIG. 5B) has been described, but Fs = 9 may be set for some wiring tracks (FIG. c)).

図５（ｃ）は、一部の配線トラックについてＦｓ＝９であるＳＢ１４を示す概略図である。スイッチブロック内の点は接続スイッチを示している。ＳＢ１４では、トラックＬ０、Ｌ３、Ｂ０、Ｂ３、Ｒ０、Ｒ３、Ｔ０、Ｔ３についてはＦｓ＝６であり、トラックＬ１、Ｌ２、Ｂ１、Ｂ２、Ｒ１、Ｒ２、Ｔ１、Ｔ２についてはＦｓ＝９である。このように、配線の自由度をさらに上げることにより、ばらつきを用いた遅延削減の効果を上げることができる。また、Ｆｓを９以上の３の倍数に設定してもよく、遅延削減の効果をさらに上げることができる。 FIG. 5C is a schematic diagram showing the SB 14 with Fs = 9 for some of the wiring tracks. The dots in the switch block indicate connection switches. In SB14, Fs = 6 for tracks L0, L3, B0, B3, R0, R3, T0, T3, and Fs = 9 for tracks L1, L2, B1, B2, R1, R2, T1, T2. . As described above, by further increasing the degree of freedom of wiring, it is possible to increase the delay reduction effect using variation. Further, Fs may be set to a multiple of 3 that is 9 or more, and the effect of reducing delay can be further increased.

一方、スイッチブロックの自由度を高くすると余分な回路が必要になる。トラックとトラックとの間にＮＭＯＳスイッチが設けられるので、配線トラックが４本の場合、Ｆｓ＝３のスイッチブロックを構成するのに必要なスイッチ数は２４（＝６×４）個、Ｆｓ＝６のスイッチブロックを構成するのに必要なスイッチ数は４８（＝６×４×２）個のスイッチが必要となる。Ｆｓ＝６のスイッチブロックではトラック０とトラック１、トラック２とトラック３を入れ替えることができるが、従来のＣＭＯＳやＮＭＯＳスイッチで構成されているスイッチブロックでは、Ｆｓ＝６にすると面積が倍になる。微細プロセスでは、このようなバッファリングしないスイッチを用いると、信号の減衰やスルーレートの悪化が問題となる。 On the other hand, if the degree of freedom of the switch block is increased, an extra circuit is required. Since NMOS switches are provided between tracks, when there are four wiring tracks, 24 (= 6 × 4) switches are required to form a switch block with Fs = 3, and Fs = 6. 48 (= 6 × 4 × 2) switches are required to construct the switch block. In the switch block with Fs = 6, track 0 and track 1 and track 2 and track 3 can be interchanged. However, in the conventional switch block composed of CMOS and NMOS switches, the area is doubled when Fs = 6. . In a fine process, when such a non-buffering switch is used, signal attenuation and slew rate deterioration become problems.

そこで、図６に示す、ｂｕｆｐアーキテクチャのスイッチを用いると、面積ペナルティは非常に小さくなる。Ｆｓ＝６のとき、Ｆｓ＝３に比べ、中央にある１２個のＮＭＯＳスイッチの数（図６（ａ））は倍の２４個になる（図６（ｂ））が、他の素子は同じ構成である。 Therefore, if the switch of the buf architecture shown in FIG. 6 is used, the area penalty becomes very small. When Fs = 6, compared with Fs = 3, the number of 12 NMOS switches in the center (FIG. 6A) is doubled (FIG. 6B), but the other elements are the same. It is a configuration.

また、表１は、図６に示すスイッチブロックにおけるＮＭＯＳスイッチ数、面積および遅延の関係を示している。ＮＭＯＳスイッチ数は、Ｆｓに比例するので２倍になるが、面積および遅延は、それぞれ１８．５％、５．０％にとどまり、面積オーバーヘッドが小さいことが分かる。 Table 1 shows the relationship among the number of NMOS switches, area, and delay in the switch block shown in FIG. Since the number of NMOS switches is proportional to Fs, it is doubled. However, the area and delay are only 18.5% and 5.0%, respectively, and it can be seen that the area overhead is small.

続いて、トラック入れ替え手続きについて説明する。 Next, the track replacement procedure will be described.

図７は、トラック入れ替えによるクリティカルパス再構成のフローチャートである。製造プロセスの後（ステップＳ１）、ばらつきを考えずに従来の配置配線ツールによって初期コンフィグレーションを作成する（ステップＳ２）。これは全てのチップに共通のコンフィグレーションとなる。クリティカルパス再構成手法では、クリティカルパスになり得るパスを再配線するので、どのパスがクリティカルパスになる可能性があるのか推定する必要がある（ステップＳ３）。推定したクリティカルパスの候補の遅延を測定し、遅延制約に違反していれば（ステップＳ４において「ｆａｉｌ」）、トラック入れ替えによって再配線を行う（ステップＳ５）。再配線した後、再度遅延測定を行い（ステップＳ６）、入れ替え先のトラックもまたクリティカルパスの候補によって使用されていれば（ステップＳ６において「ｐａｓｓ」）、入れ替えは行わない（ステップＳ７）。遅延制約に違反したパスはトポロジーを変え再配線してもよく（ステップＳ８）、全てのクリティカルパス候補が遅延制約を満たすことにより最適化を完了させてもよい。なお、ステップＳ５のトラック入れ替えは、図８に示すように、隣接する２トラック間で入れ替えが行われる。 FIG. 7 is a flowchart of critical path reconstruction by track replacement. After the manufacturing process (step S1), an initial configuration is created by a conventional placement and routing tool without considering variations (step S2). This is a common configuration for all chips. In the critical path reconstruction method, paths that can become critical paths are rewired, so it is necessary to estimate which paths are likely to become critical paths (step S3). The delay of the estimated critical path candidate is measured, and if the delay constraint is violated (“fail” in step S4), rewiring is performed by track replacement (step S5). After rewiring, delay measurement is performed again (step S6). If the replacement destination track is also used by a critical path candidate ("pass" in step S6), the replacement is not performed (step S7). A path that violates the delay constraint may be rewired with the topology changed (step S8), or the optimization may be completed when all critical path candidates satisfy the delay constraint. Note that the track replacement in step S5 is performed between two adjacent tracks as shown in FIG.

続いて、本実施形態におけるクリティカルパスの検索方法について説明する。 Next, a critical path search method according to this embodiment will be described.

図３０に示すように、従来のＰａｔｈ−Ｄｅｌａｙ法では、ＣＬＢ９３ａ〜ＣＬＢ９３ｌのクリティカルパスを抽出するために、８回の測定が必要であった。これに対し、本実施形態では、遅延比較器を用いてクリティカルパスを探索する。 As shown in FIG. 30, in the conventional Path-Delay method, eight measurements are required to extract the critical paths of CLB 93a to CLB 93l. In contrast, in the present embodiment, a critical path is searched using a delay comparator.

図９は、本実施形態に係るクリティカルパス探索方法の説明図である。ＣＬＢ３ａ〜ＣＬＢ３ｅ〜ＣＬＢ３ｉ〜ＣＬＢ３ｌのパスおよびＣＬＢ３ｄ〜ＣＬＢ３ｋ〜ＣＬＢ３ｏのパスにおけるクリティカルパスを探索する場合、図９（ａ）に示すように、１回目の測定において、ＣＬＢ３ａ〜ＣＬＢ３ｅ、ＣＬＢ３ｉ〜ＣＬＢ３ｌおよびＣＬＢ３ｄ〜ＣＬＢ３ｋの各区間における２つのパスの遅延を比較する。このとき、ＣＬＢ３ａ、ＣＬＢ３ｉおよびＣＬＢ３ｄが信号伝送起点（ＳＲＣ）となり、信号伝送終点となるＣＬＢ３ｅ、ＣＬＢ３ｌおよびＣＬＢ３ｋのそれぞれに、遅延比較器（ＤＤ、第１遅延比較器）が設けられる。遅延比較器は、各区間における２つの配線経路（第１配線経路、第２配線経路）の遅延を比較する。これにより、各区間の経路を、図９（ｂ）における破線に示すように選択する。 FIG. 9 is an explanatory diagram of the critical path search method according to the present embodiment. When searching for a critical path in a path of CLB3a to CLB3e to CLB3i to CLB3l and a path of CLB3d to CLB3k to CLB3o, as shown in FIG. Compare delays of two paths in each section of ~ CLB3k. At this time, CLB 3a, CLB 3i, and CLB 3d serve as signal transmission start points (SRC), and a delay comparator (DD, first delay comparator) is provided for each of CLB 3e, CLB 31, and CLB 3k that are signal transmission end points. The delay comparator compares delays of two wiring paths (first wiring path and second wiring path) in each section. Thereby, the route of each section is selected as shown by the broken line in FIG.

さらに、２回目の測定において、ＣＬＢ３ｅ〜ＣＬＢ３ｉおよびＣＬＢ３ｋ〜ＣＬＢ３ｏの各区間における２つのパスの遅延を比較する。このとき、ＣＬＢ３ｅおよびＣＬＢ３ｋが信号伝送起点（ＳＲＣ）となり、信号伝送終点となるＣＬＢ３ｉおよびＣＬＢ３ｏのそれぞれに、遅延比較器（ＤＤ、第２遅延比較器）が設けられる。遅延比較器は、各区間における２つの配線経路（第３配線経路、第４配線経路）の遅延を比較して、各区間の経路を選択する。 Furthermore, in the second measurement, the delays of the two paths in each section of CLB3e to CLB3i and CLB3k to CLB3o are compared. At this time, CLB 3e and CLB 3k serve as signal transmission start points (SRC), and a delay comparator (DD, second delay comparator) is provided for each of CLB 3i and CLB 3o serving as signal transmission end points. The delay comparator compares the delays of two wiring paths (third wiring path and fourth wiring path) in each section, and selects a path in each section.

以上のように、２回の測定でＣＬＢ３ａ〜ＣＬＢ３ｅ〜ＣＬＢ３ｉ〜ＣＬＢ３ｌのパスおよびＣＬＢ３ｄ〜ＣＬＢ３ｋ〜ＣＬＢ３ｏのパスにおけるクリティカルパスが探索できる。したがって、従来のＰａｔｈ−Ｄｅｌａｙ法に比べ、大幅に測定回数を減らすことができるため、測定コストを抑えることができる。 As described above, the critical paths in the paths CLB3a to CLB3e to CLB3i to CLB3l and the paths CLB3d to CLB3k to CLB3o can be searched by two measurements. Therefore, compared with the conventional Path-Delay method, the number of times of measurement can be greatly reduced, so that the measurement cost can be suppressed.

続いて、遅延比較器が組み込まれるＣＬＢの詳細な構成について説明する。 Next, a detailed configuration of the CLB in which the delay comparator is incorporated will be described.

図１０は、ＣＬＢ３の構成を示す回路図である。ＣＬＢ３は、パスＡ・Ｂから信号が入力され、パスＱＡ・ＱＢに信号を出力する。ＣＬＢ３は、２つのクラスタ化されたＬＵＴ３１ａ・３１ｂ、ＳＲラッチ３２、２つのＤＦＦ３３ａ・３３ｂおよび２つのセレクタ３４ａ・３４ｂを有している。ＬＵＴ３１ａ・３１ｂは、それぞれパスＡ・Ｂに接続されている。ＬＵＴ３１ａ・３１ｂのからの信号は、ＳＲラッチ３２に出力されるとともに、それぞれセレクタ３４ａ・３４ｂにも出力される。ＳＲラッチからの信号の一方は、ＤＦＦ３３ａを介してセレクタ３４ａに出力され、ＳＲラッチからの信号の他方は、ＤＦＦ３３ｂを介してセレクタ３４ｂに出力される。セレクタ３４ａからの信号は、パスＱＡに出力されると共に、ＬＵＴ３１ａまたはＬＵＴ３１ｂにも出力される。同様に、セレクタ３４ｂからの信号は、パスＱＢに出力されると共に、ＬＵＴ３１ａ・３１ｂにも出力される。 FIG. 10 is a circuit diagram showing a configuration of CLB3. The CLB 3 receives a signal from the paths A and B and outputs a signal to the paths QA and QB. The CLB 3 includes two clustered LUTs 31a and 31b, an SR latch 32, two DFFs 33a and 33b, and two selectors 34a and 34b. The LUTs 31a and 31b are connected to paths A and B, respectively. Signals from the LUTs 31a and 31b are output to the SR latch 32 and also to the selectors 34a and 34b, respectively. One of the signals from the SR latch is output to the selector 34a via the DFF 33a, and the other of the signals from the SR latch is output to the selector 34b via the DFF 33b. The signal from the selector 34a is output to the path QA and also to the LUT 31a or the LUT 31b. Similarly, the signal from the selector 34b is output to the path QB and also to the LUTs 31a and 31b.

図９における各測定において、ＣＬＢ３が信号伝送起点（ＳＲＣ）となる場合、図１１（ａ）に示すように、ＬＵＴ３１ａ・３１ｂは、インバータで構成される。セレクタ３４ａ・３４ｂは、それぞれＤＦＦ３３ａ・３３ｂからの信号を選択する。セレクタ３４ａ・３４ｂからの信号は、それぞれパスＱＡ・ＱＢに出力されるとともに、インバータであるＬＵＴ３１ａまたはＬＵＴ３１ｂに出力される。なお、図１１（ａ）では、ＳＲラッチ３２は省略されているが、ＳＲラッチ３２がＤＦＦ３３ａ・３３ｂにクロックを１回印加することにより、ＣＬＢ３からパスＱＡ・ＱＢに同時に信号が出力される。 In each measurement in FIG. 9, when CLB3 becomes the signal transmission starting point (SRC), as shown in FIG. 11A, the LUTs 31a and 31b are configured by inverters. The selectors 34a and 34b select signals from the DFFs 33a and 33b, respectively. The signals from the selectors 34a and 34b are output to the paths QA and QB, respectively, and also output to the LUT 31a or LUT 31b that is an inverter. In FIG. 11A, the SR latch 32 is omitted, but when the SR latch 32 applies a clock to the DFFs 33a and 33b once, signals are simultaneously output from the CLB 3 to the paths QA and QB.

また、ＣＬＢ３は信号伝送終点（遅延比較器モード）となる場合、図１１（ｂ）に示すように、ＬＵＴ３１ａ・３１ｂは、ＮＡＮＤで構成される。セレクタ３４ａ・３４ｂは、それぞれＬＵＴ３１ａ・３１ｂからの信号を選択する。セレクタ３４ａ・３４ｂからの信号は、それぞれパスＱＡ・ＱＢに出力されるとともに、ＬＵＴ３１ｂ・３１ａに出力される。図１１（ｂ）においても、ＳＲラッチ３２は省略されている。 Further, when CLB3 becomes the signal transmission end point (delay comparator mode), as shown in FIG. 11B, the LUTs 31a and 31b are configured by NANDs. The selectors 34a and 34b select signals from the LUTs 31a and 31b, respectively. The signals from the selectors 34a and 34b are output to the paths QA and QB, respectively, and also output to the LUTs 31b and 31a. Also in FIG. 11B, the SR latch 32 is omitted.

ＣＬＢ３が信号伝送終点となる場合の動作について、図１２に基づいて説明する。図１１（ｂ）に示す遅延比較器モードのＣＬＢ３は、図１２に示すフリップフロップと等価となる。 The operation when CLB 3 is the signal transmission end point will be described with reference to FIG. The delay comparator mode CLB3 shown in FIG. 11B is equivalent to the flip-flop shown in FIG.

初期状態では、図１２（ａ）に示すように、パスＡ・Ｂからの入力は０：０であるので、パスＱＡ・ＱＢへの出力は１：１となる。続いて、図１２（ｂ）に示すように、パスＡからの入力が先に１となったとすると、パスＱＡ・ＱＢへの出力は０：１となる。さらに、図１２（ｃ）に示すように、パスＢからの入力も１となっても、パスＱＡ・ＱＢへの出力は０：１のまま変化しない。逆に、パスＢからの入力が先に１となったとすると、パスＱＡ・ＱＢへの出力は１：０となり、後にパスＡからの入力が１となっても、パスＱＡ・ＱＢへの出力は１：０のまま変化しない。このように、遅延比較器モードのＣＬＢ３はパスＡ・Ｂのいずれに信号が先に到着したか判別できる。 In the initial state, as shown in FIG. 12A, since the input from the paths A and B is 0: 0, the output to the paths QA and QB is 1: 1. Subsequently, as shown in FIG. 12B, if the input from the path A is 1 first, the output to the paths QA and QB is 0: 1. Further, as shown in FIG. 12C, even if the input from the path B becomes 1, the output to the paths QA and QB remains 0: 1. On the contrary, if the input from the path B is 1 first, the output to the path QA / QB is 1: 0, and even if the input from the path A is 1 later, the output to the path QA / QB Remains unchanged at 1: 0. In this manner, the delay comparator mode CLB 3 can determine which path A or B the signal has arrived first.

したがって、信号伝送起点と信号伝送終点との間の経路長が異なっていても、複数の経路を同時に測定することができる。ＳＲラッチ３２を設けない場合、回路オーバーヘッド無しに遅延比較器として機能するが、３０ｐｓ以下の遅延を判別するのは難しい。一方、ＳＲラッチ３２を設けることにより、１ｐｓ以下の遅延も判別することができるようになり、従来のＰａｔｈ−Ｄｅｌａｙ法に比べ比較性能を格段に高めることができる。 Therefore, even if the path length between the signal transmission start point and the signal transmission end point is different, a plurality of paths can be measured simultaneously. When the SR latch 32 is not provided, it functions as a delay comparator without circuit overhead, but it is difficult to determine a delay of 30 ps or less. On the other hand, by providing the SR latch 32, a delay of 1 ps or less can be discriminated, and the comparison performance can be remarkably improved as compared with the conventional Path-Delay method.

なお、ＳＲラッチ３２によって、ＬＵＴ３１ａ・３１ｂとＤＦＦ３３ａ・３３ｂとの間の遅延が増大するが、ＳＲラッチ３２での遅延の全体の遅延に対する割合は少ないので、影響は微小である。例えば、図２８に示す経路を比較する場合、途中のＣＬＢ３は、ＬＵＴのみ用い、ＦＦは使用していない。このように、ＳＲラッチ３２をＦＦの直前に組み込んでいるため、信号はＳＲラッチ３２を通過しない。したがって、信号伝送終点のＣＬＢ３においてのみ、遅延が増加することとなる。 Although the delay between the LUTs 31a and 31b and the DFFs 33a and 33b is increased by the SR latch 32, since the ratio of the delay in the SR latch 32 to the total delay is small, the influence is small. For example, when the paths shown in FIG. 28 are compared, CLB3 in the middle uses only the LUT and does not use the FF. As described above, since the SR latch 32 is incorporated immediately before the FF, the signal does not pass through the SR latch 32. Therefore, the delay increases only in the CLB3 at the signal transmission end point.

続いて、統計的ＭＡＸ演算によるクリティカルパスの推定について説明する。 Next, critical path estimation by statistical MAX calculation will be described.

パス遅延はプロセスばらつきによりある分布でばらつくので、静的遅延解析によって得られた最大遅延パスだけではなく、クリティカルパスになりうるパス全てを再構成する必要がある。 Since the path delay varies in a certain distribution due to process variations, it is necessary to reconfigure all paths that can become critical paths, not just the maximum delay path obtained by static delay analysis.

図１３は、クリティカルパスの推定方法を示すグラフである。回路遅延分布の平均をｄ_{ｃｉｒｃｕｉｔ}とし、遅延値がｄ_{ｃｉｒｃｕｉｔ}を上回る可能性のあるパスをクリティカルパス候補とする。また、ｄ_ｉ、σ_ｉをパスｉの平均遅延および標準偏差、ｄ_{ｌｏｎｇｅｓｔ}、σ_{ｌｏｎｇｅｓｔ}を最長パスの平均遅延および標準偏差とし、ｄ_ｉ＋３σ_ｉ＞ｄ_{ｌｏｎｇｅｓｔ}−3σ_{ｌｏｎｇｅｓｔ}を満たすパスｉを副クリティカルパス候補とする。ｄ_{ｃｉｒｃｕｉｔ}は、Ｃｌａｒｋの手法（Ｃ．Ｅ．Ｃｌａｒｋ．Ｔｈｅｇｒｅａｔｅｓｔｏｆａ￣ｎｉｔｅｓｅｔｏｆｒａｎｄｏｍｖａｒｉａｂｌｅｓ．ＯｐｅｒａｔｉｏｎｓＲｅｓｅａｒｃｈ，９（２）：１４５−１６２，１９６１．を参照）を用いて副クリティカルパス候補の分布の統計的ＭＡＸ演算によって求める。ｄ_ｉ＋３σ_ｉ＞ｄ_{ｃｉｒｃｕｉｔ}を満たすパスｉがクリティカルパス候補となる。この方法で求めたクリティカルパス候補に対しクリティカルパス再構成を行う。 FIG. 13 is a graph showing a critical path estimation method. The average of the circuit delay distribution and _{d Circuit,} delay value is a critical path candidates possible paths over the _{d Circuit.} Also, let d _i and σ _{i be} the average delay and standard deviation of path i, d _longest and σ _{longest be} the average delay and standard deviation of the longest path, and pass i satisfying d _i + 3σ _i > d _longest −3σ _longest Let it be a path candidate. d _circuit is a sub-path of Clerk's method (see C. E. Clark. The greatest of a set of random variables. Operations Research, 9 (2): 145-162, 1961.). Obtained by statistical MAX calculation of distribution. A path i satisfying d _i + 3σ _i > d _circuit is a critical path candidate. Critical path reconstruction is performed on critical path candidates obtained by this method.

トラック入れ替えによるクリティカルパス再構成手法をＶＰＲ（Ｖ．ＢｅｔｚａｎｄＪ．Ｒｏｓｅ．ＶＰＲ：Ａｎｅｗｐａｃｋｉｎｇ，ｐｌａｃｅｍｅｎｔａｎｄｒｏｕｔｉｎｇｔｏｏｌｆｏｒＦＰＧＡｒｅｓｅａｒｃｈ．Ｉｎｐｒｏｃ．ｏｆＩｎｔ．ＷｏｒｋｓｈｏｐｏｎＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＡｐｐｌｉｃａｔｉｏｎｓ，ｐａｇｅｓ２１３−２２２，１９９７．を参照）に実装し、ＬＧＳｙｎｔｈ９３ベンチマーク回路ｍｕｌｔ１６ｂ、ｄａｉｏを配置配線した。モンテカルロシミュレーションによって速度および歩留まり向上効果の評価を行った。 VPR (V. Betz and J. Rose. VPR: A new packing, placement and routing tool for FPGA research. In proc. Of Prop at Int. , 1997.), and LGSynth93 benchmark circuits multit16b and dio are arranged and wired. The speed and yield improvement effect was evaluated by Monte Carlo simulation.

本実験においては、図７に示されているステップＳ８の概略配線変更のフローは実装しておらず、再構成回数は一回のみである。なお、再構成を繰り返すことにより、より大きな性能向上効果が期待できる。 In this experiment, the flow of schematic wiring change in step S8 shown in FIG. 7 is not implemented, and the number of times of reconfiguration is only once. In addition, a greater performance improvement effect can be expected by repeating the reconfiguration.

ここで仮定するＦＰＧＡアーキテクチャはｉｓｌａｎｄ−ｓｔｙｌｅのアーキテクチャであり、ＣＬＢには４入力のＬＵＴが一つ含まれる。配線セグメント同士は、図２８に示すｂｕｆｐアーキテクチャのスイッチブロックによって接続され、セグメント長は１である。ベンチマーク回路は最小サイズのアレイに配置し、配線のチャンネル幅は最小チャンネル幅の１．２倍とした。実験ではスイッチブロックおよびコネクションブロックのスイッチの抵抗、容量がばらつくと仮定した。速度および歩留まり向上効果は１０，０００回のモンテカルロシミュレーションによって求めた。クリティカルパス再構成手法では、あるタイミング制約に違反したクリティカルパスを再配線する。本実験では、タイミング制約をμ_{ｃｉｒｃｕｉｔ}＋ｉσ_{ｃｉｒｃｕｉｔ}（ｉ＝１，２，３）とし、遅延値がこの値より大きくなる回路についてクリティカルパス再構成を行った。μ_{ｃｉｒｃｕｉｔ}、σ_{ｃｉｒｃｕｉｔ}は、それぞれ回路遅延分布の平均値および標準偏差である。 The FPGA architecture assumed here is an island-style architecture, and the CLB includes one 4-input LUT. The wiring segments are connected by a switch block of the bufp architecture shown in FIG. The benchmark circuit was arranged in a minimum size array, and the channel width of the wiring was 1.2 times the minimum channel width. In the experiment, it was assumed that the resistance and capacitance of the switch and connection block switches varied. The speed and yield improvement effect was obtained by 10,000 Monte Carlo simulations. In the critical path reconstruction method, critical paths that violate a certain timing constraint are rewired. In this experiment, the timing constraint is μ _circuit + iσ _circuit (i = 1, 2, 3), and critical path reconfiguration is performed for a circuit whose delay value is larger than this value. μ _circuit and σ _circuit are the average value and standard deviation of the circuit delay distribution, respectively.

図１４（ａ）は、ｍｕｌｔ１６ｂの初期コンフィグレーションによる回路遅延のヒストグラムであり、図１４（ｂ）は、トラック入れ替えによるクリティカルパス再構成を一回行ったときの回路遅延分布を示すヒストグラムである。図１４（ｂ）に示すヒストグラムにより、タイミング制約に違反したチップが再構成されている様子が確認できた。例えば、遅延制約を５０％の歩留まりを達成する速度と定義した場合、遅延制約を満たさないパスを配線自由度を用いて入れ替えることで、遅延制約を満たさないデバイスのうち半数を遅延制約を満たすように再構成することができる。また、ばらつきが大きくなると速度向上の効果が高くなり，今後のデバイスの微細化によりその効果はさらに大きくなる。 FIG. 14A is a histogram of circuit delays due to the initial configuration of the multit 16b, and FIG. 14B is a histogram showing circuit delay distributions when a critical path reconfiguration is performed once by track replacement. From the histogram shown in FIG. 14B, it was confirmed that the chip that violated the timing constraint was reconfigured. For example, if the delay constraint is defined as a speed that achieves a yield of 50%, half of the devices that do not satisfy the delay constraint satisfy the delay constraint by replacing paths that do not satisfy the delay constraint using the degree of freedom of wiring. Can be reconfigured. In addition, as the variation becomes larger, the effect of improving the speed becomes higher, and the effect is further increased by miniaturization of devices in the future.

表２は、μ／σ＝１０％、ｄ_ｓｐｅｃ＝μ_{ｃｉｒｃｕｉｔ}＋σ_{ｃｉｒｃｕｉｔ}のときの速度・歩留まり向上を示しており、ｍｕｌｔ１６ｂ、ｄａｉｏでは性能が向上していることが分かる。 Table 2 shows the speed / yield improvement when μ / σ = 10% and d _spec = μ _circuit + σ _circuit , and it can be seen that the performance is improved in the multit 16b and the dio.

ここで、トポロジー変更による再配線機能を実装することにより、２回以上再配線を繰り返せば、さらなる最適化が可能である。このように、より多くの再構成を行う事で、さらなる性能向上を見込むことができる。 Here, by implementing the rewiring function by changing the topology, further optimization is possible if rewiring is repeated twice or more. Thus, further performance improvement can be expected by performing more reconfigurations.

また、表３に、ｍｕｌｔ１６ｂ、ｄａｉｏのプロセスばらつきの幅μ／σ、要求速度ｄ_ｓｐｅｃに対する速度および歩留まり向上を示す。歩留まりは要求仕様での歩留まりで比較し、速度向上は歩留まりが９９．８７％となる遅延値で比較している。表３から、歩留まりが最大２６．６２％向上し、速度が最大５．８１％向上していることが分かる。 Further, Table 3 shows the process variation width μ / σ of the multit 16b and daio, and the speed and yield improvement with respect to the required speed d _spec . The yield is compared by the yield in the required specifications, and the speed improvement is compared by the delay value at which the yield is 99.87%. From Table 3, it can be seen that the yield is improved by a maximum of 26.62% and the speed is improved by a maximum of 5.81%.

また、図１５は、表３のうち、ｍｕｌｔ１６ｂのプロセスのばらつきの幅μ／σ＝２％、４％および１０％のそれぞれにおけるｄ_ｓｐｅｃ＝μ＋σ、μ＋２σ、μ＋３σに対する歩留まり向上を示すグラフである。いずれの場合も、要求速度で１００％近い歩留まりを達成していることが分かる。 FIG. 15 is a graph showing the yield improvement with respect to d _spec = μ + σ, μ + 2σ, and μ + 3σ in the range of process variation μ / σ = 2%, 4%, and 10% of Table 16 in Table 3. In either case, it can be seen that a yield of nearly 100% is achieved at the required speed.

図１６は、回路のばらつき幅に対する速度向上を示すグラフであり、（ａ）は、ｍｕｌｔ１６ｂプロセスの場合、（ｂ）は、ｄａｉｏプロセスの場合を示している。ばらつき幅が大きくなるとより高速な素子が存在することになるので、ばらつきを利用したクリティカルパス再構成手法では、再構成後の回路性能向上はプロセスばらつきの拡大に従ってより大きくなることが分かる。 FIGS. 16A and 16B are graphs showing the speed improvement with respect to the variation width of the circuit. FIG. 16A shows the case of the multi 16b process, and FIG. As the variation width increases, faster elements exist. Therefore, it can be seen that, in the critical path reconfiguration method using variation, the improvement in circuit performance after reconfiguration increases as the process variation increases.

本発明は上述した実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能である。すなわち、請求項に示した範囲で適宜変更した技術的手段を組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope shown in the claims. That is, embodiments obtained by combining technical means appropriately modified within the scope of the claims are also included in the technical scope of the present invention.

本発明は、マトリックス状に配置された複数個の演算回路を備え、演算回路間の配線経路が変更可能に構成された半導体デバイスに好適に適用できる。 The present invention can be suitably applied to a semiconductor device that includes a plurality of arithmetic circuits arranged in a matrix and is configured such that the wiring path between the arithmetic circuits can be changed.

本発明の実施形態に係る半導体デバイスの構成を示すブロック図である。It is a block diagram which shows the structure of the semiconductor device which concerns on embodiment of this invention. 図１に示す半導体デバイスが備えるスイッチブロックの接続関係を示す概略図である。It is the schematic which shows the connection relation of the switch block with which the semiconductor device shown in FIG. 1 is provided. 図２に示すスイッチブロックを用いた場合における、図３１（ａ）の初期配線からの再配線後の接続関係を示す概略図である。It is the schematic which shows the connection relation after the rewiring from the initial wiring of Fig.31 (a) in the case of using the switch block shown in FIG. （ａ）は、図２に示すスイッチブロックを用いた場合における初期配線を示す概略図であり、（ｂ）は、再配線後の接続関係を示す概略図である。(A) is the schematic which shows the initial wiring at the time of using the switch block shown in FIG. 2, (b) is the schematic which shows the connection relation after rewiring. （ａ）は、Ｆｓ＝３のスイッチブロックの構成を示す概念図であり、（ｂ）は、Ｆｓ＝６のスイッチブロックの構成を示す概念図であり、（ｃ）は、Ｆｓ＝６および９のスイッチブロックの構成を示す概念図である。(A) is a conceptual diagram showing a configuration of a switch block with Fs = 3, (b) is a conceptual diagram showing a configuration of a switch block with Fs = 6, and (c) is a diagram showing Fs = 6 and 9 It is a conceptual diagram which shows the structure of this switch block. （ａ）は、Ｆｓ＝３のスイッチブロックの構成を示す回路図であり、（ｂ）は、Ｆｓ＝６のスイッチブロックの構成を示す回路図である。(A) is a circuit diagram showing a configuration of a switch block with Fs = 3, and (b) is a circuit diagram showing a configuration of a switch block with Fs = 6. 本発明に係るトラック入れ替えによるクリティカルパス再構成を示すフローチャートである。It is a flowchart which shows the critical path reconstruction by the track replacement based on this invention. 隣接する２トラック間で入れ替えを示す説明図である。It is explanatory drawing which shows replacement | exchange between two adjacent tracks. 本発明の実施形態に係るクリティカルパス探索方法の説明図であり、（ａ）は、１回目の測定を示しており、（ｂ）は、２回目の測定を示している。It is explanatory drawing of the critical path search method which concerns on embodiment of this invention, (a) has shown the 1st measurement, (b) has shown the 2nd measurement. 図１に示す半導体デバイスが備えるＣＬＢの構成を示す回路図である。It is a circuit diagram which shows the structure of CLB with which the semiconductor device shown in FIG. 1 is provided. 図１に示す半導体デバイスが備えるＣＬＢの構成を示す回路図であり、（ａ）は、信号の起点モードの場合を示しており、（ｂ）は、遅延比較器モードの場合を示している。2A and 2B are circuit diagrams illustrating a configuration of a CLB included in the semiconductor device illustrated in FIG. 1, in which FIG. 1A illustrates a signal start mode and FIG. 1B illustrates a delay comparator mode. ＣＬＢが遅延比較器モードの場合の動作を示す等価回路図である。It is an equivalent circuit diagram showing an operation when the CLB is in the delay comparator mode. 本発明に係るクリティカルパスの推定方法を示すグラフである。3 is a graph showing a critical path estimation method according to the present invention. （ａ）は、ｍｕｌｔ１６ｂの初期コンフィグレーションによる回路遅延のヒストグラムであり、（ｂ）は、トラック入れ替えによるクリティカルパス再構成を一回行ったときの回路遅延分布を示すヒストグラムである。(A) is a histogram of the circuit delay due to the initial configuration of the multit 16b, and (b) is a histogram showing the circuit delay distribution when the critical path reconfiguration is performed once by track replacement. ｍｕｌｔ１６ｂのプロセスのばらつきの幅μ／σ＝２％、４％および１０％のそれぞれにおけるｄ_ｓｐｅｃ＝μ＋σ、μ＋２σ、μ＋３σに対する歩留まり向上を示すグラフである。It is a graph which shows the yield improvement with respect to d _spec = μ + σ, μ + 2σ, μ + 3σ in each of the variation width μ / σ = 2%, 4% and 10% of the process of the multit 16b. 回路のばらつき幅に対する速度向上を示すグラフであり、（ａ）は、ｍｕｌｔ１６ｂプロセスの場合、（ｂ）は、ｄａｉｏプロセスの場合を示している。It is a graph which shows the speed improvement with respect to the variation width of a circuit, (a) shows the case of a multi 16b process, (b) has shown the case of a dario process. チップの性能とチップ数の分布を示すグラフである。It is a graph which shows the performance of a chip | tip, and distribution of a chip | tip number. （ａ）は、試作ＦＰＧＡの全体構造を示す概略図であり、（ｂ）は、当該ＦＰＧＡを備えるチップの写真である。(A) is the schematic which shows the whole structure of trial manufacture FPGA, (b) is a photograph of the chip | tip provided with the said FPGA. 図１８に示すＦＰＧＡの構造を示す概略図である。It is the schematic which shows the structure of FPGA shown in FIG. （ａ）は、上記ＦＰＧＡに設けられるＣＬＢの構成を示す回路図であり、（ｂ）は、当該ＣＬＢに設けられるＤＦＦの構成を示す回路図である。(A) is a circuit diagram which shows the structure of CLB provided in said FPGA, (b) is a circuit diagram which shows the structure of DFF provided in the said CLB. （ａ）は、リングオシレータの構成例を示す回路図であり、（ｂ）は、リングオシレータを用いて論理ブロックの性能を判別する方法を示す概念図である。(A) is a circuit diagram which shows the structural example of a ring oscillator, (b) is a conceptual diagram which shows the method of discriminating the performance of a logic block using a ring oscillator. 上記試作ＦＰＧＡのばらつき測定結果を示す概略図である。It is the schematic which shows the dispersion | variation measurement result of the said prototype FPGA. 上記試作ＦＰＧＡのばらつき測定結果を示すヒストグラムである。It is a histogram which shows the dispersion | variation measurement result of the said prototype FPGA. 従来技術を示すものであり、複数コンフィグレーションの概念図である。It shows a prior art and is a conceptual diagram of multiple configurations. 従来技術を示すものであり、クリティカルパス再構成の概念図である。It shows a prior art and is a conceptual diagram of critical path reconstruction. 従来の半導体デバイスの構成を示すブロック図である。It is a block diagram which shows the structure of the conventional semiconductor device. 図２６に示す半導体デバイスが備えるスイッチブロックの構造を示す概略図である。FIG. 27 is a schematic diagram illustrating a structure of a switch block included in the semiconductor device illustrated in FIG. 26. 従来の半導体デバイスにおける、クリティカルパスのトラック入れ替えを示す概念図である。It is a conceptual diagram which shows the track replacement of a critical path in the conventional semiconductor device. クリティカルパス探索方法を示す説明図である。It is explanatory drawing which shows the critical path search method. Ｐａｔｈ−Ｄｅｌａｙ法によるクリティカルパス探索方法を示す説明図である。It is explanatory drawing which shows the critical path search method by Path-Delay method. （ａ）は、スイッチブロックにおける初期配線における接続関係を示す概略図であり、（ｂ）は、図２７に示すスイッチブロックを用いた場合における再配線後の接続関係を示す概略図である。(A) is the schematic which shows the connection relation in the initial wiring in a switch block, (b) is the schematic which shows the connection relation after rewiring in the case of using the switch block shown in FIG. 図２７に示すスイッチブロックにおける再配線の接続関係を示す概略図であり、（ａ）は、初期配線を示しており、（ｂ）は、経路変更後の配線を示している。It is the schematic which shows the connection relation of the rewiring in the switch block shown in FIG. 27, (a) has shown the initial wiring, (b) has shown the wiring after a route change.

Explanation of symbols

１半導体デバイス
３、３ａ〜３ｏ可変論理ブロック（演算回路）
４、４ａ〜４ｈ、１４スイッチブロック
５トラック（配線トラック）
６配線経路変更部（配線経路変更手段）
３１ａ、３１ｂＬＵＴ
３２ＳＲラッチ DESCRIPTION OF SYMBOLS 1 Semiconductor device 3, 3a-3o Variable logic block (arithmetic circuit)
4, 4a to 4h, 14 switch block 5 tracks (wiring track)
6 Wiring path changing unit (wiring path changing means)
31a, 31b LUT
32 SR latch

Claims

A plurality of arithmetic circuits arranged in a matrix;
A plurality of wiring tracks arranged so as to pass between two adjacent ones of four arithmetic circuits adjacent to each other among the plurality of arithmetic circuits;
Provided at a position surrounded by the four arithmetic circuits and between one of the plurality of wiring tracks between the two adjacent arithmetic circuits and between the other two arithmetic circuits adjacent to each other. A switch block for connecting one of the plurality of wiring tracks;
In order to transmit a signal from one of the plurality of arithmetic circuits to the other one, the delays of the first wiring path and the second wiring path respectively constituted by the wiring track and the switch block are compared. A first delay comparator provided in another one of the arithmetic circuits,
In order to transmit a signal from the other one of the plurality of arithmetic circuits to the other one, a third wiring path and a fourth wiring path formed by the wiring track and the switch block, respectively. A semiconductor device, comprising: a second delay comparator provided in still another one of the arithmetic circuits for comparing delays.

The first delay comparator and the second delay comparator semiconductor device of claim 1, wherein is constituted by SR latch.

3. The semiconductor device according to claim 2 , wherein the other one of the arithmetic circuits and the other one have a clustered LUT provided in a preceding stage of the SR latch.

Each arithmetic circuit, a semiconductor device according to claim 1, characterized in that is constituted by a variable logic block.