JP7654076B2

JP7654076B2 - Floating-point calculation system, method, and program with threshold prediction

Info

Publication number: JP7654076B2
Application number: JP2023528073A
Authority: JP
Inventors: カン、ミング; ウ、ソンフン; キュンリー、エウン
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2020-11-23
Filing date: 2021-11-02
Publication date: 2025-03-31
Anticipated expiration: 2041-11-02
Also published as: KR20230093279A; KR102753819B1; DE112021005263T5; AU2021382976A1; CN116529708A; GB2615029A; JP2023550308A; US11853715B2; AU2021382976A9; WO2022106944A1; GB2615029B; GB202306201D0; US20220164163A1; AU2021382976B2

Description

本発明は、一般に、人工知能システムで使用するための浮動小数点計算を実施することに関する。 The present invention generally relates to performing floating-point calculations for use in artificial intelligence systems.

機械学習アルゴリズムは人工知能システムの例と考えられるが、この機械学習アルゴリズムは、改善された反復計算精度の利益を得る計算応用例で広く使用されつつある。ニューラル・ネットワークベースのアルゴリズムは、最も広く使用されているタイプの機械学習アルゴリズムの１つである。ニューラル・ネットワークは、人間の脳が動作する方式を模倣するプロセスを通してデータのセット中の基礎をなす関係を認識するモデルである。ニューラル・ネットワーク・モデルは、最初に、訓練データ・セットを使用して訓練され（訓練段階）、次いで、訓練されたニューラル・ネットワーク・モデルを使用してターゲット・データ・セット中の関係が認識される（推論段階）。推論段階は低精度の固定小数点算術計算に依拠し得るが、訓練段階は通常、浮動小数点算術計算を必要とする。 Machine learning algorithms, considered as examples of artificial intelligence systems, are becoming widely used in computational applications that benefit from improved iterative computational accuracy. Neural network-based algorithms are one of the most widely used types of machine learning algorithms. A neural network is a model that recognizes underlying relationships in a set of data through a process that mimics the way the human brain operates. A neural network model is first trained using a training data set (training phase), and then the trained neural network model is used to recognize relationships in a target data set (inference phase). While the inference phase may rely on low-precision fixed-point arithmetic, the training phase typically requires floating-point arithmetic.

例証的な一実施形態では、システムが、第１の浮動小数点値および第２の浮動小数点値に従ってドット積演算を実施するように構成された浮動小数点計算ユニットと、浮動小数点計算ユニットに動作可能に結合された検出ロジックとを備える。検出ロジックは、第１の浮動小数点値および第２の浮動小数点値の指数部の固定小数点合計の間の差を計算し、計算された差に基づいて、浮動小数点計算ユニットによるドット積演算の完了前に条件の存在を検出するように構成される。条件の存在の検出に応答して、検出ロジックは、通常ならドット積演算の一部として実施される計算のサブセットの実施を浮動小数点計算ユニットに回避させるようにさらに構成される。 In one illustrative embodiment, a system includes a floating-point computation unit configured to perform a dot product operation according to a first floating-point value and a second floating-point value, and detection logic operably coupled to the floating-point computation unit. The detection logic is configured to calculate a difference between a fixed-point sum of the exponents of the first floating-point value and the second floating-point value, and detect the presence of a condition prior to completion of the dot product operation by the floating-point computation unit based on the calculated difference. In response to detecting the presence of the condition, the detection logic is further configured to cause the floating-point computation unit to avoid performing a subset of calculations that would normally be performed as part of the dot product operation.

さらに他の例証的な実施形態は、命令コードを実行するように構成されたプロセッサおよびメモリを備える装置の形と、検出するステップおよび回避させるステップを実施するように構成された方法の形と、プロセッサによって実行されたときにこれらのステップをプロセッサに実施させる実行可能命令コードが組み入れられた非一過性プロセッサ可読記憶媒体の形とで、それぞれ提供される。 Further illustrative embodiments are provided in the form of an apparatus having a processor and memory configured to execute instruction code, a method configured to perform the detection and avoidance steps, and a non-transitory processor-readable storage medium embodied with executable instruction code that, when executed by a processor, causes the processor to perform the steps, respectively.

有利にも、一例として、例証的な実施形態は、計算アルゴリズムをサポートするのに使用されるハードウェアに関して計算オーバヘッドを早い段階で省くために、内積が計算される前に、負の内積出力または許容可能に小さい正の内積出力を予測する。 Advantageously, as one example, the illustrative embodiments predict a negative dot product output or an acceptably small positive dot product output before the dot product is calculated in order to save early computational overhead on the hardware used to support the computational algorithm.

本明細書に記載の実施形態に関するこれらおよび他の特徴および利点は、添付図面および後続の詳細な説明からより明らかになるであろう。 These and other features and advantages of the embodiments described herein will become more apparent from the accompanying drawings and detailed description that follow.

１つまたは複数の例証的な実施形態がそれによって実装されることが可能な、浮動小数点数のフィールド・フォーマットを描いた図である。1 depicts a floating-point number field format by which one or more illustrative embodiments may be implemented. １つまたは複数の例証的な実施形態がそれによって実装されることが可能な、浮動小数点数の算術表現を描いた図である。1 depicts an arithmetic representation of floating-point numbers, with which one or more illustrative embodiments may be implemented. １つまたは複数の例証的な実施形態がそれによって実装されることが可能な、浮動小数点ドット積計算に関連する計算カーネルを描いた図である。FIG. 1 illustrates a computation kernel associated with floating-point dot product computations by which one or more illustrative embodiments may be implemented. １つまたは複数の例証的な実施形態がそれによって実装されることが可能な浮動小数点積和演算器ロジックを描いた図である。FIG. 1 depicts floating-point multiply-accumulator logic by which one or more illustrative embodiments may be implemented. 例証的な実施形態による、浮動小数点ドット積計算のためのしきい値検出ロジックを描いた図である。FIG. 1 illustrates threshold detection logic for floating-point dot product calculations, in accordance with an illustrative embodiment. 例証的な実施形態による、電圧スケーリングなし（Ａ）、あり（Ｂ）の処理フローを描いた図である。1A and 1B depict process flows without and with voltage scaling, in accordance with an illustrative embodiment. 例証的な実施形態による、浮動小数点ドット積計算のためのしきい値検出ロジックに関する方法を描いた図である。FIG. 1 illustrates a method for threshold detection logic for floating-point dot product calculations, in accordance with an illustrative embodiment. 例証的な実施形態による人工知能システムの例示的な実装形態を描いた図である。FIG. 1 illustrates an exemplary implementation of an artificial intelligence system in accordance with an illustrative embodiment. 例証的な実施形態による例示的なプロセッサ・システムを描いた図である。FIG. 1 depicts an exemplary processor system in accordance with an illustrative embodiment. 例証的な実施形態によるクラウド・コンピューティング環境を描いた図である。FIG. 1 illustrates a cloud computing environment in accordance with an illustrative embodiment. 例証的な実施形態による抽象化モデル・レイヤを描いた図である。FIG. 1 illustrates an abstraction model layer in accordance with an illustrative embodiment.

本明細書では、例示的なコンピューティング環境、クラウド・インフラストラクチャ、データ・リポジトリ、データ・センタ、データ処理システム、情報処理システム、コンピュータ・システム、データ記憶システム、ならびに関連するサーバ、コンピュータ、記憶ユニット、およびデバイスと、他の処理およびコンピューティング・デバイスとに関して、例証的な実施形態が記述されることがある。しかし、本発明の実施形態は、示される特定の例証的なシステムおよびデバイス構成と共に使用することに限られないことを了解されたい。さらに、本明細書における「クラウド・プラットフォーム」、「クラウド・コンピューティング環境」、「クラウド・インフラストラクチャ」、「データ・リポジトリ」、「データ・センタ」、「データ処理システム」、「情報処理システム」、「コンピュータ・システム」、「データ記憶システム」、「コンピューティング環境」などの文言は、例えば、プライベートまたはパブリックあるいはその両方のクラウド・コンピューティングまたは記憶システム、ならびに、分散仮想インフラストラクチャを含む他のタイプのシステムを包含するように広く解釈されるものとする。しかし、所与の実施形態が、より一般に、１つまたは複数の処理デバイスの任意の配置構成を含み得る。 Illustrative embodiments may be described herein with respect to exemplary computing environments, cloud infrastructures, data repositories, data centers, data processing systems, information processing systems, computer systems, data storage systems, and associated servers, computers, storage units, and devices, as well as other processing and computing devices. However, it should be understood that embodiments of the invention are not limited to use with the particular illustrative system and device configurations shown. Furthermore, terms such as "cloud platform," "cloud computing environment," "cloud infrastructure," "data repository," "data center," "data processing system," "information processing system," "computer system," "data storage system," "computing environment," and the like, herein are intended to be broadly interpreted to encompass other types of systems, including, for example, private and/or public cloud computing or storage systems, as well as distributed virtual infrastructures. However, a given embodiment may more generally include any arrangement of one or more processing devices.

背景技術のセクションで上述されたように、人工知能（例えば、機械学習）システムで使用されるニューラル・ネットワーク・モデルの訓練段階は通常、浮動小数点算術計算を必要とする。一例として、このような浮動小数点算術は、「IEEE 754-2019: IEEE Standard for Floating-Point Arithmetic」という名称の米国電気電子学会（ＩＥＥＥ）規格において定義されている計算を含むことがある。ＩＥＥＥ７５４規格は、コンピュータ・プログラミング環境における２進法（基数２）および１０進法（基数１０）浮動小数点算術のための算術フォーマットおよび方法を指定している。ＩＥＥＥは、この規格に準拠する浮動小数点システムの実装形態が、完全にソフトウェアにおいて、完全にハードウェアにおいて、またはソフトウェアとハードウェアとの任意の組合せにおいて実現されることを示している。 As discussed above in the Background section, the training phase of neural network models used in artificial intelligence (e.g., machine learning) systems typically requires floating-point arithmetic calculations. As an example, such floating-point arithmetic may include calculations defined in the Institute of Electrical and Electronics Engineers (IEEE) standard entitled "IEEE 754-2019: IEEE Standard for Floating-Point Arithmetic". The IEEE 754 standard specifies arithmetic formats and methods for binary (base 2) and decimal (base 10) floating-point arithmetic in a computer programming environment. The IEEE indicates that implementations of floating-point systems conforming to this standard may be realized entirely in software, entirely in hardware, or in any combination of software and hardware.

ＩＥＥＥ７５４は、数１つ当たり１６ビットを使用する半精度フォーマット、数１つ当たり３２ビットの単精度フォーマット、および数１つ当たり６４ビットの倍精度フォーマットを定義している。各フォーマットは、符号「ｓ」と、指数「ｅ」と、小数値「ｆ」として表される仮数とを含む（例えば、数は、先頭の１ビットと、それに続く、小数点の右側の有効値を含む小数部とを仮定することによって正規化される）。このように、仮数は、浮動小数点数の、その数の有効数字を表す部分であり、指数で累乗された基数がこの部分に掛けられて数の実際の値が与えられる。 IEEE 754 defines a half-precision format using 16 bits per number, a single-precision format with 32 bits per number, and a double-precision format with 64 bits per number. Each format includes a sign "s", an exponent "e", and a mantissa expressed as a fractional value "f" (e.g., the number is normalized by assuming one leading bit followed by a fractional part that contains the significant value to the right of the decimal point). Thus, the mantissa is the part of a floating-point number that represents the significant digits of the number, and is multiplied by the base raised to the exponent to give the actual value of the number.

図１Ａは、ＩＥＥＥ７５４規格に合致し、１つまたは複数の例証的な実施形態がそれによって実装されることが可能な、浮動小数点数のフィールド・フォーマット１００を描いている。指数部および小数部のビットの数をそれぞれ示す変数ｎおよびｐは、選択された精度に依存する。図１Ｂは、図１Ａのフィールド・フォーマット１００に合致し、１つまたは複数の例証的な実施形態がそれによって実装されることが可能な、値（ｘ）として示される浮動小数点数の算術表現１１０を描いている。 FIG. 1A illustrates a floating-point number field format 100 that conforms to the IEEE 754 standard and that one or more illustrative embodiments may implement. The variables n and p, which indicate the number of exponent and fraction bits, respectively, depend on the selected precision. FIG. 1B illustrates a floating-point number arithmetic representation 110, shown as a value (x), that conforms to the field format 100 of FIG. 1A and that one or more illustrative embodiments may implement.

人工知能システムで使用されるニューラル・ネットワーク・モデルの訓練段階で実施される浮動小数点計算の１つは、浮動小数点ドット積計算である。ドット積計算は通常、人工知能システムの積和演算器（ＭＡＣ）ユニット中で実施される。ドット積は、長さの等しい２つの数列（例えば、２つの浮動小数点数）を入力して単一の数を返す代数的演算である。より詳細には、ドット積は、２つの数列の対応するエントリの積の合計である。２つの入力数がベクトルである場合、ドット積演算の結果（内積とも呼ばれる）は、スカラー値である。 One of the floating-point calculations performed during the training phase of neural network models used in artificial intelligence systems is the floating-point dot product calculation. The dot product calculation is typically performed in a multiply-accumulator (MAC) unit of an artificial intelligence system. The dot product is an algebraic operation that takes two equal-length number sequences (e.g., two floating-point numbers) as input and returns a single number. More specifically, the dot product is the sum of the products of corresponding entries in the two sequences. If the two input numbers are vectors, the result of the dot product operation (also called the inner product) is a scalar value.

さらに、現代の人工知能システムにおける計算カーネルは、内積（浮動小数点ドット積演算の結果）をとり、内積に整流化線形ユニット（Ｒｅｌｕ）関数を適用する。図１Ｃは、２つの浮動小数点値ｘおよびｗについて計算された内積ｙ

にＲｅｌｕ関数を適用する計算カーネル１２０を描いている。Ｒｅｌｕ関数は、正の入力のみを通す。すなわち、ｙが０よりも大きい限り、Ｒｅｌｕ関数からｙが出力され、そうでない場合は、Ｒｅｌｕ関数は０を出力する。 Furthermore, computational kernels in modern artificial intelligence systems take the inner product (the result of a floating-point dot product operation) and apply a Rectified Linear Unit (Relu) function to the inner product. Figure 1C shows the inner product y computed for two floating-point values x and w.

1 illustrates a computation kernel 120 that applies the Relu function to y. The Relu function only passes positive inputs, i.e., as long as y is greater than 0, the Relu function outputs y; otherwise, the Relu function outputs 0.

本明細書では、ドット積

の全体を計算する前であっても負の値が検出されて、それによりこの内積を完全に計算することなく０の出力が提供されるようにすることができれば、エネルギー効率が高いであろうことが理解される。例証的な実施形態は、計算アルゴリズムをサポートするのに使用されるハードウェアに関して計算オーバヘッドを早い段階で省くために、内積が計算される前に（すなわち、内積計算が完了する前に）このような負の内積出力を予測する技術を提供する。 In this specification, the dot product

It will be appreciated that it would be more energy efficient if negative values could be detected even before computing the entirety of , thereby providing an output of zero without fully computing the dot product. Illustrative embodiments provide techniques to predict such negative dot product outputs before the dot product is computed (i.e., before the dot product computation is complete) in order to save early computational overhead on the hardware used to support the computation algorithm.

上で説明され図１Ａおよび１Ｂに示されたように、浮動小数点数は３つのフィールド、すなわち符号（ｓ）と指数（ｅ）と小数（ｆ）とを用いて表されることを想起されたい。ＩＥＥＥ７５４半精度フォーマットの場合、例えば、ｓ＝１ビット、ｅ＝５ビット、ｆ＝１０ビットである。浮動小数点数は、正規数の場合、値（ｘ）＝（－１）^ｓ＊２^［ｅ］＊（１．ｆ）として表される。明白なように、値の大きさは、ビット数がより少ない「ｅ」に主に依存し、より長いビット・ストリーム「ｆ」は、値の大きさを微調整する働きをする。例えば、数ｘの大きさは、「ｆ」の値にかかわらず、２^［ｅ］≦｜ｘ｜＜２^{［ｅ＋１］}の範囲内である。本明細書では、このプロパティが、例証的な実施形態によれば有利にも、ドット積計算における０予測のための重要な機会をもたらすことが理解される。 Recall that, as described above and illustrated in Figures 1A and 1B, floating-point numbers are represented with three fields: sign (s), exponent (e), and fraction (f). For the IEEE 754 half-precision format, for example, s = 1 bit, e = 5 bits, and f = 10 bits. Floating-point numbers are represented as value (x) = (-1) ^s * 2 ^[e] * (1.f) for normalized numbers. As is evident, the magnitude of the value depends primarily on the smaller number of bits, "e", and the longer bit stream "f" serves to fine-tune the magnitude of the value. For example, the magnitude of the number x is in the range 2 ^[e] ≦ |x| < 2 ^[e+1] , regardless of the value of "f". It is understood herein that this property advantageously provides significant opportunities for zero prediction in dot-product calculations according to illustrative embodiments.

２つの浮動小数点数ＸとＹとの積は、次のように制限される。
Ｘ＝（－１）^ｓ＊２^［ｅｘ］＊（１．ｆｘ）＝＞２^［ｅｘ］≦｜Ｘ｜＜２^{［ｅｘ＋１］}
Ｙ＝（－１）^ｓ＊２^［ｅｙ］＊（１．ｆｙ）＝＞２^［ｅｙ］≦｜Ｙ｜＜２^{［ｅｙ＋１］}
∴２^{［ｅｘ＋ｅｙ］}≦｜ＸＹ｜＜２^{［ｅｘ＋ｅｙ＋２］} The product of two floating-point numbers X and Y is restricted as follows:
X=(-1) ^s *2 ^[ex] *(1.fx) => 2 ^[ex] ≦|X|<2 ^[ex+1]
Y=(-1) ^s *2 ^[ey] *(1.fy) => 2 ^[ey] ≦|Y|<2 ^[ey+1]
∴2 ^[ex+ey] ≦|XY|<2 ^[ex+ey+2]

本明細書ではさらに、例証的な実施形態によれば、積の最大および最小範囲の計算は、乗算を必要とせず、２つの浮動小数点値ＸおよびＹについての指数値の、１つの固定小数点加算、すなわちｅｘ＋ｅｙのみを必要とすることが理解される。ドット積計算では、多くの（Ｎ個の）このような積が合計され、例えば、

である。したがって、正の積のグループ（Ｓ_ｐ）と負の積のグループ（Ｓ_ｎ）とを区別することができる。 It is further understood herein that, according to an illustrative embodiment, the calculation of the maximum and minimum range of a product does not require multiplications, but only one fixed-point addition of exponent values for two floating-point values X and Y, namely ex+ey. In a dot product calculation, many (N) such products are summed, e.g.

It is therefore possible to distinguish between a group of positive products (S _p ) and a group of negative products (S _n ).

正および負のグループの合計の範囲は、次のとおりである。
正の積の合計：

負の積の合計：

The total ranges for the positive and negative groups are as follows:
Sum of positive products:

Sum of negative products:

したがって、例証的な実施形態では、積の総計

であることを確認するために、次の条件がチェックされる。

Thus, in the illustrative embodiment, the sum of the products

To ensure that:

上の式（１）における必要とされる計算は、高価な乗算や浮動小数点計算なしに、固定小数点合計のみを含むことに留意されたい。有利にも、この計算的に安価な事前チェックにより、計算のほぼ５０パーセント（％）（負の数が生じるとき）が回避される。数が正の数であると判明した場合は、通常の浮動小数点計算が継続する。 Note that the required calculations in equation (1) above involve only fixed-point sums, without expensive multiplications or floating-point calculations. Advantageously, this computationally inexpensive pre-check avoids nearly 50 percent (%) of the calculations (when negative numbers occur). If the number is found to be positive, normal floating-point calculations continue.

前述のように、浮動小数点ドット積計算は通常、人工知能システムに関連するＭＡＣハードウェア・ユニット中で実施される。図２は、１つまたは複数の例証的な実施形態がそれによって実装されることが可能な、ＭＡＣユニット２００に関連する浮動小数点積和演算器ロジックを描いている。ＭＡＣユニットは、値ＡとＢとのドット積に対して値Ｃを加算または減算することによって、３つの値ＡとＢとＣとを結合する。浮動小数点値の場合、ＭＡＣユニット２００は、入力ＥＡ、ＥＢ、およびＥＣを処理し、ここで、Ｅは、値Ａ、Ｂ、およびＣのそれぞれの指数部を指す。ＭＡＣユニット２００はまた、入力ＭＡ、ＭＢ、およびＭＣを処理し、ここで、Ｍは、値Ａ、Ｂ、およびＣのそれぞれの仮数（小数）部を指す。浮動小数点ＭＡＣユニット２００は、整合ロジック、加算器ロジック、乗算器ロジック、シフタ・ロジック、比較器ロジック、ならびに他の機能的ロジックを（図２に例証されるように）備え、比較的複雑である。浮動小数点ＭＡＣユニット２００などの積和演算器の従来のロジック動作は、当業者なら理解するであろうし、したがって本明細書ではこれ以上詳細に説明しない。 As previously mentioned, floating-point dot-product calculations are typically performed in a MAC hardware unit associated with an artificial intelligence system. FIG. 2 illustrates floating-point multiply-add logic associated with a MAC unit 200 by which one or more illustrative embodiments may be implemented. The MAC unit combines three values A, B, and C by adding or subtracting value C to the dot product of values A and B. For floating-point values, the MAC unit 200 processes inputs EA, EB, and EC, where E refers to the exponent portion of each of values A, B, and C. The MAC unit 200 also processes inputs MA, MB, and MC, where M refers to the mantissa (fraction) portion of each of values A, B, and C. The floating-point MAC unit 200 is relatively complex, including alignment logic, adder logic, multiplier logic, shifter logic, comparator logic, and other functional logic (as illustrated in FIG. 2). The conventional logical operation of a multiply-accumulate unit such as floating-point MAC unit 200 will be understood by those skilled in the art and will therefore not be described in further detail herein.

固定小数点加算エネルギーが浮動小数点ベースのＭＡＣ動作の１０％であると仮定すると、本明細書では、１０％の予測オーバヘッドで負の数を予測することによってＭＡＣ計算の約４０％をスキップできることが理解される。一方、累算された数が事前チェック条件を満たさないことが判明した場合、１０％の予測オーバヘッドは無駄に費やされる。したがって、０．４＊（１００％の節約－１０％のオーバヘッド）＋０．６＊（－１０％のオーバヘッド）＝３０％のエネルギー節約である。 Assuming that fixed-point addition energy is 10% of floating-point based MAC operations, it is understood herein that approximately 40% of MAC calculations can be skipped by predicting negative numbers with 10% prediction overhead. On the other hand, if the accumulated number turns out not to satisfy the pre-check condition, the 10% prediction overhead is wasted. Hence, 0.4*(100% savings-10% overhead)+0.6*(-10% overhead)=30% energy savings.

いくつかの状況では、本明細書において、式（１）は、正の積の可能な最大の大きさが負の積の可能な最小の大きさと比較されるので、必要以上または所望以上に保守的である可能性があることが理解される。したがって、代替実施形態では、以下で式（２）に描かれるように、式（１）の右側の項の「０」をしきい値「Ｔｈ」で置き換えることによって事前チェック条件が緩和される。

It is understood herein that in some circumstances, equation (1) may be more conservative than necessary or desirable since the maximum possible magnitude of a positive product is compared to the minimum possible magnitude of a negative product. Thus, in an alternative embodiment, the pre-check condition is relaxed by replacing the "0" in the right-hand term of equation (1) with a threshold value "Th", as depicted below in equation (2).

Ｔｈ＞０であり、Ｔｈの大きさは一般に比較的小さく設定されることを理解されたい。このように、左の項は正の数だが、０に非常に近い場合には計算はスキップされることが可能である。Ｔｈ値を制御することによって、精度が許容可能な程度により低くなることを代償としてより高いエネルギー効率が達成される。すなわち、Ｔｈの設定に基づいて、計算の１０％を追加的に回避できる場合、エネルギー節約は、０．５＊（１００％の節約－１０％のオーバヘッド）＋０．５＊（－１０％のオーバヘッド）＝４０％のエネルギー節約である。 It should be understood that Th>0 and the magnitude of Th is typically set relatively small. Thus, the calculation can be skipped if the left term is positive but very close to 0. By controlling the Th value, greater energy efficiency is achieved at the expense of acceptably less precision. That is, if an additional 10% of the calculations can be avoided based on the setting of Th, the energy savings is 0.5*(100% savings-10% overhead)+0.5*(-10% overhead)=40% energy savings.

図３は、例証的な実施形態による、浮動小数点ドット積計算のためのしきい値検出のロジック実装形態３００を描いている。ロジック実装形態３００は、上の式（２）のハードウェア実装形態の例証的な一実施形態を表す。代替実施形態は、他のロジック実装形態を有することがある。 FIG. 3 depicts a logic implementation 300 of threshold detection for floating-point dot product calculations, according to an illustrative embodiment. Logic implementation 300 represents one illustrative embodiment of a hardware implementation of equation (2) above. Alternative embodiments may have other logic implementations.

図示のように、ロジック実装形態３００は、浮動小数点ＭＡＣユニット３４０に動作可能に結合されたしきい値検出器３１０（検出ロジック）を備え、浮動小数点ＭＡＣユニット３４０は、マルチプレクサ３４２に動作可能に結合される。しきい値検出器３１０は、加算器３１２、加算器３１４、デマルチプレクサ３１６、レジスタ３２０、レジスタ３２２、マルチプレクサ３２４、排他的論理和（ＸＯＲ）ゲート３２６、加算器３２８、および比較器３３０を含むロジック構成要素を備える。データ入力、データ出力、選択ビット、制御ビット（またはデジタル信号）などが所与のロジック構成要素から入力されるかまたは出力されるかあるいはその両方がなされると言及される場合、これは、所与のロジック構成要素が、対応するそのような入力の受領またはそのような出力の送出あるいはその両方のために他のロジック構成要素との接続性を可能にする対応する端子を有すると想定する。 As shown, logic implementation 300 includes a threshold detector 310 (detection logic) operably coupled to a floating-point MAC unit 340, which is operably coupled to a multiplexer 342. Threshold detector 310 includes logic components including adder 312, adder 314, demultiplexer 316, register 320, register 322, multiplexer 324, exclusive-or (XOR) gate 326, adder 328, and comparator 330. When it is said that a data input, data output, selection bit, control bit (or digital signal), etc. is input or output from a given logic component, this assumes that the given logic component has a corresponding terminal that allows connectivity with other logic components for receiving such corresponding inputs and/or sending such outputs.

上で説明されたように、２つの浮動小数点値ＸとＹとの積の最大および最小範囲の計算は、乗算を必要とせず、２つの浮動小数点値ＸおよびＹについての指数値の、１つの加算、すなわちｅｘ＋ｅｙのみを必要とすること、ならびに、ドット積計算では、多くの（Ｎ個の）このような積が合計され、例えば、

であることを想起されたい。したがって、図３のしきい値検出器３１０に示されるように、ＸおよびＹについての指数ビット（指数部）の対応する各対ｅｘ_ｉおよびｅｙ_ｉが加算器３１２に入力され、それにより、加算器３１２の出力はｅｘ_ｉ＋ｅｙ_ｉである。加算器３１２の出力は、マルチプレクサ３２４の出力と共に加算器３１４に入力される。加算器３１４の出力は、デマルチプレクサ３１６に入力される。 As explained above, it is important to note that the calculation of the maximum and minimum range of the product of two floating point values X and Y does not require a multiplication, but only one addition of the exponent values for the two floating point values X and Y, namely ex+ey, and that in a dot product calculation, many (N) such products are summed together, e.g.

3, each corresponding pair of exponent bits for X and Y, ex _i and ey _i, is input to an adder 312, such that the output of adder 312 is ex _i +ey _i . The output of adder 312 is input to adder 314 along with the output of multiplexer 324. The output of adder 314 is input to demultiplexer 316.

デマルチプレクサ３１６は、正の結果を表す、第１の出力上の

を生成し、負の結果を表す、第２の出力上の

を生成する。正の結果はレジスタ３２０に入力され、負の結果はレジスタ３２２に入力される。さらに、正の結果は正レジスタ３２０から出力され、負の結果はレジスタ３２２から出力され、両方はマルチプレクサ３２４に入力され、マルチプレクサ３２４は、前述のように加算器３１４への入力としての働きをする出力を生成する。 The demultiplexer 316 outputs a positive result

on the second output, which represents a negative result.

The positive result is input to register 320 and the negative result is input to register 322. Additionally, the positive result is output from positive register 320 and the negative result is output from register 322, both of which are input to multiplexer 324 which generates an output that serves as an input to adder 314 as previously described.

図３にさらに示されるように、ＸＯＲゲート３２６は、それぞれ浮動小数点値ＸおよびＹについての符号ビットであるｓｘ_ｉおよびｓｙ_ｉを入力として受け取る。ｓｘとｓｙとが異なるとき、これは、ＸとＹとの積の符号が負であることを意味する。したがって、値は、負レジスタ３２２中の値と累算される。ｓｘとｓｙとが同じである場合、これは、ＸとＹとの積の符号が正であることを意味し、したがって、値は正レジスタ３２０中の値と累算される。ＸＯＲゲート３２６の出力は、したがって、適宜、デマルチプレクサ３１６およびマルチプレクサ３２４を制御するのに使用されることに留意されたい。 3, the XOR gate 326 receives as inputs sx _i and sy _i , which are the sign bits for the floating-point values X and Y, respectively. When sx and sy are different, this means that the sign of the product of X and Y is negative. Thus, the value is accumulated with the value in the negative register 322. When sx and sy are the same, this means that the sign of the product of X and Y is positive. Thus, the value is accumulated with the value in the positive register 320. Note that the output of the XOR gate 326 is therefore used to control the demultiplexer 316 and the multiplexer 324, as appropriate.

正レジスタ３２０からの正の結果、およびレジスタ３２２からの負の結果はまた、加算器３２８にも入力され、加算器３２８の出力は、比較器３３０中でしきい値Ｔｈに対してチェックされる。ここが、式（２）の事前チェック条件が決定される場所である。比較器３３０の出力は、浮動小数点ＭＡＣユニット３４０のＥＮに入力として提供され、また、マルチプレクサ３４２に制御信号として提供され、マルチプレクサ３４２は、０である第１の入力と、浮動小数点ＭＡＣユニット３４０の出力である第２の入力とを有する。ＥＮは、比較器３３０の出力が論理１（ＥＮ＝１）のときにだけ浮動小数点ＭＡＣユニット３４０が動作するような、浮動小数点ＭＡＣユニット３４０のための有効化スイッチを表すことに留意されたい。したがって、浮動小数点ＭＡＣユニット３４０が図２の浮動小数点ＭＡＣユニット２００に示されるように構成されていると仮定すると、入力値（すなわち、ＥＡ、ＥＢ、ＭＡ、ＭＢ・・・）は、ＥＮ＝０の場合は入力レジスタ中で更新されないことになる。このようにして、図２のユニット２００の回路は、非アクティブであり前の計算ステータスを維持することになる。 The positive result from the positive register 320 and the negative result from the register 322 are also input to an adder 328, whose output is checked against a threshold Th in a comparator 330. This is where the pre-check condition of equation (2) is determined. The output of the comparator 330 is provided as an input to EN of the floating-point MAC unit 340 and is also provided as a control signal to a multiplexer 342, which has a first input that is 0 and a second input that is the output of the floating-point MAC unit 340. Note that EN represents an enable switch for the floating-point MAC unit 340 such that the floating-point MAC unit 340 operates only when the output of the comparator 330 is a logical 1 (EN=1). Thus, assuming floating-point MAC unit 340 is configured as shown in floating-point MAC unit 200 of FIG. 2, the input values (i.e., EA, EB, MA, MB, ...) will not be updated in the input registers when EN=0. In this manner, the circuitry of unit 200 of FIG. 2 will be inactive and maintain the previous computation status.

マルチプレクサ３４２は、前述のＲｅｌｕ関数を表す。したがって、

がしきい値Ｔｈ未満である場合は、マルチプレクサ３４２の出力は０であり、そうでない場合は、浮動小数点ＭＡＣユニット３４０の出力がマルチプレクサ３４２の出力として選択される。いくつかの実施形態では、しきい値Ｔｈは０とすることができ、他の実施形態では、しきい値Ｔｈは、比較的小さい何らかの許容可能な正の数とすることができることを想起されたい。例にすぎないが、小さい許容可能な正の数は、ドット積の最大値の約１％よりも小さい値であってよい。 Multiplexer 342 represents the Relu function previously discussed.

If Th is less than the threshold Th, then the output of multiplexer 342 is 0; otherwise, the output of floating-point MAC unit 340 is selected as the output of multiplexer 342. Recall that in some embodiments, threshold Th may be 0, while in other embodiments, threshold Th may be some relatively small, allowable positive number. By way of example only, a small, allowable positive number may be a value less than about 1% of the maximum value of the dot product.

追加の実施形態では、エネルギー消費を省くためにしきい値検出器３１０が典型的な供給電圧未満で動作するように、しきい値検出器３１０中の回路は、電圧スケーリング機能付きで実装されることが可能であることを了解されたい。このような電圧スケーリング機能は、電圧スケーリング・コントローラ３４４によって制御される。コントローラ３４４は、必要または所望に依って電圧スケーリング機能を有効化または無効化するように構成される。 It should be appreciated that in additional embodiments, the circuitry in threshold detector 310 can be implemented with a voltage scaling function such that threshold detector 310 operates below a typical supply voltage to conserve energy. Such voltage scaling function is controlled by voltage scaling controller 344. Controller 344 is configured to enable or disable the voltage scaling function as needed or desired.

図４Ａおよび４Ｂは、それぞれ、例証的な実施形態による、電圧スケーリングなしの処理フロー４００および電圧スケーリングありの処理フロー４１０を描いている。処理フロー４００および４１０は、しきい値検出器３１０および浮動小数点ＭＡＣユニット３４０に関連する順次データ（データ１、データ２、データ３、データ４など）の処理に対応する。例えば、浮動小数点ＭＡＣユニット３４０が所与のデータ（データ１）についての計算を実施するとき、しきい値検出器３１０は次のデータ（データ２）についての予測を（しきい値検出に基づいて）実施し、以下同様である。 Figures 4A and 4B depict a process flow 400 without voltage scaling and a process flow 410 with voltage scaling, respectively, according to an illustrative embodiment. Process flows 400 and 410 correspond to the processing of sequential data (data 1, data 2, data 3, data 4, etc.) associated with threshold detector 310 and floating point MAC unit 340. For example, when floating point MAC unit 340 performs a calculation on a given data (data 1), threshold detector 310 performs a prediction (based on threshold detection) for the next data (data 2), and so on.

しきい値検出器３１０によって実施される予測段階計算は比較的単純なので、処理遅延は、ＭＡＣユニットの浮動小数点処理に関連する処理遅延と比較して相対的に小さく、例えば、この相対的な処理時間差は図４Ａで明白である。したがって、予測動作に電圧スケーリングを適用することができ、例えば、電圧スケーリング・コントローラ３４４は、しきい値検出器３１０の回路に供給されるロジック供給電圧ＶＤＤ（動作または基準電圧）のパーセント低減を引き起こす。いくつかの実施形態では、ＶＤＤは、典型的な（通常の）電圧動作レベルの３０％低減される。低減された電圧により、しきい値検出器３１０はそれに比例してより遅く動作する、すなわち処理遅延が増大する。この場合、図４Ｂの処理フロー４１０に描かれるように、浮動小数点計算のための時間期間（時間遅延）が予測段階によって完全に利用されることも可能である。例えば、データ２についての予測のための処理時間は、遅くなって、データ１についての浮動小数点計算のための処理時間のすべてまたはほとんどを占める。 Because the prediction stage calculations performed by the threshold detector 310 are relatively simple, the processing delay is relatively small compared to the processing delay associated with the floating point processing of the MAC unit; for example, this relative processing time difference is evident in FIG. 4A. Thus, voltage scaling can be applied to the prediction operation, e.g., the voltage scaling controller 344 causes a percentage reduction in the logic supply voltage VDD (operating or reference voltage) supplied to the circuitry of the threshold detector 310. In some embodiments, the VDD is reduced by 30% of a typical (normal) voltage operating level. The reduced voltage causes the threshold detector 310 to operate proportionately slower, i.e., the processing delay increases. In this case, it is also possible that the time period (time delay) for the floating point calculations is fully utilized by the prediction stage, as depicted in the process flow 410 of FIG. 4B. For example, the processing time for the prediction for data 2 is slowed down to take up all or most of the processing time for the floating point calculations for data 1.

いくつかの実施形態では、予測動作のためのＶＤＤを低減することによって、オーバヘッドは半分に低減され（エネルギー＝Ｃ＊Ｖ_ＤＤ ^２の式により、０．７＊０．７＝０．４９）、この結果、４５％のエネルギー節約になる。ディープ・ニューラル・ネットワークは複数の（例えば、５０個までの）レイヤからなり、各レイヤは畳込み、オプションのバッチ正規化、およびＲｅｌｕ関数を含むので、このようなエネルギー節約は重要である可能性がある。ＶＤＤのスケーリングは、任意の典型的な電圧制御メカニズムを使用してコントローラ３４４によって適用されることが可能である。 In some embodiments, by reducing the VDD for prediction operations, the overhead is reduced by half (0.7*0.7=0.49, with the formula Energy=C*V _DD ² ), resulting in a 45% energy savings. Such energy savings can be significant, since deep neural networks consist of multiple layers (e.g., up to 50), each layer including convolutions, optional batch normalization, and Relu functions. The scaling of VDD can be applied by controller 344 using any typical voltage control mechanism.

図５は、例証的な実施形態による、浮動小数点ドット積計算のためのしきい値検出の方法５００を描いている。方法５００は、例えば、図３のロジック実装形態３００を含むシステムによって実施されることが可能であることを了解されたい。しかし、代替実施形態では、方法５００は、方法のステップを実施するように構成された他のシステムによって実施されることも可能である。 FIG. 5 illustrates a method 500 of threshold detection for floating-point dot-product calculations, according to an illustrative embodiment. It should be appreciated that the method 500 may be performed, for example, by a system including the logic implementation 300 of FIG. 3. However, in alternative embodiments, the method 500 may be performed by other systems configured to perform the steps of the method.

ステップ５０２で、第１の浮動小数点値および第２の浮動小数点値の指数部の固定小数点合計の間の差を計算する。 In step 502, the difference between the fixed-point sums of the exponents of the first floating-point value and the second floating-point value is calculated.

ステップ５０４で、第１の浮動小数点値および第２の浮動小数点値に従ってドット積演算を実施するように構成された浮動小数点計算ユニットによるドット積演算の完了前に、計算された差に基づいて条件の存在を検出する。 In step 504, the presence of the condition is detected based on the calculated difference prior to completion of the dot product operation by a floating-point computation unit configured to perform the dot product operation according to the first floating-point value and the second floating-point value.

ステップ５０６で、条件の存在の検出に応答して、通常ならドット積演算の一部として実施される計算のサブセットの実施を浮動小数点計算ユニットに回避させる。 In step 506, in response to detecting the existence of the condition, the floating point computation unit is caused to avoid performing a subset of computations that would normally be performed as part of the dot product operation.

図６は、例証的な実施形態による、人工知能システム６００の例示的な実装形態を描いている。図示のように、システム６００は、訓練データ・セット６１０、ニューラル・ネットワーク・モデル６２０、しきい値検出ロジック６３０、浮動小数点ＭＡＣユニット６４０、および整流化線形ユニット６５０を備える。図３に描かれ上述された例示的なハードウェア実装形態に関して、加算器３１２、加算器３１４、デマルチプレクサ３１６、レジスタ３２０、レジスタ３２２、マルチプレクサ３２４、ＸＯＲゲート３２６、加算器３２８、比較器３３０、およびコントローラ３４４は、しきい値検出ロジック６３０の一部として実装されることが可能であることを了解されたい。さらに、浮動小数点ＭＡＣユニット３４０は、浮動小数点ＭＡＣユニット６４０によって実装されることが可能である。またさらに、マルチプレクサ３４２は、整流化線形ユニット６５０によって実装されることが可能である。しきい値検出ロジック６３０、浮動小数点ＭＡＣユニット６４０、および整流化線形ユニット６５０は、訓練データ・セット６１０に基づくニューラル・ネットワーク・モデル６２０の訓練中に計算を実施するのに使用されることを理解されたい。 6 illustrates an exemplary implementation of an artificial intelligence system 600, according to an illustrative embodiment. As shown, the system 600 includes a training data set 610, a neural network model 620, a threshold detection logic 630, a floating-point MAC unit 640, and a rectification linear unit 650. With respect to the exemplary hardware implementation illustrated in FIG. 3 and described above, it should be appreciated that the adder 312, the adder 314, the demultiplexer 316, the register 320, the register 322, the multiplexer 324, the XOR gate 326, the adder 328, the comparator 330, and the controller 344 can be implemented as part of the threshold detection logic 630. Furthermore, the floating-point MAC unit 340 can be implemented by the floating-point MAC unit 640. Furthermore, the multiplexer 342 can be implemented by the rectification linear unit 650. It should be appreciated that the threshold detection logic 630, the floating point MAC unit 640, and the rectified linear unit 650 are used to perform calculations during training of the neural network model 620 based on the training data set 610.

例示的な一実施形態では、人工知能システム６００のしきい値検出ロジック６３０、浮動小数点ＭＡＣユニット６４０、および整流化線形ユニット６５０は、１つまたは複数の特定用途向け集積回路（ＡＳＩＣ）によって実装される。ＡＳＩＣは、特定の目的のために実行可能プログラム・コード（例えば、命令コード、コンピュータ・プログラム・コードなど）でプログラムされるかまたは他の方法で構成されるロジック（例えば、回路、プロセッサ、メモリなど）を備える、特定の目的のためにカスタマイズされる集積回路（ＩＣ）チップまたはデバイスである。この例示的な場合では、特定の目的は、人工知能システム（例えば、機械学習アルゴリズム）の、およびより具体的には、訓練データ・セット６１０を使用したニューラル・ネットワーク・モデル６２０の訓練段階の、実装および実行である。ＡＳＩＣはまた、システム・オン・チップ（ＳｏＣ）とも考えられる。１つまたは複数の例証的な実施形態と共に使用できるいくつかのＡＳＩＣ実装形態は、ユーザ選択可能な基本的なロジック機能（例えば、切替えや比較などの様々な機能を提供するように複数のＶＬＳＩトランジスタ・デバイスで構成されたマルチプレクサや比較器など）のセル・ライブラリを採用して、システムの構成（および再構成）を可能にする。 In an exemplary embodiment, the threshold detection logic 630, the floating point MAC unit 640, and the rectification linear unit 650 of the artificial intelligence system 600 are implemented by one or more application specific integrated circuits (ASICs). An ASIC is an integrated circuit (IC) chip or device that is customized for a specific purpose, comprising logic (e.g., circuits, processors, memory, etc.) that is programmed or otherwise configured with executable program code (e.g., instruction code, computer program code, etc.) for a specific purpose. In this exemplary case, the specific purpose is the implementation and execution of an artificial intelligence system (e.g., a machine learning algorithm), and more specifically, the training phase of the neural network model 620 using the training data set 610. An ASIC may also be considered a system on chip (SoC). Some ASIC implementations that can be used with one or more illustrative embodiments employ cell libraries of user-selectable basic logic functions (e.g., multiplexers and comparators constructed from multiple VLSI transistor devices to provide various functions such as switching and comparison) to enable configuration (and reconfiguration) of the system.

人工知能システム６００およびその一部は、１つまたは複数のマルチコア中央処理装置（ＣＰＵ）、１つまたは複数のグラフィックス・プロセッシング・ユニット（ＧＰＵ）、および１つまたは複数のフィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ）を含む技術など、代替的な回路／プロセッサベースの技術において実現されることが可能であることも、さらに了解されたい。いくつかの実施形態では、人工知能システム６００は、２つ以上の回路／プロセッサベースの技術（例えば、ＡＳＩＣ、ＣＰＵ、ＧＰＵ、ＦＰＧＡなど）の組合せとして実装されることが可能である。 It is further understood that artificial intelligence system 600, and portions thereof, may be realized in alternative circuit/processor-based technologies, such as technologies including one or more multi-core central processing units (CPUs), one or more graphics processing units (GPUs), and one or more field programmable gate arrays (FPGAs). In some embodiments, artificial intelligence system 600 may be implemented as a combination of two or more circuit/processor-based technologies (e.g., ASICs, CPUs, GPUs, FPGAs, etc.).

図１～６に描かれた技術はまた、本明細書に記載のように、システムを提供することを含むことができ、このシステムは個別ソフトウェア・モジュールを含み、個別ソフトウェア・モジュールの各々は、有形のコンピュータ可読記録可能記憶媒体に組み入れられる。モジュールのすべて（もしくはその任意のサブセット）が同じ媒体上にあることも可能であり、または、例えば各モジュールが異なる媒体上にあることも可能である。モジュールは、図に示されるかまたは本明細書で記述されるかあるいはその両方である構成要素のいずれかまたはすべてを含むことができる。本発明の一実施形態では、モジュールは、例えば、ハードウェア・プロセッサ上で稼働することができる。この場合、方法ステップは、ハードウェア・プロセッサ上で実行される前述のようなシステムの個別ソフトウェア・モジュールを使用して実施されることが可能である。さらに、コンピュータ・プログラム製品が、システムに個別ソフトウェア・モジュールをプロビジョニングすることを含めた本明細書に記載の少なくとも１つの方法ステップを実施するために実行されるように適応されたコードを有する有形のコンピュータ可読記録可能記憶媒体を含むことができる。 The techniques depicted in Figures 1-6 may also include providing a system, as described herein, that includes individual software modules, each of which is embodied in a tangible computer-readable recordable storage medium. All of the modules (or any subset thereof) may be on the same medium, or, for example, each module may be on a different medium. The modules may include any or all of the components shown in the figures and/or described herein. In one embodiment of the invention, the modules may run on, for example, a hardware processor. In this case, method steps may be implemented using individual software modules of such a system running on the hardware processor. Additionally, a computer program product may include a tangible computer-readable recordable storage medium having code adapted to be executed to implement at least one method step described herein, including provisioning the individual software modules to the system.

加えて、図１～６に描かれた技術は、データ処理システム中のコンピュータ可読記憶媒体に記憶されたコンピュータ使用可能プログラム・コードを含むことが可能なコンピュータ・プログラム製品を介して実装されることが可能であり、コンピュータ使用可能プログラム・コードは、ネットワークを介してリモート・データ処理システムからダウンロードされたものである。また、本発明の一実施形態では、コンピュータ・プログラム製品は、サーバ・データ処理システム中のコンピュータ可読記憶媒体に記憶されたコンピュータ使用可能プログラム・コードを含むことができ、コンピュータ使用可能プログラム・コードは、リモート・システムを用いてコンピュータ可読記憶媒体中で使用されるために、ネットワークを介してリモート・データ処理システムにダウンロードされる。 In addition, the techniques depicted in Figures 1-6 may be implemented via a computer program product that may include computer usable program code stored on a computer readable storage medium in a data processing system, the computer usable program code being downloaded over a network from a remote data processing system. Also, in one embodiment of the present invention, the computer program product may include computer usable program code stored on a computer readable storage medium in a server data processing system, the computer usable program code being downloaded over a network to a remote data processing system for use in the computer readable storage medium with the remote system.

本発明またはその要素の一実施形態は、メモリと、メモリに結合され例示的な方法ステップを実施するように構成された少なくとも１つのプロセッサとを備える装置の形で実装されることが可能である。 An embodiment of the present invention or elements thereof may be implemented in the form of an apparatus comprising a memory and at least one processor coupled to the memory and configured to perform exemplary method steps.

加えて、本発明の一実施形態は、コンピュータまたはワークステーション上で稼働するソフトウェアを利用することができる。図７を参照すると、このような実装形態は、例えば、プロセッサ７０２と、メモリ７０４と、ディスプレイ７０６およびキーボード７０８によって例えば形成される入出力インタフェースとを採用し得る。本明細書における用語「プロセッサ」は、例えば、マルチコアＣＰＵ、ＧＰＵ、ＦＰＧＡ、または、１つもしくは複数のＡＳＩＣなど他の形の処理回路、あるいはその組合せを含むものなど、任意の処理デバイスを含むものとする。さらに、用語「プロセッサ」は、１つよりも多い個々のプロセッサを指すこともある。用語「メモリ」は、例えば、ＲＡＭ（ランダム・アクセス・メモリ）、ＲＯＭ（読取専用メモリ）、固定メモリ・デバイス（例えば、ハード・ドライブ）、取外し可能メモリ・デバイス（例えば、ディスケット）、フラッシュ・メモリなど、プロセッサ（例えば、ＣＰＵ、ＧＰＵ、ＦＰＧＡ、ＡＳＩＣなど）に関連付けられるメモリを含むものとする。加えて、本明細書における文言「入出力インタフェース」は、例えば、データを処理ユニットに入力するためのメカニズム（例えば、マウス）、および、処理ユニットに関連する結果を提供するためのメカニズム（例えば、プリンタ）を含むものとする。プロセッサ７０２と、メモリ７０４と、ディスプレイ７０６やキーボード７０８などの入出力インタフェースとは、例えばバス７１０を介して、データ処理ユニット７１２の一部として相互接続されることが可能である。バス７１０を例えば介した適切な相互接続はまた、コンピュータ・ネットワークとインタフェースするために提供されることが可能なネットワーク・カードなどのネットワーク・インタフェース７１４にも提供されることが可能であり、また、メディア７１８とインタフェースするために提供されることが可能なディスケットやＣＤ－ＲＯＭドライブなどのメディア・インタフェース７１６にも提供されることが可能である。 In addition, an embodiment of the present invention may utilize software running on a computer or workstation. With reference to FIG. 7, such an implementation may employ, for example, a processor 702, memory 704, and an input/output interface formed, for example, by a display 706 and a keyboard 708. The term "processor" herein includes any processing device, such as, for example, a multi-core CPU, a GPU, an FPGA, or other forms of processing circuitry, such as one or more ASICs, or a combination thereof. Furthermore, the term "processor" may refer to more than one individual processor. The term "memory" includes memory associated with a processor (e.g., CPU, GPU, FPGA, ASIC, etc.), such as, for example, RAM (random access memory), ROM (read only memory), fixed memory devices (e.g., hard drives), removable memory devices (e.g., diskettes), flash memory, etc. Additionally, the term "input/output interface" as used herein includes, for example, mechanisms for inputting data into the processing unit (e.g., a mouse) and mechanisms for providing results related to the processing unit (e.g., a printer). The processor 702, memory 704, and input/output interfaces such as the display 706 and keyboard 708 may be interconnected, for example, via a bus 710, as part of a data processing unit 712. Suitable interconnections, for example, via the bus 710, may also be provided for a network interface 714, such as a network card, which may be provided for interfacing with a computer network, and for a media interface 716, such as a diskette or CD-ROM drive, which may be provided for interfacing with media 718.

したがって、本明細書に記載のような本発明の方法を実施するための命令またはコードを含むコンピュータ・ソフトウェアが、関連するメモリ・デバイス（例えば、ＲＯＭ、固定または取外し可能メモリ）に記憶されてよく、利用される準備ができたときに部分的または全体的に（例えば、ＲＡＭに）ロードされてＣＰＵによって実装され得る。このようなソフトウェアは、次のものに限定されないが、ファームウェア、常駐ソフトウェア、マイクロコードなどを含むことができる。 Thus, computer software containing instructions or code for carrying out the methods of the present invention as described herein may be stored in associated memory devices (e.g., ROM, fixed or removable memory) and may be loaded partially or entirely (e.g., into RAM) and implemented by the CPU when ready for use. Such software may include, but is not limited to, firmware, resident software, microcode, etc.

プログラム・コードを記憶するかまたは実行するかあるいはその両方を行うのに適したデータ処理システムが、システム・バス７１０を通してメモリ要素７０４に直接または間接的に結合された少なくとも１つのプロセッサ７０２を備えることになる。メモリ要素は、プログラム・コードの実際の実装中に利用されるローカル・メモリと、バルク・ストレージと、実装中にコードがバルク・ストレージから取り出されなければならない回数を減らすために少なくともいくらかのプログラム・コードの一時記憶域を提供するキャッシュ・メモリとを含むことができる。 A data processing system suitable for storing and/or executing program code will comprise at least one processor 702 coupled directly or indirectly to memory elements 704 through a system bus 710. The memory elements may include local memory utilized during the actual implementation of the program code, bulk storage, and cache memory that provides temporary storage of at least some of the program code to reduce the number of times the code must be retrieved from bulk storage during implementation.

入出力またはＩ／Ｏデバイス（次のものに限定されないが、キーボード７０８、ディスプレイ７０６、ポインティング・デバイスなどを含む）が、直接に（バス７１０などを介して）、あるいは介在するＩ／Ｏコントローラ（明確にするために省略されている）を通して、システムに結合されることが可能である。 Input/output or I/O devices (including but not limited to a keyboard 708, a display 706, a pointing device, etc.) may be coupled to the system either directly (such as via a bus 710) or through an intervening I/O controller (omitted for clarity).

介在するプライベートまたはパブリック・ネットワークを通してデータ処理システムが他のデータ処理システムまたはリモートのプリンタもしくは記憶デバイスに結合されるのを可能にするために、ネットワーク・インタフェース７１４などのネットワーク・アダプタがシステムに結合されてもよい。モデム、ケーブル・モデム、およびイーサネット（Ｒ）カードは、現在利用可能なタイプのネットワーク・アダプタのほんの少数である。 Network adapters, such as network interface 714, may be coupled to the system to enable the data processing system to be coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems, and Ethernet cards are just a few of the currently available types of network adapters.

特許請求の範囲を含めた本明細書において、「サーバ」は、サーバ・プログラムを実行する物理的なデータ処理システム（例えば、図７に示されるシステム７１２）を含む。このような物理的サーバがディスプレイおよびキーボードを備える場合とそうでない場合とがあることは理解されるであろう。 As used herein, including the claims, a "server" includes a physical data processing system (e.g., system 712 shown in FIG. 7) that executes a server program. It will be understood that such a physical server may or may not include a display and keyboard.

本発明は、任意の可能な技術的詳細の統合レベルにおける、システム、方法、またはコンピュータ・プログラム製品、あるいはその組合せであり得る。コンピュータ・プログラム製品は、本発明の態様をプロセッサに実施させるためのコンピュータ可読プログラム命令を有するコンピュータ可読記憶媒体（１つまたは複数）を含み得る。 The present invention may be a system, method, or computer program product, or a combination thereof, at any possible level of integration of technical detail. The computer program product may include a computer-readable storage medium or media having computer-readable program instructions for causing a processor to implement aspects of the present invention.

コンピュータ可読記憶媒体は、命令実行デバイスによって使用されるための命令を保持および記憶できる有形デバイスとすることができる。コンピュータ可読記憶媒体は、例えば、次のものに限定されないが、電子記憶デバイス、磁気記憶デバイス、光学記憶デバイス、電磁記憶デバイス、半導体記憶デバイス、またはこれらの任意の適切な組合せであり得る。コンピュータ可読記憶媒体のより具体的な例の非網羅的なリストは、ポータブル・コンピュータ・ディスケット、ハード・ディスク、ランダム・アクセス・メモリ（ＲＡＭ）、読取専用メモリ（ＲＯＭ）、消去可能プログラム可能な読取専用メモリ（ＥＰＲＯＭまたはフラッシュ・メモリ）、スタティック・ランダム・アクセス・メモリ（ＳＲＡＭ）、ポータブル・コンパクト・ディスク読取専用メモリ（ＣＤ－ＲＯＭ）、デジタル・バーサタイル・ディスク（ＤＶＤ）、メモリ・スティック、フロッピー（Ｒ）・ディスク、命令が記録されたパンチ・カードや溝の中の隆起構造など機械的にエンコードされるデバイス、およびこれらの任意の適切な組合せを含む。本明細書におけるコンピュータ可読記憶媒体は、電波もしくは他の自由伝搬する電磁波、導波路もしくは他の伝送媒体を通して伝搬する電磁波（例えば、光ファイバケーブルを通る光パルス）、またはワイヤを通して伝送される電気信号など、本質的に一過性の信号であると解釈されるべきではない。 A computer-readable storage medium may be a tangible device capable of holding and storing instructions for use by an instruction execution device. A computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. A non-exhaustive list of more specific examples of computer-readable storage media includes portable computer diskettes, hard disks, random access memories (RAMs), read-only memories (ROMs), erasable programmable read-only memories (EPROMs or flash memories), static random access memories (SRAMs), portable compact disk read-only memories (CD-ROMs), digital versatile disks (DVDs), memory sticks, floppy disks, mechanically encoded devices such as punch cards or ridges in grooves on which instructions are recorded, and any suitable combination thereof. Computer-readable storage media in this specification should not be construed as being signals that are inherently ephemeral, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., light pulses through a fiber optic cable), or electrical signals transmitted through wires.

本明細書に記載のコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体からそれぞれのコンピューティング／処理デバイスにダウンロードされることが可能であり、あるいは、ネットワークを介して、例えばインターネット、ローカル・エリア・ネットワーク、ワイド・エリア・ネットワーク、もしくはワイヤレス・ネットワーク、またはその組合せを介して外部コンピュータまたは外部記憶デバイスにダウンロードされることが可能である。ネットワークは、銅伝送ケーブル、光伝送ファイバ、ワイヤレス伝送、ルータ、ファイアウォール、スイッチ、ゲートウェイ・コンピュータ、またはエッジ・サーバ、あるいはその組合せを含み得る。各コンピューティング／処理デバイス中のネットワーク・アダプタ・カードまたはネットワーク・インタフェースが、ネットワークからコンピュータ可読プログラム命令を受け取り、これらのコンピュータ可読プログラム命令を、それぞれのコンピューティング／処理デバイス内のコンピュータ可読記憶媒体に記憶されるように転送する。 The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to the respective computing/processing device or may be downloaded to an external computer or storage device via a network, such as the Internet, a local area network, a wide area network, or a wireless network, or a combination thereof. The network may include copper transmission cables, optical transmission fiber, wireless transmission, routers, firewalls, switches, gateway computers, or edge servers, or a combination thereof. A network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and transfers the computer-readable program instructions to be stored in a computer-readable storage medium within the respective computing/processing device.

本発明の動作を実施するためのコンピュータ可読プログラム命令は、アセンブラ命令、命令セット・アーキテクチャ（ＩＳＡ）命令、機械命令、機械依存命令、マイクロコード、ファームウェア命令、状態設定データ、集積回路のための構成データであり得、または、Ｓｍａｌｌｔａｌｋ（Ｒ）やＣ＋＋などのオブジェクト指向プログラミング言語と、「Ｃ」プログラミング言語や類似のプログラミング言語などの手続き型プログラミング言語とを含む１つもしくは複数のプログラミング言語の任意の組合せで書かれたソース・コードもしくはオブジェクト・コードであり得る。コンピュータ可読プログラム命令は、全体がユーザのコンピュータ上で実行されるか、一部がユーザのコンピュータ上で実行されるか、スタンドアロン・ソフトウェア・パッケージとして実行されるか、一部はユーザのコンピュータ上で実行され一部はリモート・コンピュータ上で実行されるか、または全体がリモート・コンピュータもしくはサーバ上で実行され得る。後者のシナリオでは、リモート・コンピュータは、ローカル・エリア・ネットワーク（ＬＡＮ）もしくはワイド・エリア・ネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを通してユーザのコンピュータに接続されてよく、または、外部コンピュータへの接続が（例えば、インターネット・サービス・プロバイダを使用してインターネットを通して）行われてよい。いくつかの実施形態では、プログラム可能ロジック回路、フィールド・プログラマブル・ゲート・アレイ（ＦＰＧＡ）、またはプログラマブル・ロジック・アレイ（ＰＬＡ）を例えば含む電子回路が、本発明の態様を実施するために、コンピュータ可読プログラム命令の状態情報を利用してコンピュータ可読プログラム命令を実行して、電子回路を個人化し得る。 The computer readable program instructions for carrying out the operations of the present invention may be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, configuration data for an integrated circuit, or may be source or object code written in any combination of one or more programming languages, including object oriented programming languages such as Smalltalk® or C++, and procedural programming languages such as the “C” programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or a connection to an external computer may be made (e.g., through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry, including for example programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), may utilize state information of the computer readable program instructions to execute computer readable program instructions to personalize the electronic circuitry to implement aspects of the present invention.

本明細書では、本発明の態様が、本発明の実施形態による方法、装置（システム）、およびコンピュータ・プログラム製品のフローチャート例証またはブロック図あるいはその両方を参照しながら記述されている。フローチャート例証またはブロック図あるいはその両方の各ブロック、およびフローチャート例証またはブロック図あるいはその両方の中のブロックの組合せがコンピュータ可読プログラム命令によって実装されることが可能であることは、理解されるであろう。 Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

これらのコンピュータ可読プログラム命令は、コンピュータまたは他のプログラム可能データ処理装置のプロセッサを介して実行される命令が、フローチャートまたはブロック図あるいはその両方の１つまたは複数のブロック中で指定される機能／行為を実装する手段を生み出すように、コンピュータまたは他のプログラム可能データ処理装置のプロセッサに提供されてマシンを作り出すものであってよい。これらのコンピュータ可読プログラム命令はまた、命令が記憶されたコンピュータ可読記憶媒体が、フローチャートまたはブロック図あるいはその両方の１つまたは複数のブロック中で指定される機能／行為の態様を実装する命令を含む製造品を構成するように、コンピュータ、プログラム可能データ処理装置、または他のデバイス、あるいはその組合せに特定の方式で機能するよう指示することができるコンピュータ可読記憶媒体に記憶され得る。 These computer-readable program instructions may be provided to a processor of a computer or other programmable data processing apparatus to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams. These computer-readable program instructions may also be stored in a computer-readable storage medium capable of directing a computer, programmable data processing apparatus, or other device, or combination thereof, to function in a particular manner, such that the computer-readable storage medium on which the instructions are stored constitutes an article of manufacture including instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.

コンピュータ可読プログラム命令はまた、コンピュータ、他のプログラム可能装置、または他のデバイス上で実行される命令が、フローチャートまたはブロック図あるいはその両方の１つまたは複数のブロック中で指定される機能／行為を実装するように、コンピュータ、他のプログラム可能データ処理装置、または他のデバイスにロードされて、コンピュータ、他のプログラム可能装置、または他のデバイス上で一連の動作ステップを実施させてコンピュータ実装プロセスを作り出すものであってよい。 The computer-readable program instructions may also be loaded into a computer, other programmable data processing apparatus, or other device, such that the instructions, which execute on the computer, other programmable apparatus, or other device, implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams, causing the computer, other programmable apparatus, or other device to perform a series of operational steps to create a computer-implemented process.

図中のフローチャートおよびブロック図は、本発明の様々な実施形態によるシステム、方法、およびコンピュータ・プログラム製品の可能な実装形態のアーキテクチャ、機能、および動作を例証する。これに関し、フローチャートまたはブロック図中の各ブロックは、命令のモジュール、セグメント、または部分を表すことがあり、これは、指定される論理機能を実装するための１つまたは複数の実行可能命令を含む。いくつかの代替実装形態では、ブロック中に記された機能は、図中に記された順序以外の順序で生じることがある。例えば、連続的に示される２つのブロックが、実際には、同時に実行されるかほぼ同時に実行されるか一部もしくは全部が一時的に重複する方式で実行される１つのステップとして達成されることがあり、または、ブロックは、関係する機能に依って逆の順序で実行されることも時としてある。また、ブロック図またはフローチャート例証あるいはその両方の各ブロック、およびブロック図またはフローチャート例証あるいはその両方におけるブロックの組合せは、指定される機能または行為を実施するかまたは専用ハードウェアおよびコンピュータ命令の組合せを実施する、専用ハードウェアベースのシステムによって実装されることが可能であることにも留意されるであろう。 The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of instructions, which includes one or more executable instructions for implementing the specified logical function. In some alternative implementations, the functions noted in the blocks may occur in an order other than that noted in the figures. For example, two blocks shown in succession may actually be accomplished as a single step that is performed simultaneously or nearly simultaneously or in a partially or fully overlapping manner, or the blocks may sometimes be performed in reverse order depending on the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by a dedicated hardware-based system that performs the specified functions or acts or a combination of dedicated hardware and computer instructions.

本明細書に記載の方法はいずれも、コンピュータ可読記憶媒体に組み入れられた個別ソフトウェア・モジュールを備えるシステムを提供する追加のステップを含むことができることに留意されたい。モジュールは、例えば、本明細書で詳述される構成要素のいずれかまたはすべてを含むことができる。この場合、方法ステップは、ハードウェア・プロセッサ７０２上で実行される前述のようなシステムの個別ソフトウェア・モジュールまたはサブモジュールあるいはその両方を使用して実施されることが可能である。さらに、コンピュータ・プログラム製品が、システムに個別ソフトウェア・モジュールをプロビジョニングすることを含めた本明細書に記載の少なくとも１つの方法ステップを実施するために実装されるように適応されたコードを有するコンピュータ可読記憶媒体を含むことができる。 It should be noted that any of the methods described herein may include the additional step of providing a system with individual software modules embodied in a computer-readable storage medium. The modules may include, for example, any or all of the components detailed herein. In this case, the method steps may be performed using individual software modules and/or sub-modules of such a system running on the hardware processor 702. Additionally, a computer program product may include a computer-readable storage medium having code adapted to be implemented to perform at least one method step described herein, including provisioning the system with the individual software modules.

いずれの場合でも、本明細書で例証される構成要素は、様々な形のハードウェア、ソフトウェア、またはこれらの組合せにおいて、例えば、特定用途向け集積回路（ＡＳＩＣ）、機能的回路、関連するメモリを備える適切にプログラムされたデジタル・コンピュータなどおいて実装されることを理解されたい。本明細書で提供される本発明の教示が与えられれば、関連技術の当業者なら本発明の構成要素の他の実装形態を企図することができるであろう。 In any case, it should be understood that the components illustrated herein may be implemented in various forms of hardware, software, or combinations thereof, such as, for example, application specific integrated circuits (ASICs), functional circuits, appropriately programmed digital computers with associated memory, etc. Given the teachings of the invention provided herein, one of ordinary skill in the relevant art will be able to contemplate other implementations of the components of the invention.

本開示はクラウド・コンピューティングに関する詳細な記述を含むが、本明細書で具陳される教示の実装はクラウド・コンピューティング環境に限定されないことを理解されたい。そうではなく、本発明の実施形態は、現在知られているかまたは後に開発される他の任意のタイプのコンピューティング環境と共に実装されることが可能である。 Although this disclosure includes detailed descriptions of cloud computing, it should be understood that implementation of the teachings presented herein is not limited to a cloud computing environment. Rather, embodiments of the present invention may be implemented in conjunction with any other type of computing environment now known or later developed.

クラウド・コンピューティングは、最小限の管理労力でまたはサービスのプロバイダとの最小限の対話で、迅速にプロビジョニングおよび解放されることが可能な、構成可能なコンピューティング・リソース（例えば、ネットワーク、ネットワーク帯域幅、サーバ、処理、メモリ、ストレージ、アプリケーション、仮想マシン、およびサービス）の共有プールへの、好都合なオンデマンド・ネットワーク・アクセスを可能にするためのサービス送達モデルである。このクラウド・モデルは、少なくとも５つの特性と、少なくとも３つのサービス・モデルと、少なくとも４つの展開モデルとを含み得る。 Cloud computing is a service delivery model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal administrative effort or interaction with the service provider. The cloud model may include at least five characteristics, at least three service models, and at least four deployment models.

特性は次のとおりである。 The characteristics are as follows:

オンデマンド・セルフサービス：クラウド消費者は、サービスのプロバイダとの人間による対話を必要とすることなく自動的に、サーバ時間やネットワーク・ストレージなどのコンピューティング能力を必要に依って単方向的にプロビジョニングすることができる。 On-demand self-service: Cloud consumers can unilaterally provision computing capacity, such as server time and network storage, on demand, automatically, without the need for human interaction with the service provider.

広範なネットワーク・アクセス：能力は、ネットワークを介して利用可能であり、ヘテロジニアスなシンまたはシック・クライアント・プラットフォーム（例えば、モバイル・フォン、ラップトップ、およびＰＤＡ）による使用を促進する標準的なメカニズムを通してアクセスされる。 Pervasive network access: Capabilities are available over the network and accessed through standard mechanisms that facilitate use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).

リソース・プーリング：プロバイダのコンピューティング・リソースは、マルチテナント・モデルを使用して複数の消費者に役立つようにプールされ、需要に応じて異なる物理および仮想リソースが動的に割当ておよび再割当てされる。消費者は一般に、提供されるリソースの正確な場所に対してどんな制御も知識も有さないが、より高い抽象化レベルの場所（例えば、国、州、またはデータセンタ）を指定できることがあるという点で、場所自立感がある。 Resource Pooling: A provider's computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically allocated and reallocated depending on demand. Consumers generally do not have any control or knowledge over the exact location of the resources provided, although there is a sense of location independence in that they may be able to specify a higher level of abstraction location (e.g., country, state, or datacenter).

迅速な弾力性：能力は、迅速かつ弾力的に、場合によっては自動的にプロビジョニングされて素早くスケールアウトし、迅速に解放されて素早くスケールインすることができる。消費者にとっては、プロビジョニングに利用可能な能力はしばしば無限であるように見え、いつでもどんな量でも購入することができる。 Rapid Elasticity: Capacity can be rapidly and elastically, in some cases automatically, provisioned to quickly scale out and released to quickly scale in. To the consumer, the capacity available for provisioning often appears infinite and can be purchased in any amount at any time.

測定されるサービス：クラウド・システムは、サービスのタイプ（例えば、ストレージ、処理、帯域幅、およびアクティブ・ユーザ・アカウント）にとって適切な何らかの抽象化レベルの計量能力を活用することによって、リソース使用を自動的に制御および最適化する。リソース使用は、監視、制御、および報告されることが可能であり、利用されるサービスのプロバイダと消費者との両方にとっての透明性を提供する。 Measured services: Cloud systems automatically control and optimize resource usage by leveraging metering capabilities at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency to both providers and consumers of the services being utilized.

サービス・モデルは、次のとおりである。 The service model is as follows:

ソフトウェア・アズ・ア・サービス（ＳａａＳ）：消費者に提供される能力は、クラウド・インフラストラクチャ上で稼働するプロバイダのアプリケーションを使用することである。アプリケーションには、ウェブ・ブラウザなどのシン・クライアント・インタフェースを通して様々なクライアント・デバイスからアクセス可能である（例えば、ウェブベースの電子メール）。消費者は、ネットワーク、サーバ、オペレーティング・システム、ストレージ、さらには個々のアプリケーション能力を含めて、基礎をなすクラウド・インフラストラクチャを管理することや制御することはないが、限られたユーザ特有アプリケーション構成設定については例外である可能性がある。 Software as a Service (SaaS): The consumer is offered the ability to use the provider's applications running on a cloud infrastructure. The applications are accessible from a variety of client devices through thin client interfaces such as web browsers (e.g., web-based email). The consumer does not manage or control the underlying cloud infrastructure, including the network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings.

プラットフォーム・アズ・ア・サービス（ＰａａＳ）：消費者に提供される能力は、プロバイダによってサポートされるプログラミング言語およびツールを使用して作成された、消費者によって作成または取得されたアプリケーションを、クラウド・インフラストラクチャ上に展開することである。消費者は、ネットワーク、サーバ、オペレーティング・システム、またはストレージを含めて、基礎をなすクラウド・インフラストラクチャを管理することや制御することはないが、展開されたアプリケーションに対する、および場合によってはアプリケーション・ホスティング環境構成に対する制御を有する。 Platform as a Service (PaaS): The capability offered to the consumer is to deploy applications created or acquired by the consumer, written using programming languages and tools supported by the provider, onto a cloud infrastructure. The consumer does not manage or control the underlying cloud infrastructure, including the network, servers, operating systems, or storage, but does have control over the deployed applications and, in some cases, over the application hosting environment configuration.

インフラストラクチャ・アズ・ア・サービス（ＩａａＳ）：消費者に提供される能力は、処理、ストレージ、ネットワーク、および他の基礎的なコンピューティング・リソースをプロビジョニングすることであり、消費者は、オペレーティング・システムおよびアプリケーションを含む可能性のある自由裁量によるソフトウェアを展開および実行することができる。消費者は、基礎をなすクラウド・インフラストラクチャを管理することや制御することはないが、オペレーティング・システム、ストレージ、展開されたアプリケーションに対する制御を有し、場合によっては、限られたネットワーキング構成要素（例えば、ホスト・ファイアウォール）に対する限定的な制御を有する。 Infrastructure as a Service (IaaS): The capability offered to the consumer is to provision processing, storage, network, and other underlying computing resources onto which the consumer can deploy and run software of their choice, which may include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure, but has control over the operating systems, storage, deployed applications, and in some cases, limited control over limited networking components (e.g., host firewalls).

展開モデルは次のとおりである。 The deployment models are as follows:

プライベート・クラウド：クラウド・インフラストラクチャは、もっぱら組織のために運営される。これは、組織または第三者によって管理されることがあり、オンプレミスまたはオフプレミスに存在することがある。 Private Cloud: The cloud infrastructure is operated solely for the organization. It may be managed by the organization or a third party and may exist on-premise or off-premise.

コミュニティ・クラウド：クラウド・インフラストラクチャは、いくつかの組織によって共有され、共有の関心事（例えば、ミッション、セキュリティ要件、ポリシ、およびコンプライアンス考慮事項）を有する具体的なコミュニティをサポートする。これは、組織または第三者によって管理されることがあり、オンプレミスまたはオフプレミスに存在することがある。 Community Cloud: The cloud infrastructure is shared by several organizations to support a specific community with shared concerns (e.g., mission, security requirements, policies, and compliance considerations). It may be managed by the organization or a third party and may reside on-premise or off-premise.

パブリック・クラウド：クラウド・インフラストラクチャは、一般大衆または大規模な産業グループに利用可能にされ、クラウド・サービスを販売する組織によって所有される。 Public cloud: The cloud infrastructure is made available to the general public or large industry groups and is owned by an organization that sells cloud services.

ハイブリッド・クラウド：クラウド・インフラストラクチャは、２つ以上のクラウド（プライベート、コミュニティ、またはパブリック）の合成であり、２つ以上のクラウドは、固有のエンティティであり続けるが、データおよびアプリケーションの可搬性を可能にする標準化されたまたはプロプラエタリな技術（例えば、クラウド間の負荷均衡化のためのクラウド・バースティング）によってまとめられる。 Hybrid Cloud: A cloud infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are held together by standardized or proprietary technologies that allow portability of data and applications (e.g., cloud bursting for load balancing between clouds).

クラウド・コンピューティング環境はサービス指向であり、その焦点は、ステートレス性、低結合、モジュール性、およびセマンティック相互運用性に合わせられる。クラウド・コンピューティングの核心にあるのは、相互接続されたノードのネットワークを含むインフラストラクチャである。 Cloud computing environments are service-oriented and focus on statelessness, low coupling, modularity, and semantic interoperability. At the heart of cloud computing is an infrastructure that includes a network of interconnected nodes.

次に図８を参照すると、例証的なクラウド・コンピューティング環境８５０が描かれている。図示のように、クラウド・コンピューティング環境８５０は、１つまたは複数のクラウド・コンピューティング・ノード８１０を含み、これらのノード８１０により、クラウド消費者によって使用されるローカル・コンピューティング・デバイス、例えば、パーソナル・デジタル・アシスタント（ＰＤＡ）もしくはセルラー電話機８５４Ａ、デスクトップ・コンピュータ８５４Ｂ、ラップトップ・コンピュータ８５４Ｃ、または自動車コンピュータ・システム８５４Ｎ、あるいはその組合せなどが通信し得る。ノード８１０は、相互と通信し得る。これらは、前述のようなプライベート、コミュニティ、パブリック、もしくはハイブリッド・クラウド、またはこれらの組合せなど、１つまたは複数のネットワーク中で、物理的または仮想的にグループ化されることがある（図示せず）。これにより、クラウド・コンピューティング環境８５０は、クラウド消費者がローカル・コンピューティング・デバイス上にリソースを維持する必要のないサービスとして、インフラストラクチャ、プラットフォーム、またはソフトウェア、あるいはその組合せを提供することができる。図８に示されるコンピューティング・デバイス８５４Ａ～Ｎのタイプは例証にすぎないものとすること、ならびに、コンピューティング・ノード８１０およびクラウド・コンピューティング環境８５０は任意のタイプのネットワークまたは（例えば、ウェブ・ブラウザを使用する）ネットワーク・アドレス指定可能接続あるいはその両方を介して任意のタイプのコンピュータ化デバイスと通信できることが理解される。 8, an illustrative cloud computing environment 850 is depicted. As shown, the cloud computing environment 850 includes one or more cloud computing nodes 810 with which local computing devices used by cloud consumers, such as a personal digital assistant (PDA) or cellular phone 854A, a desktop computer 854B, a laptop computer 854C, or an automobile computer system 854N, or combinations thereof, may communicate. The nodes 810 may communicate with each other. They may be physically or virtually grouped in one or more networks, such as a private, community, public, or hybrid cloud, or combinations thereof, as previously described (not shown). This allows the cloud computing environment 850 to provide infrastructure, platform, and/or software as a service without the cloud consumer having to maintain resources on a local computing device. It is understood that the types of computing devices 854A-N shown in FIG. 8 are intended to be illustrative only, and that the computing nodes 810 and cloud computing environment 850 can communicate with any type of computerized device via any type of network and/or network-addressable connections (e.g., using a web browser).

次に図９を参照すると、クラウド・コンピューティング環境８５０（図８）によって提供される機能的抽象化レイヤのセットが示されている。図９に示される構成要素、レイヤ、および機能は例証にすぎないものとし、本発明の実施形態はこれに限定されないことを、あらかじめ理解されたい。描かれているように、以下のレイヤおよび対応する機能が提供される。 Referring now to FIG. 9, a set of functional abstraction layers provided by cloud computing environment 850 (FIG. 8) is shown. It should be understood in advance that the components, layers, and functions shown in FIG. 9 are intended to be illustrative only, and embodiments of the present invention are not limited thereto. As depicted, the following layers and corresponding functions are provided:

ハードウェアおよびソフトウェア・レイヤ９６０は、ハードウェアおよびソフトウェア構成要素を含む。ハードウェア構成要素の例は、メインフレーム９６１、ＲＩＳＣ（縮小命令セット・コンピュータ）アーキテクチャベースのサーバ９６２、サーバ９６３、ブレード・サーバ９６４、記憶デバイス９６５、ならびにネットワークおよびネットワーキング構成要素９６６を含む。いくつかの実施形態では、ソフトウェア構成要素は、ネットワーク・アプリケーション・サーバ・ソフトウェア９６７およびデータベース・ソフトウェア９６８を含む。 Hardware and software layer 960 includes hardware and software components. Examples of hardware components include mainframes 961, RISC (reduced instruction set computing) architecture-based servers 962, servers 963, blade servers 964, storage devices 965, and networks and networking components 966. In some embodiments, software components include network application server software 967 and database software 968.

仮想化レイヤ９７０は抽象化レイヤを提供し、この抽象化レイヤから、仮想エンティティの以下の例が提供され得る。すなわち、仮想サーバ９７１、仮想ストレージ９７２、仮想プライベート・ネットワークを含む仮想ネットワーク９７３、仮想アプリケーションおよびオペレーティング・システム９７４、ならびに仮想クライアント９７５である。一例では、管理レイヤ９８０は、以下に述べる機能を提供し得る。リソース・プロビジョニング９８１は、クラウド・コンピューティング環境内でタスクを実施するのに利用されるコンピューティング・リソースおよび他のリソースの動的調達を提供する。計量および価格設定９８２は、リソースがクラウド・コンピューティング環境内で利用されるのに伴うコスト追跡と、これらのリソースの消費に対する料金請求または送り状送付とを提供する。 The virtualization layer 970 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers 971, virtual storage 972, virtual networks including virtual private networks 973, virtual applications and operating systems 974, and virtual clients 975. In one example, the management layer 980 may provide the following functions: Resource provisioning 981 provides dynamic procurement of computing and other resources utilized to perform tasks within the cloud computing environment. Metering and pricing 982 provides cost tracking as resources are utilized within the cloud computing environment and billing or invoicing for the consumption of these resources.

一例では、これらのリソースは、アプリケーション・ソフトウェア・ライセンスを含むことがある。セキュリティが、クラウド消費者およびタスクに対する識別検証、ならびに、データおよび他のリソースの保護を提供する。ユーザ・ポータル９８３は、クラウド・コンピューティング環境へのアクセスを消費者およびシステム管理者に提供する。サービス・レベル管理９８４は、必要とされるサービス・レベルが満たされるようなクラウド・コンピューティング・リソース割振りおよび管理を提供する。サービス・レベル・アグリーメント（ＳＬＡ）計画および達成９８５は、ＳＬＡに従って将来の要件が予期されるクラウド・コンピューティング・リソースの事前取決めおよび調達を提供する。 In one example, these resources may include application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection of data and other resources. User portal 983 provides consumers and system administrators with access to the cloud computing environment. Service level management 984 provides cloud computing resource allocation and management such that required service levels are met. Service level agreement (SLA) planning and achievement 985 provides pre-negotiation and procurement of cloud computing resources where future requirements are anticipated in accordance with SLAs.

作業負荷レイヤ９９０は、クラウド・コンピューティング環境が利用される機能の例を提供する。このレイヤから提供される作業負荷および機能の例は、本発明の１つまたは複数の実施形態による、マッピングおよびナビゲーション９９１、ソフトウェア開発およびライフサイクル管理９９２、仮想教室教育送達９９３、データ分析処理９９４、トランザクション処理９９５、ならびに人工知能アルゴリズム（しきい値検出および浮動小数点計算を含む）処理９９６を含む。 The workload layer 990 provides examples of functions for which cloud computing environments are utilized. Examples of workloads and functions provided from this layer include mapping and navigation 991, software development and lifecycle management 992, virtual classroom instructional delivery 993, data analytics processing 994, transaction processing 995, and artificial intelligence algorithms (including threshold detection and floating point calculations) processing 996, in accordance with one or more embodiments of the present invention.

本明細書で使用される術語は、特定の実施形態について記述するためのものにすぎず、本発明の限定とはしない。本明細書において、単数形「ａ」、「ａｎ」、および「ｔｈｅ」は、コンテキストが明確にそうでないと示さない限り、複数形も含むものとする。さらに、用語「ｃｏｍｐｒｉｓｅｓ」または「ｃｏｍｐｒｉｓｉｎｇ」あるいはその両方は、本明細書で使用されるとき、言明される特徴、ステップ、動作、要素、または構成要素、あるいはその組合せの存在を指定するが、別の特徴、ステップ、動作、要素、構成要素、またはこれらのグループ、あるいはその組合せの存在または追加を排除しないことも理解されるであろう。 The terminology used herein is merely for the purpose of describing particular embodiments and is not intended to limit the invention. In this specification, the singular forms "a", "an" and "the" are intended to include the plural unless the context clearly indicates otherwise. Furthermore, it will be understood that the terms "comprises" and/or "comprising" as used herein specify the presence of stated features, steps, operations, elements, or components, or combinations thereof, but do not preclude the presence or addition of other features, steps, operations, elements, components, or groups thereof, or combinations thereof.

本発明の少なくとも１つの実施形態は、例えば、モデル復元ロジックの複雑な手動の（例えばカスタム構築される）開発に取って代わるフレームワーク（例えば、１つまたは複数のフレームワーク構成のセット）など、有益な効果を提供し得る。本明細書で例証的に述べたように、フレームワークは、障害検出構成要素および関連するモデル復元パイプラインのセットを用いて構成およびインスタンス化される。インスタンス化された後は、フレームワークは、ログを入力として使用して所与のライフサイクルにプラグインし、新しいモデル・バージョンのための新しいモデル・アーチファクトを既存のライフサイクル・パイプライン中に送達する。１つまたは複数の例証的な実施形態では、フレームワークは、ＡＩアプリケーションのエンドツーエンド開発およびライフサイクル管理のための、クラウドベースのフレームワークおよびプラットフォームである。 At least one embodiment of the present invention may provide beneficial effects, such as, for example, a framework (e.g., a set of one or more framework configurations) that replaces complex manual (e.g., custom-built) development of model recovery logic. As illustratively described herein, the framework is configured and instantiated with a set of fault detection components and an associated model recovery pipeline. Once instantiated, the framework plugs into a given lifecycle using logs as input and delivers new model artifacts for the new model version into the existing lifecycle pipeline. In one or more illustrative embodiments, the framework is a cloud-based framework and platform for end-to-end development and lifecycle management of AI applications.

本発明の様々な実施形態についての記述を例証のために提示したが、この記述は、網羅的なものとはせず、開示される実施形態に限定されるものともしない。述べた実施形態の範囲を逸脱することなく、多くの修正および変形が当業者には明らかであろう。本明細書で使用される術語は、実施形態の原理、実際上の適用、もしくは市場で見られる技術に勝る技術的改善を最もよく説明するために、または本明細書に開示される実施形態を他の当業者が理解できるようにするために選ばれたものである。 Although the description of various embodiments of the present invention has been presented for illustrative purposes, the description is not intended to be exhaustive or limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope of the described embodiments. The terminology used herein has been selected to best explain the principles, practical applications, or technical improvements over the art found in the marketplace of the embodiments, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. A system comprising:
a floating-point calculation unit configured to perform a dot product operation according to the first floating-point value and the second floating-point value;
detection logic operably coupled to the floating point calculation unit;
the detection logic is configured to calculate a difference between a fixed-point sum of exponents of the first floating-point value and the second floating-point value and detect the presence of a condition prior to completion of the dot product operation by the floating-point computation unit based on the calculated difference;
responsive to detecting the existence of the condition, the detection logic is further configured to cause the floating point computation unit to avoid performing a subset of computations that would normally be performed as part of the dot product operation.

The system of claim 1, wherein the condition detected is whether a result of the calculation of the difference between a fixed-point sum of exponents of the first floating-point value and the second floating-point value is less than a threshold value.

The system of claim 2, wherein the threshold value is one of a positive value and 0.

The system of claim 2, wherein the existence of the detected condition serves as a predictor that the avoided subset of calculations would result in a dot product of the first floating-point value and the second floating-point value being less than the threshold.

The system of any one of claims 1 to 4, further comprising a voltage scaling controller operably coupled to the detection logic and configured to reduce an operating voltage of the detection logic in proportion to a processing delay associated with the floating point calculation unit.

The system of any one of claims 1 to 5, wherein the dot product operation is part of a training phase for a neural network model used in an artificial intelligence system.

The system of any one of claims 1 to 6, implemented as part of one or more integrated circuits.

1. A method comprising:
calculating a difference between a fixed-point sum of the exponents of the first floating-point value and the second floating-point value;
detecting the existence of a condition based on the calculated difference prior to completion of a dot product operation by a floating-point calculation unit configured to perform a dot product operation according to the first floating-point value and the second floating-point value;
and in response to detecting the existence of the condition, causing the floating-point computation unit to avoid performing a subset of computations that would normally be performed as part of the dot-product operation;
A method wherein one or more of said steps are performed by a processing circuit configured to execute instruction code.

The method of claim 8, wherein the condition detected is whether a result of the calculation of the difference between a fixed-point sum of exponents of the first floating-point value and the second floating-point value is less than a threshold value.

The method of claim 9, wherein the threshold value is one of a positive value and 0.

The method of claim 9, wherein the existence of the detected condition serves as a predictor that the avoided subset of calculations would result in a dot product of the first floating-point value and the second floating-point value being less than the threshold.

the difference between a fixed-point sum of the exponents of the first floating-point value and the second floating-point value is

10. The method of claim 9, wherein ex _i represents the exponent of one of the first and second floating-point values, ey _i represents the exponent of the other of the first and second floating-point values, N represents the number of components that make up the exponent, S _p represents a group of positive products, and S _n represents a group of negative products.

A method according to any one of claims 8 to 12, wherein the dot product operation is part of a training phase for a neural network model used in an artificial intelligence system.

An apparatus comprising:
At least one processor;
at least one memory containing instruction code;
14. Apparatus, wherein said at least one memory and said instruction code, together with said at least one processor, are configured to cause said apparatus to perform the steps of any of claims 8 to 13.

A computer program for causing a computer to execute the steps according to any one of claims 8 to 13.

1. A system comprising:
a rectified linear unit;
a floating-point computation unit operatively coupled to the rectified linear unit and configured to perform a dot product operation according to the first floating-point value and the second floating-point value;
detection logic operably coupled to the floating point calculation unit;
the detection logic is configured to detect the presence of a condition prior to completion of the dot product operation by the floating-point computation unit by calculating a difference between a fixed-point sum of exponents of the first floating-point value and the second floating-point value and, based on the calculated difference, comparing the calculated difference to a threshold;
In response to the calculated difference being less than the threshold, the detection logic is further configured to cause the floating point computation unit to avoid performing a subset of computations that would normally be performed as part of the dot product operation and to cause an output of the floating point computation unit to be controlled by the rectifying linear unit.

The system of claim 16 implemented as part of an artificial intelligence system.

The system of claim 16 implemented as part of one or more integrated circuits.