JP7316073B2

JP7316073B2 - Arithmetic unit of neural network

Info

Publication number: JP7316073B2
Application number: JP2019056641A
Authority: JP
Inventors: 一嘉石渡; 亜季鈴木
Original assignee: Denso Corp; NSI Texe Inc
Current assignee: Denso Corp; NSI Texe Inc
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2023-07-27
Anticipated expiration: 2039-03-25
Also published as: WO2020196586A1; JP2020160564A

Description

本発明は、ニューラルネットワークの演算装置に関する。 The present invention relates to a computation device for neural networks.

従来、ニューラルネットワークの演算装置では、複数の活性化関数が使われることがある（例えば、特許文献１参照）。活性化関数は、例えば、シグモイド関数、Softmax関数、ReLU（Rectified Linear Unit）関数、恒等関数等が使われている。計算量の削減のため、日々新たな活性化関数が考案されている。 2. Description of the Related Art Conventionally, a plurality of activation functions may be used in neural network computing devices (see, for example, Patent Document 1). A sigmoid function, a Softmax function, a ReLU (Rectified Linear Unit) function, an identity function, or the like is used as the activation function, for example. In order to reduce the amount of calculation, new activation functions are devised every day.

特開２０１８－９２５６０号公報JP 2018-92560 A

従来のニューラルネットワークの演算装置では、処理を高速化するために活性化関数に関するプログラム専用のハードウェアを設計し活性化関数をハードウェア化することがあった。この場合、新たな活性化関数への対応が難しくなるという課題があった。 In conventional neural network computing devices, in order to speed up processing, hardware dedicated to programs related to activation functions is sometimes designed and the activation functions are implemented in hardware. In this case, there is a problem that it becomes difficult to deal with the new activation function.

本発明は、活性化関数をハードウェア化して処理を高速化し、かつ適用する活性化関数を容易に変更できるニューラルネットワークの演算装置を提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an arithmetic device for a neural network in which an activation function is implemented in hardware to speed up processing and the activation function to be applied can be easily changed.

本開示は上記課題を解決するために以下の技術的手段を採用する。特許請求の範囲及びこの項に記載した括弧内の符号は、ひとつの態様として後述する実施の形態に記載の具体的手段との対応関係を示す一例であって、本発明の技術的範囲を限定するものではない。 The present disclosure employs the following technical means to solve the above problems. The symbols in parentheses described in the claims and this section are an example showing the correspondence relationship with the specific means described in the embodiment described later as one aspect, and limit the technical scope of the present invention. not something to do.

上記目的を達成するために、本発明は、ニューラルネットワークの演算装置（１１６）であって、複数種類の活性化関数について、前記活性化関数を用いた演算を行う活性化関数回路（２０８）を有する演算ユニット（２０６）と、スレッドに対応付けて前記活性化関数の種類の指定を記憶するレジスタ（２１４）と、を備える。 In order to achieve the above object, the present invention provides a neural network computing device (116) comprising an activation function circuit (208) for performing computations using the activation functions for a plurality of types of activation functions. and a register (214) for storing designation of the activation function type in association with the thread.

本開示によれば、活性化関数をハードウェア化して処理を高速化し、かつ活性化関数の種類をレジスタで指定するので、適用する活性化関数を容易に変更できる。 According to the present disclosure, since the activation function is made into hardware to speed up the processing and the type of the activation function is specified by the register, the activation function to be applied can be easily changed.

本発明の実施の形態に係る演算装置の構成を示すブロック図である。1 is a block diagram showing the configuration of an arithmetic device according to an embodiment of the present invention; FIG. 本発明の実施の形態に係る演算装置が処理するスレッドを説明するための図である。FIG. 4 is a diagram for explaining threads processed by the arithmetic device according to the embodiment of the present invention; 本発明の実施の形態に係るデータフロー型プロセッサの構成を示すブロック図である。1 is a block diagram showing the configuration of a dataflow processor according to an embodiment of the present invention; FIG. 本発明の実施の形態に係るＰＥ（Processing Element）の構成を示すブロック図である。3 is a block diagram showing the configuration of a PE (Processing Element) according to the embodiment of the present invention; FIG. （ａ）本発明の実施の形態に係る活性化関数としてParametric ReLU関数を指定するアセンブラコードを示す図である。（ｂ）本発明の実施の形態に係る活性化関数として、Hyperbolic Tangent関数を指定するアセンブラコードを示す図である。(a) is a diagram showing assembler code specifying a Parametric ReLU function as an activation function according to an embodiment of the present invention; (b) A diagram showing an assembler code specifying a Hyperbolic Tangent function as an activation function according to an embodiment of the present invention. （ａ）本発明の実施の形態に係る活性化関数としてParametric ReLU関数とその微分関数のグラフを示す図である。（ｂ）本発明の実施の形態に係る活性化関数としてHyperbolic Tangent関数とその微分関数のグラフを示す図である。(a) A diagram showing a graph of a Parametric ReLU function and its differential function as an activation function according to the embodiment of the present invention. (b) A diagram showing a graph of a Hyperbolic Tangent function and its differential function as an activation function according to the embodiment of the present invention.

以下、図面を参照して本発明の実施の形態を説明する。なお、以下に説明する実施の形態は、本発明を実施する場合の一例を示すものであって、本発明を以下に説明する具体的構成に限定するものではない。本発明の実施にあたっては、実施の形態に応じた具体的構成が適宜採用されてよい。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, the embodiment described below shows an example of the case of carrying out the present invention, and does not limit the present invention to the specific configuration described below. In carrying out the present invention, a specific configuration according to the embodiment may be appropriately employed.

［演算装置１０の構成］
図１は、本発明の実施の形態に係る演算装置１０の構成を示すブロック図である。演算装置１０は、データフロー型プロセッサ１００、ホストＣＰＵ（Central Processing Unit）２００、システムバス３００、ＲＯＭ（Read Only Memory）４００、ＲＡＭ（Random Access Memory）５００、外部インターフェイス６００、イベントハンドラ７００を有する。 [Configuration of computing device 10]
FIG. 1 is a block diagram showing the configuration of an arithmetic device 10 according to an embodiment of the invention. The arithmetic device 10 has a data flow processor 100 , a host CPU (Central Processing Unit) 200 , a system bus 300 , a ROM (Read Only Memory) 400 , a RAM (Random Access Memory) 500 , an external interface 600 and an event handler 700 .

図２は、本発明の実施の形態に係る演算装置１０が処理するスレッドを説明するための図である。図２に示すように、プログラムコードは、データと処理とが分割されているグラフ構造を有している。このグラフ構造は、プログラムのタスク並列性、グラフ並列性を保持している。演算装置１０は、このグラフ構造のプログラムを分割して得られた多量のスレッドを処理する。 FIG. 2 is a diagram for explaining threads processed by the arithmetic device 10 according to the embodiment of the present invention. As shown in FIG. 2, the program code has a graph structure in which data and processing are separated. This graph structure holds task parallelism and graph parallelism of the program. The arithmetic unit 10 processes a large number of threads obtained by dividing the graph-structured program.

演算装置１０が、例えば、畳み込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）における畳み込みやプーリング処理をするとき、データフロー型プロセッサ１００に大きな負荷がかかる。演算装置１０が、後述する活性化関数モジュール２０８や活性化関数の微分モジュール２１０を備え活性化関数のハードウェア化をすることで、このデータフロー型プロセッサ１００にかかる大きな負荷を低減できる。 For example, when the arithmetic device 10 performs convolution or pooling processing in a convolutional neural network (CNN), the data flow processor 100 is heavily loaded. The heavy load on the data flow processor 100 can be reduced by providing the arithmetic unit 10 with an activation function module 208 and an activation function differentiation module 210, which will be described later, and implementing the activation functions in hardware.

図１に戻って説明を続ける。データフロー型プロセッサ１００には、割込み処理を生成するイベントハンドラ７００が接続されており、イベントハンドラ７００が生成した割込みをサポートしている。データフロー型プロセッサ１００の具体的な構成は、図３を用いて後述する。 Returning to FIG. 1, the description continues. The data flow processor 100 is connected to an event handler 700 that generates interrupt processing, and supports interrupts generated by the event handler 700 . A specific configuration of the data flow processor 100 will be described later with reference to FIG.

ホストＣＰＵ２００は、演算装置１０の中央処理装置（ＣＰＵ）である。ホストＣＰＵ２００は、システムバス３００を介して、演算装置１０の各部と情報の授受が可能である。ホストＣＰＵ２００が、ＲＯＭ４００又はＲＡＭ５００に記憶されたプログラムを実行することにより、演算装置１０の各部の機能を実現する。 The host CPU 200 is the central processing unit (CPU) of the computing device 10 . The host CPU 200 can exchange information with each unit of the arithmetic device 10 via the system bus 300 . The host CPU 200 implements the functions of each section of the arithmetic device 10 by executing programs stored in the ROM 400 or the RAM 500 .

ＲＯＭ４００は、読み出し専用の不揮発性メモリである。ＲＡＭ５００は、読み書き用の揮発性メモリである。ＲＯＭ４００とＲＡＭ５００とは、ホストＣＰＵ２００が演算装置１０の各部の機能を実現するためのプログラムを記憶する。ＲＯＭ４００とＲＡＭ５００とは、データフロー型プロセッサ１００の演算結果を記憶してもよい。 The ROM 400 is a read-only non-volatile memory. The RAM 500 is a read/write volatile memory. The ROM 400 and the RAM 500 store programs for the host CPU 200 to implement the functions of the arithmetic unit 10 . The ROM 400 and RAM 500 may store the calculation results of the data flow processor 100 .

外部インターフェイス６００は、演算装置１０が外部から情報を入力し、又は外部へ情報を出力する入出力装置である。外部インターフェイス６００は、例えば、カメラ、超音波センサ等である。 The external interface 600 is an input/output device through which the arithmetic device 10 inputs information from the outside or outputs information to the outside. The external interface 600 is, for example, a camera, an ultrasonic sensor, or the like.

［データフロー型プロセッサ１００の構成］
図３は、データフロー型プロセッサ１００の構成を示すブロック図である。データフロー型プロセッサ１００は、コマンドユニット１０２と、メモリサブシステム１０４と、スレッドスケジューラ１１２と、実行コア１１４とを有する。 [Configuration of Data Flow Processor 100]
FIG. 3 is a block diagram showing the configuration of the dataflow processor 100. As shown in FIG. The dataflow processor 100 has a command unit 102 , a memory subsystem 104 , a thread scheduler 112 and an execution core 114 .

コマンドユニット１０２は、データフロー型プロセッサ１００の機能の実行を指示する命令部である。コマンドユニット１０２は、ユーザが設定できるコンフィギュレーション（設定項目）のインターフェイスであるコンフィグインターフェイスと通信してもよい。コマンドユニット１０２は、データの読み出し命令、書き込み命令、消去命令等のコマンドを一時的に格納するコマンドバッファの機能を有してもよい。 The command unit 102 is an instruction section that instructs execution of functions of the data flow processor 100 . The command unit 102 may communicate with a config interface, which is a user-configurable configuration interface. The command unit 102 may have the function of a command buffer that temporarily stores commands such as data read instructions, write instructions, and erase instructions.

メモリサブシステム１０４は、ＲＡＭ５００と実行コア１１４との間に置かれるメモリのシステムである。メモリサブシステム１０４は、システムバスインターフェイス又はＲＯＭインターフェイスと通信してもよい。メモリサブシステム１０４は、アービタ１０６、Ｌ１キャッシュ１０８、Ｌ２キャッシュ１１０を有する。 Memory subsystem 104 is a system of memory interposed between RAM 500 and execution core 114 . Memory subsystem 104 may communicate with a system bus interface or a ROM interface. Memory subsystem 104 includes arbiter 106 , L1 cache 108 and L2 cache 110 .

アービタ１０６は、ＰＥ１１６とＬ１キャッシュ１０８との間に配置され、複数のデータアクセス要求が競合したときに、優先度等を用いた予め定められた規則に従って調整する調整装置である。Ｌ１キャッシュ１０８は、ＲＡＭ５００よりも高速に読み書きできるメモリである。Ｌ２キャッシュ１１０は、Ｌ１キャッシュの下位にあるＲＡＭ５００よりも高速に読み書きできるメモリである。 The arbiter 106 is placed between the PE 116 and the L1 cache 108, and is a coordinating device that adjusts according to a predetermined rule using priority or the like when a plurality of data access requests conflict. L1 cache 108 is memory that can be read and written faster than RAM 500 . The L2 cache 110 is memory that can be read and written faster than the RAM 500 below the L1 cache.

スレッドスケジューラ１１２は、処理すべき優先度にしたがい複数スレッドを実行コア１１４に割り当てる、スレッドのスケジューラである。 The thread scheduler 112 is a thread scheduler that allocates multiple threads to the execution core 114 according to the priority of processing.

実行コア１１４は、スレッドの処理を実行する実行部である。実行コア１１４は、複数の演算器（Processing Element：ＰＥ）１１６を有する。ＰＥ１１６は、スカラ演算するスカラ演算器でもよいし、ベクトル演算するベクトル演算器でもよい。本実施の形態では、ＰＥ１１６は、少なくとも１つはベクトル演算器である。 The execution core 114 is an execution unit that executes thread processing. The execution core 114 has a plurality of processing elements (PE) 116 . The PE 116 may be a scalar operator that performs scalar operations or a vector operator that performs vector operations. In this embodiment, at least one PE 116 is a vector operator.

［ＰＥ１１６の構成］
図４は、本発明の実施の形態に係るＰＥ１１６の構成を示すブロック図である。ＰＥ１１６は、命令デコーダ・制御部２０２、汎用レジスタファイル２０４、演算ユニット２０６、設定レジスタ２１２を有する。 [Configuration of PE 116]
FIG. 4 is a block diagram showing the configuration of PE 116 according to the embodiment of the present invention. The PE 116 has an instruction decoder/controller 202 , a general-purpose register file 204 , an arithmetic unit 206 and a setting register 212 .

命令デコーダ・制御部２０２は、命令をデコードし、各部を制御する命令デコーダ・制御部である。命令デコーダ・制御部２０２は、汎用レジスタファイル２０４に対して演算に用いられるレジスタを指定し、設定レジスタ２１２に対して演算スレッドを指定し、演算ユニット２０６に対して演算に用いるモジュールを指定する。 The instruction decoder/control unit 202 is an instruction decoder/control unit that decodes instructions and controls each unit. The instruction decoder/control unit 202 designates the registers used for the operation to the general-purpose register file 204, designates the operation thread to the setting register 212, and designates the module to be used for the operation to the operation unit 206. FIG.

演算ユニット２０６は、入力データに基づいて出力データを生成する演算部である。上述したように、演算ユニット２０６は、命令デコーダ・制御部２０２が指定したレジスタの値に対して、命令デコーダ・制御部２０２が指定したモジュールを使用して演算する。本実施の形態では、演算ユニット２０６は、ベクトル演算を行う。演算ユニット２０６は、活性化関数モジュール２０８と活性化関数の微分モジュール２１０とを有する。 The arithmetic unit 206 is an arithmetic section that generates output data based on input data. As described above, the arithmetic unit 206 uses the module specified by the instruction decoder/controller 202 to compute the value of the register specified by the instruction decoder/controller 202 . In this embodiment, arithmetic unit 206 performs vector arithmetic. The arithmetic unit 206 has an activation function module 208 and an activation function differentiation module 210 .

活性化関数モジュール２０８は、活性化関数を演算に適用する活性化関数回路であり、ハードウェアによって構成されている。活性化関数モジュール２０８は、種類の異なる複数の活性化関数を有する。 The activation function module 208 is an activation function circuit that applies activation functions to operations, and is configured by hardware. The activation function module 208 has multiple activation functions of different types.

活性化関数は、例えば、恒等関数、ステップ関数、線形関数、ソフトプラス関数、シグモイド関数、ハードシグモイド関数、Hyperbolic Tangent関数、ソフトサイン関数、ReLU関数、Leaky ReLU関数、Parametric ReLU関数、Thresholded ReLU関数、eLU（Exponential Linear Units）関数、SeLU（Scaled Exponential Linear Unit）関数、Softmax関数、Gumbel-Softmax関数、Swish関数等がある。図４では、活性化関数モジュール２０８が、活性化関数Ａ、活性化関数Ｂ、活性化関数Ｃの３つの活性化関数を有する例を挙げた。 Activation functions are e.g. identity function, step function, linear function, soft plus function, sigmoid function, hard sigmoid function, Hyperbolic Tangent function, soft sine function, ReLU function, Leaky ReLU function, Parametric ReLU function, Thresholded ReLU function , eLU (Exponential Linear Units) function, SeLU (Scaled Exponential Linear Unit) function, Softmax function, Gumbel-Softmax function, Swish function, and the like. In FIG. 4, the activation function module 208 has three activation functions, activation function A, activation function B, and activation function C, as an example.

活性化関数の微分モジュール２１０は、活性化関数の微分を演算に適用する活性化関数の微分関数回路であり、ハードウェアによって構成されている。活性化関数の微分モジュール２１０は、種類の異なる複数の活性化関数の微分関数を有する。活性化関数の微分モジュール２１０が演算に適用する複数の活性化関数の微分関数は、例えば、活性化関数モジュール２０８が記憶する複数の活性化関数を微分した関数である。図４では、活性化関数の微分モジュール２１０が、活性化関数Ａの微分関数、活性化関数Ｂの微分関数、活性化関数Ｃの微分関数を有する例を挙げた。 The activation function differentiation module 210 is an activation function differentiation function circuit that applies differentiation of the activation function to computation, and is configured by hardware. The activation function differentiation module 210 has a plurality of activation function differentiation functions of different types. The differentiation functions of the plurality of activation functions that the activation function differentiation module 210 applies to the calculation are, for example, functions obtained by differentiating the plurality of activation functions stored in the activation function module 208 . In FIG. 4, the activation function differentiation module 210 has an activation function A differentiation function, an activation function B differentiation function, and an activation function C differentiation function.

なお、本実施の形態では、演算ユニット２０６が活性化関数モジュール２０８と活性化関数の微分モジュール２１０を有する例を挙げたが、演算ユニット２０６は、活性化関数モジュール２０８だけでもよい。 In the present embodiment, an example in which the arithmetic unit 206 has the activation function module 208 and the activation function differentiation module 210 was given, but the arithmetic unit 206 may have only the activation function module 208 .

設定レジスタ２１２は、活性化関数の種類を指定し、活性化関数の係数を指定するためのデータを記憶するレジスタである。なお、設定レジスタ２１２としては、プロセッサが対応するスレッドの数分の設定レジスタ２１２が備えられる。設定レジスタ２１２は、活性化関数の種類を指定するデータを記憶するレジスタ２１４を有する。本実施の形態では、活性化関数Ａを指定するレジスタ値を「１」、活性化関数Ｂを指定するレジスタ値を「２」、活性化関数Ｃを指定するレジスタ値を「３」とする。スレッドに指定されたレジスタ値に対応する活性化関数を、そのスレッドの「デフォルトの活性化関数」という。 The setting register 212 is a register that stores data for designating the type of activation function and designating the coefficient of the activation function. As the setting registers 212, the number of setting registers 212 corresponding to the number of threads corresponding to the processor is provided. The setting register 212 has a register 214 that stores data designating the type of activation function. In this embodiment, the register value specifying activation function A is "1", the register value specifying activation function B is "2", and the register value specifying activation function C is "3". The activation function corresponding to the register values specified for a thread is called the "default activation function" of that thread.

設定レジスタ２１２は、活性化関数の係数を指定するデータを記憶するレジスタ２１６を有してもよい。本実施の形態では、設定レジスタ２１２は、活性化関数の種類と係数とを指定する。 Configuration registers 212 may include registers 216 that store data specifying coefficients of the activation function. In this embodiment, the setting register 212 designates the type and coefficient of the activation function.

命令デコーダ・制御部２０２は、演算ユニット２０６に対して、命令コードで活性化関数の種類、およびその係数を指定する。活性化関数の種類の指定は、命令コードの活性化関数の種類を指定するビットにより判別される。命令コードの活性化関数の種類を指定するビットが「０」のときは、活性化関数の種類を指定するレジスタの値によって、該当する活性化関数（すなわち、デフォルトの活性化関数）を実行する。ここで、「０」は、活性化関数Ａ～Ｃに対応するレジスタ値とは異なる予め定められた値である。一方、命令コードの活性化関数の種類を指定するビットが「１」～「３」のときは、演算ユニット２０６は、指定された値に該当する活性化関数Ａ～Ｃのいずれかを実行する。 The instruction decoder/control unit 202 designates the type of activation function and its coefficient with an instruction code to the arithmetic unit 206 . The designation of the type of activation function is determined by the bit that designates the type of activation function in the instruction code. When the bit specifying the type of activation function in the instruction code is "0", the corresponding activation function (that is, the default activation function) is executed according to the value of the register that specifies the type of activation function. . Here, "0" is a predetermined value different from the register values corresponding to the activation functions AC. On the other hand, when the bit specifying the type of activation function of the instruction code is "1" to "3", the arithmetic unit 206 executes one of the activation functions A to C corresponding to the specified value. .

同様に、活性化関数の係数の指定は、命令コードの活性化関数の係数を指定するビットにより判別される。命令コードの活性化関数の種類を指定するビットが「０」のときは、活性化関数の係数を指定するレジスタの値によって、該当する活性化関数の係数（すなわち、デフォルトの活性化関数の係数）を指定する。一方、命令コードの活性化関数の種類を指定するビットが「１」以上のときは、命令コードには、必要に応じて活性化関数の係数が指定されており、演算ユニット２０６は、指定された値を用いて活性化関数を実行する。 Similarly, the designation of the coefficients of the activation function is determined by the bits of the instruction code that designate the coefficients of the activation function. When the bit specifying the type of activation function in the instruction code is "0", the corresponding activation function coefficient (i.e., the default activation function coefficient ). On the other hand, when the bit specifying the type of activation function in the instruction code is "1" or more, the instruction code specifies the coefficient of the activation function as necessary, and the arithmetic unit 206 Executes the activation function using the value obtained.

上述した活性化関数の種類と係数の指定とを、図５と図６とを用いて説明する。図５（ａ）は、本発明の実施の形態に係る活性化関数としてParametric ReLU関数を指定するアセンブラコードを示す図である。図６（ａ）は、本発明の実施の形態に係る活性化関数としてParametric ReLU関数とその微分関数のグラフを示す図である。 The types of activation functions and designation of coefficients described above will be described with reference to FIGS. 5 and 6. FIG. FIG. 5(a) is a diagram showing assembler code specifying a Parametric ReLU function as an activation function according to an embodiment of the present invention. FIG. 6A is a diagram showing graphs of a Parametric ReLU function and its differential function as activation functions according to the embodiment of the present invention.

アセンブラコード（１）３０２は、活性化関数の種類を指定するデータを記述している。本実施の形態では、アセンブラコード（１）３０２は、活性化関数の種類を指定する引数が「１」であり、「１」に対応付けられたParametric ReLU関数を指定している。 Assembler code (1) 302 describes data specifying the type of activation function. In this embodiment, the assembler code (1) 302 has "1" as an argument specifying the type of activation function, and specifies the Parametric ReLU function associated with "1".

アセンブラコード（２）３０４は、活性化関数の種類を指定するレジスタＲ１に、アセンブラコード（１）３０２で指定した活性化関数を指定するデータ（actfunc#sel）を設定することを記述している。 The assembler code (2) 304 describes setting data (actfunc#sel) specifying the activation function specified in the assembler code (1) 302 in the register R1 specifying the type of activation function. .

アセンブラコード（３）３０６は、引数で活性化関数の係数を指定し、活性化関数を演算に適用することを記述している。本実施の形態では、アセンブラコード（３）３０６は、引数「１」でParametric ReLU関数の係数「１」を指定し、Parametric ReLU関数を演算に適用することを記述している。 Assembler code (3) 306 specifies the coefficients of the activation function as arguments and describes applying the activation function to the operation. In this embodiment, the assembler code (3) 306 specifies the coefficient "1" of the Parametric ReLU function with the argument "1" and describes applying the Parametric ReLU function to the calculation.

アセンブラコード（４）３０８は、引数で活性化関数の微分関数の係数を指定し、活性化関数の微分関数を演算に適用することを記述している。本実施の形態では、アセンブラコード（４）３０８は、引数「１」でParametric ReLU関数の微分関数の係数を指定し、Parametric ReLUの微分関数を演算に適用することを記述している。 The assembler code (4) 308 specifies the coefficient of the differential function of the activation function as an argument, and describes applying the differential function of the activation function to the calculation. In this embodiment, the assembler code (4) 308 specifies the coefficient of the differential function of the Parametric ReLU function with the argument "1", and describes applying the differential function of the Parametric ReLU to the calculation.

図５（ａ）に示すアセンブラコードにより演算に適用されるParametric ReLU関数をグラフにすると、図６（ａ）に示すように係数ａ＝１のParametric ReLU関数とその微分関数のグラフとなる。 When the Parametric ReLU function applied to the calculation by the assembler code shown in FIG. 5(a) is graphed, it becomes a graph of the Parametric ReLU function with coefficient a=1 and its differential function as shown in FIG. 6(a).

次に、演算に適用する活性化関数をParametric ReLU関数及びその微分関数からHyperbolic Tangent関数及びその微分関数へと変更するアセンブラコードを説明する。図５（ｂ）本発明の実施の形態に係る活性化関数として、Hyperbolic Tangent関数を指定するアセンブラコードを示す図であり、図６（ｂ）本発明の実施の形態に係る活性化関数としてHyperbolic Tangent関数及びその微分関数のグラフを示す図である。 Next, the assembler code for changing the activation function applied to the operation from the Parametric ReLU function and its differential function to the Hyperbolic Tangent function and its differential function will be described. FIG. 5(b) is a diagram showing an assembler code that specifies a Hyperbolic Tangent function as an activation function according to the embodiment of the present invention, and FIG. It is a figure which shows the graph of a tangent function and its derivative function.

図５（ｂ）に示すように、本実施の形態では、アセンブラコード（１）４０２は、活性化関数の種類を指定する値が「１」から「２」に変更されている。この変更により、値「１」に対応づけられたParametric ReLU関数から値「２」に対応付けられたHyperbolic Tangent関数へと活性化関数及びその微分関数の指定が変更される。 As shown in FIG. 5B, in the assembler code (1) 402 in this embodiment, the value specifying the type of activation function is changed from "1" to "2". This change changes the designation of the activation function and its derivative function from the Parametric ReLU function associated with the value "1" to the Hyperbolic Tangent function associated with the value "2".

アセンブラコード（２）３０４からアセンブラコード（４）３０８は、アセンブラコード（２）４０４からアセンブラコード（４）４０８と同じアセンブラコードである。このように、他のアセンブラコードは変えずにアセンブラコード（１）４０２により関数の種類を指定するのみで、演算に適用するデフォルトの活性化関数を変更できる。 Assembler code (2) 304 to assembler code (4) 308 are the same assembler code as assembler code (2) 404 to assembler code (4) 408 . In this way, the default activation function applied to the operation can be changed only by designating the function type by the assembler code (1) 402 without changing other assembler codes.

なお、Hyperbolic Tangent関数は係数がない関数であり係数の指定が不要である。そのため、アセンブラコード（３）４０６が、引数「１」により関数の係数を指定し、アセンブラコード（４）４０８が引数「１」により関数の係数を指定しても、いずれも無視される。 Note that the Hyperbolic Tangent function is a function without coefficients and does not require specification of coefficients. Therefore, even if the assembler code (3) 406 specifies the coefficient of the function with the argument "1" and the assembler code (4) 408 specifies the coefficient of the function with the argument "1", both are ignored.

図５（ｂ）に示すアセンブラコードにより適用される活性化関数をグラフにすると図６（ｂ）に示すように係数をもたないHyperbolic Tangent関数及びその微分関数のグラフとなる。 Graphing the activation function applied by the assembler code shown in FIG. 5(b) results in a graph of the Hyperbolic Tangent function without coefficients and its derivative function as shown in FIG. 6(b).

このように、ニューラルネットワークの演算装置１０は、活性化関数回路を有する演算ユニット２０６と、スレッドに対応付けて活性化関数の種類及び係数の指定を記憶する設定レジスタ２１２とを備えているため、活性化関数をハードウェア化して処理を高速化し、かつ適用する活性化関数を容易に変更できる。 In this way, the neural network arithmetic device 10 includes an arithmetic unit 206 having an activation function circuit, and a setting register 212 that stores designation of the activation function type and coefficient in association with the thread. Activation functions can be implemented in hardware to speed up processing, and activation functions to be applied can be easily changed.

また、ニューラルネットワークの演算装置１０は、少なくとも１つの活性化関数回路がベクトル演算を行うため、複数のデータを並列処理できる。 Further, since at least one activation function circuit of the arithmetic device 10 of the neural network performs vector arithmetic, a plurality of data can be processed in parallel.

また、ニューラルネットワークの演算装置１０は、演算ユニット２０６として、複数種類の活性化関数回路とその微分関数回路とを有するため、一つの命令で複数の活性化関数とその微分とを切り替えて利用できる。 In addition, since the arithmetic unit 10 of the neural network has a plurality of types of activation function circuits and their differential function circuits as the arithmetic unit 206, it is possible to switch and use a plurality of activation functions and their differentials with one instruction. .

また、活性化関数の種類と係数とは、設定レジスタ２１２の値又は命令コードの引数により指定されるため、設定レジスタ２１２のデフォルト値を変えるだけで異なる活性化関数及びその係数への対応が可能となり、デフォルトの活性化関数を変更するためにＲＯＭコードの書き換えを不要若しくは少なくできる。 In addition, since the type of activation function and coefficients are specified by the value of the setting register 212 or the argument of the instruction code, it is possible to deal with different activation functions and their coefficients simply by changing the default value of the setting register 212. Therefore, rewriting of the ROM code to change the default activation function can be eliminated or reduced.

また、活性化関数の種類とその係数との指定は、設定レジスタ２１２による指定よりも命令コードによる指定を優先できるため、設定レジスタ２１２による活性化関数の指定を受けたくないスレッドは、命令コードにより活性化関数とその係数とを指定できる。なお、設定レジスタ２１２による指定よりも命令コードによる指定を優先できる場合でも、命令コードの引数として予め定められた所定値が指定されているときは、設定レジスタ２１２による活性化関数とその係数との指定を採用できる。 In addition, since the specification of the type of activation function and its coefficients can be specified by the instruction code over the specification by the setting register 212, a thread that does not want to receive the specification of the activation function by the setting register 212 can use the instruction code. An activation function and its coefficients can be specified. Even if the specification by the instruction code can be given priority over the specification by the setting register 212, if a predetermined value is specified as an argument of the instruction code, the activation function by the setting register 212 and its coefficient are specification can be adopted.

［変形例］
本発明の実施の形態では、データフロー型プロセッサ１００が用いられているが、データフロー型プロセッサ１００に代えてＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）等の従来のプロセッサを用いてもよい。 [Modification]
Although the data flow processor 100 is used in the embodiment of the present invention, a conventional processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) may be used in place of the data flow processor 100. good.

１０・・・演算装置，１００・・・データフロー型プロセッサ，
２００・・・ホストＣＰＵ，３００・・・システムバス，４００・・・ＲＯＭ，
５００・・・ＲＡＭ，６００・・・外部インターフェイス，
７００・・・イベントハンドラ，１０２・・・コマンドユニット，
１０４・・・メモリサブシステム，１０６・・・アービタ，
１０８・・・Ｌ１キャッシュ，１１０・・・Ｌ２キャッシュ，
１１２・・・スレッドスケジューラ，１１４・・・実行コア，１１６・・・ＰＥ，
２０２・・・命令デコーダ・制御部，２０４・・・汎用レジスタファイル，
２０６・・・演算ユニット，２０８・・・活性化関数モジュール，
２１０・・・活性化関数の微分モジュール，２１２・・・設定レジスタ，
２１４・・・活性化関数の種類を指定するレジスタ，
２１６・・・活性化関数の係数を指定するレジスタ 10... Arithmetic device, 100... Data flow type processor,
200... Host CPU, 300... System bus, 400... ROM,
500...RAM, 600...external interface,
700 event handler, 102 command unit,
104... memory subsystem, 106... arbiter,
108 L1 cache, 110 L2 cache,
112... thread scheduler, 114... execution core, 116... PE,
202... Instruction decoder/control unit, 204... General-purpose register file,
206 operation unit, 208 activation function module,
210 ... differentiation module of activation function, 212 ... setting register,
214... register for designating the type of activation function,
216 ... register for specifying the coefficient of the activation function

Claims

A calculation unit (206) having an activation function circuit (208) for performing calculations using the activation functions for a plurality of types of activation functions;
a register (214) for storing designation of the type of activation function in association with a thread;
with
When the type of the activation function is specified in the instruction code of the thread sent to the operation unit, the operation unit is activated by the instruction code specified in the instruction code regardless of the specification in the register. performing an operation using the activation function;
A computing unit (116) of a neural network.

When a predetermined value is specified as the specification of the type of the activation function in the instruction code of the thread sent to the arithmetic unit, the arithmetic unit associates the thread with the register. 2. A computation device for a neural network according to claim 1, wherein computation is performed using the activation function designated by .

3. The neural network operation device according to claim 1 , further comprising a register (216) for storing the designation of the coefficient of the activation function in association with the thread.

When a predetermined value is specified as a coefficient of the activation function in the instruction code of the thread sent to the arithmetic unit, the arithmetic unit associates the thread with the register. 4. The neural network operation device according to claim 3 , wherein the operation is performed using the coefficients of the activation function designated by .

A calculation unit (206) having an activation function circuit (208) for performing calculations using the activation functions for a plurality of types of activation functions;
a register (214) for storing designation of the type of activation function in association with a thread;
a register (216) for storing designation of coefficients of the activation function in association with threads;
with
When the coefficient of the activation function is specified in the instruction code of the thread sent to the operation unit, the operation unit receives the coefficient specified in the instruction code regardless of the specification in the register. A computation device for a neural network , which performs computation using the coefficients of the activation function.

6. The neural network operation device according to claim 1, wherein at least one of the plurality of types of activation function circuits performs vector operations on a plurality of data in parallel.

The arithmetic unit further comprises a differential function circuit (210) for a plurality of types of activation functions for computing respective differential functions of the plurality of types of activation functions,
7. The neural network computing device according to claim 1, wherein computation is performed using said activation function and a differential function of said activation function.