JP7014393B2

JP7014393B2 - Data processing device and data processing method in this

Info

Publication number: JP7014393B2
Application number: JP2017117686A
Authority: JP
Inventors: 洋一富岡; セドゥーキンスタニスラフ
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2017-06-15
Filing date: 2017-06-15
Publication date: 2022-02-01
Anticipated expiration: 2037-06-15
Also published as: JP2019003414A

Description

本発明は、データ処理装置、及びこれにおけるデータ処理方法に関し、特に、畳み込みニューラルネットワークにおける畳み込み演算に適したデータ処理装置、及びこれにおけるデータ処理方法に関する。 The present invention relates to a data processing apparatus and a data processing method thereof, and more particularly to a data processing apparatus suitable for a convolutional operation in a convolutional neural network, and a data processing method thereof.

ニューラルネットワークに畳み込み(圧縮処理：Convolution))を追加した畳み込みニューラルネットワーク（Convolutional Neural Network:以下適宜ＣＮＮと表記）が、特に画像認識に有効な機械学習として広く認識されている。 A convolutional neural network (hereinafter referred to as CNN as appropriate), which is a neural network with convolution (convolution) added, is widely recognized as machine learning that is particularly effective for image recognition.

図１は、ＣＮＮのシステム構成の概略を示す図である。入力データに対して、複数の層(レイヤーＬ１－Ｌ５))構造で処理を行う。 FIG. 1 is a diagram showing an outline of a system configuration of CNN. The input data is processed in a plurality of layers (layers L1-L5)) structure.

図１では、レイヤーＬ１、Ｌ２のそれぞれは、畳み込み層(Convolutional Layer）、プーリング層（Pooling Layer）を含みこれを繰り返す。 In FIG. 1, each of the layers L1 and L2 includes a convolutional layer and a pooling layer, and this is repeated.

畳み込み層は、入力データに対してフィルタ(kernel)特徴を乗算する（特徴量を畳み込む）層である。入力データが画像データである場合、入力データ（画像）に対して、それぞれ異なるフィルタ特徴を乗算してフィルタの数に対応する画像を得ている。複数のフィルタを使うことにより入力画像のさまざまな特徴が捉えられ、特徴量の畳み込みによって画像内のパターンが検出出来る。 The convolution layer is a layer that multiplies the input data by the filter (kernel) features (convolutions the features). When the input data is image data, the input data (image) is multiplied by different filter features to obtain an image corresponding to the number of filters. Various features of the input image can be captured by using multiple filters, and patterns in the image can be detected by convolving the features.

プーリング層は、畳み込み層の直後に置かれ、レイヤーを縮小して扱い易くし、抽出された特徴の位置感度を低下させる。 The pooling layer is placed immediately after the convolutional layer, reducing the layer to make it easier to handle and reducing the position sensitivity of the extracted features.

ＣＮＮは、次いで、レイヤーＬ３－Ｌ５により全結合した多層パーセプトロンを配置して入力データ（画像）を認識する。 The CNN then arranges a multi-layer perceptron fully coupled by layers L3-L5 to recognize the input data (image).

ここで、序盤のレイヤーで行う畳み込み演算には膨大な計算回数が必要である。このためかかる部分の省電力化が非常に重要な課題となっている。 Here, the convolution calculation performed in the early layer requires a huge number of calculations. For this reason, power saving in such a part has become a very important issue.

しかし、複数の演算素子（ＰＥ：プロセッシングエレメント）を、アレイ状に配置するアレイ型の並列演算処理素子とすると、周辺機能ブロックとの類似度計算をするために多くの配線資源や転送時間が必要となる。 However, if a plurality of arithmetic elements (PE: processing elements) are arranged in an array-type parallel arithmetic processing element, a large amount of wiring resources and transfer time are required to calculate the similarity with peripheral functional blocks. Will be.

かかる点に鑑みて、本発明者等は、先にＰＥ間の通信でのデータ衝突を回避し、かつＰＥを特定の方向に偏ることなく増加させることが可能な拡張性の高いデータ処理装置を提案している(特許文献１)。 In view of this point, the present inventors have previously provided a highly expandable data processing device capable of avoiding data collisions in communication between PEs and increasing PEs without being biased in a specific direction. It has been proposed (Patent Document 1).

かかる先の発明技術では、ｎ次元のネットワークを構成するｎ次元の方向に配置された全てのＰＥが、転送クロックに同期してデータを入出力する。そして、データを入出力する方向であるシフト方向に隣接する第１の隣接ＰＥから第１のデータを受け取るとともに、反対側に隣接する第２の隣接ＰＥに第２のデータを出力し、隣接するＰＥ間のデータ転送レートがシフト方向によらず等しいという特徴を有する演算装置である。 In the above-mentioned invention technique, all PEs arranged in the n-dimensional direction constituting the n-dimensional network input / output data in synchronization with the transfer clock. Then, the first data is received from the first adjacent PE adjacent to the shift direction which is the direction of input / output of data, and the second data is output to the second adjacent PE adjacent to the opposite side to be adjacent to each other. It is an arithmetic unit having a feature that the data transfer rate between PEs is equal regardless of the shift direction.

特許第５９３９５７２号公報Japanese Patent No. 5939572

これまでのＣＮＮ演算のための技術は、上記特許文献１に提案の発明に従う場合であっても、並列演算において個々のＰＥの処理量が大きくなるもの、即ち、技術的に最速であるがエネルギーに乏しいＰＥあるいはコアの数が、メモリに蓄積されるデータの数よりはるかに小さい。 The technique for CNN calculation so far is such that the processing amount of each PE becomes large in the parallel calculation even when the invention proposed in Patent Document 1 is followed, that is, it is technically the fastest but energy. The number of poor PEs or cores is much smaller than the number of data stored in memory.

換言すれば、各演算ステップにおけるアクティブなプロセッサ、メモリ動作の数が基本的にＣＮＮアルゴリズムにおける可能性より小さいものであった。 In other words, the number of active processors and memory operations in each arithmetic step was basically smaller than possible in the CNN algorithm.

結果として、ＣＮＮの解決のための時間が、最小値よりはるかに大きく、解決すべきエネルギーが高くなる。 As a result, the time to resolve the CNN is much greater than the minimum and the energy to be resolved is high.

かかる点に鑑みて、本発明の目的は、演算素子ＰＥを三次元的に多数配置し、並列性を保ったまま省電力で高速の計算を行えるデータ処理装置、及びこれにおけるデータ処理方法を提供することにある。 In view of this point, an object of the present invention is to provide a data processing device capable of performing high-speed calculation with low power consumption while maintaining parallelism by arranging a large number of arithmetic elements PE three-dimensionally, and a data processing method thereof. To do.

上記目的を達成する本発明に従う第１の側面は、乗算及び加算機能を有する複数のプロセッシングエレメントを３次元方向に有するデータ処理装置であって、それぞれ複数の前記プロセッシングエレメントが２次元方向に配置され、Ｚ軸方向に積層された複数の２次元面を有し、前記複数の２次元面のそれぞれに対応して特徴重みが配置されるフィルタメモリを有し、入力データがＺ軸方向の上位面の２次元面から配置され、一の面に配置されたプロセッシングエレメントで前記乗算機能により前記入力データと前記特徴重みの積を順次演算して２次元畳み込みデータを演算し、更に下面から転送されるデータと自身のデータを加算する演算を行い、当該演算結果を隣接する上面のプロセッシングエレメントに転送することを特徴とする。 A first aspect according to the present invention that achieves the above object is a data processing apparatus having a plurality of processing elements having multiplication and addition functions in a three-dimensional direction, in which the plurality of processing elements are arranged in the two-dimensional direction. , Has a plurality of two-dimensional surfaces stacked in the Z-axis direction, has a filter memory in which feature weights are arranged corresponding to each of the plurality of two-dimensional surfaces, and input data is an upper surface in the Z-axis direction. The processing element arranged from the two-dimensional surface of the above, sequentially calculates the product of the input data and the feature weight by the multiplication function, calculates the two-dimensional convolution data, and is further transferred from the lower surface. It is characterized in that an operation of adding data and its own data is performed, and the operation result is transferred to a processing element on an adjacent upper surface.

上記目的を達成する本発明に従う第１の側面において、第１の態様として、前記２次元方向に配置されたプロセッシングエレメントはトーラスネットワークに接続され、Ｚ軸方向には、上下面に隣接するＰＥが、ネットワークで双方向に接続されることを特徴とする。 In the first aspect according to the present invention to achieve the above object, as the first aspect, the processing element arranged in the two-dimensional direction is connected to the torus network, and the PE adjacent to the upper and lower surfaces is connected in the Z-axis direction. , It is characterized by being connected in both directions by a network.

上記目的を達成する本発明に従う第１の側面において、第２の態様として、前記２次元畳み込みデータは、隣接するプロセッシングエレメントからの転送データと自身のデータを加算演算し、更にシフト方向に隣接するプロセッシングエレメントに前記加算演算結果を転送することを特徴とする。 In the first aspect according to the present invention to achieve the above object, as a second aspect, the two-dimensional convolution data is added to the transfer data from the adjacent processing element and its own data, and is further adjacent in the shift direction. It is characterized in that the addition operation result is transferred to a processing element.

上記目的を達成する本発明に従う第１の側面において、第３の態様として、前記Ｚ軸方向の最上位面にあるプロセッシングエレメントは、下面の複数のプロセッシングエレメントから転送される２次元畳み込みデータと自身のデータを加算して２．５次元畳み込みデータを演算することを特徴とする。 In the first aspect according to the present invention to achieve the above object, as a third aspect, the processing element on the uppermost surface in the Z-axis direction is the two-dimensional convolution data transferred from the plurality of processing elements on the lower surface and itself. It is characterized in that 2.5-dimensional convolution data is calculated by adding the data of.

上記目的を達成する本発明に従う第１の側面における第３の態様において、第４の態様として、前記２．５次元畳み込みデータは、順次下面のプロセッシングエレメントのシフトされることを特徴とする。 In the third aspect of the first aspect according to the present invention, which achieves the above object, as a fourth aspect, the 2.5-dimensional convolution data is characterized in that the processing element on the lower surface is sequentially shifted.

上記目的を達成する本発明に従う第１の側面における上記何れかの態様において、前記特徴重みは、前記入力データの配置された２次元面の数で分割され、前記入力データの配置された面毎に対応するフィルタメモリに配置し、前記畳み込み演算の際、前記フィルタメモリに配置された特徴重みを、対応する面の全てのプロセッシングエレメントにブロードキャストすることを特徴とする。 In any of the above aspects of the first aspect according to the invention that achieves the above object, the feature weight is divided by the number of two-dimensional planes in which the input data is arranged, and for each plane in which the input data is arranged. It is characterized in that it is arranged in the filter memory corresponding to the above, and at the time of the convolution operation, the feature weight arranged in the filter memory is broadcast to all the processing elements of the corresponding surfaces.

上記目的を達成する本発明に従う第２の側面は、乗算及び加算機能を有する複数のプロセッシングエレメントを３次元方向に有するデータ処理装置におけるデータ処理方法であって、前記データ処理装置は、それぞれ複数の前記プロセッシングエレメントが２次元方向に配置され、Ｚ軸方向に積層された複数の２次元面を有し、前記複数の２次元面のそれぞれに対応して特徴重みが配置されるフィルタメモリを有し、入力データを前記Ｚ軸方向の上位面の２次元面から配置する工程と、一の面に配置されたプロセッシングエレメントで前記乗算機能により前記入力データと前記特徴重みの積を順次演算して２次元畳み込みデータを演算する工程と、更に下面から転送されるデータと自身のデータを加算する演算を行い、当該演算結果を隣接する上面のプロセッシングエレメントに転送する工程を有することを特徴とする。 A second aspect according to the present invention that achieves the above object is a data processing method in a data processing apparatus having a plurality of processing elements having multiplication and addition functions in a three-dimensional direction, and the data processing apparatus has a plurality of each. The processing element is arranged in the two-dimensional direction, has a plurality of two-dimensional surfaces stacked in the Z-axis direction, and has a filter memory in which feature weights are arranged corresponding to each of the plurality of two-dimensional surfaces. , The process of arranging the input data from the two-dimensional surface of the upper surface in the Z-axis direction and the processing element arranged on one surface sequentially calculate the product of the input data and the feature weight by the multiplication function. It is characterized by having a step of calculating dimensional convolution data and a step of further performing a calculation of adding the data transferred from the lower surface and its own data and transferring the calculation result to a processing element on the adjacent upper surface.

上記目的を達成する本発明に従う第２の側面において、第１の態様として、前記２次元畳み込みデータは、隣接するプロセッシングエレメントからの転送データと自身のデータを加算演算し、更にシフト方向に隣接するプロセッシングエレメントに前記加算演算結果を転送する工程を有することを特徴とする。 In the second aspect according to the present invention to achieve the above object, as the first aspect, the two-dimensional convolution data is added to the transfer data from the adjacent processing element and its own data, and is further adjacent in the shift direction. It is characterized by having a step of transferring the addition operation result to a processing element.

上記目的を達成する本発明に従う第２の側面において、第２の態様として、前記Ｚ軸方向の最上位面にあるプロセッシングエレメントは、下面の複数のプロセッシングエレメントから転送される２次元畳み込みデータと自身のデータを加算して２．５次元畳み込みデータを演算する工程を有することを特徴とする。 In the second aspect according to the present invention to achieve the above object, as a second aspect, the processing element on the uppermost surface in the Z-axis direction is the two-dimensional convolution data transferred from the plurality of processing elements on the lower surface and itself. It is characterized by having a step of adding the data of the above and calculating the 2.5-dimensional convolution data.

上記目的を達成する本発明に従う第２の側面において、第３の態様として、前記２．５次元畳み込みデータを、順次下面のプロセッシングエレメントにシフトする工程を有することを特徴とする。 A second aspect according to the present invention that achieves the above object is characterized in that, as a third aspect, there is a step of sequentially shifting the 2.5-dimensional convolution data to a processing element on the lower surface.

上記本発明に従う特徴構成により、処理されるデータと同数のプロセッサで並列演算を行うことで、最小実行ステップ数でＣＮＮの各層の計算を実行できる。このため、リアルタイム処理に求められる実行時間制約を達成できる最小の動作クロック周波数で実行可能であり、リアルタイムかつ低消費電力の計算を行える。 According to the feature configuration according to the present invention, the calculation of each layer of CNN can be executed with the minimum number of execution steps by performing parallel computation with the same number of processors as the data to be processed. Therefore, it can be executed at the minimum operating clock frequency that can achieve the execution time constraint required for real-time processing, and real-time and low power consumption calculation can be performed.

ＣＮＮのシステム構成の概略を示す図である。It is a figure which shows the outline of the system configuration of CNN. TAP(Tensor Array Processor)における３次元アレイ状のPME(Processor in Memory)を示す図である。It is a figure which shows the PME (Processor in Memory) in the form of a three-dimensional array in a TAP (Tensor Array Processor). 各PMEの機能構成例ブロック図である。It is a block diagram of a functional configuration example of each PME. TAPの３次元アレイをそれぞれのXY面に展開して示す図である。It is a figure which shows the 3D array of TAP expanded in each XY plane. 図４における一つの面に属するPMEを拡大して示す図である。It is a figure which shows the PME belonging to one plane in FIG. 4 in an enlarged manner. 畳み込み演算による演算結果の変化を示す図である。It is a figure which shows the change of the operation result by the convolution operation. 本発明のデータ処理方法におけるある層の初期状態を表す図である。It is a figure which shows the initial state of a certain layer in the data processing method of this invention. １番目のフィルタについて２次元畳み込みを行っている状態を表す図である。It is a figure which shows the state which perform 2D convolution about the 1st filter. 下の面から１番目の２次元畳み込み演算データが転送され、自身の畳み込み演算結果と加算することで２．５次元畳み込み演算の結果を求める処理を示す図である。It is a figure which shows the process which the 1st 2D convolution operation data is transferred from the lower surface, and the result of 2.5-dimensional convolution operation is obtained by adding with the own convolution operation result. １番目の２．５次元畳み込みの結果（白丸）を下の面にデータシフトした状態を表す図である。It is a figure which shows the state which the result (white circle) of the first 2.5-dimensional convolution is data-shifted to the lower surface. ２番目のフィルタについて２次元畳み込みを行っている状態を表す図である。It is a figure which shows the state which perform 2D convolution about the 2nd filter. 下の面から転送される２番目のフィルタに対する２次元畳み込み演算データと自身の畳み込み演算結果との加算により２．５次元畳み込み演算の結果を求める処理を示す図である。It is a figure which shows the process which obtains the result of the 2.5-dimensional convolution operation by the addition of the 2D convolution operation data for the 2nd filter transferred from the lower surface, and the own convolution operation result. １番目と２番目の２．５次元畳み込みの結果（白丸）を下の面にデータシフトした状態を表す図である。It is a figure which shows the state which the result (white circle) of the 1st and 2nd 2.5-dimensional convolution is data-shifted to the lower surface. ３番目のフィルタについて２次元畳み込みを行っている状態を表す図である。It is a figure which shows the state which perform 2D convolution about the 3rd filter. 下の面から転送される３番目のフィルタに対する２次元畳み込み演算データと自身の畳み込み演算結果と加算により２．５次元畳み込み演算の結果を求める処理を示す図である。It is a figure which shows the process which obtains the result of the 2.5-dimensional convolution operation by the 2D convolution operation data for the 3rd filter transferred from the lower surface, the own convolution operation result, and addition. １番目から３番目の２．５次元畳み込みの結果（白丸）を下の面にデータシフトした状態を表す図である。It is a figure which shows the state which the result (white circle) of the 1st to 3rd 2.5-dimensional convolution is data-shifted to the lower surface. ４番目のフィルタについて２次元畳み込みを行っている状態を表す図である。It is a figure which shows the state which perform 2D convolution about the 4th filter. 下の面から転送される４番目のフィルタに対する２次元畳み込み演算データと自身の畳み込み演算結果と加算により２．５次元畳み込み演算の結果を求める処理を示す図である。It is a figure which shows the process which obtains the result of the 2.5-dimensional convolution operation by the 2D convolution operation data for the 4th filter transferred from the lower surface, the own convolution operation result, and addition. ４枚のフィルタを用いた畳み込み層の計算結果を表す図であるIt is a figure which shows the calculation result of the convolution layer using four filters. 畳み込み演算の例を具体的数値で説明する図である。It is a figure explaining the example of a convolution operation with concrete numerical values. 特徴重みの縦横を異なるものとした時のデータ上のシフト方向を考察する図である。It is a figure which considers the shift direction on the data when the vertical and horizontal of a feature weight are different. 本発明に従う２．５次元畳み込み演算の様子を示すタイムチャート図である。It is a time chart diagram which shows the state of the 2.5-dimensional convolution operation according to this invention.

以下に、本発明の実施例を添付の図面に従い説明する。これらの実施例は本発明の理解を容易とするためのものであり、本発明の適用は、これら実施例に限定されるものではない。また、本発明の保護の範囲は、特許請求の範囲と同一又は類似の範囲にも及ぶ。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. These examples are for facilitating the understanding of the present invention, and the application of the present invention is not limited to these examples. In addition, the scope of protection of the present invention extends to the same or similar scope as the claims.

本発明に従うデータ処理装置は、３次元アレイ状に配置されたそれぞれメモリ機能を有する演算素子であるPME(Processor in Memory)とネットワークから構成されるシステムであり、以降TAP(Tensor Array Processor)と称する。 The data processing device according to the present invention is a system composed of a PME (Processor in Memory), which is an arithmetic element having a memory function and is arranged in a three-dimensional array, and a network, and is hereinafter referred to as a TAP (Tensor Array Processor). ..

図２は、かかるTAPにおける３次元アレイ状のPMEを示す図であり、それぞれ計算モジュールを有する複数のPMEが３次元（X,Y,Z）方向に積層配列されている。 FIG. 2 is a diagram showing three-dimensional array-shaped PMEs in such a TAP, in which a plurality of PMEs each having a calculation module are stacked and arranged in a three-dimensional (X, Y, Z) direction.

かかる構造は、半導体技術により、３次元プロセッサとして作成可能である。すなわち、複数のPMEがX,Y方向に配列された２次元半導体面をZ方向に積み重ねて１チップで３次元構造とすることが可能である。 Such a structure can be created as a three-dimensional processor by semiconductor technology. That is, it is possible to stack two-dimensional semiconductor surfaces in which a plurality of PMEs are arranged in the X and Y directions in the Z direction to form a three-dimensional structure with one chip.

図２において、複数のPMEが、Ws×Hs×Csの３次元アレイ状（X軸方向にWs個,Y軸方向にHs個,Z軸方向にCz個）に配列されている。各XY面では、各PMEがトーラスネットワーク（Lｘ、Ly）に接続され、各PMEの有するデータをX軸正方向、X軸負方向、Y軸正方向、Y軸負方向の4方向にデータシフトする機能を有する。 In FIG. 2, a plurality of PMEs are arranged in a three-dimensional array of Ws × Hs × Cs (Ws in the X-axis direction, Hs in the Y-axis direction, and Cz in the Z-axis direction). On each XY plane, each PME is connected to a torus network (Lx, Ly), and the data possessed by each PME is shifted in four directions: X-axis positive direction, X-axis negative direction, Y-axis positive direction, and Y-axis negative direction. Has the function of

また、Z軸方向では、上下面に隣接するPMEが、ネットワークLzで双方向に接続され、Z軸正方向（上面方向）とZ軸負方向（下面方向）のデータ転送を同時に行うことが可能に構成されている。かかるデータシフトの方向制御及び、そのための共通シフトクロックは、後にデータ処理装置を展開図で示す制御プロセッサにより供給される。 In the Z-axis direction, PMEs adjacent to the upper and lower surfaces are connected in both directions via the network Lz, and data can be transferred simultaneously in the Z-axis positive direction (upper surface direction) and in the Z-axis negative direction (lower surface direction). It is configured in. The direction control of the data shift and the common shift clock for that purpose are later supplied by the control processor whose data processing device is shown in the developed view.

図３は、各PMEの機能構成例ブロック図であり、畳み込み層の計算に必要な乗算及び加算機能ブロック１０とプーリング層の計算に必要なMAX（最大値）演算機能ブロック１１を有している。さらに、必要に応じて、追加の機能ブロックを添えることは可能である。 FIG. 3 is a block diagram of a functional configuration example of each PME, and has a multiplication and addition function block 10 required for the calculation of the convolution layer and a MAX (maximum value) calculation function block 11 necessary for the calculation of the pooling layer. .. In addition, it is possible to add additional functional blocks as needed.

図４は、上記TAPの３次元アレイをそれぞれのXY面に展開して示す図である。かかるTAPは、システムとして共通の制御プロセッサ２０と、指示メモリ２１を有している。 FIG. 4 is a diagram showing the three-dimensional array of the TAP expanded on each XY plane. Such a TAP has a control processor 20 common to the system and an instruction memory 21.

３次元アレイのそれぞれのXY面（Z=1，Z=2,・・・Z=S）に対応してフィルタメモリFM1-FMSを有し、フィルタメモリFM1-FMSのそれぞれは、対応する同一面に存在する全てのPMEと接続されている。計算の際に、フィルタメモリFM1-FMSからフィルタの特徴重みを対応する面の全てのPMEにブロードキャストすることが可能である。 Each of the three-dimensional array has a filter memory FM1-FMS corresponding to each XY plane (Z = 1, Z = 2, ... Z = S), and each of the filter memory FM1-FMS has a corresponding identical plane. It is connected to all PMEs that exist in. During the calculation, the filter memory FM1-FMS can broadcast the feature weights of the filter to all PMEs of the corresponding faces.

ここで、本発明に従うデータ処理方法をＣＮＮの畳み込み演算処理に用いる場合を想定する。 Here, it is assumed that the data processing method according to the present invention is used for the convolution operation processing of CNN.

指示メモリ２１には、事前の学習により得られた各層のフィルタサイズ及びフィルタ数が畳み込みニューラルネットワーク構造として入力される。これに基づき、制御プロセッサ２０により、各面のフィルタメモリFM1-FMSに対応する重み、及び共通のクロック信号の供給等が行われる。 The filter size and the number of filters of each layer obtained by prior learning are input to the instruction memory 21 as a convolutional neural network structure. Based on this, the control processor 20 supplies weights corresponding to the filter memories FM1-FMS on each surface, a common clock signal, and the like.

PMEは、畳み込み層において、自身の有するデータと対応するフィルタメモリからブロードキャストされる重みとを乗算し、更に隣接するPMEから転送されるデータとを加算して、その結果を反対側に隣接するPMEに転送する。プーリング層においては、自身のデータと隣接するPMEから転送されるデータの最大値を求め、隣接するPMEに順次転送する。 In the convolution layer, the PME multiplies its own data with the weight broadcast from the corresponding filter memory, then adds the data transferred from the adjacent PME, and the result is the adjacent PME on the opposite side. Transfer to. In the pooling layer, the maximum value of the data transferred from its own data and the adjacent PME is obtained, and the data is sequentially transferred to the adjacent PME.

先に、説明した様に指示メモリ２１からの予め学習によって得られた指示データに基づき、上記のPMEによる演算と転送の方向及び共通シフトのためのタイミングクロックが、制御プロセッサ２０から全てのPMEに送られる。 As described above, based on the instruction data obtained by pre-learning from the instruction memory 21, the timing clock for the direction and common shift of the calculation and transfer by the above PME is transmitted from the control processor 20 to all PMEs. Sent.

図５は、かかる図４における一つの面に属するPMEを拡大して示す図である。この例では、Ws=４, Hs=３の場合で、Z=kの面を示している。フィルタメモリFMkから共通に、Z=kの面にある全てのPMEにフィルタ(特徴重み)が供給される。 FIG. 5 is an enlarged view showing a PME belonging to one surface in FIG. 4. In this example, the plane of Z = k is shown in the case of Ws = 4 and Hs = 3. A filter (feature weight) is commonly supplied from the filter memory FMk to all PMEs on the Z = k plane.

かかる構成のTAPに、列数W_o×行数H_o×チャンネル数C_oの３次元テンソルデータがＣＮＮの第１層の入力となる。 Three-dimensional tensor data of the number of columns W _o × the number of rows H _o × the number of channels _Co is input to the TAP having such a configuration as the first layer of the CNN.

この入力データに繰り返しＣＮＮの畳み込み層の計算を適用することで、各層のデータサイズが変化する。 By repeatedly applying the calculation of the convolutional layer of CNN to this input data, the data size of each layer changes.

ここで、k層目のデータサイズをW_k×H_k×C_kとする。本発明のシステムでは、 Here, the data size of the kth layer is W _k × H _k × C _k . In the system of the present invention

であると想定する。 It is assumed that.

入力のチャネル数がこの値より小さい場合は、TAPの上位のXY面から順に入力データを配置する。入力データが配置されたPMEは活性面（active）、そうでなければ非活性面（inactive）となる。ただし、PMEが非活性面であってもデータの転送は行われる。 If the number of input channels is smaller than this value, the input data is arranged in order from the upper XY plane of the TAP. The PME in which the input data is placed is the active surface (active), otherwise it is the inactive surface (inactive). However, data transfer is performed even if the PME is an inactive surface.

例えば、入力データとして一枚の画像データを考えた時、次のように想定することが出来る。一枚の画像データを同じ大きさの領域ごとに区切り複数の領域データ（チャネル）として切り出し、各領域データをTAPの最上位の面から順に該当の面にあるPMEに配置していく。 For example, when considering one image data as input data, the following can be assumed. One image data is divided into areas of the same size, cut out as multiple area data (channels), and each area data is placed in the PME on the corresponding surface in order from the top surface of the TAP.

この時、フィルタ（特徴重み）は、次のように処理される。一つの特徴重みを前記画像データの配置される面の数に対応して分割し、それぞれの分割特徴重みを対応する面のファイルメモリFMに格納する。そして、計算時に対応するフィルタメモリFMに配置されている特徴重みが当該面に属する全てのPMEにブロードキャストされる。 At this time, the filter (feature weight) is processed as follows. One feature weight is divided according to the number of faces on which the image data is arranged, and each split feature weight is stored in the file memory FM of the corresponding face. Then, the feature weights arranged in the corresponding filter memory FM at the time of calculation are broadcast to all PMEs belonging to the surface.

それぞれのPMEは、ブロードキャストされた特徴重みと自身のデータとの積を演算する。さらに、PMEは、一方向の隣接するメモリ要素から転送されるデータを前記の積の演算結果に加え、反対方向に隣接するPMEに転送する。かかる処理を繰り返し、２次元畳み込みを行う。なお、かかる場合の転送制御は、先に述べた特許文献１の発明に従い実行される。 Each PME computes the product of the broadcast feature weights and its own data. Further, the PME adds the data transferred from the adjacent memory elements in one direction to the calculation result of the product, and transfers the data to the adjacent PMEs in the opposite direction. This process is repeated to perform two-dimensional convolution. The transfer control in such a case is executed according to the invention of Patent Document 1 described above.

さらに、本発明では、特徴として、入力データに対し２次元畳み込みを行ったデータが配置されたTAPの最下面から最上面まで、それぞれ２次元畳み込みデータを上方向に転送する。この時、各面のPMEは自身のデータと一つ下の面からの畳み込み演算結果を足し合わせ、その結果を一つ上の面のPMEに転送する。 Further, as a feature of the present invention, the two-dimensional convolution data is transferred upward from the lowermost surface to the uppermost surface of the TAP in which the data obtained by performing the two-dimensional convolution with respect to the input data is arranged. At this time, the PME of each surface adds its own data and the convolution calculation result from the next lower surface, and transfers the result to the PME of the next upper surface.

最終的に最上位面の２次元アレイプロセッサで、全ての２次元アレイの２次元畳み込み結果を足し合わせた２．５次元畳み込み演算結果を得ることが出来る。 Finally, the top-level two-dimensional array processor can obtain a 2.5-dimensional convolution operation result by adding the two-dimensional convolution results of all the two-dimensional arrays.

さらに、後に詳述するように、最上位面で得られた２．５次元畳み込み演算結果は、順次下面にシフトされる。 Further, as will be described in detail later, the 2.5-dimensional convolution calculation result obtained on the uppermost surface is sequentially shifted to the lower surface.

かかる畳み込み演算による演算結果の変化を図６に示す。図６（１）に示すように、N×M×Cin個のPMEに配置された入力データが、畳み込み演算の結果N×M×Cout個のPMEに畳み込み演算結果が得られる。このときCoutの大きさは、特徴重み(kernel)の数に依存する。 FIG. 6 shows a change in the calculation result due to the convolution calculation. As shown in FIG. 6 (1), the input data arranged in N × M × Cin PMEs is the result of the convolution operation, and the convolution operation result is obtained in N × M × Cout PMEs. At this time, the size of Cout depends on the number of feature weights (kernels).

この結果がＣＮＮの一つの層の畳み込み演算処理結果のデータであり、次いで、図６（２）に示すようにプーリング層の演算処理を行ってレイヤーを縮小して扱いやすくする。同時に、この演算結果は次の層の入力になる。 This result is the data of the convolution calculation processing result of one layer of CNN, and then, as shown in FIG. 6 (2), the calculation processing of the pooling layer is performed to reduce the layer and make it easier to handle. At the same time, the result of this operation becomes the input of the next layer.

ここで、本発明のデータ処理装置において実行されるデータ処理に従う畳み込み演算処理の特徴を理解容易のために、更に図７Ａ～７Ｍにおいて各面における変化を模式的に示す。 Here, for the sake of easy understanding of the characteristics of the convolution operation according to the data processing executed in the data processing apparatus of the present invention, changes in each surface are schematically shown in FIGS. 7A to 7M.

図Ａ～７Ｍにおいて、活性面は入力データが配置された面である。図７Ａは、初期状態を表す。実線の直方体が活性面のPMEを表し、破線の直方体が不活性面のPMEを表す。以下、図７Ｂ～７Ｍにおいて同様である。さらに、灰色の丸で占められる表示は入力データDinを表している。図７Ｂは１番目のフィルタに対して各面において２次元畳み込みを行っている状態を示す。図の矢印は、各面にあるPMEに対するデータの転送方向を表し、黒丸は計算結果を示す。以下、図７Ｃ～７Ｍにおいて同様である。 In FIGS. A to 7M, the active surface is the surface on which the input data is arranged. FIG. 7A shows the initial state. The rectangular parallelepiped of the solid line represents the PME of the active surface, and the rectangular parallelepiped of the broken line represents the PME of the inactive surface. Hereinafter, the same applies to FIGS. 7B to 7M. Furthermore, the display occupied by the gray circle represents the input data Din. FIG. 7B shows a state in which two-dimensional convolution is performed on each surface of the first filter. The arrows in the figure indicate the data transfer direction to the PME on each surface, and the black circles indicate the calculation results. Hereinafter, the same applies to FIGS. 7C to 7M.

図７Ｃは、下の面からその２次元畳み込み演算データが転送され、自身の２次元畳み込み演算結果と加算することで２．５次元畳み込み演算の結果を求める処理を示している。 FIG. 7C shows a process in which the two-dimensional convolution operation data is transferred from the lower surface and added to the own two-dimensional convolution operation result to obtain the result of the 2.5-dimensional convolution operation.

図７Ｄは、２番目のフィルタに対する畳み込み計算を行う準備として、この２．５次元畳み込み演算結果を下の面にシフトする状態を示している。このシフトは、不活性面を含めて行われる。 FIG. 7D shows a state in which the 2.5-dimensional convolution calculation result is shifted to the lower surface in preparation for performing the convolution calculation for the second filter. This shift is performed including the inactive surface.

図７Ｅは２番目のフィルタに対して各面において２次元畳み込みを行っている状態を示す。図７Ｆは下の面から２番目のフィルタに対する２次元畳み込み演算データが転送され、自身の畳み込み演算結果と加算することで、２．５次元畳み込み演算の結果を求める処理を示している。 FIG. 7E shows a state in which two-dimensional convolution is performed on each surface of the second filter. FIG. 7F shows a process of obtaining the result of the 2.5-dimensional convolution operation by transferring the two-dimensional convolution operation data to the second filter from the lower surface and adding it to the own convolution operation result.

同様に図７Ｇ～図７Ｌは３番目、４番目のフィルタに対する畳み込み計算の様子を示している。すなわち、図７Ｇは、１番目と２番目の２．５次元畳み込みの結果(白丸)を矢印のように上の面から下の面にデータシフトした状態を示している。 Similarly, FIGS. 7G to 7L show the state of the convolution calculation for the third and fourth filters. That is, FIG. 7G shows a state in which the results (white circles) of the first and second 2.5-dimensional convolutions are data-shifted from the upper surface to the lower surface as shown by arrows.

図７Ｈは、３番目のフィルタについて２次元畳み込みを行っている状態を示している。図７Ｉは、下から３番目のフィルタに対する２次元畳み込み演算データが転送され、自身の畳み込み演算結果と加算することで２．５次元畳み込み演算の結果を求める処理を示している。 FIG. 7H shows a state in which two-dimensional convolution is performed for the third filter. FIG. 7I shows a process of transferring the two-dimensional convolution operation data to the third filter from the bottom and adding it to the own convolution operation result to obtain the result of the 2.5-dimensional convolution operation.

図７Ｊは、１番目から３番目の２．５次元畳み込み結果（白丸）を上の面から下の面にデータシフトした状態を示している。この際、不活性面にもデータがシフトされている。 FIG. 7J shows a state in which the first to third 2.5-dimensional convolution results (white circles) are data-shifted from the upper surface to the lower surface. At this time, the data is also shifted to the inactive surface.

図７Ｋは、４番目のフィルタに対する２次元畳み込み演算を行っている状態を示している。図７Ｌは、図７Ｉの処理と同様であるが、下から４番目のフィルタに対する２次元畳み込み演算データが転送され、自身の畳み込み演算結果と加算することで２.５次元畳み込み演算結果を求める処理を示している。 FIG. 7K shows a state in which a two-dimensional convolution operation is performed on the fourth filter. FIG. 7L is the same as the process of FIG. 7I, but the two-dimensional convolution operation data for the fourth filter from the bottom is transferred, and the 2.5-dimensional convolution operation result is obtained by adding it to the own convolution operation result. Shows the desired process.

図７Ｍは、最終的に４枚のフィルタを用いた畳み込み層の計算結果を表し、これが次層の入力となる。 FIG. 7M finally shows the calculation result of the convolution layer using the four filters, and this is the input of the next layer.

ここで、畳み込み演算を式で表すと下記(１)式のようになる。b₀はバイアス定数項である。バイアスb₀は、畳み込み演算の結果を一定値増加、減少するために使用される。このバイアスb₀とフィルタの重みωはともに、ＣＮＮの学習時に自動的に決定される。 Here, the convolution operation can be expressed by an equation as shown in equation (1) below. b ₀ is the bias constant term. Bias b ₀ is used to increase or decrease the result of the convolution operation by a certain value. Both the bias b ₀ and the filter weight ω are automatically determined during CNN learning.

ただし、sは自然数であり、畳み込み計算を行うときのストライドを表す。さらに、簡単化のため、本発明の説明ではストライドが１のときのみを説明しているが、ストライドが２以上であっても本発明の適用可能は、否定されない。 However, s is a natural number and represents the stride when performing the convolution calculation. Further, for the sake of simplicity, the description of the present invention describes only when the stride is 1, but the applicability of the present invention cannot be denied even if the stride is 2 or more.

は、第ｌ層のo番目のフィルタの重み、 Is the weight of the oth filter of the lth layer,

は、第ｌ層の入力データである。 Is the input data of the first layer.

それぞれの面にあるPMEは、（１）式の後半部分 The PME on each side is the latter half of equation (1).

の計算を行う。このとき、C=C^lの２次元畳み込み演算をTAPの一番上の面のPMEが計算しており、同様にC=C^l－1の２次元畳み込演算をその一つ下の面のPMEが計算している。各面で計算した上記の後半部分の計算結果を足し合わせることで（１）式全体の計算をしている。 Do the calculation. At this time, the PME on the top surface of the TAP calculates the two-dimensional convolution operation of C = C ^l , and similarly, the two-dimensional convolution operation of C = C ^l -1 is performed on the surface below it. Calculated by PME. By adding the calculation results of the latter half of the above calculated on each surface, the calculation of the whole equation (1) is performed.

図８は、上記の畳み込み演算の例を具体的数値で説明する図であり、２つの上下面の場合を例にしている。 FIG. 8 is a diagram illustrating an example of the above-mentioned convolution operation with specific numerical values, and the case of two upper and lower surfaces is taken as an example.

一の面（Ch1）で入力（Input）x６０と重み（kernel）ｗ６１を矢印方向に移動しながら乗算し、同時に下の面（Ch2）で入力（Input）x６２と重み（kernel）ｗ６３を矢印方向に移動しながら乗算する。これにより、それぞれ２次元畳み込み演算結果６４が得られる。 Multiply the input (Input) x60 and the weight (kernel) w61 on one surface (Ch1) while moving in the arrow direction, and at the same time, input (Input) x62 and the weight (kernel) w63 on the lower surface (Ch2) in the arrow direction. Multiply while moving to. As a result, the two-dimensional convolution calculation result 64 is obtained, respectively.

一の面（Ch1）の初期時点での入力６０と重み６１との乗算結果は、次のようであり、
（－３＊１）＋（－２＊２）＋（１＊２）＋（３＊２）＝１
次いで、1桁分矢印方向にシフトした時の入力６０と重み６１との乗算結果は、次のようである。 The multiplication result of the input 60 and the weight 61 at the initial time of one surface (Ch1) is as follows.
(-3 * 1) + (-2 * 2) + (1 * 2) + (3 * 2) = 1
Next, the multiplication result of the input 60 and the weight 61 when shifted in the direction of the arrow by one digit is as follows.

（１＊１）＋（－３＊２）＋（－２＊３）＋（２＊２）＋（１＊２）＋（３＊１）＝－２
これらは、２次元畳み込み演算結果６４に示される通りである。 (1 * 1) + (-3 * 2) + (-2 * 3) + (2 * 2) + (1 * 2) + (3 * 1) = -2
These are as shown in the two-dimensional convolution operation result 64.

一方、下の面（Ch2）の初期時点での入力６２と重み６３との乗算結果は、次のようであり。 On the other hand, the multiplication result of the input 62 and the weight 63 at the initial time of the lower surface (Ch2) is as follows.

（－２＊２）＋（３＊３）＋（－３＊１）＋（１＊３）＝５
次いで、1桁分矢印方向にシフトした時の入力６０と重み６１との乗算結果は、次のようである。 (-2 * 2) + (3 * 3) + (-3 * 1) + (1 * 3) = 5
Next, the multiplication result of the input 60 and the weight 61 when shifted in the direction of the arrow by one digit is as follows.

(２*２) + (-２*３) + (３*３) + (１*１) + (-３*３) + (１*２) =１である。 (2 * 2) + (-2 * 3) + (3 * 3) + ( 1 * 1) + ( -3 * 3) + (1 * 2) = 1.

これらは、２次元畳み込み演算結果６５に示される通りである。 These are as shown in the two-dimensional convolution calculation result 65.

ついで、前記一の面（Ch1）では、自身の２次元畳み込み演算結果６４を得て、更に下の面（Ch2）から転送される２次元畳み込み演算結果６５が転送される。したがって、それら２次元畳み込み演算結果６４及び６５とバイアスb₀＝１とを加算して２．５次元畳み込み演算結果６６に示すように求める。 Then, on the one surface (Ch1), its own two-dimensional convolution calculation result 64 is obtained, and the two-dimensional convolution operation result 65 transferred from the lower surface (Ch2) is transferred. Therefore, the two-dimensional convolution calculation results 64 and 65 and the bias b ₀ = 1 are added and obtained as shown in the 2.5-dimensional convolution calculation result 66.

上記の様に、入力データ上で重みを順次所定桁数分ずつシフトして乗算及び加算を繰り返すことにより２次元畳み込み演算結果が得られる。 As described above, the weight is sequentially shifted by a predetermined number of digits on the input data, and multiplication and addition are repeated to obtain a two-dimensional convolution operation result.

この際、指示メモリ２１に格納されている指示に基づき、PMEからのデータ転送の方向がデータを一筆書きに転送し、無駄な転送をなくし、同じPMEに複数のデータ転送が行われないように制御され、データの衝突を回避することが出来る。 At this time, based on the instruction stored in the instruction memory 21, the direction of data transfer from the PME transfers the data in one stroke, eliminates unnecessary transfer, and prevents multiple data transfers to the same PME. It is controlled and can avoid data collisions.

ここで、上記図８に示す例では特徴重みを縦横３×３、即ち縦横の長さが同じｗ×ｗとしているが、縦横の長さが異なる様に一般化することが出来、これをｗ_1×ｗ_2として表す。 Here, in the example shown in FIG. 8, the feature weight is set to 3 × 3 in length and width, that is, w × w having the same length and width, but it can be generalized so that the length and width are different, and this is w_1. Expressed as × w_2.

図９は、特徴重みの縦横を異なるものとした時のデータ上のシフト方向を考察する図である。図９(１)は、縦横長さが同じ奇数で、中心に向かう様に一筆書きでシフトすることが出来る。図９(２)は、縱の長さ、横の長さのいずれか一方が偶数であり、図９(１)と同様に、全ての点をちょうど１回ずつ通るハミルトンパスが存在する。これに対し、図９(３)は、縱の長さも横の長さも奇数の場合で有り、１個のPMEは２回通過することになるので、無駄な転送が発生する。 FIG. 9 is a diagram for considering the shift direction on the data when the vertical and horizontal features weights are different. FIG. 9 (1) has an odd number with the same vertical and horizontal lengths, and can be shifted with a single stroke so as to move toward the center. In FIG. 9 (2), either the length of the 縱 or the horizontal length is an even number, and as in FIG. 9 (1), there is a Hamilton path that passes through all the points exactly once. On the other hand, FIG. 9 (3) shows a case where both the length and the horizontal length of the 縱 are odd numbers, and one PME passes twice, so that unnecessary transfer occurs.

ここで、上記説明したように２．５次元の畳み込み演算の結果が得られるが、このデータはTAPの一番上のPMEが保有している。本発明に従うアルゴリズムでは、TAPの最下面をZ=1, 最上面をZ=C_Sとすると、l層目で0番目のフィルタを用いたときの２．５次元畳み込み演算の結果 Here, as described above, the result of the 2.5-dimensional convolution operation is obtained, and this data is held by the PME at the top of the TAP. In the algorithm according to the present invention, assuming that the lowermost surface of the TAP is Z = 1 and the uppermost surface is Z = C _S , the result of the 2.5-dimensional convolution operation when the 0th filter is used in the lth layer.

を

of

の面状のPMEに配置し、次の層の畳み込みの計算の準備に整える。このため、最上面で計算結果が得られる度に、各PMEが保有している２．５次元畳み込み演算結果を下面方向に１回シフトする(図７Ｂ参照)。 Place on a planar PME and prepare for the calculation of the next layer convolution. Therefore, each time the calculation result is obtained on the uppermost surface, the 2.5-dimensional convolution calculation result held by each PME is shifted once in the lower surface direction (see FIG. 7B).

図１０は、更に本発明に従う２．５次元畳み込み演算の様子を示すタイムチャート図である。このタイムチャートでは、PME(i,j,1), PME(i,j,2),…PME(i,j,Cs)の動作を表している。また、この図ではCin個の面がactiveである。 FIG. 10 is a time chart diagram showing a state of the 2.5-dimensional convolution operation according to the present invention. This time chart shows the operation of PME (i, j, 1), PME (i, j, 2),… PME (i, j, Cs). Also, in this figure, the Cin planes are active.

タイムチャートにおいて、各面の墨塗り部分Ａで２次元畳み込み演算を行っている。この計算結果が終了した次のステップでその計算結果とbiasBを足し合わせて上の面のPMEにデータを転送する(上方向矢印)。 In the time chart, a two-dimensional convolution operation is performed on the blackened portion A of each surface. In the next step after this calculation result is completed, the calculation result and bias B are added and the data is transferred to the PME on the upper surface (up arrow).

次の面のPMEは、下の面のPMEから転送されたデータと自身の２次元畳み込み演算結果を足し合わせて,その結果Ｃを更に一つ上の面のPMEに転送することを繰り返す。最終的に一番上の面のPMEでの演算結果が、２．５次元畳み込みの演算結果Ｄとなる。 The PME on the next surface adds the data transferred from the PME on the lower surface and the result of its own two-dimensional convolution operation, and as a result, transfers C to the PME on the next upper surface repeatedly. Finally, the calculation result of PME on the top surface is the calculation result D of 2.5-dimensional convolution.

次いで、この、２．５次元畳み込みの演算結果Ｄが、一つの重みについて２．５次元畳み込み演算が終わる都度、下向矢印の方向に下の面にシフトされる。この際、上の面からシフトされるデータはPMEでは、それを保存するだけで、その他の処理は行われない。２．５次元畳み込演算を求めるために、一度だけシフトを行う。 Next, the calculation result D of the 2.5-dimensional convolution is shifted to the lower surface in the direction of the down arrow each time the 2.5-dimensional convolution operation is completed for one weight. At this time, the data shifted from the upper surface is only saved in PME, and no other processing is performed. Shift only once to obtain the 2.5-dimensional convolution operation.

上記の動作をCout回繰り返し、一番上の面からCoutまでに２．５次元畳み込み演算結果が保持される。 The above operation is repeated Cout times, and the 2.5-dimensional convolution operation result is held from the top surface to Cout.

ここで、ＣＮＮの各層において、上記したように２．５次元畳み込み処理が行われた後、プーリング（pooling）演算を行なって、次の層の入力データとされる。 Here, in each layer of the CNN, after the 2.5-dimensional convolution process is performed as described above, a pooling operation is performed to obtain input data for the next layer.

プーリング演算は、２次元畳み込み演算の時と同じ方法で周辺のPMEが持つデータを受け取り次の式（２）で最大値を計算する。 In the pooling operation, the data held by the surrounding PMEs is received by the same method as in the two-dimensional convolution operation, and the maximum value is calculated by the following equation (2).

ただし、s'は２以上の自然数であり、畳み込み計算を行うときのストライドを表す。さらに、actは活性化関数であり、例えば However, s'is a natural number of 2 or more and represents the stride when performing the convolution calculation. Furthermore, act is an activation function, for example

が用いられる。 Is used.

また、プーリングでは、２次元畳み込みと同様に上面側にデータ転送を行うが、各PMEは、自身の有するデータxと隣接した下面のPMEから受け取ったy_inを用いて In pooling, data is transferred to the upper surface side in the same way as two-dimensional convolution, but each PME uses y _in received from the PME on the lower surface adjacent to its own data x.

を計算する。 To calculate.

この計算を行いながら先に、図９で説明した様にデータを転送することにより周辺のPMEの持つデータの最大値を求める。 While performing this calculation, the maximum value of the data possessed by the surrounding PMEs is obtained by first transferring the data as described with reference to FIG.

以上説明したように、本発明に従うデータ処理装置は、ＣＮＮにおけるデータ処理装置として使用される場合は、ＣＮＮの構造（学習によって得られた各層のフィルタサイズ、フィルタ数）が入力として与えられる。各フィルタの重みがTAPの各面のフィルタメモリ上に与えられる。さらに、ＣＮＮの主入力データがTAPのPMEに配置される。 As described above, when the data processing apparatus according to the present invention is used as a data processing apparatus in CNN, the structure of CNN (filter size of each layer obtained by learning, number of filters) is given as an input. The weight of each filter is given on the filter memory of each side of the TAP. In addition, the CNN main input data is placed in the TAP's PME.

各面のPMEは、自身のデータと重みを乗算して２次元畳み込みデータを、周辺から転送されるデータとを加算して上位の面上のPMEに送る。したがって、最上位の面にあるPMEで、全ての下面の２次元畳み込み演算結果加算することにより並列性を保ったまま省力で高速の２．５次元畳み込み演算結果を得ることが出来る。 The PME of each surface multiplies its own data by the weight, adds the two-dimensional convolution data to the data transferred from the periphery, and sends it to the PME on the upper surface. Therefore, by adding the two-dimensional convolution calculation results of all the lower surfaces with the PME on the uppermost surface, it is possible to obtain a high-speed 2.5-dimensional convolution calculation result without labor while maintaining parallelism.

PME メモリ要素
１０乗算及び加算機能ブロック
１１ MAX（最大値）演算機能ブロック
２０制御プロセッサ
２１指示メモリ
FM1-FMS フィルタメモリ PME memory element 10 Multiplication and addition function block 11 MAX (maximum value) arithmetic function block 20 control processor 21 instruction memory
FM1-FMS filter memory

Claims

A data processing device having a plurality of processing elements having multiplication and addition functions in a three-dimensional direction.
The plurality of processing elements are arranged on each of the plurality of two-dimensional planes laminated in the Z-axis direction.
The processing elements arranged on the two-dimensional plane are connected to the torus network, and the processing elements adjacent to the upper and lower surfaces in the Z-axis direction are connected in both directions by the network.
A processing element arranged on the two-dimensional plane corresponding to each of the plurality of two-dimensional planes has a filter memory for broadcasting feature weights.
The input data is sequentially arranged from the two-dimensional surface of the upper surface in the Z-axis direction to the lower surface.
Each of the processing elements arranged on one two-dimensional plane multiplies the input data and the feature weight by the multiplication function, and further obtains the transfer data from the adjacent processing elements on the two-dimensional plane and the result of the multiplication. Addition is performed, and then the addition operation result is transferred to a processing element adjacent to the opposite side of the two-dimensional surface .
The multiplication, addition, and transfer are repeated to perform two-dimensional convolution of the input data, and further.
The processing element of each two-dimensional surface adds its own data and the convolution data from the surface one below in the Z-axis direction, and transfers the data to the surface one above in the Z-axis direction . The data obtained by two-dimensional convolution of the data is transferred upward from the lowermost surface to the uppermost surface in the Z-axis direction.
A data processing device characterized by that.

In claim 1,
The processing element on the uppermost surface in the Z-axis direction calculates the 2.5-dimensional convolution data by adding the two-dimensional convolution data transferred from the plurality of processing elements on the lower surface and its own data.
A data processing device characterized by that.

In claim 2,
The 2.5-dimensional convolution data is sequentially shifted from the processing element on the uppermost surface to the processing element on the lower surface.
A data processing device characterized by that.

In any one of claims 1 to 3,
The feature weight is divided by the number of two-dimensional surfaces on which the input data is arranged, and is arranged in the filter memory corresponding to each surface on which the input data is arranged. Broadcast the placed feature weights to all processing elements on the corresponding faces,
A data processing device characterized by that.

A data processing method in a data processing apparatus having a plurality of processing elements having multiplication and addition functions in a three-dimensional direction.
In the data processing device, a plurality of processing elements are arranged on each of a plurality of two-dimensional planes stacked in the Z-axis direction.
The processing elements arranged on the two-dimensional plane are connected to the torus network, and the processing elements adjacent to the upper and lower surfaces in the Z-axis direction are connected in both directions by the network.
A processing element arranged on the two-dimensional plane corresponding to each of the plurality of two-dimensional planes has a filter memory for broadcasting feature weights.
The process of sequentially arranging the input data from the two-dimensional surface of the upper surface in the Z-axis direction to the lower surface, and
With each of the processing elements placed on one two-dimensional plane,
A step of multiplying the input data and the feature weight by the multiplication function,
The step of adding the transfer data from the processing elements adjacent to each other on the two-dimensional plane and the result of the multiplication, and
Next, it has a step of transferring the addition operation result to a processing element adjacent to the opposite side of the two-dimensional surface .
The step of multiplying, the step of adding, and the step of transferring are repeated to perform two-dimensional convolution of the input data, and further.
The processing element of each two-dimensional surface adds its own data and the convolution data from the surface one below in the Z-axis direction, and transfers the data to the surface one above in the Z-axis direction . The data obtained by two-dimensional convolution of the data is transferred upward from the lowermost surface to the uppermost surface in the Z-axis direction.
A data processing method characterized by that.

In claim 5,
The processing element on the uppermost surface in the Z-axis direction is a step of calculating 2.5-dimensional convolution data by adding two-dimensional convolution data transferred from a plurality of processing elements on the lower surface and its own data.
A data processing method characterized by having.

In claim 6,
The step of sequentially shifting the 2.5-dimensional convolution data from the processing element on the uppermost surface to the processing element on the lower surface.
A data processing method characterized by having.