JP7675480B2

JP7675480B2 - Calculation device, recognition device and control device

Info

Publication number: JP7675480B2
Application number: JP2021085284A
Authority: JP
Inventors: 真岸本; 晃北山; 理宇平井; 浩朗伊藤
Original assignee: Astemo Ltd
Current assignee: Astemo Ltd
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2025-05-14
Anticipated expiration: 2041-05-20
Also published as: DE112022001612T5; WO2022244331A1; JP2022178465A

Description

本発明は、入力データに基づく演算を実行する演算装置およびその演算方法に関する。また、本発明は、当該演算装置を用いて、入力データを認識する認識装置や入力データに応じた制御を行う制御装置にも関する。これらの中でも特に、カメラやLIDAR(Light Detection and Ranging)などセンサを用いて外界情報を収集し、その情報から物体の種類や存在する座標を検出する技術に関する。 The present invention relates to a calculation device that performs calculations based on input data and a calculation method thereof. The present invention also relates to a recognition device that uses the calculation device to recognize input data and a control device that performs control according to the input data. In particular, the present invention relates to a technology for collecting external information using sensors such as cameras and LIDAR (Light Detection and Ranging) and detecting the type of object and its coordinates from that information.

近年は、交通事故が社会問題となっており、車両による移動時の安全性に対する要求が高まっている。その要求に対応するため、様々な自動運転や運転支援向けの技術が提案されている。それらの技術の中で特にDNN(Deep Neural Network)のひとつであるCNN(Convolutional Neural Network)を用いた物体認識手法や行動予測手法は高い認識性能を有することが知られており、自動運転への適用が進展している。 In recent years, traffic accidents have become a social issue, and there is an increasing demand for safety when traveling by vehicle. To meet these demands, various technologies for autonomous driving and driving assistance have been proposed. Among these technologies, object recognition and behavior prediction methods using CNN (Convolutional Neural Network), a type of DNN (Deep Neural Network), are known to have high recognition performance, and their application to autonomous driving is progressing.

CNNは、外界情報である画像データを入力とし、複数の畳み込み層から構成され、縦続的に接続されているニューラルネットワークである。ここで畳み込み層とは積和演算と活性化関数演算により構成され、入力データ内の画素と対応する重みパラメータの乗算を行い、その結果を一定回数累積加算して出力データを作成し、その後活性化関数演算を行い、その結果を出力する一連の演算のことである。 CNN is a neural network that takes image data, which is information about the outside world, as input and is composed of multiple convolutional layers connected in cascade. Here, a convolutional layer is a series of operations that consists of product-sum operations and activation function operations, multiplying the pixels in the input data by the corresponding weight parameters, accumulating the results a certain number of times to create output data, and then performing activation function operations to output the results.

画像データに対して畳み込み層の演算を行うことで、入力された画像データ内の特定の物体の種類や存在する座標を出力する。まず具体的な構成を説明する。本特許で述べる認識装置では、第1層は入力された画像データと第1層の畳み込み演算の重みパラメータを積和演算することで、畳み込み演算結果を出力する。複数のニューラルネットワークの第ｊ番目の畳み込み層を第ｊ層と呼び、その第ｊ層(1≦j≦Lを満たす整数)は第j-1層の出力データと、第j層の畳み込み演算の重みパラメータから、第j層の畳み込み層の演算結果を出力する。最終層をL層とすると第L層の前段の第L-1層の出力データと第L層目の畳み込み演算の重みパラメータとを入力とし、物体の種類と存在する座標を出力する。それぞれの畳み込み層は、入力データと重みパラメータを用いて畳み込み演算を行った後、活性化関数演算を、その行い結果を出力する。 By performing a convolutional layer operation on image data, the type of a specific object in the input image data and the coordinates of the object are output. First, the specific configuration is explained. In the recognition device described in this patent, the first layer performs a product-sum operation on the input image data and the weight parameters of the convolution operation of the first layer to output the convolution operation result. The jth convolutional layer of multiple neural networks is called the jth layer, and the jth layer (an integer that satisfies 1≦j≦L) outputs the operation result of the jth convolutional layer from the output data of the j-1th layer and the weight parameters of the convolution operation of the jth layer. If the final layer is the Lth layer, it takes the output data of the L-1th layer, which is the layer before the Lth layer, and the weight parameters of the convolution operation of the Lth layer as input, and outputs the type of object and the coordinates of the object. Each convolutional layer performs a convolution operation using the input data and weight parameters, then performs an activation function operation and outputs the result.

ここで、CNNのような積和演算を主体とする演算量が多い処理を、車載用ECU(Electronic Control Unit)に実装する場合には、外界情報取得装置の処理速度と同等のスループットと、車両制御周期以下のレイテンシを満たす必要がある。従来、CNNを実装する際には、エッジデバイス向けに設計されたGPU(Graphic Processing Unit)やFPGA(Field Programmable Logic Array)などを用いるのが一般的である。しかし車載用途のデバイスは、使用条件、実装コスト、搭載可能なデバイスサイズの制限から、限られた演算性能しか持たず、要求される演算性能を満たせない課題が発生する。そこで、要求演算性能を満たすためにデバイス自体の改良やCNN演算に特化した演算器を利用することで目標処理速度を満たすことが一般的である。 When implementing a computationally intensive process such as CNN, which mainly involves multiply-and-accumulate operations, in an in-vehicle ECU (Electronic Control Unit), it is necessary to achieve a throughput equivalent to the processing speed of the external information acquisition device and a latency equal to or less than the vehicle control cycle. Conventionally, when implementing CNN, it is common to use a GPU (Graphic Processing Unit) or FPGA (Field Programmable Logic Array) designed for edge devices. However, in-vehicle devices have only limited computing performance due to restrictions on the conditions of use, implementation costs, and device size that can be installed, and issues arise in which the required computing performance cannot be met. Therefore, in order to meet the required computing performance, it is common to achieve the target processing speed by improving the device itself or by using a computing unit specialized for CNN operations.

次に、CNNを演算装置としてハードウェアに実装した場合の従来技術について述べる。外部からの入力はカメラやLIDARなどの外界情報取得装置を用いて取得する。取得した情報はメモリに格納される。演算装置は、メモリ、モデル保存部、パラメータ格納部、複数の畳み込み演算部で構成され、物体の種類や存在する座標などの認識結果を出力するものである。 Next, we will describe the conventional technology when CNN is implemented in hardware as a computing device. External input is acquired using an external information acquisition device such as a camera or LIDAR. The acquired information is stored in memory. The computing device is composed of a memory, a model storage unit, a parameter storage unit, and multiple convolution calculation units, and outputs recognition results such as the type of object and its coordinates.

ここで、メモリに格納された外界情報は、畳み込み演算部に送信される。またモデル保存部は事前に学習を行ったデータを保管し、その学習済みデータをパラメータ格納部へ保存する。パラメータ格納部は受け取った学習済みデータから、層ごとの重みパラメータ、層ごとのチャネル数、層ごとのフィルタ数を選択し、1層目からL層目までの畳み込み演算部に送信する。畳み込み演算部は、1層目はメモリからの入力データと、1層目の重みパラメータ、1層目のチャネル数、1層目のフィルタ数を入力として2層目に演算結果を出力する。また、畳み込み演算部は、続接続される。そして、そのｊ番目の層である第ｊ層の畳み込み演算部では、第j-1層の畳み込み演算部の出力と、第j層の重みパラメータと、第j層のチャネル数と、第j層のフィルタ数を入力とし、第j+1層に演算結果を出力する。 The external world information stored in the memory is sent to the convolution calculation unit. The model storage unit also stores data that has been previously trained, and saves the trained data in the parameter storage unit. The parameter storage unit selects weight parameters for each layer, the number of channels for each layer, and the number of filters for each layer from the trained data it receives, and sends them to the convolution calculation units from the 1st layer to the Lth layer. The first layer of the convolution calculation unit takes as input data from the memory, the weight parameters for the 1st layer, the number of channels for the 1st layer, and the number of filters for the 1st layer, and outputs the calculation result to the 2nd layer. The convolution calculation units are also connected in series. The jth layer, which is the jth layer, takes as input the output of the convolution calculation unit for the j-1th layer, the weight parameters for the jth layer, the number of channels for the jth layer, and the number of filters for the jth layer, and outputs the calculation result to the j+1th layer.

また、畳み込み演算部は、メモリから送信された入力データと、パラメータ格納部から送信された重みパラメータとチャネル数とフィルタ数をもとに畳み込み演算を行った後、活性化関数演算を行い、演算結果を次層に出力する働きを行う。 The convolution calculation unit performs a convolution calculation based on the input data sent from the memory and the weight parameters, number of channels, and number of filters sent from the parameter storage unit, then performs an activation function calculation and outputs the calculation result to the next layer.

しかしながら、演算器の演算性能が限られる車載用演算デバイスでは、演算デバイスの演算単位と、畳み込み演算に使用するフィルタやチャネルの数の不整合により、演算デバイスを十分活用できず、処理速度が低下するという課題があった。 However, in the case of in-vehicle computing devices, where the computing performance of the computing unit is limited, there is an issue that the computing device cannot be fully utilized due to a mismatch between the computing unit of the computing device and the number of filters and channels used for the convolution calculation, resulting in a decrease in processing speed.

そこで、CNNに関する演算で、限られた演算性能の中でCNN演算を効率的に実行できる手法及び装置として、特許文献1の方式が提案されている。特許文献1の構成を示す。まず、特許文献1に記載の演算装置は、分割器、演算器、生成器から構成される。次に、特許文献1の接続関係を示す。 In response to this, the method of Patent Document 1 has been proposed as a method and device for efficiently executing CNN calculations within limited calculation performance in CNN-related calculations. The configuration of Patent Document 1 is shown below. First, the calculation device described in Patent Document 1 is composed of a divider, a calculator, and a generator. Next, the connections in Patent Document 1 are shown.

演算器には、分割器からの出力が入力される。演算器は生成器へ出力を行い、生成器は演算器からの入力を受け付ける。 The output from the divider is input to the calculator. The calculator outputs to the generator, and the generator accepts input from the calculator.

次に、特許文献1提案の演算装置の動作を示す。まず、分割器で、畳み込みニューラルネットワークの選択された層の重みパラメータを深さ方向の次元及びカーネル数の次元の少なくとも一方において分割する。そして、複数の演算パラメータを含む演算パラメータ配列を得る。次に、演算パラメータ配列の中の各演算パラメータを使用し、選択された層の演算を演算部で実行することで、部分演算結果配列を得る。最後に、生成器で部分演算結果配列をもとに層の出力を生成する。これにより、分割器で演算器の演算単位に合わせて分割してから、演算器へ重みパラメータを送信する。このことで、演算器の動作効率または利用率を改善し、パラメータのサイズによってハードウェアが限定されることを回避している。 Next, the operation of the arithmetic device proposed in Patent Document 1 is shown. First, a divider divides the weight parameters of a selected layer of a convolutional neural network in at least one of the depth dimension and the dimension of the number of kernels. Then, an arithmetic parameter array containing a plurality of arithmetic parameters is obtained. Next, each arithmetic parameter in the arithmetic parameter array is used in the arithmetic unit to execute the arithmetic of the selected layer, thereby obtaining a partial arithmetic result array. Finally, a generator generates the output of the layer based on the partial arithmetic result array. As a result, the divider divides the weight parameters according to the arithmetic unit of the arithmetic unit, and then transmits them to the arithmetic unit. This improves the operating efficiency or utilization rate of the arithmetic unit, and avoids hardware limitations due to the size of the parameters.

特開2019-82996号公報JP 2019-82996 A

特許文献1では、事前に演算器へ入力される重みパラメータをCNNの演算の深さ方向もしくはカーネル数の次元の少なくとも一方向に関して分割し、分割したデータを分割器から演算器に転送して畳み込み演算を行う。そして、この演算結果により限られた大きさの演算器に対し、演算器で演算可能なサイズに分割した重みパラメータを用いて畳み込み演算を行う。 In Patent Document 1, the weight parameters input to the calculator in advance are divided in at least one direction, either in the depth direction of the CNN calculation or in the dimension of the number of kernels, and the divided data is transferred from the divider to the calculator to perform the convolution calculation. Then, for a calculator of a limited size based on the results of this calculation, the weight parameters divided into a size that can be calculated by the calculator are used to perform the convolution calculation.

しかし、特許文献1では、事前に演算器外の分割器で重みパラメータの部分配列を得るための分割演算、及び、分割した重みパラメータの部分配列演算を再度結合するための生成器での演算が必要である。さらに、畳み込み演算では、分割前後での演算の整合性を保つ必要がある。このため、重みパラメータの分割方法や、分割後の重みパラメータのサイズに制約があり、演算器を十全に活用することができない課題がある。 However, in Patent Document 1, a division operation is required in advance to obtain a partial array of weight parameters using a divider outside the arithmetic unit, and then a calculation is required in a generator to recombine the partial array calculation of the divided weight parameters. Furthermore, in the convolution calculation, it is necessary to maintain the consistency of the calculation before and after the division. For this reason, there are restrictions on the method of dividing the weight parameters and the size of the weight parameters after division, which poses the problem that the arithmetic unit cannot be fully utilized.

上記の課題を解決するために、本発明では、畳み込み演算における感度および演算方向（実行順）に基づいてフィルタ情報の削除を行う。 To solve the above problem, in the present invention, filter information is deleted based on the sensitivity and calculation direction (execution order) of the convolution calculation.

より具体的には、入力データをもとにCNN演算を行う演算装置において、前記CNN演算に用いるモデルを保存するモデル保存部と、前記モデルを用いて、複数の畳み込み層毎に畳み込み演算を実行することで、前記CNN演算を行うCNN演算部と、前記モデル保存部から、前記畳み込み演算に用いる畳み込みフィルタの重みパラメータを取得する重みパラメータ取得部と、前記モデル保存部に保存された前記モデルから、前記複数の畳み込み層ごとのチャネル情報を取得するチャネル情報取得部と、前記モデル保存部から、前記複数の畳み込み層ごとのフィルタ情報を取得するフィルタ情報取得部と、前記CNN演算に必要となる重み情報と演算の組み合わせを割り当て、前記CNN演算部へ送信する演算割り当て部と、前記CNN演算部の最大演算可能数、前記CNN演算の実行順、前記チャネル情報取得部から取得した前記CNN演算部の畳み込み層ごとのチャネル情報および前記重みパラメータ取得部から取得した重み情報および前記フィルタ情報取得部から取得した前記複数の畳み込み層ごとのフィルタ情報に基づき、前記畳み込み演算に使用するフィルタ情報の一部を削除する削除インデックス決定部を有する演算装置である。 More specifically, in a computing device that performs CNN computation based on input data, the computing device includes a model storage unit that stores a model used in the CNN computation, a CNN computation unit that performs the CNN computation by executing a convolution computation for each of a plurality of convolution layers using the model, a weight parameter acquisition unit that acquires weight parameters of a convolution filter used in the convolution computation from the model storage unit, a channel information acquisition unit that acquires channel information for each of the plurality of convolution layers from the model stored in the model storage unit, a filter information acquisition unit that acquires filter information for each of the plurality of convolution layers from the model storage unit, a computation allocation unit that assigns a combination of weight information and computation required for the CNN computation and transmits it to the CNN computation unit, and a deletion index determination unit that deletes a part of the filter information used in the convolution computation based on the maximum number of computations possible for the CNN computation unit, the execution order of the CNN computation, the channel information for each convolution layer of the CNN computation unit acquired from the channel information acquisition unit, the weight information acquired from the weight parameter acquisition unit, and the filter information for each of the plurality of convolution layers acquired from the filter information acquisition unit.

本発明の代表的な実施によれば、演算器の活用率を向上できる。なお、前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 A typical implementation of the present invention can improve the utilization rate of the computing unit. Problems, configurations, and effects other than those described above will become clear from the explanation of the following embodiment.

実施例１における演算装置である認識装置の機能ブロック図である。FIG. 2 is a functional block diagram of a recognition device which is a computing device in the first embodiment. 実施例１における削除チャネルインデックス情報を説明するための説明図である。FIG. 11 is an explanatory diagram for explaining deletion channel index information in the first embodiment. 実施例１における削除フィルタインデックス情報を説明するための説明図である。FIG. 11 is an explanatory diagram for explaining deletion filter index information in the first embodiment. 実施例１におけるCNN演算部の内部構成を示す図である。FIG. 2 is a diagram illustrating an internal configuration of a CNN calculation unit in the first embodiment. 実施例１における削除インデックス決定部の内部構成を示す図である。FIG. 13 is a diagram illustrating an internal configuration of a deletion index determination unit in the first embodiment. 実施例１における畳み込み演算部の内部構成を示す図である。FIG. 2 is a diagram illustrating an internal configuration of a convolution calculation unit in the first embodiment. 実施例１における削除フィルタ決定部の処理フローを示すフローチャートである。10 is a flowchart showing a process flow of a deletion filter determination unit in the first embodiment. 実施例１における削除チャネル決定部の処理フローを示すフローチャートである。11 is a flowchart showing a process flow of a deletion channel determination unit in the first embodiment. 実施例１における演算割り当て部の処理フローを示すフローチャートである。11 is a flowchart showing a process flow of a computation allocation unit in the first embodiment. 実施例１におけるストライドおよび削除可能なフィルタの例を説明するための図である。11A and 11B are diagrams illustrating an example of a stride and a removable filter in the first embodiment. 実施例１における削除フィルタの様子を例示した説明図であるFIG. 11 is an explanatory diagram illustrating the state of a deletion filter in the first embodiment; 実施例２における演算装置である認識装置の機能ブロック図である。FIG. 11 is a functional block diagram of a recognition device which is a computing device in a second embodiment. 実施例２における削除インデックス決定部の内部構成を示す図である。FIG. 11 is a diagram illustrating an internal configuration of a deletion index determination unit in the second embodiment. 実施例２における演算器で演算する際のフィルタの格納の様子を示す説明図である。FIG. 11 is an explanatory diagram showing the storage state of a filter when a calculation is performed by a calculation unit in the second embodiment. 実施例３における制御装置の機能ブロック図である。FIG. 11 is a functional block diagram of a control device according to a third embodiment.

図１は、実施例１における認識装置１０００の機能ブロック図である。認識装置１０００は、演算装置の一種であり、カメラやLIDARからの外界情報を用いて、外界の状況を認識する装置である。 Figure 1 is a functional block diagram of a recognition device 1000 in the first embodiment. The recognition device 1000 is a type of computing device, and is a device that recognizes the situation of the outside world using outside world information from a camera or LIDAR.

図１の認識装置１０００は、外界情報取得装置１０１と接続する。そして、CNN演算部１０２と、モデル保存部１０４と、チャネル情報取得部１０５と、重みパラメータ取得部１０６と、フィルタ情報取得部１０７と削除インデックス決定部１０８と、演算割り当て部１０９から構成される。この結果、認識装置１０００は、認識結果１０３を出力する。 The recognition device 1000 in FIG. 1 is connected to an external world information acquisition device 101. It is composed of a CNN calculation unit 102, a model storage unit 104, a channel information acquisition unit 105, a weight parameter acquisition unit 106, a filter information acquisition unit 107, a deletion index determination unit 108, and a calculation allocation unit 109. As a result, the recognition device 1000 outputs a recognition result 103.

ここで、各構成要素について説明する。まず、外界情報取得装置１０１は、外界情報を取得し、これをCNN演算部１０２に取得情報を送信する。この外界情報取得装置１０１は、カメラやLIDARといったセンサで実現できる。また、チャネル情報取得部１０５は、モデル保存部１０４からの出力であるモデル情報１１０を受け取る。また、重みパラメータ取得部１０６は、モデル保存部１０４からの出力であるモデル情報１１０を受け取る。 Here, each component will be described. First, the external world information acquisition device 101 acquires external world information and transmits the acquired information to the CNN calculation unit 102. This external world information acquisition device 101 can be realized by a sensor such as a camera or LIDAR. Furthermore, the channel information acquisition unit 105 receives model information 110 which is output from the model storage unit 104. Furthermore, the weight parameter acquisition unit 106 receives model information 110 which is output from the model storage unit 104.

また、フィルタ情報取得部１０７は、モデル保存部１０４からの出力であるモデル情報１１０を受け取る。また、削除インデックス決定部１０８は、チャネル情報取得部１０５から、各層のチャネル数を示すチャネル情報１１１と受け取る。さらに、削除インデックス決定部１０８は、重みパラメータ取得部１０６の出力である各層の重みパラメータを示す重みパラメータ情報１１２を受け取る。またさらに、削除インデックス決定部１０８は、フィルタ情報取得部１０７の出力である各層のフィルタ数を示すフィルタ情報１１３を受け取る。 The filter information acquisition unit 107 also receives model information 110, which is an output from the model storage unit 104. The deletion index determination unit 108 also receives channel information 111 indicating the number of channels in each layer from the channel information acquisition unit 105. The deletion index determination unit 108 also receives weight parameter information 112 indicating the weight parameters of each layer, which is an output from the weight parameter acquisition unit 106. The deletion index determination unit 108 also receives filter information 113 indicating the number of filters in each layer, which is an output from the filter information acquisition unit 107.

また、演算割り当て部１０９は、モデル保存部１０４の出力であるモデル情報１１０を受け取る。さらに、演算割り当て部１０９は、削除インデックス決定部の出力である各層の削除フィルタインデックス情報１１４を受け取る。またさらに、演算割り当て部１０９は、各層の削除チャネルインデックス情報１１６を受け取る。 The computation allocation unit 109 also receives model information 110, which is the output of the model storage unit 104. Furthermore, the computation allocation unit 109 receives deletion filter index information 114 for each layer, which is the output of the deletion index determination unit. Furthermore, the computation allocation unit 109 also receives deletion channel index information 116 for each layer.

また、CNN演算部１０２は、外界情報取得装置１０１の外界情報である入力データ１１７と、演算割り当て部１０９の出力であるCNN演算制御信号１１５と、チャネル情報１１１およびフィルタ情報１１３を受け取る。そして、CNN演算部１０２は、これらを用いて認識結果１０３を出力する。 The CNN calculation unit 102 also receives input data 117, which is external world information from the external world information acquisition device 101, a CNN calculation control signal 115, which is the output of the calculation allocation unit 109, channel information 111, and filter information 113. The CNN calculation unit 102 then uses these to output the recognition result 103.

次に、認識装置１０００の動作および信号の流れを説明する。まず、外界情報取得装置１０１からCNN演算部１０２へ、外界情報を送信する。次に、CNN演算部１０２は、モデル保存部１０４のモデル情報１１０および演算割り当て部１０９のCNN演算制御信号１１５、チャネル情報１１１およびフィルタ情報１１３をもとに演算を行う。なお、図１では、入力されるチャネル情報１１１およびフィルタ情報１１３を示す矢印は省略する。そして、CNN演算部１０２は、これらを用いて、認識結果１０３を算出し出力する。 Next, the operation of the recognition device 1000 and the signal flow will be described. First, external world information is transmitted from the external world information acquisition device 101 to the CNN calculation unit 102. Next, the CNN calculation unit 102 performs calculations based on the model information 110 of the model storage unit 104 and the CNN calculation control signal 115, channel information 111, and filter information 113 of the calculation allocation unit 109. Note that in FIG. 1, the arrows indicating the input channel information 111 and filter information 113 are omitted. Then, the CNN calculation unit 102 uses these to calculate and output the recognition result 103.

次に、チャネル情報取得部１０５は、モデル保存部１０４のモデル情報から各層のチャネル数を抽出し、チャネル情報１１１を削除インデックス決定部１０８に出力する。また、重みパラメータ取得部１０６は、モデル保存部１０４から受け取ったモデル情報１１０から各層の重みパラメータを抽出し、重みパラメータ情報１１２を削除インデックス決定部１０８に出力する。 Next, the channel information acquisition unit 105 extracts the number of channels for each layer from the model information in the model storage unit 104, and outputs channel information 111 to the deletion index determination unit 108. In addition, the weight parameter acquisition unit 106 extracts weight parameters for each layer from the model information 110 received from the model storage unit 104, and outputs weight parameter information 112 to the deletion index determination unit 108.

また、フィルタ情報取得部１０７は、モデル保存部１０４のモデル情報１１０から各層のフィルタ数を取得し、これを含む各層のフィルタ情報１１３を削除インデックス決定部１０８に出力する。 The filter information acquisition unit 107 also acquires the number of filters for each layer from the model information 110 in the model storage unit 104, and outputs the filter information 113 for each layer including this to the deletion index determination unit 108.

また、削除インデックス決定部１０８は、チャネル情報１１１、重みパラメータ情報１１２およびフィルタ情報１１３から演算を削除するフィルタ及びチャネルのインデックスを算出する。そして、削除インデックス決定部１０８は、これらインデックスを含む削除フィルタインデックス情報１１４及び削除チャネルインデックス情報１１６を、演算割り当て部１０９に出力する。 The deletion index determination unit 108 also calculates the indexes of the filters and channels for which calculations are to be deleted from the channel information 111, weight parameter information 112, and filter information 113. The deletion index determination unit 108 then outputs deletion filter index information 114 and deletion channel index information 116, which include these indexes, to the calculation allocation unit 109.

また、演算割り当て部１０９では、削除フィルタインデックス情報１１４と各層ごとの削除チャネルインデックス情報１１６とモデル情報１１０をもとに、演算順の決定とモデルから削除するフィルタやチャネルを特定する。ここで、演算割り当て部１０９は、削除の対象として、フィルタおよびチャネルの少なくとも一方を特定すればよい。特に、削除の対象を、フィルタとすることが望ましい。そして、演算割り当て部１０９は、CNN演算部１０２に、特定されたフィルタやチャネルを含むCNN演算制御信号１１５を送信する。 The computation allocation unit 109 also determines the computation order and identifies filters and channels to be deleted from the model based on the deleted filter index information 114, deleted channel index information 116 for each layer, and model information 110. Here, the computation allocation unit 109 only needs to identify at least one of a filter and a channel as the target for deletion. In particular, it is desirable to select a filter as the target for deletion. Then, the computation allocation unit 109 transmits a CNN computation control signal 115 including the identified filter and channel to the CNN computation unit 102.

ここで、削除チャネルインデックス情報１１６ついては、図２を用いて説明する。また、削除フィルタインデックス情報１１４については図３を用いて説明する。さらに、削除インデックス決定部１０８の動作の詳細については、図５を用いて説明する。なお、本実施例の認識装置は、いわゆるコンピュータでも実現できる。この場合、各部の機能をプログラムに従ってＣＰＵのような処理装置で実行することになる。また、このプログラムは記憶媒体に格納される。また、各部は、ＦＰＧＡ（Field Programmable Gate Array）のような専用ハードウェアや専用回路でも実現できる。 The deletion channel index information 116 will be described with reference to FIG. 2. The deletion filter index information 114 will be described with reference to FIG. 3. The operation of the deletion index determination unit 108 will be described in detail with reference to FIG. 5. The recognition device of this embodiment can also be realized by a so-called computer. In this case, the functions of each unit are executed by a processing device such as a CPU in accordance with a program. The program is stored in a storage medium. Each unit can also be realized by dedicated hardware such as an FPGA (Field Programmable Gate Array) or a dedicated circuit.

図２は、削除チャネルインデックス情報１１６を説明するための説明図である。まず、本実施例のCNNの層はL層存在して、それぞれ1層目からＬ層目まで従属接続されている。また、各層ごとにチャネル２５１が存在しており、ある1層のチャネル２５１は次の層のすべてのチャネル２５１に対して接続されている。ここで、各層ごとにチャネルに対してそれぞれ異なるインデックスをあたえ、削除チャネルインデックス情報１１６では、各層ごとに削除するべきチャネルインデックスを指定する。この指定されたインデックス番号を、以下、削除チャネルインデックスと呼ぶ。 Figure 2 is an explanatory diagram for explaining the deletion channel index information 116. First, there are L layers of CNN in this embodiment, and each layer is connected in a cascade manner from the 1st layer to the Lth layer. Also, there are channels 251 in each layer, and the channel 251 in one layer is connected to all the channels 251 in the next layer. Here, a different index is given to the channel in each layer, and the deletion channel index information 116 specifies the channel index to be deleted for each layer. Hereinafter, this specified index number will be referred to as the deletion channel index.

次に、図３は、削除フィルタインデックス情報１１４を説明するための説明図である。まず、CNNでは１層目からL層目までそれぞれフィルタ３５１が存在している。図３において、あるj層（１≦j≦L）については、図３では３×３のフィルタを例示している。なお、フィルタの数については、あくまでも一例であり、これに限定されるものではない。 Next, FIG. 3 is an explanatory diagram for explaining the deletion filter index information 114. First, in CNN, filters 351 exist in each of the 1st to Lth layers. In FIG. 3, for a certain jth layer (1≦j≦L), a 3×3 filter is shown as an example. Note that the number of filters is merely an example and is not limited to this.

ここで、本実施例では、各層ごとに一つの層のすべてのフィルタに同じインデックス番号３５２を与える。この削除フィルタインデックス情報１１４では、各層ごとに削除するべきフィルタインデックス３５２を指定する。この指定されたインデックス番号を以下で削除フィルタインデックスと呼ぶ。 In this embodiment, the same index number 352 is assigned to all filters in one layer for each layer. This deletion filter index information 114 specifies the filter index 352 to be deleted for each layer. Hereinafter, this specified index number will be referred to as the deletion filter index.

図４は、CNN演算部１０２の内部構成を示す図である。図４において、CNN演算部１０２は、メモリ２０１、パラメータ格納部２０２、畳み込み演算部２０３を有する。また、CNN演算部１０２は、入力データ１１７、CNN演算制御信号１１５、モデル情報１１０、１層のCNN演算制御信号２０４、全層の重みパラメータ２０５、全層のフィルタ数２０６、全層のチャネル数２０７、１層、つまり、各層の重みパラメータである個別重みパラメータ２０８、各層の個別フィルタ数２０９および各層の個別チャネル数２１０を扱う。そして、CNN演算部１０２は、これらお用いて、認識結果１０３を出力する。 Figure 4 is a diagram showing the internal configuration of the CNN calculation unit 102. In Figure 4, the CNN calculation unit 102 has a memory 201, a parameter storage unit 202, and a convolution calculation unit 203. The CNN calculation unit 102 also handles input data 117, CNN calculation control signal 115, model information 110, CNN calculation control signal 204 for one layer, weight parameters for all layers 205, the number of filters in all layers 206, the number of channels in all layers 207, individual weight parameters 208 for one layer, that is, the weight parameters for each layer, the number of individual filters in each layer 209, and the number of individual channels in each layer 210. The CNN calculation unit 102 then uses these to output the recognition result 103.

次に、図４に示すCNN演算部１０２の接続関係を説明する。メモリ２０１には、入力データ１１７が入力される。この入力データ１１７は、上述の外界情報である。また、パラメータ格納部２０２は、モデル情報１１０が保持される。 Next, the connections of the CNN calculation unit 102 shown in FIG. 4 will be described. Input data 117 is input to the memory 201. This input data 117 is the external world information described above. In addition, the parameter storage unit 202 holds the model information 110.

また、１層目の畳み込み演算部２０３－１には、以下の信号ないし情報が入力される。
CNN演算制御信号１１５から分岐した個別、つまり、１層のCNN演算制御信号２０４－１
メモリ２０１に保存される保存データ２１１（入力データ１１７）
パラメータ格納部２０２から出力される重みパラメータ２０５から分岐した個別重みパラメータ２０８－１
パラメータ格納部２０２から出力されるフィルタ数２０６から分岐した個別フィルタ数２０９－１
パラメータ格納部２０２から出力されるチャネル数から分岐した個別チャネル数２１０－１
以降、２層目以降の畳み込み演算部、つまり、j層目の畳み込み演算部２０３－jについては、以下の信号ないし情報が入力される。
CNN演算制御信号１１５から分岐した個別の、つまり、ｊ層のCNN演算制御信号２０４－j
j-1層の畳み込み演算部の出力データ２１２－ｊ
パラメータ格納部２０２から出力される重みパラメータ２０５から分岐した個別重みパラメータ２０８－ｊ
パラメータ格納部２０２から出力されるフィルタ数２０６から分岐した個別フィルタ数２０９－ｊ
パラメータ格納部２０２から出力されるチャネル数から分岐した個別チャネル数２１０－ｊ
また、最終層の第Ｌ層目の畳み込み演算部２０３－Ｌには、以下の信号ないし情報が入力される。
CNN演算制御信号１１５から分岐した個別のCNN演算制御信号２０４－Ｌ
L-1層目の畳み込み演算部２０３―ｊの出力データ２１２－Ｌ
パラメータ格納部２０２から出力される重みパラメータ２０５から分岐した個別重みパラメータ２０８－Ｌ
パラメータ格納部２０２から出力されるフィルタ数２０６から分岐した個別フィルタ数２０９－Ｌ
パラメータ格納部２０２から出力されるチャネル数から分岐した個別チャネル数２１０－Ｌ
そして、第Ｌ層目の畳み込み演算部２０３－Ｌは、これらを用いて認識した認識結果１０３を出力する。 The first-layer convolution operation unit 203-1 receives the following signals or information:
An individual CNN operation control signal 204-1, i.e., a first-layer CNN operation control signal, branched off from the CNN operation control signal 115
Saved data 211 (input data 117) saved in memory 201
Individual weighting parameters 208-1 branched from the weighting parameters 205 output from the parameter storage unit 202
The number of individual filters 209-1 branched from the number of filters 206 output from the parameter storage unit 202
The number of individual channels 210-1 branched from the number of channels output from the parameter storage unit 202
Thereafter, the following signals or information are input to the second-layer and subsequent convolution operation units, that is, the j-th layer convolution operation unit 203-j.
The CNN operation control signal 204-j of the jth layer is branched from the CNN operation control signal 115.
Output data 212-j of the convolution operation unit of the j-1th layer
Individual weight parameters 208-j branched from the weight parameters 205 output from the parameter storage unit 202
The number of individual filters 209-j branched from the number of filters 206 output from the parameter storage unit 202
The number of individual channels 210-j branched from the number of channels output from the parameter storage unit 202
The following signals or information are input to the convolution operation unit 203-L in the final layer, the Lth layer:
An individual CNN operation control signal 204-L branched off from the CNN operation control signal 115
Output data 212-L of the L-1th layer convolution operation unit 203-j
Individual weight parameters 208-L branched from the weight parameters 205 output from the parameter storage unit 202
The number of individual filters 209-L branched from the number of filters 206 output from the parameter storage unit 202
The number of individual channels 210-L branched from the number of channels output from the parameter storage unit 202
Then, the Lth layer convolution operation unit 203-L outputs the recognition result 103 obtained by using these.

次に、CNN演算部１０２の動作の詳細を説明する。まず、CNN演算部１０２では、入力データ１１７およびモデル情報１１０をもとに畳み込み演算を行い、その認識結果１０３を出力する。この際、各畳み込み演算部２０３は、CNN演算制御信号１１５から分岐した各層それぞれに対するCNN演算制御信号２０４を受信し、一部のフィルタ及びチャネルの演算をスキップする。この演算のスキップが、本願の削除の一例である。ここで、本願の削除には、演算の省略、無視といった受動的な処理も含まれる。ここで、削除、つまり、スキップされるフィルタおよびチャネルは外部からのモデル更新に伴い、モデル情報１１０が変更されるまで同じものである。なお、上述のように、スキップされる対象は、フィルタおよびチャネルの一方であってもよい。この結果、本実施例では、言い換えると、本願では、畳み込みに使用するフィルタ情報とチャネル情報の少なくとも一方を削除することになる。 Next, the operation of the CNN calculation unit 102 will be described in detail. First, the CNN calculation unit 102 performs a convolution calculation based on the input data 117 and the model information 110, and outputs the recognition result 103. At this time, each convolution calculation unit 203 receives a CNN calculation control signal 204 for each layer branched from the CNN calculation control signal 115, and skips the calculation of some filters and channels. This skipping of calculations is an example of the deletion of the present application. Here, the deletion of the present application also includes passive processing such as omitting or ignoring calculations. Here, the deleted filters and channels, that is, the skipped filters and channels, are the same until the model information 110 is changed due to an external model update. Note that, as described above, the target to be skipped may be either the filter or the channel. As a result, in this embodiment, in other words, in this application, at least one of the filter information and the channel information used for convolution is deleted.

また、メモリ２０１は、入力データ１１７を格納する。そして、畳み込み演算部２０３へ入力データ１１７である保存データ２１１が出力される。次に、畳み込み演算部２０３は、入力された保存データ２１１（出力データ２１２）と個別重みパラメータ２０８をもとに畳み込み演算を行う。この畳み込み演算の結果、演算順やデータ読み込みを制御するための信号であるCNN演算制御信号２０４と、個別フィルタ数２０９と個別チャネル数２１０により演算回数や出力データ数、演算器へのデータの割り当てが決定される。畳み込み演算部２０３は、このように決定された演算結果である出力データ２１２を出力する。これを第１層から第Ｌ層まで繰り返す。 The memory 201 also stores input data 117. Then, saved data 211, which is the input data 117, is output to the convolution calculation unit 203. Next, the convolution calculation unit 203 performs a convolution calculation based on the input saved data 211 (output data 212) and the individual weighting parameters 208. As a result of this convolution calculation, the number of calculations, the number of output data, and the allocation of data to the calculation units are determined by the CNN calculation control signal 204, which is a signal for controlling the calculation order and data reading, the number of individual filters 209, and the number of individual channels 210. The convolution calculation unit 203 outputs output data 212, which is the calculation result determined in this way. This is repeated from the first layer to the Lth layer.

また、パラメータ格納部２０２での受け取ったモデル情報１１０から各層の重みパラメータ２０５、各層のフィルタ数２０６、各層のチャネル数２０７を分割し畳み込み演算部２０３へ出力される。 The model information 110 received by the parameter storage unit 202 is divided into weight parameters 205 for each layer, the number of filters 206 for each layer, and the number of channels 207 for each layer, and output to the convolution calculation unit 203.

以上で、CNN演算部１０２の説明を終わり、次に、削除インデックス決定部１０８について、説明する。図５は、削除インデックス決定部１０８の内部構成を示す図である。 This concludes the explanation of the CNN calculation unit 102. Next, we will explain the deletion index determination unit 108. Figure 5 is a diagram showing the internal configuration of the deletion index determination unit 108.

図５において、削除インデックス決定部１０８は、チャネル情報１１１と、重みパラメータ情報１１２と、フィルタ情報１１３を入力として受け取る。そして、削除インデックス決定部１０８は、これらの入力に基づいて、削除チャネルインデックス情報１１６および削除フィルタインデックス情報１１４を出力する。このために、削除インデックス決定部１０８は、目標処理速度格納部３０１と、最大演算数保存部３０２と、演算速度解析部３０３と、感度情報解析部３０４と、削除チャネル決定部３０５と、削除フィルタ決定部３０６と、削除チャネルインデックス出力部３０７と、削除フィルタインデックス出力部を有する。 In FIG. 5, the deletion index determination unit 108 receives channel information 111, weight parameter information 112, and filter information 113 as inputs. Then, based on these inputs, the deletion index determination unit 108 outputs deletion channel index information 116 and deletion filter index information 114. To this end, the deletion index determination unit 108 has a target processing speed storage unit 301, a maximum number of operations storage unit 302, an operation speed analysis unit 303, a sensitivity information analysis unit 304, a deletion channel determination unit 305, a deletion filter determination unit 306, a deletion channel index output unit 307, and a deletion filter index output unit.

次に、図５を用いて、削除インデックス決定部１０８の接続関係を示す。演算速度解析部３０３は、目標処理速度格納部３０１の出力である目標処理速度３１０と最大演算数保存部３０２の出力である演算器当たりの最大演算可能な演算数３１５と、チャネル情報１１１と、フィルタ情報１１３を入力として受け取る。 Next, the connection relationship of the deletion index determination unit 108 is shown using FIG. 5. The calculation speed analysis unit 303 receives as input the target processing speed 310 which is the output of the target processing speed storage unit 301, the maximum number of calculations that can be calculated per calculation unit 315 which is the output of the maximum number of calculations storage unit 302, the channel information 111, and the filter information 113.

また、感度情報解析部３０４は、チャネル情報１１１と、重みパラメータ情報１１２と、フィルタ情報１１３を入力として受け取る。また、削除チャネル決定部３０５は、演算速度解析部３０３の結果である削除チャネル数３１１と感度情報解析部３０４の結果である感度情報３１３を入力として受け取る。 The sensitivity information analysis unit 304 receives as input the channel information 111, the weight parameter information 112, and the filter information 113. The removal channel determination unit 305 receives as input the number of channels to be removed 311, which is the result of the calculation speed analysis unit 303, and the sensitivity information 313, which is the result of the sensitivity information analysis unit 304.

また、削除フィルタ決定部３０６は、演算速度解析部３０３の出力である削除フィルタ数３１２と感度情報解析部３０４の出力である感度情報３１６を入力として受け取る。また、削除チャネルインデックス出力部３０７は、削除チャネル決定部３０５の出力である優先度情報３１７と、最大演算数保存部３０２の出力である演算数３１５を入力として受け取り、各層ごとの削除チャネルインデックス情報１１６を出力する。 The deletion filter determination unit 306 receives as input the number of deletion filters 312 output from the calculation speed analysis unit 303 and the sensitivity information 316 output from the sensitivity information analysis unit 304. The deletion channel index output unit 307 receives as input the priority information 317 output from the deletion channel determination unit 305 and the number of operations 315 output from the maximum number of operations storage unit 302, and outputs deletion channel index information 116 for each layer.

またさらに、削除フィルタインデックス出力部３０８は、削除フィルタ決定部３０６の出力３１８と、最大演算数保存部３０２の出力である演算数３１５を入力として受け取り、各層の削除フィルタインデックス情報１１４を出力する。 Furthermore, the deletion filter index output unit 308 receives as input the output 318 of the deletion filter determination unit 306 and the number of operations 315, which is the output of the maximum number of operations storage unit 302, and outputs the deletion filter index information 114 for each layer.

次に、図５を用いて、削除インデックス決定部１０８の動作について説明する。まず、演算速度解析部３０３は、入力されたチャネル情報１１１と、フィルタ情報１１３と、演算器１つ当たりの最大演算可能な演算数３１５をもとに、各層ごとに畳み込み演算にかかる時間と目標処理速度と畳み込み演算にかかる時間の差分を試算する。 Next, the operation of the deletion index determination unit 108 will be described with reference to FIG. 5. First, the calculation speed analysis unit 303 estimates the time required for the convolution calculation for each layer and the difference between the target processing speed and the time required for the convolution calculation, based on the input channel information 111, filter information 113, and the maximum number of calculations that can be performed per calculation unit 315.

また、感度情報解析部３０４は、チャネル情報１１１と、重みパラメータ情報１１２と、フィルタ情報１１３から、各層のチャネル全ての認識精度に対する感度情報と、各層のフィルタすべての認識精度に対する感度情報を見積もる。 The sensitivity information analysis unit 304 also estimates sensitivity information for the recognition accuracy of all channels in each layer and sensitivity information for the recognition accuracy of all filters in each layer from the channel information 111, weight parameter information 112, and filter information 113.

また、削除チャネル決定部３０５は、演算速度解析部３０３から入力された各層の演算速度及び目標処理速度との差分情報と、感度情報解析部３０４から出力された各層ごとの認識精度への感度情報３１３と、層内のチャネルの感度情報３１３を受け付ける。そして、削除チャネル決定部３０５は、これらに基づいて、認識精度への感度が低いチャネルから順に目標処理速度との差分が０となるように、削除するチャネルの優先度を設定する。なお、削除チャネル決定部３０５での動作の詳細は、図８を用いて、後述する。 The channel to be removed determination unit 305 also receives difference information between the calculation speed of each layer and the target processing speed input from the calculation speed analysis unit 303, sensitivity information 313 to the recognition accuracy for each layer output from the sensitivity information analysis unit 304, and sensitivity information 313 of the channels within the layer. Based on these, the channel to be removed determination unit 305 sets the priority of the channels to be removed in order from the channel with the lowest sensitivity to recognition accuracy so that the difference from the target processing speed becomes 0. Note that the operation of the channel to be removed determination unit 305 will be described in detail later with reference to FIG. 8.

また、削除フィルタ決定部３０６では、演算速度解析部３０３からの各層の処理速度と目標処理速度との差分情報と、フィルタの感度情報３１６をもとに各層ごとに削除を行うフィルタの順位付けを行う。 The deletion filter determination unit 306 also prioritizes the filters to be deleted for each layer based on the difference information between the processing speed of each layer and the target processing speed from the calculation speed analysis unit 303 and the filter sensitivity information 316.

また、削除チャネルインデックス出力部３０７では、削除チャネル決定部３０５で決められた各層の削除を行うチャネルの優先度情報３１７と、最大演算数保存部３０２から出力される演算器の最大演算可能数の情報を受け付ける。そして、削除チャネルインデックス出力部３０７では、これらに基づいて、残るチャネルの数が、演算器の最大演算可能な演算数の倍数もしくは1以外の約数となるようにインデックス番号を設定し、これを出力する。 The deleted channel index output unit 307 also receives priority information 317 of the channels to be deleted for each layer determined by the deleted channel determination unit 305, and information on the maximum number of operations that the calculator can perform output from the maximum number of operations storage unit 302. Based on these, the deleted channel index output unit 307 sets an index number so that the number of remaining channels is a multiple of the maximum number of operations that the calculator can perform or a divisor other than 1, and outputs this.

ここで、削除チャネルインデックス出力部３０７は、チャネルの最大演算可能数をＭc、あるj層（1≦j≦L）のチャネル数をＣ、j層の削除するチャネル数をＣｄとしたとき、以下の（数１）に従った処理を行う。つまり、削除チャネルインデックス出力部３０７は、（数１）を満たすＣｄを設定し、優先度情報が低いものから順番にＣｄ個のチャネルの演算順をｊ層の削除チャネルインデックス情報１１６に設定する。 Here, the deletion channel index output unit 307 performs processing according to the following (Equation 1), where Mc is the maximum number of channels that can be calculated, C is the number of channels in a certain jth layer (1≦j≦L), and Cd is the number of channels to be deleted from the jth layer. In other words, the deletion channel index output unit 307 sets Cd that satisfies (Equation 1), and sets the calculation order of the Cd channels in ascending order of priority information to the deletion channel index information 116 of the jth layer.

また、削除フィルタインデックス出力部３０８は、削除フィルタ決定部３０６で設定した削除を行うフィルタの優先順位をもとに、残るフィルタの数が、演算器の最大演算可能な演算数の倍数もしくは1以外の約数となるようにインデックス情報を設定する。そして、削除フィルタインデックス出力部３０８は、これを出力する。なお、削除フィルタ決定部３０６でのフィルタの優先順位の特定については、図７を用いて後述する。 The deletion filter index output unit 308 sets index information based on the priority of the filters to be deleted set by the deletion filter determination unit 306 so that the number of remaining filters is a multiple of the maximum number of operations that the calculator can perform or a divisor other than 1. The deletion filter index output unit 308 then outputs this. Note that the determination of the filter priority by the deletion filter determination unit 306 will be described later with reference to FIG. 7.

ここで、削除フィルタインデックス出力部３０８は、フィルタの最大演算可能数をMf、あるj層のフィルタ数をF、j層の削除するフィルタ数をFdとしたとき、以下の（数２）に従った処理を行う。つまり、削除フィルタインデックス出力部３０８は、（数２）を満たすFdを設定し、優先度情報が低いものから順番にFd個のフィルタの演算順に対してｊ層の削除フィルタインデックス情報１１４を設定する。 Here, the deletion filter index output unit 308 performs processing according to the following (Equation 2) where Mf is the maximum number of filters that can be calculated, F is the number of filters in a certain jth layer, and Fd is the number of filters to be deleted from the jth layer. In other words, the deletion filter index output unit 308 sets Fd that satisfies (Equation 2), and sets the deletion filter index information 114 for the jth layer for the calculation order of the Fd filters in ascending order of priority information.

次に、畳み込み演算部２０３について、説明する。図６は、本実施例における畳み込み演算部２０３の内部構成を示す図である。 Next, the convolution calculation unit 203 will be described. Figure 6 is a diagram showing the internal configuration of the convolution calculation unit 203 in this embodiment.

まず、畳み込み演算部２０３の構成について、説明する。ここでは、第ｊ層を例に挙げて説明する。畳み込み演算部２０３は、CNN演算制御信号２０４、個別チャネル数２１０、個別フィルタ数２０９、前層の出力データ２１２－ｊ、個別重みパラメータ２０８を入力として受け取る。そして、畳み込み演算部２０３は、これらを用いて、演算結果である出力データ２１２－ｊを出力する。このために、演算制御部４０１、入力データ一時保存部４０２、演算部４０３、出力データ一時格納部４０４を有する。 First, the configuration of the convolution calculation unit 203 will be explained. Here, the jth layer will be explained as an example. The convolution calculation unit 203 receives as input the CNN calculation control signal 204, the number of individual channels 210, the number of individual filters 209, the output data of the previous layer 212-j, and the individual weighting parameters 208. The convolution calculation unit 203 then uses these to output the output data 212-j, which is the result of the calculation. For this purpose, it has a calculation control unit 401, an input data temporary storage unit 402, a calculation unit 403, and an output data temporary storage unit 404.

次に、畳み込み演算部２０３の内部構成の接続関係を示す。演算制御部４０１は、CNN演算制御信号２０４と個別チャネル数２１０と個別フィルタ数２０９を入力として受け取る。また、入力データ一時保存部４０２は、前層の出力データ２１２－ｊと個別重みパラメータ２０８と、演算制御部４０１から出力される、保存されたデータの出力順及び出力タイミングを制御する制御信号４１０を入力として受け取る。 The following shows the connections within the convolution calculation unit 203. The calculation control unit 401 receives as input the CNN calculation control signal 204, the number of individual channels 210, and the number of individual filters 209. The input data temporary storage unit 402 receives as input the output data 212-j of the previous layer, the individual weighting parameters 208, and a control signal 410 output from the calculation control unit 401 that controls the output order and output timing of the stored data.

また、演算部４０３は、演算順及び演算開始、演算停止の演算停止制御信号４１２と演算に必要となる入力データ４１１を入力として受け取る。また、出力データ一時格納部４０４は、演算部４０３の演算結果４１３と格納データの出力タイミングを制御する制御信号４１４を入力として受け取り、演算結果である出力データ２１２－ｊ+1を出力する。 The calculation unit 403 also receives as input a calculation stop control signal 412 that controls the calculation order, the start of calculation, and the stop of calculation, as well as input data 411 required for the calculation. The output data temporary storage unit 404 also receives as input a calculation result 413 of the calculation unit 403 and a control signal 414 that controls the output timing of the stored data, and outputs output data 212-j+1, which is the calculation result.

以下で、畳み込み演算部２０３の動作を説明する。まず、入力データ一時保存部４０２に、前層の出力データ２１２－ｊと個別重みパラメータ２０８を格納する。そして、演算制御部４０１は、CNN演算制御信号２０４、個別チャネル数２１０、個別フィルタ数２０９の情報をもとに、演算を実行させる順番に合わせ、データ出力タイミング、出力をスキップするチャネル及びフィルタの情報を出力する。また、演算制御部４０１は、演算部４０３に対しては演算回数、演算開始タイミング、終了タイミングを制御する演算停止制御信号４１２を出力する。 The operation of the convolution calculation unit 203 is explained below. First, the output data 212-j of the previous layer and the individual weight parameters 208 are stored in the input data temporary storage unit 402. Then, the calculation control unit 401 outputs the data output timing and information on the channels and filters for which output is to be skipped, in accordance with the order in which the calculations are to be performed, based on the information of the CNN calculation control signal 204, the number of individual channels 210, and the number of individual filters 209. In addition, the calculation control unit 401 outputs a calculation stop control signal 412 to the calculation unit 403, which controls the number of calculations, the calculation start timing, and the end timing.

そして、出力データ一時格納部４０４では、演算後のデータを出力するタイミングを制御する制御信号４１４を保持し、他の部位で利用することを可能とする。また、演算部４０３では、入力されたデータから、畳み込み演算と活性化関数処理を行い、出力データ一時格納部４０４へ演算結果４１３を出力する。出力データ一時格納部４０４では、演算制御部４０１からの制御信号４１４に合わせて演算結果である出力データ２１２－ｊ＋１を順次出力する。CNN演算制御信号２０４、個別チャネル数２１０、個別フィルタ数２０９については、モデル情報１１０が更新されない限り、演算制御部４０１でデータを保持し続ける。 Then, the output data temporary storage unit 404 holds a control signal 414 that controls the timing of outputting the data after the calculation, allowing it to be used in other parts. The calculation unit 403 also performs a convolution calculation and activation function processing on the input data, and outputs the calculation result 413 to the output data temporary storage unit 404. The output data temporary storage unit 404 sequentially outputs the calculation result, output data 212-j+1, in accordance with the control signal 414 from the calculation control unit 401. As for the CNN calculation control signal 204, the number of individual channels 210, and the number of individual filters 209, the calculation control unit 401 continues to hold the data unless the model information 110 is updated.

次に、削除フィルタ決定部３０６の動作の詳細について、説明する。図７は、削除フィルタ決定部３０６での削除するフィルタの優先度を設定する動作についてのフローチャートである。 Next, the operation of the deletion filter determination unit 306 will be described in detail. Figure 7 is a flowchart showing the operation of the deletion filter determination unit 306 to set the priority of the filter to be deleted.

まず、削除フィルタ決定部３０６は、演算速度解析部３０３の出力である削除チャネル数３１１と感度情報解析部３０４の出力である感度情報３１３が入力されると、動作を開始する(ステップS1001)。 First, the deletion filter determination unit 306 starts operation when the number of channels to be deleted 311, which is the output of the calculation speed analysis unit 303, and the sensitivity information 313, which is the output of the sensitivity information analysis unit 304, are input (step S1001).

次に、削除フィルタ決定部３０６は、演算速度解析部３０３の出力をもとに目標処理速度以下の処理速度の層をｎ層分(nはn>=0を満たす整数)抽出する。そして、削除フィルタ決定部３０６は、処理速度が遅い順に並べる(ステップS1002)。 Next, the deletion filter determination unit 306 extracts n layers (n is an integer that satisfies n>=0) with processing speeds equal to or lower than the target processing speed based on the output of the calculation speed analysis unit 303. The deletion filter determination unit 306 then sorts the layers in order of slowest processing speed (step S1002).

次に、削除フィルタ決定部３０６は、処理速度以下の層を並べた場合に、現在対象にしている層を表すパラメータとしてiを設定し、i=nと設定する(ステップS1003)。 Next, when layers with a processing speed equal to or lower are arranged, the deletion filter determination unit 306 sets i as a parameter representing the currently targeted layer, and sets i=n (step S1003).

次に、削除フィルタ決定部３０６は、i=0かどうか判定する(ステップS1004)。この結果、i=0の場合はステップS1006に進む。iが0ではない場合はステップS1005に進む。 Next, the deletion filter determination unit 306 determines whether i=0 (step S1004). If the result is that i=0, the process proceeds to step S1006. If i is not 0, the process proceeds to step S1005.

また、削除フィルタ決定部３０６は、認識精度への感度情報をもとにフィルタの大きさごとに、感度が低いものから順に削除優先度を設定する(ステップS1005)。例えば、３×３＝９個の重みパラメータで構成されるフィルタの場合には、９個の重みパラメータの内、削除されたときに認識精度の劣化量が小さいものから順番に優先度が設定される。なお、フィルタの数については、この例に限定されない。また、削除の基準は、劣化量に限定されず、所定規則を満たすものであればよい。また、認識精度の劣化以外にも、演算精度の劣化量を用いてもよい。 The deletion filter determination unit 306 also sets the deletion priority for each filter size, starting from the lowest sensitivity based on the sensitivity information to recognition accuracy (step S1005). For example, in the case of a filter consisting of 3 x 3 = 9 weight parameters, the priority is set in order of the nine weight parameters that will cause the least degradation in recognition accuracy when deleted. Note that the number of filters is not limited to this example. Also, the criterion for deletion is not limited to the amount of degradation, and may be any criterion that satisfies a predetermined rule. In addition to the degradation in recognition accuracy, the amount of degradation in calculation accuracy may also be used.

また、削除フィルタ決定部３０６は、iを一つ減らしてステップS1004へ戻る(ステップS1007)。 The deletion filter determination unit 306 also decrements i by one and returns to step S1004 (step S1007).

そして、最終的に削除フィルタ決定部３０６は、n層分個別に算出したフィルタの削除優先度情報を出力する(ステップS1006)。 Finally, the deletion filter determination unit 306 outputs the deletion priority information of the filters calculated individually for each n layer (step S1006).

以上で、削除フィルタ決定部３０６での削除するフィルタの優先度を設定する動作についての説明を終わる。 This concludes the explanation of the operation of setting the priority of filters to be deleted in the deletion filter determination unit 306.

次に、削除チャネル決定部３０５の動作について説明する。図８は、削除チャネル決定部３０５の動作を示すフローチャートである。 Next, the operation of the deletion channel determination unit 305 will be described. Figure 8 is a flowchart showing the operation of the deletion channel determination unit 305.

まず、削除チャネル決定部３０５は、演算速度解析部３０３の出力である削除チャネル数３１１と感度情報解析部３０４の出力である感度情報３１３が入力されると、動作を開始する(ステップS2001)。 First, the deletion channel determination unit 305 starts operation when the number of deletion channels 311, which is the output of the calculation speed analysis unit 303, and the sensitivity information 313, which is the output of the sensitivity information analysis unit 304, are input (step S2001).

次に、削除チャネル決定部３０５は、演算速度解析部の出力である削除チャネル数３１１をもとに処理速度が目標以下の層をｎ層分(nはn>=0を満たす整数)抽出し処理速度が遅い順に並べる(ステップS2002)。 Next, the deletion channel determination unit 305 extracts n layers (n is an integer that satisfies n>=0) whose processing speed is below the target based on the number of deletion channels 311, which is the output of the calculation speed analysis unit, and sorts them in order of slowest processing speed (step S2002).

次に、削除チャネル決定部３０５は、処理速度以下の層を並べた場合に現在対象にしている層を表すパラメータとしてiを設定し、i=nと設定する(ステップS2003)。 Then, the deletion channel determination unit 305 sets i as a parameter representing the layer currently being targeted when arranging layers below the processing speed, and sets i=n (step S2003).

次に、削除チャネル決定部３０５は、i=0かどうか判定する(ステップS2004)。この結果、i=0の場合は、ステップS2006に進む。iが0でない場合は、ステップS2005に進む。 Then, the deletion channel determination unit 305 determines whether i=0 (step S2004). If the result is that i=0, the process proceeds to step S2006. If i is not 0, the process proceeds to step S2005.

また、削除チャネル決定部３０５は、i層のチャネルについて感度情報解析部３０４の出力である感度情報３１３と演算速度解析部３０３の出力である削除チャネル数３１１の情報をもとに、削除が必要なチャネル数を算出する。算出される削除に必要なチャネル数は、i層の処理速度が目標処理速度を上回るまで算出することが望ましい。そして、削除チャネル決定部３０５は、認識精度への感度が小さい順に削除優先度設定を行う(ステップS2005)。 The channel to be deleted determination unit 305 also calculates the number of channels that need to be deleted based on the sensitivity information 313 that is the output of the sensitivity information analysis unit 304 for the channels of the i layer and the number of channels to be deleted 311 that is the output of the calculation speed analysis unit 303. It is desirable to calculate the number of channels that need to be deleted until the processing speed of the i layer exceeds the target processing speed. The channel to be deleted determination unit 305 then sets deletion priorities in order of decreasing sensitivity to recognition accuracy (step S2005).

最後に、削除チャネル決定部３０５は、n層分個別に算出したチャネルの削除優先度情報を出力する(ステップS2006)。 Finally, the deletion channel determination unit 305 outputs the deletion priority information of the channels calculated individually for each n layer (step S2006).

次に、演算割り当て部１０９での動作について説明する。図９は、演算割り当て部１０９での動作についてのフローチャートである。以下処理フローについて説明する。 Next, the operation of the computation allocation unit 109 will be described. Figure 9 is a flowchart showing the operation of the computation allocation unit 109. The processing flow will be described below.

まず、演算割り当て部１０９は、削除インデックス決定部１０８からの削除チャネルインデックス情報１１６と削除フィルタインデックス情報１１４が入力されると、動作を開始する(ステップS3001)。次に、演算割り当て部１０９は、入力された削除チャネルインデックス情報を格納する(ステップS3002)。 First, the computation allocation unit 109 starts operation when the deletion channel index information 116 and deletion filter index information 114 are input from the deletion index determination unit 108 (step S3001). Next, the computation allocation unit 109 stores the input deletion channel index information (step S3002).

次に、演算割り当て部１０９は、削除フィルタインデックス情報および削除優先度を格納する(ステップS3002)。次に、演算割り当て部１０９は、削除を行うフィルタが削除フィルタインデックス情報のストライド方向から見て両端にあるかどうか判定する(ステップS3004)。なお、ストライド方向や両端に関しては、図１０を用いて後述する。両端にあると判定された場合には、ステップS3007へ進む。両端でないと判定された場合には、ステップS3005へ進む。 Next, the computation allocation unit 109 stores the deletion filter index information and the deletion priority (step S3002). Next, the computation allocation unit 109 determines whether the filter to be deleted is at either end as viewed from the stride direction of the deletion filter index information (step S3004). The stride direction and both ends will be described later with reference to FIG. 10. If it is determined that the filter is at either end, the process proceeds to step S3007. If it is determined that the filter is not at either end, the process proceeds to step S3005.

また、演算割り当て部１０９は、ストライド方向の変更が実装上可能かどうか判定する(ステップS3005)。このために、演算割り当て部１０９は、実装上の制約条件に基づいて判定することが望ましい。この判定の結果、変更可能な場合はステップS3009へ進む。変更不可能な場合はステップS3006へ進む。 The operation allocation unit 109 also determines whether changing the stride direction is possible in the implementation (step S3005). For this reason, it is desirable for the operation allocation unit 109 to make a determination based on implementation constraints. If the result of this determination is that the stride direction can be changed, the process proceeds to step S3009. If the stride direction cannot be changed, the process proceeds to step S3006.

次に、演算割り当て部１０９は、削除フィルタインデックスの優先度を両端の次点で優先度が高いものに変更する(ステップS3006)。次に、演算割り当て部１０９は、
削除チャネルインデックス情報を出力する(ステップS3007)。 Next, the computation allocation unit 109 changes the priority of the deletion filter index to the next highest priority after both ends (step S3006).
The deletion channel index information is output (step S3007).

そして、演算割り当て部１０９は、削除フィルタインデックス情報及びストライドの方向を出力する(ステップS3008)。 Then, the computation allocation unit 109 outputs the deletion filter index information and the stride direction (step S3008).

また、演算割り当て部１０９は、ステップS3006において不可能と判定された場合に、ストライドの方向を９０度変更させる(ステップS3009)。そして、ステップS3007に進む。 If it is determined in step S3006 that this is not possible, the computation allocation unit 109 changes the stride direction by 90 degrees (step S3009). Then, the process proceeds to step S3007.

次に、ストライドおよび削除可能なフィルタについて説明する。図１０は、ストライドおよび削除可能なフィルタの例を説明するための図である。畳み込み演算では、図１０（ａ）（ｂ）の中間データ８０１上を、畳み込みフィルタ８０３およびフィルタ８０５がストライド方向８０２及び８０４の方向へ移動しながら当該畳み込み演算がされる。ここではフィルタを３×３としているが、フィルタの大きさはこれに限定されない。 Next, the stride and the filters that can be removed will be described. FIG. 10 is a diagram for explaining an example of the stride and the filters that can be removed. In the convolution operation, the convolution operation is performed while the convolution filter 803 and the filter 805 move in the stride directions 802 and 804 on the intermediate data 801 in FIG. 10(a) and (b). Here, the filter is 3×3, but the size of the filter is not limited to this.

フィルタの両端とは、図１０（ａ）の畳み込みフィルタ８０３及び図１０（ｂ）のフィルタ８０５において灰色で示すフィルタのことである。ここで、畳み込みフィルタ８０３はストライド方向８０２が左右のため、両端のフィルタは左右となる。また、フィルタ８０５はストライド方向８０４が上下のため、両端のフィルタは上下となる。 The two ends of the filter refer to the filters shown in gray in the convolution filter 803 in FIG. 10(a) and the filter 805 in FIG. 10(b). Here, the stride direction 802 of the convolution filter 803 is left and right, so the filters at both ends are left and right. Also, the stride direction 804 of the filter 805 is up and down, so the filters at both ends are up and down.

次に、図１１に、３×３のフィルタについて削除を行った例を示す。ここでは、最大演算可能数が８とした場合における実際に削除した例を示す。最大演算数は、ここでは例として８としているが、これに限定されず２^ｎ（ｎ≧１）であればよい。このように、最大演算可能数が８の場合には、削除するフィルタの数は１つとなる。そのため、図１１に示した灰色の領域の内1つのフィルタを削除する。削除済みのフィルタ９１３が、横方向のストライド９１２で削除を行った後のフィルタである。また、削除済みのフィルタ９１４が、縦方向のストライド９１５で削除を行った後のフィルタである。以上で、実施例１の説明を終わる。 Next, FIG. 11 shows an example of deleting a 3×3 filter. Here, an example of actual deletion when the maximum number of operable operations is 8 is shown. The maximum number of operations is 8 as an example here, but is not limited to this and may be 2 ⁿ (n≧1). In this way, when the maximum number of operable operations is 8, the number of filters to be deleted is one. Therefore, one filter is deleted from the gray area shown in FIG. 11. The deleted filter 913 is the filter after deletion with a horizontal stride 912. Also, the deleted filter 914 is the filter after deletion with a vertical stride 915. This concludes the explanation of the first embodiment.

次に、実施例２について説明する。実施例２は、実施例１の認識装置１０００を、車両の走行の際における外界の認識に適用した例である。このため、実施例２では、速度無制限道路走行の際と、通常の高速道走行の際や市街地走行の際など、走行する速度及び演算処理に要求される速度に応じて畳み込み演算の処理速度を変更する。 Next, a second embodiment will be described. The second embodiment is an example in which the recognition device 1000 of the first embodiment is applied to recognition of the outside world while a vehicle is traveling. For this reason, in the second embodiment, the processing speed of the convolution calculation is changed according to the traveling speed and the speed required for the calculation processing, such as when traveling on a road with no speed limit, when traveling on a normal highway, or when traveling in an urban area.

ここで、通常の高速道路走行の際や市街地走行の際と、速度無制限道路走行の際では、ECUで必要とされる処理速度が異なる。また、走行する路面の幅や周辺状況も異なる。そこで、本実施例では走行速度の変化を観測し、一つの演算器で複数の演算を行う演算数を増やすことで、要求処理速度の増加に対応させている例を示す。 The processing speed required by the ECU differs when driving on normal highways or urban areas, and when driving on roads with no speed limit. The width of the road surface and surrounding conditions also differ. Therefore, this embodiment shows an example of responding to the increase in required processing speed by observing changes in driving speed and increasing the number of calculations that perform multiple calculations in one computing unit.

また、実施例２では、走行速度情報を取得し、単位時間当たりの平均時速Vがあらかじめ設計時に定めた走行速度上限値Xを上回ったときに、目標処理速度を変更し、削除インデックスの数を増加させる。このことで、演算数を増加させ処理速度を向上させるための実施例である。なお実施例１と共通部分については図に同一符号を付し、その説明を省略する。 In addition, in the second embodiment, driving speed information is acquired, and when the average speed V per unit time exceeds the upper driving speed limit X determined in advance at the time of design, the target processing speed is changed and the number of deletion indexes is increased. This is an embodiment for increasing the number of calculations and improving the processing speed. Note that parts common to the first embodiment are given the same reference numerals in the figures and their explanations are omitted.

図１２は、実施例２における演算装置である認識装置１０００の機能ブロック図である。ここで、図１２を用いて、実施例１との差異について説明する。実施例２の認識装置１０００では、走行速度取得部９０１が、追加されている。また、本実施例では、実施例１とは異なる構成である削除インデックス決定部９０３を有する。つづいて、本実施例２の認識装置１０００の接続関係について説明する。まず、走行速度取得部９０１は、削除インデックス決定部９０３へ走行速度を出力する。削除インデックス決定部１０８は、走行速度取得部９０１、チャネル情報取得部１０５、重みパラメータ取得部１０６、フィルタ情報取得部１０７の出力を受け取る。 Fig. 12 is a functional block diagram of the recognition device 1000, which is a calculation device in the second embodiment. Here, differences from the first embodiment will be described with reference to Fig. 12. The recognition device 1000 in the second embodiment has a driving speed acquisition unit 901 added. In addition, the present embodiment has a deletion index determination unit 903, which is configured differently from the first embodiment. Next, the connection relationship of the recognition device 1000 in the second embodiment will be described. First, the driving speed acquisition unit 901 outputs the driving speed to the deletion index determination unit 903. The deletion index determination unit 108 receives outputs from the driving speed acquisition unit 901, the channel information acquisition unit 105, the weight parameter acquisition unit 106, and the filter information acquisition unit 107.

次に、実施例２の認識装置１０００の動作のうち、実施例１との差分について説明する。まず、走行速度取得部９０１で車両走行速度を監視しており、現在の走行速度を継続的に削除インデックス決定部１０８へ出力する。 Next, differences between the operation of the recognition device 1000 of the second embodiment and the first embodiment will be described. First, the vehicle speed acquisition unit 901 monitors the vehicle speed, and continuously outputs the current speed to the deletion index determination unit 108.

なお、本実施例の認識装置１０００も、いわゆるコンピュータでも実現できる。この場合、各部の機能をプログラムに従ってＣＰＵのような処理装置で実行することになる。また、このプログラムは記憶媒体に格納される。また、各部は、ＦＰＧＡ（Field Programmable Gate Array）のような専用ハードウェアや専用回路でも実現できる。 The recognition device 1000 of this embodiment can also be realized by a so-called computer. In this case, the functions of each part are executed by a processing device such as a CPU according to a program. The program is stored in a storage medium. Each part can also be realized by dedicated hardware such as an FPGA (Field Programmable Gate Array) or a dedicated circuit.

図１３は、実施例２における削除インデックス決定部９０３の内部構成を示す図である。ここで、図１３を用いて、削除インデックス決定部９０３の構成について、削除インデックス決定部１０８との差分について説明する。 Figure 13 is a diagram showing the internal configuration of the deletion index determination unit 903 in the second embodiment. Here, the differences in the configuration of the deletion index determination unit 903 from the deletion index determination unit 108 will be described with reference to Figure 13.

削除インデックス決定部９０３は、実施例１の削除インデックス決定部１０８に加えて、目標処理速度判定部９０２と演算速度解析部９０５をさらに有する。ここで、目標処理速度判定部９０２は、走行速度取得部９０１から出力された走行速度情報を追加の入力とする。また、演算速度解析部９０５は、最大演算数保存部３０２からの出力９１６と目標処理速度判定部からの出力である目標処理速度３１０を追加の入力とし、最大演算数９０６を出力する。 The deletion index determination unit 903 further includes a target processing speed determination unit 902 and a calculation speed analysis unit 905 in addition to the deletion index determination unit 108 of the first embodiment. Here, the target processing speed determination unit 902 receives as an additional input the running speed information output from the running speed acquisition unit 901. The calculation speed analysis unit 905 receives as an additional input the output 916 from the maximum number of operations storage unit 302 and the target processing speed 310, which is the output from the target processing speed determination unit, and outputs the maximum number of operations 906.

次に、実施例２における削除インデックス決定部９０３の接続関係について、説明する。まず、目標処理速度判定部９０２に、走行速度情報が入力される。また、演算速度解析部９０５に、最大演算数保存部３０２の出力９１６と、目標処理速度判定部９０２の出力である目標処理速度３１０と、チャネル情報１１１とフィルタ情報１１３が入力される。そして、演算速度解析部９０５は、削除チャネル数３１１と削除フィルタ数３１２と変更された最大演算数９０６を出力する。また、変更された最大演算数９０６は、削除チャネルインデックス出力部３０７と削除フィルタインデックス出力部３０８に入力される。 Next, the connection relationship of the deletion index determination unit 903 in the second embodiment will be described. First, travel speed information is input to the target processing speed determination unit 902. In addition, the output 916 of the maximum number of operations storage unit 302, the target processing speed 310 which is the output of the target processing speed determination unit 902, the channel information 111, and the filter information 113 are input to the calculation speed analysis unit 905. Then, the calculation speed analysis unit 905 outputs the number of channels to be deleted 311, the number of filters to be deleted 312, and the changed maximum number of operations 906. In addition, the changed maximum number of operations 906 is input to the deletion channel index output unit 307 and the deletion filter index output unit 308.

以下、実施例２における削除インデックス決定部９０３の動作について説明する。目標処理速度判定部９０２では、入力された走行速度情報から単位時間当たりの走行速度Vを算出する。ここで、V<Xを満たす場合には、目標処理速度判定部９０２は、既存の目標処理速度Gを維持する。一方、目標処理速度判定部９０２は、V>Xを満たす場合には目標処理速度Gを２Gに変更して、変更された目標処理速度を演算速度解析部９０５へ送信する。 The operation of the deletion index determination unit 903 in the second embodiment will be described below. The target processing speed determination unit 902 calculates the running speed V per unit time from the input running speed information. Here, if V<X is satisfied, the target processing speed determination unit 902 maintains the existing target processing speed G. On the other hand, if V>X is satisfied, the target processing speed determination unit 902 changes the target processing speed G to 2G and transmits the changed target processing speed to the calculation speed analysis unit 905.

また、目標処理速度判定部９０２は、演算速度解析部９０５では目標処理速度が増加したときに、最大演算可能数２ⁿを２^n-1へと変更する。また、目標処理速度判定部９０２は、減少したときには２^ｎへ戻す。次に、目標処理速度判定部９０２は、変更した最大演算可能数をもとに削除フィルタチャネル数を求める。そして、目標処理速度判定部９０２は、変更した最大演算数９０６を削除チャネルインデックス出力部３０７と削除フィルタインデックス出力部３０８に送信する。それ以外の動作については、図５と同様ため省略する。 Furthermore, when the target processing speed is increased in the calculation speed analysis unit 905, the target processing speed determination unit 902 changes the maximum operable number ²ⁿ to 2n ^-1 . When the target processing speed is decreased, the target processing speed determination unit 902 returns it to ²ⁿ . Next, the target processing speed determination unit 902 determines the number of filter channels to be deleted based on the changed maximum operable number. Then, the target processing speed determination unit 902 transmits the changed maximum number of operations 906 to the deleted channel index output unit 307 and the deleted filter index output unit 308. Other operations are the same as those in FIG. 5 and will not be described.

次に、図１４は、演算器で演算する際のフィルタの格納の様子についての説明図である。図１４（ａ）が、最大演算数を減少させる前の例を表し、図１４（ｂ）が最大演算数を減少させた後の例を示す。また、これらは、演算器１５１、重みパラメータ１５２をもとに構成される。 Next, FIG. 14 is an explanatory diagram of the storage state of the filter when it is calculated by the calculator. FIG. 14(a) shows an example before the maximum number of calculations is reduced, and FIG. 14(b) shows an example after the maximum number of calculations is reduced. These are also configured based on the calculator 151 and weight parameters 152.

ここで、最大演算数が８(n=3)、フィルタサイズが９の場合について、図１４（ａ）を用いて、説明する。最大演算数が８の場合、フィルタから１つインデックスを削除し、演算器１５１に重みパラメータ１５２をセットして演算する。また、走行速度が上昇し、最大演算数が４となったときの例について、図１４（ｂ）を用いて説明する。最大演算数が４のときには、フィルタから５削除して演算器１５１にセットする。この場合に、図１４（ａ）と比較すると倍の重みパラメータ１５２を演算器１５１に入れることができることになる。以上で、実施例２についての説明を終わる。 Now, the case where the maximum number of operations is 8 (n=3) and the filter size is 9 will be described with reference to FIG. 14(a). When the maximum number of operations is 8, one index is deleted from the filter and the weight parameter 152 is set in the calculator 151 for calculation. Also, an example where the driving speed increases and the maximum number of operations becomes 4 will be described with reference to FIG. 14(b). When the maximum number of operations is 4, 5 is deleted from the filter and set in the calculator 151. In this case, compared to FIG. 14(a), twice as many weight parameters 152 can be input to the calculator 151. This concludes the description of the second embodiment.

実施例３は、実施例１および実施例２の認識装置１０００を、制御装置２０００に適用した実施例である。図１５は、実施例３における制御装置２０００の機能ブロック図である。この制御装置２０００は、例えば、ECUとして実装される。 The third embodiment is an embodiment in which the recognition device 1000 of the first and second embodiments is applied to a control device 2000. FIG. 15 is a functional block diagram of the control device 2000 in the third embodiment. The control device 2000 is implemented as, for example, an ECU.

図３において、制御装置２０００は、認識装置１０００、制御信号生成部２００１を有する。そして、実施例１や２の処理を実行する認識装置１０００から出力される認識結果１０３を、制御信号生成部２００１に送信する。次に、制御信号生成部２００１では、認識結果に応じて、制御信号２００２を生成し、これに基づいて制御対象３０００の制御を行う。 In FIG. 3, the control device 2000 has a recognition device 1000 and a control signal generation unit 2001. The recognition result 103 output from the recognition device 1000, which executes the processing of the first and second embodiments, is transmitted to the control signal generation unit 2001. Next, the control signal generation unit 2001 generates a control signal 2002 according to the recognition result, and controls the control target 3000 based on this.

ここで、制御装置２０００がECUで実行される場合、制御対象３０００は車両となる。この場合、各実施例に記載した処理に基づいて、車両の自動運転や運転支援を実現できる。 Here, when the control device 2000 is executed by an ECU, the control target 3000 is a vehicle. In this case, automatic driving and driving assistance of the vehicle can be realized based on the processing described in each embodiment.

以上で、各実施例の説明を終わるが、各実施例には様々な変形例や適用例が想定される。例えば、各実施例では、認識装置１０００を例に説明したが、認識に限定しない演算を行う演算装置も各実施例の範疇に含まれる。 This concludes the explanation of each embodiment, but various modifications and application examples are envisioned for each embodiment. For example, in each embodiment, the recognition device 1000 has been used as an example, but a calculation device that performs calculations not limited to recognition also falls within the scope of each embodiment.

また、各実施例によれば、一般的な画像データを用いて演算を行った場合に、演算器の活用率は50%程度であり、本発明による演算の削除機構を用いることで、演算器の活用率を約100%近くまで向上させることが期待できる。つまり、活用率を向上できる。 In addition, according to each embodiment, when performing calculations using general image data, the utilization rate of the calculator is about 50%, and by using the calculation deletion mechanism according to the present invention, it is expected that the utilization rate of the calculator can be improved to nearly 100%. In other words, the utilization rate can be improved.

さらに、本実施例では、畳み込みニューラルネットワークでの演算の際に、畳み込み演算で使用するフィルタ及びチャネルの一部を削除する。このため、層ごとに異なるフィルタ、チャネルの位置、もしくは順番を示す番号である削除インデックス情報を演算制御部へ与える。このことで、演算器の演算単位に合わせて、畳み込み演算の入力データの一部、および重みパラメータの一部の読み込みをスキップし、畳み込み演算の一部を削除する。これにより限られた演算性能を持つデバイスで演算器を効率よく使用することができる。 Furthermore, in this embodiment, when performing calculations in a convolutional neural network, some of the filters and channels used in the convolution calculation are deleted. For this reason, deletion index information, which is a number indicating the position or order of the filters and channels that differ for each layer, is provided to the calculation control unit. This allows the reading of some of the input data for the convolution calculation and some of the weight parameters to be skipped in accordance with the calculation unit of the calculator, and part of the convolution calculation is deleted. This allows the calculator to be used efficiently in a device with limited calculation performance.

１０１外界情報取得装置
１０２ CNN演算部
１０４モデル保存部
１０５チャネル情報取得部
１０６重みパラメータ取得部
１０７フィルタ情報取得部
１０８削除インデックス決定部
１０９演算割り当て部
１１１チャネル情報
１１２重みパラメータ情報
１１３フィルタ情報
１１４削除フィルタインデックス情報
１１５ CNN演算制御信号
１１６削除チャネルインデックス情報
１１７入力データ
２０２パラメータ格納部
２０３畳み込み演算部
２０４分岐したCNN演算制御信号
２０８個別重みパラメータ
２０９個別フィルタ数
２１０個別チャネル数
２１１保存データ
３０１目標処理速度格納部
３０２最大演算数保存部
３０３演算速度解析部
３０４感度情報解析部
３０５削除チャネル決定部
３０６削除フィルタ決定部
４０１演算制御部
４１２演算停止制御信号
８０１中間データ
８０２ストライド方向
８０３畳み込みフィルタ
９０１走行速度取得部
９０２目標処理速度判定部 101 External world information acquisition device 102 CNN calculation unit 104 Model storage unit 105 Channel information acquisition unit 106 Weight parameter acquisition unit 107 Filter information acquisition unit 108 Delete index determination unit 109 Calculation allocation unit 111 Channel information 112 Weight parameter information 113 Filter information 114 Delete filter index information 115 CNN calculation control signal 116 Delete channel index information 117 Input data 202 Parameter storage unit 203 Convolution calculation unit 204 Branched CNN calculation control signal 208 Individual weight parameter 209 Number of individual filters 210 Number of individual channels 211 Saved data 301 Target processing speed storage unit 302 Maximum number of calculations storage unit 303 Calculation speed analysis unit 304 Sensitivity information analysis unit 305 Delete channel determination unit 306 Delete filter determination unit 401 Calculation control unit 412 Calculation stop control signal 801 Intermediate data 802 Stride direction 803 Convolution filter 901 Travel speed acquisition unit 902 Target processing speed determination unit

Claims

In a computing device that performs CNN calculations based on input data,
A model storage unit for storing a model used in the CNN calculation;
A CNN calculation unit that performs the CNN calculation by executing a convolution calculation for each of a plurality of convolution layers using the model;
a weight parameter acquisition unit that acquires weight parameters of a convolution filter used in the convolution operation from the model storage unit;
a channel information acquisition unit that acquires channel information for each of the plurality of convolution layers from the model stored in the model storage unit;
a filter information acquisition unit that acquires filter information for each of the plurality of convolution layers from the model storage unit;
An operation allocation unit that allocates a combination of weight information and operations required for the CNN operation and transmits the combination to the CNN operation unit;
a deletion index determination unit that deletes a portion of filter information used for the convolution calculations, based on the maximum number of calculations possible for the CNN calculation unit, an execution order of the CNN calculations, channel information for each convolution layer of the CNN calculation unit acquired from the channel information acquisition unit, weight information acquired from the weight parameter acquisition unit, and filter information for each of the plurality of convolution layers acquired from the filter information acquisition unit.

2. The computing device according to claim 1,
The deletion index determination unit is a calculation device that further deletes a part of the channel information used in the convolution calculation.

2. The computing device according to claim 1,
The deletion index determination unit is a calculation device that determines filter information to be deleted in response to a change in the weight information.

3. The computing device according to claim 2,
The deletion index determination unit is a calculation device that determines channel information to be deleted in response to a change in the weight information.

2. The computing device according to claim 1,
The filter indicated by the filter information is composed of a plurality of weight parameters,
The deletion index determination unit is a calculation device that determines a weight parameter that satisfies a predetermined rule from among the plurality of parameters.

6. The computing device according to claim 5,
The deletion index determination unit is a calculation device that determines a weight parameter to be deleted based on an amount of deterioration in calculation accuracy as the predetermined rule.

2. The computing device according to claim 1,
The deletion index determination unit is a calculation device that determines filter information to be deleted in accordance with a processing speed.

8. The computing device according to claim 7,
The deletion index determination unit is a calculation device that determines filter information to be deleted based on sensitivity information indicating sensitivities of the plurality of convolution layers and the size of the filter.

The computing device according to any one of claims 1 to 8,
As the input data, external environment information acquired from an external environment acquisition device is used,
A recognition device that recognizes an external situation using the external world information.

The recognition device according to claim 9,
A control device characterized in that the result of the calculation in the recognition device is output as a control signal for an object in accordance with the recognized external situation.