JP7716068B2

JP7716068B2 - Image object detection device, image object detection method, and program

Info

Publication number: JP7716068B2
Application number: JP2022078792A
Authority: JP
Inventors: 泳青孫; 幸浩坂東; 祐介日和▲崎▼; 弘劉; 真一佐藤
Original assignee: Nippon Telegraph and Telephone Corp; Inter University Research Institute Corp Research Organization of Information and Systems; NTT Inc USA
Current assignee: Inter University Research Institute Corp Research Organization of Information and Systems; NTT Inc; NTT Inc USA
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2025-07-31
Anticipated expiration: 2042-05-12
Also published as: JP2023167528A

Description

本発明は、画像物体検出装置、画像物体検出方法、及びプログラムに関する。 The present invention relates to an image object detection device, an image object detection method, and a program.

近年、深層学習を利用して画像から物体を検出する数多くの技術が提案されている。このような物体検出技術は、今後、自動運転の分野に利用されていくことが見込まれている。これまでに提案されている物体検出技術は、晴天などの良好な天気の状態で撮影された画像については、優れた物体検出能力を備えていることが報告されている。ただし、当該技術を自動運転の分野に適用するためには、様々な天気の状態において、良好な天気の状態の場合と同等の物体検出能力を有することが要求される。 In recent years, numerous technologies have been proposed that use deep learning to detect objects in images. It is expected that such object detection technologies will be used in the field of autonomous driving in the future. It has been reported that the object detection technologies proposed to date have excellent object detection capabilities for images taken in good weather conditions, such as sunny skies. However, in order to apply these technologies to the field of autonomous driving, they must have object detection capabilities in a variety of weather conditions that are equivalent to those in good weather.

Claudio Michaelis et al. ,”Benchmarking Robustness in Object Detection : Autonomous Driving when Winter is Coming”,[online], 31th March 2020, arXiv preprint, arXiv:1907.07484,[令和４年３月１７日検索],インターネット<URL:https://arxiv.org/pdf/1907.07484.pdf>Claudio Michaelis et al. ,”Benchmarking Robustness in Object Detection : Autonomous Driving when Winter is Coming”,[online], 31th March 2020, arXiv preprint, arXiv:1907.07484, [Retrieved March 17, 2020], Internet <URL: https://arxiv.org/pdf/1907.07484.pdf>

しかしながら、厳しい気象条件の環境、光の少ない場所や夜間の環境の下では、カメラの撮像素子のセンサノイズが増加するため場合、撮影された画像の品質は低下する。そのため、これまでに提案されている物体検出技術によって、例えば、雨や霧などの良好でない天気の状態において撮影された画像を対象として物体検出を行った場合、物体検出の精度が、良好な天気の状態において撮影された画像を対象とする場合よりも低くなるという問題がある（例えば、非特許文献１参照）。 However, in harsh weather conditions, low-light locations, or nighttime environments, the quality of captured images can deteriorate due to increased sensor noise in the camera's imaging element. Therefore, when object detection is performed using previously proposed object detection technologies on images captured in poor weather conditions, such as rain or fog, the accuracy of object detection is lower than when images captured in good weather conditions are used (see, for example, Non-Patent Document 1).

本発明は、良好でない天気の状態で撮影された画像から、良好な天気の状態の場合と同等の精度で物体を検出するができる技術の提供を目的としている。 The present invention aims to provide technology that can detect objects from images taken in poor weather conditions with the same accuracy as in good weather conditions.

本発明の一態様は、画像に表示される天気の特徴を示す天気特徴モデルを生成する天気特徴モデル生成部と、物体検出対象の画像データから特徴マップを生成する特徴マップ生成部と、前記特徴マップ生成部が生成した特徴マップに対して、前記天気特徴モデル生成部が生成した前記天気特徴モデルを因果介入する結合を行うことにより、天気による影響を抑制した特徴マップを生成する結合部と、前記結合部が生成する特徴マップから物体検出を行う物体検出部と、を備える画像物体検出装置である。 One aspect of the present invention is an image object detection device that includes a weather feature model generation unit that generates a weather feature model that indicates the characteristics of the weather displayed in an image; a feature map generation unit that generates a feature map from image data of a target for object detection; a combination unit that generates a feature map in which the influence of weather is suppressed by causally combining the weather feature model generated by the weather feature model generation unit with the feature map generated by the feature map generation unit; and an object detection unit that detects objects from the feature map generated by the combination unit.

本発明の一態様は、画像に表示される天気の特徴を示す天気特徴モデルを生成する天気特徴モデル生成ステップと、物体検出対象の画像データから特徴マップを生成する特徴マップ生成ステップと、前記特徴マップ生成ステップにより生成された特徴マップに対して、前記天気特徴モデル生成ステップにより生成された前記天気特徴モデルを因果介入する結合を行うことにより、天気による影響を抑制した特徴マップを生成する結合ステップと、前記結合ステップにより生成された特徴マップから物体検出を行う物体検出ステップと、を含む画像物体検出方法である。 One aspect of the present invention is an image object detection method including: a weather feature model generation step of generating a weather feature model that indicates the characteristics of the weather displayed in an image; a feature map generation step of generating a feature map from image data of a target for object detection; a combination step of causally combining the weather feature model generated in the weather feature model generation step with the feature map generated in the feature map generation step to generate a feature map in which the influence of weather is suppressed; and an object detection step of detecting objects from the feature map generated in the combination step.

本発明の一態様は、コンピュータを、画像に表示される天気の特徴を示す天気特徴モデルを生成する天気特徴モデル生成手段、物体検出対象の画像データから特徴マップを生成する特徴マップ生成手段、前記特徴マップ生成手段が生成した特徴マップに対して、前記天気特徴モデル生成手段が生成した前記天気特徴モデルを因果介入する結合を行うことにより、天気による影響を抑制した特徴マップを生成する結合手段、前記結合手段が生成する特徴マップから物体検出を行う物体検出手段、として機能させるためのプログラムである。 One aspect of the present invention is a program that causes a computer to function as weather feature model generation means that generates a weather feature model that indicates the characteristics of the weather displayed in an image, feature map generation means that generates a feature map from image data of a target for object detection, combination means that generates a feature map in which the influence of weather is suppressed by causally combining the weather feature model generated by the weather feature model generation means with the feature map generated by the feature map generation means, and object detection means that detects objects from the feature map generated by the combination means.

本発明により、良好でない天気の状態で撮影された画像から、良好な天気の状態の場合と同等の精度で物体を検出することが可能になる。 This invention makes it possible to detect objects from images taken in poor weather conditions with the same accuracy as in good weather conditions.

第１の実施形態の画像物体検出装置の構成を示すブロック図である。1 is a block diagram showing a configuration of an image object detection device according to a first embodiment. 第１の実施形態の画像物体検出装置の詳細構成を示すブロック図である。FIG. 2 is a block diagram showing a detailed configuration of the image object detection device according to the first embodiment. 第１の実施形態の天気特徴モデル生成部による処理の流れを示す図である。FIG. 4 is a diagram illustrating a processing flow by a weather feature model generation unit of the first embodiment. 第１の実施形態の天気特徴モデル生成部による処理の概要を示す図である。FIG. 4 is a diagram illustrating an outline of processing by a weather feature model generation unit according to the first embodiment. 第１の実施形態における物体検出の処理の流れを示す図である。FIG. 4 is a diagram illustrating a processing flow of object detection according to the first embodiment. 第１の実施形態の結合部による処理の概要を示す図である。FIG. 10 is a diagram illustrating an outline of processing by a combining unit according to the first embodiment. 第２の実施形態の画像物体検出装置の構成を示すブロック図である。FIG. 10 is a block diagram showing the configuration of an image object detection device according to a second embodiment. 第２の実施形態の天気特徴モデル生成部による処理の流れを示す図である。FIG. 10 is a diagram illustrating a processing flow by a weather feature model generation unit of the second embodiment.

（第１の実施形態）
以下、本発明の実施形態について図面を参照して説明する。図１は、第１の実施形態による画像物体検出装置１の構成を示すブロック図である。画像物体検出装置１は、天気特徴モデル生成部１１、天気特徴モデル記憶部１２、画像データ記憶部１３、特徴マップ生成部１４、結合部１５、及び物体検出部１６を備える。 (First embodiment)
[0023] Hereinafter, embodiments of the present invention will be described with reference to the drawings. Fig. 1 is a block diagram showing the configuration of an image object detection device 1 according to a first embodiment. The image object detection device 1 includes a weather feature model generation unit 11, a weather feature model storage unit 12, an image data storage unit 13, a feature map generation unit 14, a combination unit 15, and an object detection unit 16.

天気特徴モデル生成部１１は、画像に表示される天気の特徴を示す天気特徴モデルを生成する。天気特徴モデル記憶部１２は、天気特徴モデル生成部１１が生成する天気特徴モデルを記憶する。画像データ記憶部１３は、物体検出の対象となる画像データを記憶する。特徴マップ生成部１４は、画像データ記憶部１３が記憶する画像データから特徴マップを生成する。結合部１５は、特徴マップ生成部１４が生成した特徴マップと、天気特徴モデル記憶部１２が記憶する天気特徴モデルとを結合することにより、天気による影響を抑制した特徴マップを生成する。ここで、結合部１５が行う特徴マップと、天気特徴モデルとを結合する処理とは、いわゆる構造的因果モデルにおける介入である因果介入(causal intervention)による結合の処理である。当該処理を行うことにより、特徴マップに対応する画像データが、例えば、雨の状態で撮影された画像データである場合、画像データにおいて雨による影響が抑制されることになる。物体検出部１６は、結合部１５が生成した特徴マップから、当該特徴マップに対応する画像データに含まれている物体の種類、位置、及び範囲を検出する物体検出の処理を行う。 The weather feature model generation unit 11 generates a weather feature model that indicates the characteristics of the weather displayed in the image. The weather feature model storage unit 12 stores the weather feature model generated by the weather feature model generation unit 11. The image data storage unit 13 stores image data that is the target of object detection. The feature map generation unit 14 generates a feature map from the image data stored in the image data storage unit 13. The combination unit 15 combines the feature map generated by the feature map generation unit 14 with the weather feature model stored in the weather feature model storage unit 12 to generate a feature map that suppresses the effects of weather. Here, the process of combining the feature map and the weather feature model performed by the combination unit 15 is a combination process using causal intervention, which is an intervention in a so-called structural causal model. By performing this process, if the image data corresponding to the feature map is, for example, image data taken in rainy conditions, the effects of rain in the image data are suppressed. The object detection unit 16 performs object detection processing to detect the type, position, and range of objects contained in the image data corresponding to the feature map generated by the combination unit 15.

図２は、画像物体検出装置１の詳細な構成の一例を示すブロック図であり、物体検出の手法として、ｍａｓｋＲ－ＣＮＮ（Region based Convolutional Neural Network）の手法を適用した例を示している。画像データ記憶部１３は、例えば、物体検出の対象となる画像データを予め記憶する。画像データは、例えば、カラーの画像データであり、Ｒ（Red）、Ｇ（Green）、Ｂ（Blue）の各々のチャンネルの２次元の画像データであって同一サイズの２次元の画像データが３チャンネル分、重ね合わせられた３次元配列のデータである。ここで、サイズとは、２次元の画像データの縦と横のピクセル数、言い換えると、２次元の配列データの縦と横のデータ数によって定められる大きさである。 Figure 2 is a block diagram showing an example of the detailed configuration of the image object detection device 1, illustrating an example in which the mask R-CNN (Region-based Convolutional Neural Network) method is applied as the object detection method. The image data storage unit 13, for example, stores in advance image data to be used for object detection. The image data is, for example, color image data, and is two-dimensional image data for each of the R (Red), G (Green), and B (Blue) channels, forming a three-dimensional array in which three channels of two-dimensional image data of the same size are overlaid. Here, size refers to the number of vertical and horizontal pixels in the two-dimensional image data, or in other words, the size determined by the number of vertical and horizontal data elements in the two-dimensional array data.

特徴マップ生成部１４は、ＣＮＮ（Convolutional Neural Network）層２１を備えており、画像データ記憶部１３が記憶する画像データに対してＣＮＮ層２１により畳み込み演算を行って特徴マップ５０を生成する。ここで、特徴マップ５０は、次式（１）により定義される。 The feature map generation unit 14 includes a CNN (Convolutional Neural Network) layer 21, which performs a convolution operation on the image data stored in the image data storage unit 13 to generate a feature map 50. Here, the feature map 50 is defined by the following equation (1):

式（１）において、「Ｆ」は、特徴マップ５０を示しており、「Ｒ」は、実数空間を示す数学記号である。Ｒの上付き添え字の「ｄ」は、次元数であり、「ｃ」は、チャンネル数である。すなわち、式（１）は、特徴マップ５０がｄ×ｃ次元実数空間の要素として表されることを示している。ここでは、ＣＮＮ層２１は、画像データ記憶部１３が記憶する２次元の画像データがチャンネル方向に３つ重ね合わせられたデータに対して畳み込み演算を行うため、畳み込み演算により得られる特徴マップ５０の次元数「ｄ」は、ｄ＝２になる。チャンネル数「ｃ」は、ＣＮＮ層２１において適用されるフィルタのチャンネル数などに応じた値になる。 In equation (1), "F" represents the feature map 50, and "R" is a mathematical symbol representing real space. The superscript "d" of R represents the number of dimensions, and "c" represents the number of channels. In other words, equation (1) indicates that the feature map 50 is expressed as an element of a dxc-dimensional real space. Here, the CNN layer 21 performs a convolution operation on data in which three pieces of two-dimensional image data stored in the image data storage unit 13 are superimposed in the channel direction. Therefore, the number of dimensions "d" of the feature map 50 obtained by the convolution operation is d = 2. The number of channels "c" is a value that corresponds to the number of channels of the filter applied in the CNN layer 21, etc.

より詳細には、ＣＮＮ層２１は、ＣＮＮのみを含むＦＣＮ（Fully Convolutional Network）などの深層ニューラルネットワークが適用されてもよいし、ＶＧＧ(Visual Geometry Group)、ＲｅｓＮｅｔ(Residual Network)などのＣＮＮと、ＣＮＮ以外のニューラルネットワークとを含む深層ニューラルネットワークが適用されてもよい。ＣＮＮ層２１による畳み込み演算によって得られる特徴マップ５０は、チャンネル数「ｃ」個の同一サイズの２次元配列のデータが、チャンネル方向に重ね合わせられた３次元配列のデータになる。重ね合わせられる２次元配列のデータは、縦と横のデータ数が同一であるデータである。当該２次元配列のデータのサイズは、特徴マップ生成部１４が画像データ記憶部１３から読み出す画像データのサイズを縮小したサイズであって、特徴マップ生成部１４が画像データ記憶部１３から読み出す画像データのサイズ及びＣＮＮ層２１のフィルタのサイズに応じたサイズになる。 More specifically, the CNN layer 21 may employ a deep neural network such as a fully convolutional network (FCN) that includes only a CNN, or a deep neural network that includes a CNN such as a visual geometry group (VGG) or a residual network (ResNet) and a neural network other than a CNN. The feature map 50 obtained by the convolution operation by the CNN layer 21 is a three-dimensional array of data in which two-dimensional arrays of the same size for the number of channels "c" are superimposed in the channel direction. The superimposed two-dimensional array data has the same number of data columns and columns. The size of the two-dimensional array data is a reduced size of the image data read by the feature map generation unit 14 from the image data storage unit 13, and corresponds to the size of the image data read by the feature map generation unit 14 from the image data storage unit 13 and the size of the filter in the CNN layer 21.

物体検出部１６は、ＲＰＮ(Region Proposal Network)層２２と、ＢｏｘＨｅａｄ部２３とを備える。ＲＰＮ層２２は、特徴マップ５０において物体が存在する位置と範囲を検出するニューラルネットワークである。ＢｏｘＨｅａｄ部２３は、ＲｏＩ(Region of Interest)－Ａｌｉｇｎ層３１、全結合層３２，３３－１，３３－２、及びＣＮＮ層３４，３５を備える。ＲｏＩ－Ａｌｉｇｎ層３１は、結合部１５が生成する特徴マップ５１、すなわち、天気による影響が抑制された特徴マップ５１と、ＲＰＮ層２２の出力とに基づいて、特徴マップ５１において物体が存在する部分を抽出し、抽出した部分のデータに対してＰｏｏｌｉｎｇ処理を行い、Ｐｏｏｌｉｎｇ処理を行った結果を示すデータを出力するニューラルネットワークである。 The object detection unit 16 includes an RPN (Region Proposal Network) layer 22 and a Box Head unit 23. The RPN layer 22 is a neural network that detects the location and range of an object in the feature map 50. The Box Head unit 23 includes an RoI (Region of Interest)-Align layer 31, fully connected layers 32, 33-1, and 33-2, and CNN layers 34 and 35. The RoI-Align layer 31 is a neural network that extracts areas where objects exist in the feature map 51 based on the feature map 51 generated by the combining unit 15 (i.e., the feature map 51 in which the effects of weather have been suppressed) and the output of the RPN layer 22, performs pooling processing on the data for the extracted areas, and outputs data indicating the results of the pooling processing.

全結合層３２，３３－１，３３－２は、全結合のニューラルネットワークである。全結合層３２，３３－１は、ＲｏＩ－Ａｌｉｇｎ層３１が出力するデータが全結合層３２に与えられと、与えられたデータから物体の種類を示すクラスのデータを算出する。全結合層３３－１の出力段は、Ｓｏｆｔｍａｘ関数になっており、全結合層３２，３３－１は、物体のクラスを示すデータとして、当該物体が、予め定められる複数のクラスのいずれのクラスに属するかを確率的に示すデータを算出する。全結合層３２，３３－２は、ＲｏＩ－Ａｌｉｇｎ層３１が出力するデータが全結合層３２に与えられると、与えられたデータから物体が存在する位置と範囲を示すデータ、いわゆるバウンディングボックスの位置と範囲を示すデータを算出する。 The fully connected layers 32, 33-1, and 33-2 are fully connected neural networks. When data output by the RoI-Align layer 31 is provided to the fully connected layer 32, the fully connected layers 32 and 33-1 calculate class data indicating the type of object from the provided data. The output stage of the fully connected layer 33-1 is a Softmax function, and the fully connected layers 32 and 33-1 calculate data indicating the object's class, probabilistically indicating which of multiple predetermined classes the object belongs to. When data output by the RoI-Align layer 31 is provided to the fully connected layer 32, the fully connected layers 32 and 33-2 calculate data indicating the position and range of the object, or the position and range of the so-called bounding box, from the provided data.

ＣＮＮ層３４，３５は、畳み込み演算を行うニューラルネットワークである。ＣＮＮ層３４，３５は、ＲｏＩ－Ａｌｉｇｎ層３１が出力するデータがＣＮＮ層３４に与えられると、与えられたデータから物体の部分に対して適用するマスクの種類を示すデータを出力する。なお、ＣＮＮ層２１、ＲＰＮ層２２、全結合層３２，３３－１，３３－２、及びＣＮＮ層３４，３５を構成するニューラルネットワークのニューロンには、物体検出の処理が行われる前に、学習済みの重みとバイアスとが適用される。 CNN layers 34 and 35 are neural networks that perform convolutional operations. When data output by RoI-Align layer 31 is provided to CNN layer 34, CNN layers 34 and 35 output data indicating the type of mask to apply to the object portion from the provided data. Note that trained weights and biases are applied to the neurons of the neural networks that make up CNN layer 21, RPN layer 22, fully connected layers 32, 33-1, 33-2, and CNN layers 34 and 35 before object detection processing is performed.

天気特徴モデル生成部１１は、天気画像データ記憶部４１、特徴マップ生成部４２、分類部４３、検出部４４、及び合成部４５を備える。天気画像データ記憶部４１は、様々な天気の状態で撮影された複数の画像データを予め記憶する。天気画像データ記憶部４１が記憶する画像データは、画像データ記憶部１３が記憶する画像データと同様に、ＲＧＢのカラーの画像データである。 The weather feature model generation unit 11 includes a weather image data storage unit 41, a feature map generation unit 42, a classification unit 43, a detection unit 44, and a synthesis unit 45. The weather image data storage unit 41 pre-stores multiple image data captured under various weather conditions. The image data stored in the weather image data storage unit 41 is RGB color image data, similar to the image data stored in the image data storage unit 13.

特徴マップ生成部４２は、特徴マップ生成部１４と同様に、例えば、ＦＣＮ、ＶＧＧ、ＲｅｓＮｅｔなどのＣＮＮ層を備えており、天気画像データ記憶部４１が記憶する複数の画像データの各々から特徴マップを生成する。分類部４３は、特徴マップ生成部４２が生成する複数の特徴マップをクラスタリングする。検出部４４は、分類部４３がクラスタリングした各クラスタの中心特徴ベクトルを検出する。合成部４５は、検出部４４が検出した中心特徴ベクトルを合成して、様々な天気の特徴を一括して表すモデルである天気特徴モデル７０を生成する。ここで、天気特徴モデル７０は、次式（２）により定義される。 Like the feature map generation unit 14, the feature map generation unit 42 has a CNN layer such as FCN, VGG, or ResNet, and generates a feature map from each of the multiple image data stored in the weather image data storage unit 41. The classification unit 43 clusters the multiple feature maps generated by the feature map generation unit 42. The detection unit 44 detects the central feature vector of each cluster clustered by the classification unit 43. The synthesis unit 45 synthesizes the central feature vectors detected by the detection unit 44 to generate a weather feature model 70, which is a model that collectively represents various weather features. Here, the weather feature model 70 is defined by the following equation (2).

式（２）において、「Ｗ」は、天気特徴モデル７０を示しており、式（２）は、天気特徴モデル７０がｄ×ｈ次元実数空間の要素として表されることを示している。式（２）において、次元数「ｄ」は、特徴マップ５０と同様にｄ＝２である。合成部４５は、特徴マップ５０の１チャンネル分の２次元配列のデータと同一のサイズの２次元配列のデータが、チャンネル数「ｈ」個分、重ね合わせられた３次元配列のデータになるように中心特徴ベクトルを合成して天気特徴モデル７０を生成する。チャンネル数「ｈ」は、分類部４３がクラスタリングした際に得られるクラスタの数、すなわち、天気の種類の数である。 In equation (2), "W" represents the weather feature model 70, and equation (2) indicates that the weather feature model 70 is expressed as an element of a dxh-dimensional real space. In equation (2), the number of dimensions "d" is d = 2, the same as the feature map 50. The synthesis unit 45 generates the weather feature model 70 by synthesizing the central feature vectors so that two-dimensional array data of the same size as the two-dimensional array data for one channel of the feature map 50 becomes three-dimensional array data in which the number of channels "h" of data are superimposed. The number of channels "h" is the number of clusters obtained when the classification unit 43 performs clustering, i.e., the number of weather types.

（第１の実施形態の天気特徴モデル生成部による処理）
図３、図４を参照しつつ、第１の実施形態の天気特徴モデル生成部１１による処理について説明する。図３に示す処理が開始される前に、天気画像データ記憶部４１には、予め定められる複数の種類の天気の状態で撮影された複数の画像データが予め書き込まれる。ここでは、一例として、図４に示すように、天気画像データ記憶部４１には「雨」、「霧」、「雪」、「曇り」、「薄曇り」の各々の状態で撮影された雨の画像データ５２－１、霧の画像データ５２－２、雪の画像データ５２－３、曇りの画像データ５２－４、薄曇りの画像データ５２－５，…という複数の画像データが予め書き込まれているものとする。図４では、雨の天気に対する画像データとして、１つの雨の画像データ５２－１を示しているが、１つの天気の種類に対して、複数の画像データが天気画像データ記憶部４１に記憶されているものとする。 (Processing by the weather feature model generation unit of the first embodiment)
The processing by the weather feature model generation unit 11 of the first embodiment will be described with reference to FIGS. 3 and 4. Before the processing shown in FIG. 3 is started, a plurality of image data captured under a plurality of predetermined weather conditions is written in advance to the weather image data storage unit 41. As an example, as shown in FIG. 4, the weather image data storage unit 41 is assumed to have a plurality of image data pre-written therein, including rain image data 52-1, fog image data 52-2, snow image data 52-3, cloudy image data 52-4, and lightly cloudy image data 52-5, each captured under each of the following weather conditions: "rain,""fog,""snow,""cloudy," and "slightly cloudy." While FIG. 4 shows one rain image data 52-1 as image data for rainy weather, it is assumed that a plurality of image data for each weather type is stored in the weather image data storage unit 41.

雨の画像データ５２－１、霧の画像データ５２－２、雪の画像データ５２－３、曇りの画像データ５２－４、薄曇りの画像データ５２－５，…は、任意の位置で撮影された画像データであり、多くの画像データにおいて撮影位置が異なっている方が、最終的に得られる天気特徴モデル７０がより一般化されることになる。ただし、一部の画像データにおいて同一の位置で撮影された画像データが含まれていてもよい。 Rainy image data 52-1, foggy image data 52-2, snowy image data 52-3, cloudy image data 52-4, slightly cloudy image data 52-5, etc. are image data captured at arbitrary locations, and the more image data captured at different locations, the more general the weather feature model 70 that is ultimately obtained. However, some image data may contain image data captured at the same location.

特徴マップ生成部４２は、天気画像データ記憶部４１に記憶されている雨の画像データ５２－１、霧の画像データ５２－２、雪の画像データ５２－３、曇りの画像データ５２－４、薄曇りの画像データ５２－５，…を１つずつ読み出し、読み出した画像データの各々に対して畳み込み演算を行って特徴マップを生成する。すなわち、図４に示すように、特徴マップ生成部４２は、雨の画像データ５２－１に対して雨の画像データの特徴マップ５３－１を生成する。特徴マップ生成部４２は、雨の画像データ５２－１以外の霧の画像データ５２－２、雪の画像データ５２－３、曇りの画像データ５２－４、薄曇りの画像データ５２－５，…の各々についても、各々に対応する特徴マップ５３－２，５３－３，５３－４，５３－５，…を生成する。特徴マップ生成部４２は、生成した特徴マップ５３－１，５３－２，５３－３，５３－４，５３－５，…を分類部４３に出力する（ステップＳａ１）。 The feature map generation unit 42 reads out the rain image data 52-1, fog image data 52-2, snow image data 52-3, cloudy image data 52-4, lightly cloudy image data 52-5, etc. stored in the weather image data storage unit 41 one by one, and performs a convolution operation on each of the read image data to generate a feature map. That is, as shown in FIG. 4, the feature map generation unit 42 generates feature map 53-1 for the rain image data 52-1. The feature map generation unit 42 also generates corresponding feature maps 53-2, 53-3, 53-4, 53-5, etc. for each of the fog image data 52-2, snow image data 52-3, cloudy image data 52-4, lightly cloudy image data 52-5, etc. other than the rain image data 52-1. The feature map generation unit 42 outputs the generated feature maps 53-1, 53-2, 53-3, 53-4, 53-5, ... to the classification unit 43 (step Sa1).

分類部４３は、特徴マップ生成部４２が出力する特徴マップ５３－１，５３－２，５３－３，５３－４，５３－５，…を取り込み、取り込んだ特徴マップ５３－１，５３－２，５３－３，５３－４，５３－５，…を、例えば、混合ガウスモデル（ＧＭＭ(Gaussian Mixture Model)によってクラスタリングする。例えば、分類部４３は、図４の散布図６０に示すように、ベクトル空間内に特徴マップ５３－１，５３－２，５３－３，５３－４，５３－５，…をプロットする。散布図６０において、「〇」、「☆」、「◇」、「□」、「△」のマークが、特徴マップ５３－１，５３－２，５３－３，５３－４，５３－５，…をプロットした結果である。 The classification unit 43 imports the feature maps 53-1, 53-2, 53-3, 53-4, 53-5, ... output by the feature map generation unit 42 and clusters the imported feature maps 53-1, 53-2, 53-3, 53-4, 53-5, ... using, for example, a Gaussian Mixture Model (GMM). For example, the classification unit 43 plots the feature maps 53-1, 53-2, 53-3, 53-4, 53-5, ... in a vector space, as shown in the scatter diagram 60 of Figure 4. In the scatter diagram 60, the marks "◯", "☆", "◇", "□", and "△" represent the plot results of the feature maps 53-1, 53-2, 53-3, 53-4, 53-5, ....

上記したように、１つの天気の種類に対して、複数の画像データが天気画像データ記憶部４１に記憶されているので、例えば、雨の特徴マップは、特徴マップ５３－１以外にも複数存在することになる。散布図６０では、特徴マップ５３－１を含む雨の特徴マップの各々が示す位置が、「〇」のマークで示されている。同様に、特徴マップ５３－２を含む霧の特徴マップの各々が示す位置は、「☆」のマークで示されている。特徴マップ５３－３を含む雪の特徴マップの各々が示す位置は、「◇」のマークで示されている。特徴マップ５３－４を含む曇りの特徴マップの各々が示す位置は、「□」のマークで示されている。特徴マップ５３－５を含む薄曇りの特徴マップの各々が示す位置は、「△」のマークで示されている。 As mentioned above, multiple image data items are stored in the weather image data storage unit 41 for each weather type, so for example, there will be multiple rain feature maps in addition to feature map 53-1. In the scatter plot 60, the positions indicated by each rain feature map, including feature map 53-1, are indicated with a "◯" mark. Similarly, the positions indicated by each fog feature map, including feature map 53-2, are indicated with a "☆" mark. The positions indicated by each snow feature map, including feature map 53-3, are indicated with a "◇" mark. The positions indicated by each cloudy feature map, including feature map 53-4, are indicated with a "□" mark. The positions indicated by each lightly cloudy feature map, including feature map 53-5, are indicated with a "△" mark.

ベクトル空間内にプロットした際には、プロットした点の各々は、「〇」、「☆」、「◇」、「□」、「△」のように天気の種類ごとに分類されていないが、分類部４３が、クラスタリングを行うことにより、プロットした点の各々が天気の種類ごとに分類されることになる。なお、図４に示す散布図６０は、理解を容易にするために、一例として、２次元のベクトル空間においてクラスタリングが行われている例を示しているが、クラスタリングの対象となる特徴マップ５３－１，５３－２，５３－３，５３－４，５３－５，…は、３次元配列のデータである。そのため、分類部４３によるクラスタリングは、２次元を超える多次元のベクトル空間において行われる場合もある。分類部４３は、クラスタリングした結果を示すデータを検出部４４に出力する（ステップＳａ２）。 When plotted in vector space, the plotted points are not classified by weather type, such as "◯," "☆," "◇," "□," or "△." However, by performing clustering, the classification unit 43 classifies the plotted points into weather types. Note that, for ease of understanding, the scatter plot 60 shown in FIG. 4 shows an example in which clustering is performed in a two-dimensional vector space. However, the feature maps 53-1, 53-2, 53-3, 53-4, 53-5, ... that are the subject of clustering are three-dimensional array data. Therefore, clustering by the classification unit 43 may also be performed in a multidimensional vector space with more than two dimensions. The classification unit 43 outputs data indicating the clustering results to the detection unit 44 (step Sa2).

検出部４４は、分類部４３が出力するクラスタリングした結果を示すデータを取り込み、取り込んだデータに基づいて、各クラスタの中心特徴ベクトル５４－１，５４－２，５４－３，５４－４，５４－５を検出する。検出部４４は、検出した中心特徴ベクトル５４－１，５４－２，５４－３，５４－４，５４－５の各々を示すデータを合成部４５に出力する（ステップＳａ３）。合成部４５は、検出部４４が出力する中心特徴ベクトル５４－１，５４－２，５４－３，５４－４，５４－５の各々を示すデータを取り込み、取り込んだデータを合成して「雨」、「霧」、「雪」、「曇り」、「薄曇り」の５種類の天気の特徴を一括して表すモデルであって３次元配列のデータである天気特徴モデル７０を生成する。ここで、中心特徴ベクトル５４－１，５４－２，５４－３，５４－４，５４－５の各々を示すデータを合成する処理とは、例えば、中心特徴ベクトル５４－１，５４－２，５４－３，５４－４，５４－５の各々の内積を算出する処理である（ステップＳａ４）。合成部４５は、生成した天気特徴モデル７０のデータを天気特徴モデル記憶部１２に書き込んで記憶させる（ステップＳａ５）。 The detection unit 44 imports data indicating the clustering results output by the classification unit 43 and detects central feature vectors 54-1, 54-2, 54-3, 54-4, and 54-5 of each cluster based on the imported data. The detection unit 44 outputs data indicating each of the detected central feature vectors 54-1, 54-2, 54-3, 54-4, and 54-5 to the synthesis unit 45 (step Sa3). The synthesis unit 45 imports data indicating each of the central feature vectors 54-1, 54-2, 54-3, 54-4, and 54-5 output by the detection unit 44 and synthesizes the imported data to generate a weather feature model 70, which is a three-dimensional array of data and collectively represents the characteristics of five types of weather: "rain," "fog," "snow," "cloudy," and "partly cloudy." Here, the process of synthesizing the data representing each of the central feature vectors 54-1, 54-2, 54-3, 54-4, and 54-5 is, for example, the process of calculating the inner product of each of the central feature vectors 54-1, 54-2, 54-3, 54-4, and 54-5 (step Sa4). The synthesizing unit 45 writes and stores the data for the generated weather feature model 70 in the weather feature model storage unit 12 (step Sa5).

（第１の実施形態における物体検出の処理）
図５、図６を参照しつつ、第１の実施形態の画像物体検出装置１による物体検出の処理について説明する。図５に示す処理が開始される前に、天気特徴モデル生成部１１により天気特徴モデル７０が生成され、天気特徴モデル記憶部１２には、天気特徴モデル生成部１１によって生成された天気特徴モデル７０のデータが書き込まれる。ＣＮＮ層２１、ＲＰＮ層２２、全結合層３２，３３－１，３３－２、及びＣＮＮ層３４，３５を構成するニューラルネットワークの各々のニューロンには、学習済みの重みとバイアスとが適用される。 (Object detection process in the first embodiment)
The object detection process by the image object detection device 1 of the first embodiment will be described with reference to Figures 5 and 6. Before the process shown in Figure 5 starts, the weather feature model generation unit 11 generates a weather feature model 70, and data of the weather feature model 70 generated by the weather feature model generation unit 11 is written to the weather feature model storage unit 12. Learned weights and biases are applied to each neuron of the neural networks that make up the CNN layer 21, the RPN layer 22, the fully connected layers 32, 33-1, 33-2, and the CNN layers 34, 35.

特徴マップ生成部１４は、画像データ記憶部１３から画像データを読み出し、読み出した画像データに対してＣＮＮ層２１により畳み込み演算を行って特徴マップ５０を生成する。特徴マップ生成部１４は、生成した特徴マップ５０を結合部１５と、物体検出部１６とに出力する（ステップＳｂ１）。結合部１５は、特徴マップ生成部１４が出力する特徴マップ５０を取り込む。結合部１５は、天気特徴モデル記憶部１２から天気特徴モデル７０のデータを読み出す。結合部１５は、特徴マップ５０と、天気特徴モデル７０とを結合する処理、すなわち、次式（３）により示される因果介入の結合の処理を行う。 The feature map generation unit 14 reads image data from the image data storage unit 13 and performs a convolution operation on the read image data using the CNN layer 21 to generate a feature map 50. The feature map generation unit 14 outputs the generated feature map 50 to the combination unit 15 and the object detection unit 16 (step Sb1). The combination unit 15 imports the feature map 50 output by the feature map generation unit 14. The combination unit 15 reads data of the weather feature model 70 from the weather feature model storage unit 12. The combination unit 15 performs a process of combining the feature map 50 and the weather feature model 70, i.e., a process of combining the causal interventions shown in the following equation (3).

式（３）において、左辺のサーカムフレックス付きのＦは、結合部１５が生成する特徴マップ５１を示している。右辺の第１項の式は、特徴マップ５０の転置（Ｆ^Ｔ）と、天気特徴モデル７０（Ｗ）との積、すなわち３次元配列同士の積に対してソフトマックス関数を適用する式である。ここで、特徴マップ５０の転置とは、以下のようにして、特徴マップ５０の要素を入れ替えることである。すなわち、特徴マップ５０が、Ｘ×Ｙ×ｃの３次元配列のデータであって、ｃ個のチャンネルごとのＸとＹの２次元配列のデータの各々の要素を（ｘ，ｙ）で表すとする。ただし、ｘ＝１～Ｘであり、ｙ＝１～Ｙである。特徴マップ５０の転置とは、特徴マップ５０において、チャンネル方向を維持したまま、チャンネルごとの２次元配列のデータの（ｘ，ｙ）の要素を（ｙ，ｘ）の要素になるように要素を入れ替えることである。 In equation (3), the circumflex F on the left side represents the feature map 51 generated by the combiner 15. The first term on the right side is an equation that applies a softmax function to the product of the transpose of the feature map 50 (F ^T ) and the weather feature model 70 (W), i.e., the product of two three-dimensional arrays. Here, transposing the feature map 50 means swapping the elements of the feature map 50 as follows. Specifically, assume that the feature map 50 is a three-dimensional array of data (X×Y×c), and each element of the two-dimensional array of data (X and Y) for each of the c channels is represented by (x, y), where x = 1 to X and y = 1 to Y. Transposing the feature map 50 means swapping the elements (x, y) of the two-dimensional array data for each channel in the feature map 50 so that the elements become (y, x) while maintaining the channel direction.

式（３）の右辺の第１項の式により、ｈ＝ｃの場合、言い換えると、天気特徴モデル７０（すなわち、天気特徴モデルＷ）のチャンネル数ｈが、特徴マップ５０（すなわち、特徴マップＦ）のチャンネル数ｃに一致している場合、特徴マップ５０と同一サイズ及び同一チャンネル数の３次元配列のデータであって、各チャンネルにおいて天気の影響により品質が低下している要素の値が大きな値になる３次元配列のデータが得られることになる。例えば、図６に示すように特徴マップ５０が、４×４の２次元配列のデータを、チャンネル数「ｃ」に一致する個数、重ね合わせた３次元配列のデータであり、天気特徴モデル７０が、４×４の２次元配列のデータを、チャンネル数「ｈ」に一致する個数、重ね合わせた３次元配列のデータであるとする。 According to the first term on the right-hand side of equation (3), when h = c, in other words, when the number of channels h of the weather feature model 70 (i.e., weather feature model W) matches the number of channels c of the feature map 50 (i.e., feature map F), a three-dimensional array of data of the same size and number of channels as the feature map 50 is obtained, where the values of elements in each channel whose quality has deteriorated due to the effects of weather are large. For example, as shown in FIG. 6, suppose that the feature map 50 is a three-dimensional array of data in which 4x4 two-dimensional array data are superimposed in a number equal to the number of channels "c," and the weather feature model 70 is a three-dimensional array of data in which 4x4 two-dimensional array data are superimposed in a number equal to the number of channels "h."

ｈ＝ｃである場合、結合部１５は、特徴マップ５０を転置した３次元配列のデータと、天気特徴モデル７０の３次元配列のデータとの積を算出することにより、特徴マップ５０と同一サイズ及び同一チャンネル数、すなわち４×４×ｃの３次元配列のデータが得られることになる。結合部１５は、算出した３次元配列のデータに対して、チャンネルごとの１６個の要素の値に対してソフトマックス関数を適用し、ソフトマックス関数の出力値の各々を１６個の要素の値とする。これにより、１６個の要素の値を合計すると「１」になり、特徴マップ５０において天気の影響により品質が低下している要素の値が大きな値になっているｃ個の２次元配列のデータが得られることになる。結合部１５は、ｃ個の２次元配列のデータをチャンネル方向に重ね合わせることにより、特徴マップ５０と同一サイズ及び同一チャンネル数の３次元配列のデータ７１を生成する。 When h = c, the combiner 15 calculates the product of the three-dimensional array data obtained by transposing the feature map 50 and the three-dimensional array data of the weather feature model 70, thereby obtaining three-dimensional array data of the same size and number of channels as the feature map 50, i.e., 4 x 4 x c. The combiner 15 applies a softmax function to the 16 element values for each channel of the calculated three-dimensional array data, and sets each of the output values of the softmax function to the values of the 16 elements. As a result, the sum of the values of the 16 elements is "1," and c pieces of two-dimensional array data are obtained in which elements in the feature map 50 whose quality has deteriorated due to the effects of weather have large values. The combiner 15 overlays the c pieces of two-dimensional array data in the channel direction, thereby generating three-dimensional array data 71 of the same size and number of channels as the feature map 50.

式（３）の右辺の「×」は、チャンネルごとのアダマール積を示す演算であり、結合部１５は、第１式の結果として得られる３次元配列のデータ７１と、特徴マップ５０の３次元配列のデータとにおいて、チャンネルごとに、対応する位置の要素の値を乗算する演算を行う。言い換えると、結合部１５は、第１式の結果として得られる３次元配列のデータ７１のｚ番目のチャンネルのｘ行ｙ列の要素の値と、特徴マップ５０のｚ番目のチャンネルのｘ行ｙ列の要素の値とを乗算した値を、特徴マップ５１のｚ番目のチャンネルのｘ行ｙ列のピクセルの値にする。ここで、ｘ、ｙ、ｚは、正の整数であり、図６に示す例の場合、ｘ＝１～４であり、ｙ＝１～４であり、ｚ＝１～ｃである。これにより、特徴マップ５０において、天気の影響により品質が低下した要素の値が強調され、天気による影響が抑制された特徴マップ５１が得られることになる。 The "x" on the right-hand side of equation (3) is an operation indicating the Hadamard product for each channel, and the combination unit 15 performs an operation to multiply, for each channel, the values of elements at corresponding positions in the three-dimensional array data 71 obtained as a result of equation 1 and the three-dimensional array data of the feature map 50. In other words, the combination unit 15 multiplies the value of the element at row x and column y of the zth channel of the three-dimensional array data 71 obtained as a result of equation 1 by the value of the element at row x and column y of the zth channel of the feature map 50, and sets the result as the value of the pixel at row x and column y of the zth channel of the feature map 51. Here, x, y, and z are positive integers. In the example shown in FIG. 6, x = 1 to 4, y = 1 to 4, and z = 1 to c. As a result, the values of elements in the feature map 50 whose quality has deteriorated due to the effects of weather are emphasized, resulting in a feature map 51 in which the effects of weather are suppressed.

これに対して、天気特徴モデル７０のチャンネル数ｈが、ｈ＝ｃでない場合、上記の手順では、特徴マップ５０と、天気特徴モデル７０との積を算出することができない。ｈ＝ｃでない場合に、結合部１５が特徴マップ５１を生成する手順は、以下に示すような手順になる。特徴マップ５０のチャンネルごとの天気の特徴を含む部分をＷＦとする（以下、天気特徴部分ＷＦという）。ここで、天気特徴部分ＷＦは、次式（４）により定義される２次元配列のデータ、言い換えると、３次元配列のデータの１チャンネル分のデータであり、特徴マップ５０が、図６に示すように４×４×ｃの３次元配列のデータである場合、天気特徴部分ＷＦは、４×４×１の３次元配列のデータになる。 On the other hand, if the number of channels h of the weather feature model 70 is not h = c, the above procedure cannot calculate the product of the feature map 50 and the weather feature model 70. When h = c is not true, the procedure by which the combiner 15 generates the feature map 51 is as follows. The portion of the feature map 50 containing the weather features for each channel is designated WF (hereinafter referred to as the weather feature portion WF). Here, the weather feature portion WF is a two-dimensional array of data defined by the following equation (4), in other words, one channel's worth of three-dimensional array data. If the feature map 50 is a 4 x 4 x c three-dimensional array of data as shown in Figure 6, the weather feature portion WF will be a 4 x 4 x 1 three-dimensional array of data.

結合部１５は、次式（５）によりｃチャンネル分の天気特徴部分ＷＦを算出する。 The combining unit 15 calculates the weather feature portion WF for c channels using the following equation (5):

式（５）において、ｊ＝１～ｃであり、ＷＦ_ｊは、ｃチャンネルを有する特徴マップ５０のｊ番目のチャンネルの天気特徴部分ＷＦである。ｉは、１～ｈであり、ｍ_ｊ，ｉは、ｊ番目のｍ_ｉを示しており、ｍ_ｉは、次式（６）によって算出することができる。なお、次式（６）において、右辺の「Ｆ」は、特徴マップ５０（すなわち、特徴マップＦ）のｊ番目のチャンネルを示している。 In equation (5), j = 1 to c, and WF _j is the weather feature part WF of the j-th channel of the feature map 50 having c channels. i ranges from 1 to h, and m _j,i _indicates the j-th m _i , which can be calculated by the following equation (6). In equation (6), "F" on the right side indicates the j-th channel of the feature map 50 (i.e., feature map F).

式（５）及び式（６）において、「Ｗ_・，ｉ」は、天気特徴モデル７０（すなわち、天気特徴モデルＷ）のｉ番目のチャンネルの特徴量である。ｍ_ｉは、次式（７）によって定義されるため、ｍ_ｊ，ｉ、すなわち行列ｍは、次式（８）によって定義される。 In equations (5) and (6), "W _,i " is the feature value of the i-th channel of the weather feature model 70 (i.e., the weather feature model W). Since _mi is defined by the following equation (7), mj _,i , i.e., matrix m, is defined by the following equation (8).

式（８）に示すように、行列ｍは、天気特徴モデル７０のｈ個の特徴量と、特徴マップ５０のｃ個の特徴量の相関を示すことになる。すなわち、式（６）は、物体検出の対象の画像データの特徴マップ５０（すなわち、特徴マップＦ）のｊ番目の特徴と、全ての天気の種類の各々の中心特徴ベクトルを合成して得られる天気特徴モデル７０（すなわち、天気特徴モデルＷ）との類似度として相関行列ｍ_ｉを算出する式ということになる。この類似度を、特徴マップ５０のｃ個のチャンネルの各々について算出することにより、相関行列ｍ_ｊ，ｉが得られることになる。この相関行列ｍ_ｊ，ｉを天気の特徴を示す天気特徴の重みとして、式（５）により、天気特徴の重み付き和を算出し、天気の影響をうける特徴マップ５０のｊ番目の特徴を示すＷＦ_ｊを算出する。特徴マップ５０のチャンネル数はｃであるため、結合部１５は、ＷＦ_ｊをｃ個算出することになる。 As shown in equation (8), matrix m indicates the correlation between the h feature quantities of the weather feature model 70 and the c feature quantities of the feature map 50. That is, equation (6) is an equation for calculating a correlation matrix m i as the similarity between the jth feature of the feature map 50 (i.e., feature map _F ) of the image data of the object detection target and the weather feature model 70 (i.e., weather feature model W) obtained by combining the central feature vectors of all weather types. By calculating this similarity for each of the c channels of the feature map 50, a correlation matrix m _j,i is obtained. Using this correlation matrix m _j,i as the weight of the weather feature indicating the weather characteristics, a weighted sum of the weather features is calculated using equation (5), and WF _j , which indicates the jth feature of the feature map 50 that is influenced by the weather, is calculated. Because the number of channels of the feature map 50 is c, the combining unit 15 calculates c WF _j .

結合部１５は、特徴マップ５０（すなわち、特徴マップＦ）と、算出したｃ個の天気特徴部分ＷＦ_ｊ（ｊ＝１～ｃ）をチャンネル方向に重ね合わせた３次元配列のデータとよって、ｃ個のチャンネルごとに次式（９）の右辺に示す残差計算を行って、特徴マップ５０から天気の影響を取り除いて左辺に示す特徴マップ５１を生成する。 The combining unit 15 uses the feature map 50 (i.e., feature map F) and a three-dimensional array of data in which the calculated c weather feature parts WF _j (j = 1 to c) are superimposed in the channel direction to perform residual calculations shown on the right side of the following equation (9) for each of the c channels, and removes the influence of weather from the feature map 50 to generate the feature map 51 shown on the left side.

結合部１５は、生成した特徴マップ５１を物体検出部１６に出力する（ステップＳｂ２）。なお、上記のｈ＝ｃでない場合の手順は、ｈ＝ｃの場合にも適用することが可能である。 The combining unit 15 outputs the generated feature map 51 to the object detection unit 16 (step Sb2). Note that the above procedure for when h = c is not true can also be applied when h = c.

物体検出部１６は、特徴マップ生成部１４が出力する特徴マップ５０と、結合部１５が出力する特徴マップ５１とを取り込んで、以下に示す物体検出処理を行う。ＲＰＮ層２２は、特徴マップ５０を取り込み、物体が存在する位置を検出する。ＲｏＩ－Ａｌｉｇｎ層３１は、特徴マップ５１と、ＲＰＮ層２２の出力とを取り込み、取り込んだＲＰＮ層２２の出力に基づいて、特徴マップ５１において物体が存在する部分を抽出する。ＲｏＩ－Ａｌｉｇｎ層３１は、抽出した部分のデータに対してＰｏｏｌｉｎｇ処理を行い、Ｐｏｏｌｉｎｇ処理を行った結果を示すデータを全結合層３２と、ＣＮＮ層３４とに出力する。 The object detection unit 16 takes in the feature map 50 output by the feature map generation unit 14 and the feature map 51 output by the combination unit 15, and performs the object detection process described below. The RPN layer 22 takes in the feature map 50 and detects the location of an object. The RoI-Align layer 31 takes in the feature map 51 and the output of the RPN layer 22, and extracts the portion of the feature map 51 where the object is located based on the output of the RPN layer 22. The RoI-Align layer 31 performs pooling processing on the data of the extracted portion, and outputs data indicating the results of the pooling processing to the fully connected layer 32 and the CNN layer 34.

全結合層３２が、ＲｏＩ－Ａｌｉｇｎ層３１が出力するデータを取り込むと、全結合層３３－１は、取り込んだデータが示す領域に含まれる物体のクラスを示すデータを出力し、全結合層３３－２は、取り込んだデータが示す領域に含まれる物体が存在する位置と範囲を示すデータを出力する。ＣＮＮ層３４が、ＲｏＩ－Ａｌｉｇｎ層３１が出力するデータを取り込むと、ＣＮＮ層３５は、取り込んだデータが示す領域に含まれる物体の部分に対して適用するマスクの種類を示すデータを出力する（ステップＳｂ３）。 When the fully connected layer 32 takes in the data output by the RoI-Align layer 31, the fully connected layer 33-1 outputs data indicating the class of object contained in the area indicated by the taken-in data, and the fully connected layer 33-2 outputs data indicating the position and range of the object contained in the area indicated by the taken-in data. When the CNN layer 34 takes in the data output by the RoI-Align layer 31, the CNN layer 35 outputs data indicating the type of mask to apply to the portion of the object contained in the area indicated by the taken-in data (step Sb3).

特徴マップ生成部１４が、画像データ記憶部１３から読み出した画像データに対して、物体のクラスを示すデータ、物体が存在する位置と範囲を示すデータ、物体に対して適用するマスクの種類を示すデータを適用することにより、例えば、画像データに表示されている物体の各々が、バウンディングボックスで囲まれると共に、バウンディングボックスに関連付けられて物体のクラスを示すデータが表示され、更に、各々の物体の部分が異なる色でマスクされることになる。 The feature map generation unit 14 applies data indicating the object class, data indicating the position and range in which the object exists, and data indicating the type of mask to be applied to the object to the image data read from the image data storage unit 13. As a result, for example, each object displayed in the image data is surrounded by a bounding box, data indicating the object class is displayed in association with the bounding box, and further, parts of each object are masked in different colors.

上記の第１の実施形態の画像物体検出装置１において、天気特徴モデル生成部１１は、画像に表示される天気の特徴を示す天気特徴モデル７０を生成する。特徴マップ生成部１４は、物体検出対象の画像データから特徴マップ５０を生成する。結合部１５は、特徴マップ生成部１４が生成した特徴マップ５０に対して、天気特徴モデル生成部１１が生成した天気特徴モデル７０を因果介入する結合を行うことにより、天気による影響を抑制した特徴マップ５１を生成する。物体検出部１６は、結合部１５が生成する特徴マップ５１から物体検出を行う。天気特徴モデル生成部１１が生成する天気特徴モデル７０は、様々な種類の天気の状態で撮影された画像から抽出される特徴量に含まれる天気に固有の情報を、天気の種類ごとではなく、様々な天気の種類を一括して表現するモデルであり、別の言い方をすると、様々な天気の種類を網羅して表現する汎用的なモデルということができる。このような特徴を有する天気特徴モデル７０を、物体検出対象の画像データの特徴マップ５０に対して因果介入する結合を結合部１５によって行うことにより、天気による影響が抑制された特徴マップ５１が得られることになる。そのため、物体検出部１６は、天気による影響が抑制された特徴マップ５１に対して物体検出の処理を行うことにより、物体検出対象の画像データが、良好でない天気の状態で撮影された画像データであっても、良好な天気の状態の場合と同等の精度で物体を検出することができる。天気特徴モデル７０は、様々な天気の種類に対して汎用的なモデルになっているため、物体検出対象の画像データが、いずれかの種類の天気による影響によって破損している場合でも、破損の内容や破損の程度に関わらず、任意の種類の天気による影響を抑制することができる。したがって、天気特徴モデル７０を適用することにより、物体検出のロバスト性を向上させることが可能になる。 In the image object detection device 1 of the first embodiment described above, the weather feature model generation unit 11 generates a weather feature model 70 that indicates the features of the weather displayed in the image. The feature map generation unit 14 generates a feature map 50 from image data of the object detection target. The combination unit 15 causally combines the weather feature model 70 generated by the weather feature model generation unit 11 with the feature map 50 generated by the feature map generation unit 14 to generate a feature map 51 that suppresses the influence of weather. The object detection unit 16 performs object detection from the feature map 51 generated by the combination unit 15. The weather feature model 70 generated by the weather feature model generation unit 11 is a model that represents weather-specific information contained in features extracted from images captured under various weather conditions, not for each weather type, but for all weather types collectively. In other words, it can be said to be a general-purpose model that comprehensively represents a variety of weather types. By causally combining the weather feature model 70 having such characteristics with the feature map 50 of the image data of the object detection target by the combining unit 15, a feature map 51 in which the influence of weather is suppressed is obtained. Therefore, by performing object detection processing on the feature map 51 in which the influence of weather is suppressed, the object detection unit 16 can detect objects with the same accuracy as in good weather, even if the image data of the object detection target was captured in poor weather conditions. Because the weather feature model 70 is a general-purpose model for various weather types, even if the image data of the object detection target is corrupted by the influence of any type of weather, the influence of any type of weather can be suppressed regardless of the type or extent of the corruption. Therefore, applying the weather feature model 70 makes it possible to improve the robustness of object detection.

上記の第１の実施形態において分類部４３は、混合ガウスモデルによってクラスタリングを行っているが、例えば、Ｋ－ｍｅａｎｓ法などの他のクラスタリング手法によってクラスタリングを行うようにしてもよい。 In the first embodiment described above, the classification unit 43 performs clustering using a Gaussian mixture model, but clustering may also be performed using other clustering methods, such as the K-means method.

上記の第１の実施形態では、天気画像データ記憶部４１に記憶される画像データの天気の種類として「雨」、「霧」、「雪」、「曇り」、「薄曇り」という５つの種類を一例として示しているが、少なくとも２種類の異なる天気の種類の画像データが、天気画像データ記憶部４１に記憶されていればよい。「雨」、「霧」、「雪」、「曇り」、「薄曇り」以外の天気の状態で撮影された画像データが、天気画像データ記憶部４１に記憶されていてもよく、この場合、検出部４４は、分類部４３が行うクラスタリングによって生成されるクラスタの各々、すなわち、天気の種類の各々に対応する中心特徴ベクトル５４－１，５４－２，５４－３，５４－４，５４－５，…を検出することになる。 In the first embodiment described above, five weather types, "rain," "fog," "snow," "cloudy," and "slightly cloudy," are shown as examples of weather types for image data stored in the weather image data storage unit 41. However, it is sufficient if image data for at least two different weather types is stored in the weather image data storage unit 41. Image data captured in weather conditions other than "rain," "fog," "snow," "cloudy," and "slightly cloudy" may also be stored in the weather image data storage unit 41. In this case, the detection unit 44 detects each of the clusters generated by the clustering performed by the classification unit 43, i.e., the central feature vectors 54-1, 54-2, 54-3, 54-4, 54-5, ... corresponding to each weather type.

（第２の実施形態）
図７は、第２の実施形態による画像物体検出装置１ａの構成を示すブロック図である。なお、第２の実施形態において、第１の実施形態と同一の構成については、同一の符号を付し、以下、異なる構成について説明する。画像物体検出装置１ａは、天気特徴モデル生成部１１ａ、天気特徴モデル記憶部１２、画像データ記憶部１３、特徴マップ生成部１４、結合部１５、及び物体検出部１６を備える。 Second Embodiment
7 is a block diagram showing the configuration of an image object detection device 1a according to the second embodiment. In the second embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and only the different components will be described below. The image object detection device 1a includes a weather feature model generation unit 11a, a weather feature model storage unit 12, an image data storage unit 13, a feature map generation unit 14, a combination unit 15, and an object detection unit 16.

第１の実施形態の天気特徴モデル生成部１１は、様々な種類の天気の画像データ、すなわち、雨の画像データ５２－１、霧の画像データ５２－２、雪の画像データ５２－３、曇りの画像データ５２－４、薄曇りの画像データ５２－５，…から天気特徴モデル７０を生成していた。ところで、雨や霧などの天気の状態で撮影を行った場合、撮影により得られた画像において、天気による影響は、画像の各ピクセルに一様に分布していることが観察される。このことを踏まえて、第２の実施形態の天気特徴モデル生成部１１ａは、実際に撮影した天気の画像データから天気特徴モデル７０を生成する代わりに、各ピクセルに雑音が一様に現れるランダム雑音の画像を生成し、生成したランダム雑音の画像から天気特徴モデルを生成する。ここで、説明の便宜上、第２の実施形態における天気特徴モデルを、符号７０ａを付して、以下、天気特徴モデル７０ａという。 In the first embodiment, the weather feature model generation unit 11 generated the weather feature model 70 from image data of various types of weather, i.e., rain image data 52-1, fog image data 52-2, snow image data 52-3, cloudy image data 52-4, slightly cloudy image data 52-5, etc. When photographing in rainy or foggy weather conditions, it is observed that the effects of the weather are uniformly distributed across each pixel in the photographed image. Based on this, the weather feature model generation unit 11a in the second embodiment generates a random noise image in which noise appears uniformly across each pixel, rather than generating the weather feature model 70 from image data of actually photographed weather, and generates a weather feature model from the generated random noise image. For ease of explanation, the weather feature model in the second embodiment is designated by the symbol 70a and will hereinafter be referred to as the weather feature model 70a.

（第２の実施形態の天気特徴モデル生成部による処理）
図８は、第２の実施形態の天気特徴モデル生成部１１ａによる処理の流れを示すフローチャートである。天気特徴モデル生成部１１ａは、特徴マップ５０の各チャンネルの２次元配列のデータと同一サイズのランダム雑音の２次元画像データを、特徴マップ５０のチャンネル数分、すなわち「ｃ」個、生成する（ステップＳｃ１）。天気特徴モデル生成部１１ａは、生成した「ｃ」個のランダム雑音の２次元画像データをチャンネル方向に重ね合わせた３次元配列のデータを天気特徴モデル７０ａのデータとして生成する（ステップＳｃ２）。 (Processing by the weather feature model generation unit of the second embodiment)
8 is a flowchart showing the flow of processing by the weather feature model generation unit 11a of the second embodiment. The weather feature model generation unit 11a generates "c" pieces of two-dimensional image data of random noise, each of which has the same size as the two-dimensional array data of each channel of the feature map 50 (step Sc1). The weather feature model generation unit 11a generates three-dimensional array data by superimposing the "c" pieces of generated two-dimensional image data of random noise in the channel direction as data for the weather feature model 70a (step Sc2).

なお、ランダム雑音は、１つの２次元画像において一様に雑音が分布している必要がある。そのため、天気特徴モデル生成部１１ａは、２次元画像データの各ピクセルの画素値が正規分布の乱数になるように２次元画像データを生成する。ランダム雑音の具体例として、例えば、正規分布の不規則な雑音であるガウシアンノイズなどを適用することができ、特に、白色ガウシアンノイズとするのが望ましい。天気特徴モデル７０ａは、次式（１０）により定義される。 Note that random noise must be uniformly distributed within a single two-dimensional image. Therefore, the weather feature model generation unit 11a generates two-dimensional image data so that the pixel values of each pixel in the two-dimensional image data are normally distributed random numbers. Specific examples of random noise include Gaussian noise, which is irregular noise with a normally distributed distribution, and white Gaussian noise is particularly desirable. The weather feature model 70a is defined by the following equation (10):

式（１０）から分かるように、第１の実施形態の天気特徴モデル７０と、第２の実施形態の天気特徴モデル７０ａとは、共に同一の次元数であるｄ次元の実数空間の要素であるが、チャンネル数が異なる３次元配列のデータである。第２の実施形態の天気特徴モデル７０ａは、特徴マップ５０と同一のサイズ及び同一のチャンネル数のｄ×ｃ次元実数空間の要素として表される３次元配列のデータである。天気特徴モデル生成部１１ａは、生成した天気特徴モデル７０ａのデータを天気特徴モデル記憶部１２に書き込んで記憶させる（ステップＳｃ３）。 As can be seen from equation (10), the weather feature model 70 of the first embodiment and the weather feature model 70a of the second embodiment are both elements of a d-dimensional real number space, which is the same number of dimensions, but are three-dimensional array data with different numbers of channels. The weather feature model 70a of the second embodiment is three-dimensional array data expressed as elements of a dxc-dimensional real number space with the same size and number of channels as the feature map 50. The weather feature model generation unit 11a writes and stores the data of the generated weather feature model 70a in the weather feature model storage unit 12 (step Sc3).

第２の実施形態における物体検出の処理は、図５に示す第１の実施形態の物体検出の処理と同一の処理が行われる。ただし、特徴マップ生成部１４が生成する特徴マップ５０に対して、結合部１５が因果介入の結合を行う天気特徴モデルは、第２の実施形態の天気特徴モデル７０ａである。これにより、第２の実施形態においても、第１の実施形態と同様に、物体検出の対象の画像データから得られる特徴マップ５０において天気による影響の抑制した上で物体検出を行うことが可能になる。 The object detection process in the second embodiment is the same as the object detection process in the first embodiment shown in Figure 5. However, the weather feature model to which the combining unit 15 performs causal combination on the feature map 50 generated by the feature map generating unit 14 is the weather feature model 70a of the second embodiment. As a result, in the second embodiment, as in the first embodiment, it is possible to perform object detection while suppressing the influence of weather in the feature map 50 obtained from image data of the object detection target.

上記の第２の実施形態において、天気特徴モデル生成部１１ａは、ｃ個とは異なる数のｈ個のランダム雑音を生成し、生成したｈ個のランダム雑音から天気特徴モデル７０ａを生成するようにしてもよい。この場合、結合部１５は、第１の実施形態において説明したｈ＝ｃでない場合の手順により、特徴マップ５０と、天気特徴モデル７０ａとから特徴マップ５１を生成することになる。 In the second embodiment described above, the weather feature model generation unit 11a may generate h random noises, a number different from c, and generate the weather feature model 70a from the generated h random noises. In this case, the combination unit 15 generates the feature map 51 from the feature map 50 and the weather feature model 70a using the procedure for when h = c, as described in the first embodiment.

上記の第１及び第２の実施形態では、物体検出部１６の具体例として、ｍａｓｋＲ－ＣＮＮを適用した例を示している。これに対して、物体検出部１６として、例えば、ＦａｓｔｅｒＲ－ＣＮＮやＹＯＬＯ（You Only Look Once）などの深層ニューラルネットワークを用いた他の物体検出の手法を適用するようにしてもよい。他の物体検出の手法を適用する場合も、ｍａｓｋＲ－ＣＮＮを適用した例と同様に、特徴マップ５０に対応する特徴マップが出力として得られる箇所に、結合部１５を挿入して、特徴マップ５０に天気特徴モデル７０，７０ａを結合する構成になる。 In the first and second embodiments described above, an example is shown in which mask R-CNN is applied as a specific example of the object detection unit 16. However, other object detection methods using deep neural networks, such as Faster R-CNN or YOLO (You Only Look Once), may also be applied to the object detection unit 16. When applying other object detection methods, similar to the example in which mask R-CNN is applied, a combining unit 15 is inserted at a location where a feature map corresponding to the feature map 50 is obtained as an output, and the weather feature models 70, 70a are combined with the feature map 50.

上記の第１及び第２の実施形態において、天気画像データ記憶部４１と、画像データ記憶部１３とが記憶する画像データは、ＲＧＢのカラーの画像データであるとしている。これに対して、天気画像データ記憶部４１と、画像データ記憶部１３とが記憶する画像データは、ＣＭＹＫのカラーの画像データであってもよいし、グレースケールの画像データであってもよい。天気画像データ記憶部４１と、画像データ記憶部１３とが記憶する画像データは、同一のカメラによって撮影された画像データであってもよいし、異なるカメラによって撮影された画像データであってもよい。ただし、一方が、ＲＧＢのカラーの画像データであって、他方がグレースケールの画像データであるといったドメインの違いが生じないように両方の画像データのドメインが同一になるように撮影する必要がある。 In the first and second embodiments described above, the image data stored in the weather image data storage unit 41 and the image data storage unit 13 is RGB color image data. In contrast, the image data stored in the weather image data storage unit 41 and the image data storage unit 13 may be CMYK color image data or grayscale image data. The image data stored in the weather image data storage unit 41 and the image data storage unit 13 may be image data captured by the same camera, or may be image data captured by different cameras. However, the images must be captured so that the domains of both image data are the same, so that there is no difference in domain, such as one being RGB color image data and the other being grayscale image data.

上記の第１及び第２の実施形態において、画像データ記憶部１３が記憶する画像データの縦と横のピクセル数は、同一の数であってもよいし、異なる数であってもよい。上記の第１の実施形態において、天気画像データ記憶部４１と、画像データ記憶部１３とが記憶する画像データのサイズは、同一のサイズであっても、異なるサイズであってもよい。天気画像データ記憶部４１が記憶する画像データの縦と横のピクセル数は、同一の数であってもよいし、異なる数であってもよい。ただし、上記したように、特徴マップ５０のチャンネルごとの２次元配列のデータのサイズと、天気特徴モデル７０，７０ａのチャンネルごとの２次元配列のデータのサイズとが一致している必要がある。そのため、特徴マップ５０及び天気特徴モデル７０，７０ａのチャンネルごとの２次元配列のデータのサイズが一致するように、特徴マップ生成部１４が備えるＣＮＮ層２１と、天気特徴モデル生成部１１，１１ａとを構成する必要がある。 In the first and second embodiments described above, the number of vertical and horizontal pixels of the image data stored in the image data storage unit 13 may be the same or different. In the first embodiment described above, the size of the image data stored in the weather image data storage unit 41 and the image data storage unit 13 may be the same or different. The number of vertical and horizontal pixels of the image data stored in the weather image data storage unit 41 may be the same or different. However, as described above, the size of the two-dimensional array data for each channel of the feature map 50 must match the size of the two-dimensional array data for each channel of the weather feature models 70 and 70a. Therefore, the CNN layer 21 provided in the feature map generation unit 14 and the weather feature model generation units 11 and 11a must be configured so that the size of the two-dimensional array data for each channel of the feature map 50 and the weather feature models 70 and 70a match.

上述した実施形態における画像物体検出装置１，１ａをコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The image object detection device 1, 1a in the above-described embodiments may be implemented by a computer. In this case, a program for implementing this function may be recorded on a computer-readable recording medium, and the program may be loaded into a computer system and executed. Note that the term "computer system" herein includes hardware such as an OS and peripheral devices. Furthermore, "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, as well as storage devices such as hard disks built into computer systems. Furthermore, "computer-readable recording medium" may also include media that dynamically store programs for a short period of time, such as communication lines used when transmitting programs over networks such as the Internet or over communication lines such as telephone lines, or media that store programs for a fixed period of time, such as volatile memory within the computer system that serves as the server or client. The program may also be a program that implements some of the above-described functions, or may be a program that can implement the above-described functions in combination with a program already stored in the computer system, or may be implemented using a programmable logic device such as an FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The above describes in detail an embodiment of the present invention with reference to the drawings, but the specific configuration is not limited to this embodiment and includes designs that do not deviate from the gist of the present invention.

１…画像物体検出装置、１１…天気特徴モデル生成部、１２…天気特徴モデル記憶部、１３…画像データ記憶部、１４…特徴マップ生成部、１５…結合部、１６…物体検出部 1... Image object detection device, 11... Weather feature model generation unit, 12... Weather feature model storage unit, 13... Image data storage unit, 14... Feature map generation unit, 15... Combination unit, 16... Object detection unit

Claims

a weather feature model generation unit that generates a weather feature model that indicates the features of the weather displayed in the image;
a feature map generation unit that generates a feature map from image data of an object detection target;
a combining unit that generates a feature map in which the influence of weather is suppressed by causally combining the weather feature model generated by the weather feature model generating unit with the feature map generated by the feature map generating unit;
an object detection unit that performs object detection from the feature map generated by the combination unit;
An image object detection device comprising:

The weather feature model generation unit
generating a feature map from each of a plurality of image data in which weather conditions corresponding to each of a plurality of weather types are captured, clustering the generated feature maps, and generating the weather feature model by combining central feature vectors of each cluster obtained by the clustering;
The image object detection device according to claim 1 , comprising:

The weather feature model generation unit
generating the weather feature model from an image of random noise;
The image object detection device according to claim 1 , comprising:

The random noise is white Gaussian noise.
The image object detection device according to claim 3 .

The coupling portion is
A Hadamard product of the feature map and data obtained by applying a softmax function to the product of the transpose of the feature map and the weather feature model is calculated, thereby causally combining the weather feature model with the feature map.
The image object detection device according to any one of claims 1 to 4.

The coupling portion is
calculating a correlation matrix between each channel of the feature map and each channel of the weather feature model, extracting a portion of the feature map including weather features for each channel based on the calculated correlation matrix and the weather feature model, and performing a residual calculation to remove the weather features from the feature map using the extracted portion including the weather features and the feature map, thereby causally combining the weather feature model with the feature map;
The image object detection device according to any one of claims 1 to 4.

a weather feature model generation step of generating a weather feature model indicating the features of the weather displayed in the image;
a feature map generation step of generating a feature map from image data of an object detection target;
a combining step of generating a feature map in which the influence of weather is suppressed by causally combining the weather feature model generated in the weather feature model generating step with the feature map generated in the feature map generating step;
an object detection step of detecting an object from the feature map generated by the combining step;
An image object detection method comprising:

Computer,
a weather feature model generating means for generating a weather feature model indicating the features of the weather displayed in the image;
a feature map generating means for generating a feature map from image data of an object detection target;
a combining means for generating a feature map in which the influence of weather is suppressed by causally combining the weather feature model generated by the weather feature model generating means with the feature map generated by the feature map generating means;
an object detection means for detecting an object from the feature map generated by the combining means;
A program to function as a