JP7663341B2

JP7663341B2 - DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, DATA PROCESSING PROGRAM, AND LEARNING MODEL GENERATION METHOD

Info

Publication number: JP7663341B2
Application number: JP2020192449A
Authority: JP
Inventors: 智之吉山
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2025-04-16
Anticipated expiration: 2040-11-19
Also published as: JP2022081113A

Description

本発明は、データ処理装置、データ処理方法、データ処理プログラム、及び学習モデル生成方法に係り、特に、対象物を表す対象データに対してデータ処理を行うデータ処理装置、データ処理方法、データ処理プログラム、及び学習モデル生成方法に関する。 The present invention relates to a data processing device, a data processing method, a data processing program, and a learning model generation method, and in particular to a data processing device, a data processing method, a data processing program, and a learning model generation method that perform data processing on target data that represents an object.

画像に撮影されたシーンを自動認識するなどの目的で、画像を、当該画像に撮影されている複数の物体それぞれの領域や複数の部位それぞれの領域に分割すると共に、各領域に撮影されている物体や部位を認識する技術が研究・開発されてきた。以下、撮影されている物体や部位を被写体と呼ぶ。被写体の認識を伴った領域分割はセマンティックセグメンテーションなどと称される。 For purposes such as automatically recognizing a scene captured in an image, technologies have been researched and developed that divide an image into regions for each of multiple objects or multiple body parts captured in the image, and recognize the objects or body parts captured in each region. Hereinafter, the captured objects or body parts are referred to as subjects. Region division that involves subject recognition is called semantic segmentation.

特に、近年では、学習に基づいて上記分割と認識を行う技術が盛んに研究されている。例えば、非特許文献１には、予め被写体ごとに分割された領域の画素ごとに当該被写体を表すクラスを付与した学習用画像を多数用意し、コンピュータにこれらの学習用画像を機械学習させることが記載されている。予め付与する情報はアノテーションなどと称される。この学習によって生成された学習済みモデルに任意の画像を入力すれば当該入力画像に対して画素ごとのクラスが出力される。つまり当該入力画像が被写体ごとに、クラスでラベル付けされた領域（ラベル領域）に分割される。 In particular, in recent years, there has been active research into technologies that perform the above-mentioned segmentation and recognition based on learning. For example, Non-Patent Document 1 describes a method of preparing a large number of learning images in which a class representing the subject is assigned to each pixel in an area that has been segmented in advance for each subject, and having a computer perform machine learning on these learning images. The information that is assigned in advance is called an annotation or the like. If an arbitrary image is input into a trained model generated by this learning, a class for each pixel of the input image is output. In other words, the input image is segmented into areas (label areas) labeled with a class for each subject.

“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)

従来、これらの機械学習では、レンズ歪の少ない透視投影画像からなる学習用画像とそのアノテーションからなるデータセットを用いて機械学習を行う場合が多い（図１１参照）。一方で、透視投影画像は、監視用に使われるような魚眼カメラで撮影される魚眼画像（図２参照）とは物体の映り方が大きく異なっている。従って、透視投影画像によるデータセットで作成した学習済みモデルに対して魚眼画像を入力すると、特に画像の上下と実世界の上下にズレが生じていたり、魚眼画像の左右や下部で、人物を正しく検出できなかったり、床を壁と誤認識したりするなど、著しく精度が低下することがある（図１２参照）。図１２では、図１２（Ａ）に示す魚眼画像に対するセグメンテーション処理結果として、図１２（Ｂ）に示すように、誤ったラベル領域に分割された例を示している。 Conventionally, in these machine learning methods, machine learning is often performed using a dataset consisting of learning images made of perspective projection images with little lens distortion and their annotations (see FIG. 11). On the other hand, perspective projection images show objects in a significantly different way from fisheye images taken by fisheye cameras used for surveillance (see FIG. 2). Therefore, when a fisheye image is input to a trained model created with a dataset of perspective projection images, the accuracy may be significantly reduced, particularly when there is a misalignment between the top and bottom of the image and the top and bottom of the real world, people cannot be detected correctly on the left and right or at the bottom of the fisheye image, or the floor may be mistaken for a wall (see FIG. 12). FIG. 12 shows an example of segmentation into incorrectly labeled regions as shown in FIG. 12(B) as a result of segmentation processing on the fisheye image shown in FIG. 12(A).

このような精度低下を防ぐ方法として、例えば魚眼画像を透視投影画像に変換してから処理する手法が考えられる。この場合、魚眼画像１枚から複数の透視投影画像が得られるので、その全てを処理した結果を統合するために処理時間が増加し、また、結果に不整合が生じた際の統合が難しい、という欠点がある。 One way to prevent this loss of accuracy is, for example, to convert the fisheye image into a perspective projection image before processing it. In this case, multiple perspective projection images are obtained from a single fisheye image, so the processing time increases because all of these images need to be processed and then integrated, and it is also difficult to integrate the results when inconsistencies occur in the results.

また、別の手法として、データセットを３６０度あらゆる方向に回転させたものを学習時に使用することで、学習済みモデルがあらゆる向きの物体を処理できるようにし、魚眼画像を処理できるようにする手法が考えられる。しかしながら、この場合は魚眼画像全体に渡ってあらゆる向きの物体を見つけ出そうとするので、人の上下を間違えるなどの誤報が生じるようになってしまう。 Another approach would be to rotate the dataset 360 degrees in every direction during training, allowing the trained model to process objects in every orientation and therefore fisheye images. However, this would result in false positives, such as mistaking the top and bottom of a person for the bottom, as the system tries to find objects in every orientation across the entire fisheye image.

また、セグメンテーションの推定に畳み込みニューラルネットワークを用いる場合は、画像に写っている対象の傾きに応じて畳み込みに使用するフィルタを回転させることで、魚眼画像に対して高精度で重みを推定できる可能性がある。しかし、畳み込み操作で用いられるフィルタは一般的に幅や高さが小さく、この重みの配列に対して連続的な回転を定義することは難しい。 In addition, when using a convolutional neural network to estimate segmentation, it may be possible to estimate weights with high accuracy for fisheye images by rotating the filter used for convolution according to the tilt of the object in the image. However, the filters used in convolution operations generally have small widths and heights, and it is difficult to define continuous rotations for this weight array.

そこで、本発明は、対象物を表す対象データに対して、対象物の基準方向を考慮してデータ処理を行うことができるデータ処理装置、データ処理方法、データ処理プログラム、及び学習モデル生成方法を提供することを目的とする。 The present invention aims to provide a data processing device, a data processing method, a data processing program, and a learning model generation method that can process target data representing an object while taking into account the reference direction of the object.

上記の目的を達成するために本発明に係るデータ処理装置は、対象物を表す学習用データと前記学習用データ上の、前記対象物の基準方向に対応する方向とを用いて予め学習された係数を用いたフィルタを用いてデータ処理を行うデータ処理装置であって、対象物を表す対象データ、及び前記対象データ上の、前記対象物の基準方向に対応する方向を受け付ける受付部と、前記対象物の基準方向に対応する方向に基づいて、前記フィルタの係数を変更した変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行うデータ処理部と、を含んで構成されている。 In order to achieve the above object, the data processing device according to the present invention processes data using a filter that uses coefficients that have been previously learned using learning data representing an object and a direction on the learning data that corresponds to the reference direction of the object, and is configured to include a reception unit that receives object data representing an object and a direction on the object data that corresponds to the reference direction of the object, and a data processing unit that performs convolution processing on the object data using a modified filter in which the coefficients of the filter have been modified based on the direction that corresponds to the reference direction of the object, to perform data processing.

本発明に係るデータ処理装置によれば、受付部が、対象物を表す対象データ、及び前記対象データ上の、前記対象物の基準方向に対応する方向を受け付ける。そして、データ処理部が、前記対象物の基準方向に対応する方向に基づいて、前記フィルタの係数を変更した変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行う。 In the data processing device according to the present invention, the receiving unit receives target data representing an object and a direction on the target data that corresponds to a reference direction of the object. Then, the data processing unit performs data processing by performing a convolution process on the target data using a modified filter in which the coefficients of the filter are modified based on the direction that corresponds to the reference direction of the object.

このように、対象データ上の、前記対象物の基準方向に対応する方向に基づいて、前記フィルタの係数を変更した変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行うことにより、対象物を表す対象データに対して、対象物の基準方向を考慮してデータ処理を行うことができる。 In this way, by performing convolution processing on the target data using a modified filter whose coefficients have been changed based on the direction in the target data that corresponds to the reference direction of the object, data processing can be performed on the target data representing the object, taking into account the reference direction of the object.

また、前記対象データは、要素毎に対象物の基準方向が異なるデータであって、前記受付部は、前記要素毎に前記対象物の基準方向に対応する方向を格納した角度マップを受け付け、前記データ処理部は、前記角度マップに基づいて、前記フィルタの係数を変更した変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行うことができる。 The target data is data in which the reference direction of the object differs for each element, and the reception unit receives an angle map that stores a direction corresponding to the reference direction of the object for each element, and the data processing unit performs a convolution process on the target data using a modified filter in which the coefficient of the filter is modified based on the angle map, to perform data processing.

また、前記データ処理部は、前記要素毎に、前記方向に基づいて、前記フィルタの係数を変更し、前記対象データに対して、前記要素毎に、前記変更後のフィルタを用いて畳み込み処理を行ってデータ処理を行うことができる。 The data processing unit can also change the filter coefficients for each element based on the direction, and perform convolution processing on the target data using the changed filter for each element to process the data.

前記データ処理部は、前記予め学習された係数を用いたフィルタを用いて生成される、複数の特定方向の各々に対応するフィルタを、前記方向と前記特定方向との相対角度に応じて重み付けすることにより、前記フィルタの係数を変更し、変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行うことができる。 The data processing unit changes the coefficients of the filters corresponding to each of a plurality of specific directions, which are generated using a filter using the pre-learned coefficients, by weighting the filters according to the relative angle between the direction and the specific direction, and can perform data processing by performing a convolution process on the target data using the changed filters.

前記データ処理部は、前記予め学習された係数を用いたフィルタを用いて生成される、複数の特定方向の各々に対応するフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を各々行い、前記複数の特定方向の各々に対応するフィルタを用いた前記データ処理の結果を、前記対象物の基準方向に対応する方向と前記特定方向との相対角度に応じて重み付けて統合することができる。 The data processing unit performs data processing by performing a convolution process on the target data using a filter corresponding to each of a plurality of specific directions, which is generated using a filter using the pre-learned coefficients, and can integrate the results of the data processing using the filter corresponding to each of the plurality of specific directions by weighting them according to the relative angle between the direction corresponding to the reference direction of the target object and the specific direction.

前記データ処理部は、畳み込みニューラルネットワークの少なくとも一部の層のフィルタとして、前記変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行うことができる。 The data processing unit can perform data processing by performing convolution processing on the target data using the modified filter as a filter for at least a portion of the layers of the convolutional neural network.

前記対象データは、魚眼カメラにより撮影された魚眼画像である、とすることができる。 The target data may be a fisheye image taken by a fisheye camera.

本発明に係るデータ処理方法は、対象物を表す学習用データと前記学習用データ上の、前記対象物の基準方向に対応する方向とを用いて予め学習された係数を用いたフィルタを用いてデータ処理を行うデータ処理方法であって、受付部が、対象物を表す対象データ、及び前記対象データ上の、前記対象物の基準方向に対応する方向を受け付け、データ処理部が、前記対象物の基準方向に対応する方向に基づいて、前記フィルタの係数を変更した変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行う。 The data processing method according to the present invention performs data processing using a filter using coefficients that have been previously trained using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object, in which a receiving unit receives object data representing an object and a direction on the object data that corresponds to a reference direction of the object, and a data processing unit performs data processing by performing a convolution process on the object data using a modified filter in which the coefficients of the filter have been modified based on the direction that corresponds to the reference direction of the object.

本発明に係るデータ処理プログラムは、対象物を表す学習用データと前記学習用データ上の、前記対象物の基準方向に対応する方向とを用いて予め学習された係数を用いたフィルタを用いてデータ処理を行うためのデータ処理プログラムであって、コンピュータを、対象物を表す対象データ、及び前記対象データ上の、前記対象物の基準方向に対応する方向を受け付ける受付部、及び前記対象物の基準方向に対応する方向に基づいて、前記フィルタの係数を変更した変更後のフィルタを用いて、前記対象データに対して畳み込み処理を行ってデータ処理を行うデータ処理部として機能させるためのプログラムである。 The data processing program according to the present invention is a data processing program for performing data processing using a filter using coefficients that have been previously trained using learning data representing an object and a direction on the learning data that corresponds to the reference direction of the object, and causes a computer to function as a reception unit that receives object data representing an object and a direction on the object data that corresponds to the reference direction of the object, and a data processing unit that performs data processing by performing a convolution process on the object data using a modified filter in which the coefficients of the filter have been modified based on the direction that corresponds to the reference direction of the object.

本発明に係る学習モデル生成方法は、学習用データ生成部が、対象物を表す学習用データと、正解のデータ処理結果と、前記学習用データ上の、前記対象物の基準方向に対応する方向とを含む訓練データセットを生成し、学習部が、データ処理のための畳み込み処理で用いるフィルタの係数を、前記方向に基づいて前記フィルタの係数を変更した変更後のフィルタを用いて前記学習用データに対して畳み込み処理を行ってデータ処理を行った結果と、前記正解のデータ処理結果とが一致するように学習する。 In the learning model generation method according to the present invention, a learning data generation unit generates a training data set including learning data representing an object, a correct data processing result, and a direction on the learning data that corresponds to a reference direction of the object, and a learning unit performs convolution processing on the learning data using a modified filter in which the coefficients of the filter are changed based on the direction, and learns so that the result of data processing matches the correct data processing result.

本発明のデータ処理装置、データ処理方法、データ処理プログラム、及び学習モデル生成方法によれば、対象物を表す対象データに対して、対象物の基準方向を考慮してデータ処理を行うことができる、という効果が得られる。 The data processing device, data processing method, data processing program, and learning model generation method of the present invention have the advantage that data processing can be performed on target data representing an object while taking into account the reference direction of the object.

本発明の実施の形態に係る画像処理システムの構成を示す概略図である。1 is a schematic diagram showing a configuration of an image processing system according to an embodiment of the present invention; 魚眼画像の一例を示す図である。FIG. 2 is a diagram illustrating an example of a fisheye image. 本発明の実施の形態に係る学習装置の構成を示す概略図である。1 is a schematic diagram showing a configuration of a learning device according to an embodiment of the present invention; 本発明の実施の形態に係る学習装置の学習部の構成を示す概略図である。2 is a schematic diagram showing a configuration of a learning unit of the learning device according to the embodiment of the present invention; FIG. 本発明の実施の形態に係る学習装置及び画像処理装置のデータ処理部の構成を示す概略図である。1 is a schematic diagram showing a configuration of a data processing unit of a learning device and an image processing device according to an embodiment of the present invention. 本発明の実施の形態に係る画像処理装置の構成を示す概略図である。1 is a schematic diagram illustrating a configuration of an image processing device according to an embodiment of the present invention. 本発明の実施の形態に係る学習装置による学習処理の動作を示すフローチャートである。5 is a flowchart showing the operation of a learning process by the learning device according to the embodiment of the present invention. 本発明の実施の形態に係る画像処理装置による画像処理の動作を示すフローチャートである。4 is a flowchart showing an image processing operation by the image processing device according to the embodiment of the present invention. （Ａ）魚眼画像の一例を示す図、及び（Ｂ）セグメンテーション処理結果の一例を示す図である。FIG. 1A is a diagram showing an example of a fisheye image, and FIG. 1B is a diagram showing an example of a segmentation processing result. 変形例に係る学習装置及び画像処理装置のデータ処理部の構成を示す概略図である。FIG. 13 is a schematic diagram showing the configuration of a data processing unit of a learning device and an image processing device according to a modified example. 透視投影画像の一例を示す図である。FIG. 2 is a diagram illustrating an example of a perspective projection image. （Ａ）魚眼画像の一例を示す図、及び（Ｂ）従来技術におけるセグメンテーション処理結果の一例を示す図である。FIG. 1A is a diagram showing an example of a fisheye image, and FIG. 1B is a diagram showing an example of a segmentation processing result in the prior art.

以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、本実施の形態では、魚眼撮影装置によって撮影された魚眼画像を被写体のクラスごとに分類してラベル領域に分割する画像処理システムに本発明を適用した場合を例に説明する。魚眼画像が、対象データの一例であり、被写体が対象物の一例である。 Below, an embodiment of the present invention will be described in detail with reference to the drawings. In this embodiment, the present invention will be described as being applied to an image processing system that classifies fisheye images taken by a fisheye photography device by subject class and divides them into label areas. A fisheye image is an example of target data, and a subject is an example of an object.

＜システム構成＞
以下、本発明を適用した画像処理システム１０００の概略構成を示した図１を参照し、本発明の実施の形態の構成を説明する。 <System Configuration>
Hereinafter, the configuration of an embodiment of the present invention will be described with reference to FIG. 1 which shows a schematic configuration of an image processing system 1000 to which the present invention is applied.

（画像処理システム１０００）
画像処理システム１０００は、魚眼撮影装置１１００、ネットワーク１２００、学習装置１００、及び画像処理装置１５０を有する。なお、画像処理装置１５０が、データ処理装置の一例である。 (Image Processing System 1000)
The image processing system 1000 includes a fisheye photography device 1100, a network 1200, a learning device 100, and an image processing device 150. The image processing device 150 is an example of a data processing device.

（魚眼撮影装置１１００） (Fisheye photography device 1100)

魚眼撮影装置１１００は、図２に示すような、魚眼画像を撮影する。具体的には、魚眼撮影装置１１００は、所定の監視空間を監視する目的で設置される監視カメラであり、例えば、天井などに設置され、光学中心が略鉛直下向きで下方を撮影する。魚眼画像は、画素毎に被写体の基準方向（例えば、上方向）が異なり、画素毎の被写体の基準方向が予め定められている。魚眼撮影装置１１００で撮影した魚眼画像は、画像処理装置１５０に送信される。なお、画素が、要素の一例である。 The fisheye photography device 1100 captures a fisheye image as shown in FIG. 2. Specifically, the fisheye photography device 1100 is a surveillance camera installed for the purpose of monitoring a specified surveillance space, and is installed, for example, on a ceiling, with its optical center facing approximately vertically downward to capture images downward. In a fisheye image, the reference direction of the subject (e.g., the upward direction) differs for each pixel, and the reference direction of the subject for each pixel is determined in advance. The fisheye image captured by the fisheye photography device 1100 is transmitted to the image processing device 150. Note that a pixel is an example of an element.

（ネットワーク１２００）
ネットワーク１２００は、魚眼撮影装置１１００、画像処理装置１５０、及び学習装置１００の間でデータの送受信を行なうために利用される回線である。ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や、インターネット等の公衆回線が本発明のネットワーク１２００として利用できる。ネットワーク１２００上の電文については、公知のＶＰＮ技術等を用いて、電文を暗号化する等の安全措置が講じられることが望ましい。 (Network 1200)
The network 1200 is a line used for transmitting and receiving data between the fisheye photography device 1100, the image processing device 150, and the learning device 100. A local area network (LAN) or a public line such as the Internet can be used as the network 1200 of the present invention. It is desirable to take security measures for messages on the network 1200, such as encrypting the messages using known VPN technology or the like.

（学習装置１００）
画像処理装置１５０で用いるモデルのパラメータは、学習装置１００によって学習される。 (Learning device 100)
The parameters of the model used in the image processing device 150 are learned by the learning device 100 .

学習装置１００は、ＣＰＵ、ＧＰＵ、ＭＰＵ、周辺回路、端子、各種メモリなどから構成され、学習用データセットを受け付け、画像の被写体のクラスごとに分類してラベル領域に分割するセグメンテーション処理を行うためのモデルのパラメータを学習する。なお、セグメンテーション処理が、データ処理の一例である。 The learning device 100 is composed of a CPU, GPU, MPU, peripheral circuits, terminals, various memories, etc., and learns the parameters of a model for accepting a learning dataset and performing segmentation processing to classify the subjects of an image by class and divide them into label regions. The segmentation processing is an example of data processing.

図３に示すように、学習装置１００は、機能的には、受付部１０及び演算部２０を備えている。 As shown in FIG. 3, the learning device 100 functionally comprises a reception unit 10 and a calculation unit 20.

受付部１０は、複数の学習用データセットを受け付ける。例えば、学習用データセットには、透視投影画像の学習用画像とセグメンテーション処理用の正解ラベルとのセットが含まれている。ここで、透視投影画像では、被写体の基準方向は、上方向となっている。また、セグメンテーション処理用の正解ラベルとは、学習用画像と同じ幅と高さを持つ配列で、各要素が対応する画像の位置のセグメンテーションラベルを示しているものである。 The receiving unit 10 receives a plurality of training data sets. For example, the training data set includes a set of training images of perspective projection images and correct answer labels for segmentation processing. Here, in the perspective projection images, the reference direction of the subject is the upward direction. The correct answer labels for segmentation processing are an array with the same width and height as the training images, with each element indicating the segmentation label of the corresponding image position.

図３に示すように、演算部２０は、学習用画像生成部２２、学習部２４、及びモデル記憶部２６を備えている。 As shown in FIG. 3, the calculation unit 20 includes a learning image generation unit 22, a learning unit 24, and a model storage unit 26.

以下、演算部２０を構成する学習用画像生成部２２、学習部２４、及びモデル記憶部２６の各部について、詳細に説明する。 The learning image generation unit 22, learning unit 24, and model storage unit 26 that make up the calculation unit 20 are described in detail below.

学習用画像生成部２２は、学習用画像と正解ラベルとを、ある角度θ_ａｕｇだけ回転させる。このときθ_ａｕｇは０°≦θ_ａｕｇ＜３６０°から無作為に選択する。また、学習用画像生成部２２は、学習用画像と同じ幅と高さを持つ配列である角度マップを作成し、全ての要素の値をθ_ａｕｇとしたものを出力する。 The training image generation unit 22 rotates the training images and the correct labels by a certain angle θ _aug , where θ _aug is selected randomly from 0°≦θ _aug <360°. The training image generation unit 22 also creates an angle map, which is an array with the same width and height as the training images, and outputs the map with the values of all elements set to θ _aug .

学習部２４は、図４に示すように、データ処理部３０、誤差計算部４０、及び更新部５０を備えている。 As shown in FIG. 4, the learning unit 24 includes a data processing unit 30, an error calculation unit 40, and an update unit 50.

以下、学習部２４を構成するデータ処理部３０、誤差計算部４０、及び更新部５０の各部について、詳細に説明する。 The data processing unit 30, error calculation unit 40, and update unit 50 that make up the learning unit 24 are described in detail below.

データ処理部３０は、学習用画像と角度マップとを用いて、学習用画像に対してセグメンテーション処理を行う。ここで、データ処理部３０は、畳み込みニューラルネットワークで構成されているものとし、角度マップに基づいて、フィルタの係数を変更した変更後のフィルタを用いた畳み込みニューラルネットワークに、学習用画像を入力して、学習用画像に対して畳み込み処理を行ってセグメンテーション処理を行い、学習用画像を被写体のクラスごとに分類してラベル領域に分割したセグメンテーション結果を出力する。 The data processing unit 30 performs a segmentation process on the training image using the training image and the angle map. Here, the data processing unit 30 is assumed to be configured with a convolutional neural network, and inputs the training image to a convolutional neural network using a modified filter whose coefficients have been changed based on the angle map, performs a convolution process on the training image to perform a segmentation process, and outputs a segmentation result in which the training image is classified by subject class and divided into label regions.

以下、図５に示すデータ処理部３０を構成するフィルタ合成部３２及び畳み込み計算部３４の各部について、詳細に説明する。 The following provides a detailed explanation of each of the filter synthesis unit 32 and the convolution calculation unit 34 that constitute the data processing unit 30 shown in FIG. 5.

データ処理部３０では、被写体の上方向が画像の上方向である領域を対象とするフィルタｗ_↑と、被写体の下方向が画像の上方向である領域を対象とするフィルタｗ_↓と、被写体の右方向が画像の上方向である領域を対象とするフィルタｗ_→と、被写体の左方向が画像の上方向である領域を対象とするフィルタｗ_←とを、基底フィルタとして保持している。この基底フィルタを角度マップに基づいてフィルタ合成部３２において合成する。なお、上方向、下方向、右方向、及び左方向が、特定方向の一例である。また、図５では、Ｎ個の基底フィルタを角度マップに基づいて合成する例を示している。 The data processing unit 30 holds, as base filters, a filter w _↑ for a region where the upper direction of the subject is the upper direction of the image, a filter w _↓ for a region where the lower direction of the subject is the upper direction of the image, a filter w _→ for a region where the right direction of the subject is the upper direction of the image, and a filter w _← for a region where the left direction of the subject is the upper direction of the image. These base filters are synthesized in the filter synthesis unit 32 based on the angle map. Note that the upper direction, the lower direction, the right direction, and the left direction are examples of the specific direction. Also, FIG. 5 shows an example of synthesizing N base filters based on the angle map.

フィルタ合成部３２は、画素毎に、角度マップの値に基づいて、フィルタの係数を変更する。 The filter synthesis unit 32 changes the filter coefficients for each pixel based on the angle map values.

具体的には、学習用画像での画素ｉにおける角度マップの値をθ_ｉとすると、合成フィルタｗ_ｉを以下のように求める。 Specifically, if the angle map value at pixel i in the learning image is θ _i , the synthesis filter w _i is calculated as follows.

・・・（１）
...(1)

上記（１）式のように、データ処理部３０は、複数の特定方向の各々に対応する基底フィルタを、角度マップの値が示す方向と特定方向との相対角度に応じて重み付けすることにより、合成フィルタを求め、フィルタの係数を変更することで、角度マップの値に応じて画像内で変化するフィルタを作成することができる。 As shown in the above formula (1), the data processing unit 30 obtains a composite filter by weighting the basis filters corresponding to each of the multiple specific directions according to the relative angle between the direction indicated by the angle map value and the specific direction, and by changing the filter coefficients, it is possible to create a filter that changes within the image according to the angle map value.

なお、畳み込みに使用するフィルタの回転は９０度、１８０度、２７０度の回転であれば、配列の転置と上下左右の反転を用いて定義することができる。よって、例えば、データ処理部３０は、ｗ_↑のみを基底フィルタとし、ｗ_←、ｗ_↓、ｗ_→についてはｗ_↑を反転、転置することで求めることもできる。この場合、学習パラメータの増加を防ぐことができるほか、基底フィルタ間の関係に自然な拘束を与えることができるようになる。 In addition, if the rotation of the filter used in the convolution is 90 degrees, 180 degrees, or 270 degrees, it can be defined by transposing the array and inverting it up and down and left and right. Therefore, for example, the data processing unit 30 can determine only w _↑ as a basis filter, and for w _← , w _↓ , and w _→ , by inverting and transposing w _↑ . In this case, it is possible to prevent an increase in the learning parameters, and to impose natural constraints on the relationship between the basis filters.

畳み込み計算部３４は、各層において、画素毎に、フィルタ合成部３２で当該画素について求めた合成フィルタを元に、学習用画像に畳み込み処理を行い、次の層に結果を出力する。 For each pixel in each layer, the convolution calculation unit 34 performs convolution processing on the learning image based on the synthesis filter calculated for that pixel by the filter synthesis unit 32, and outputs the results to the next layer.

畳み込み計算部３４は、最終的に各画素における各クラスの確度を、セグメンテーション処理結果として出力する。 The convolution calculation unit 34 finally outputs the accuracy of each class at each pixel as the segmentation processing result.

誤差計算部４０は、各画素における各クラスの確度と、回転済み正解ラベルとを元に、各画素におけるクロスエントロピー誤差を計算する。 The error calculation unit 40 calculates the cross-entropy error at each pixel based on the accuracy of each class at each pixel and the rotated correct label.

更新部５０は、計算された誤差を元に、畳み込みニューラルネットワークのフィルタの係数を含むパラメータを更新する。 The update unit 50 updates parameters including the filter coefficients of the convolutional neural network based on the calculated error.

学習部２４は、データ処理部３０、誤差計算部４０、及び更新部５０の各処理を繰り返すことで、計算される誤差を最小化し、データ処理部３０を構成する畳み込みニューラルネットワークのパラメータを学習する。 The learning unit 24 minimizes the calculated error by repeating the processes of the data processing unit 30, the error calculation unit 40, and the update unit 50, and learns the parameters of the convolutional neural network that constitutes the data processing unit 30.

なお、本実施の形態では、複数の学習用データセットについて誤差を計算し、その誤差を平均したものを用いて、パラメータの更新量を求める。学習用画像生成部２２において学習用画像を回転させる角度は、学習用データセット各々で異なっていて良い。 In this embodiment, the error is calculated for multiple learning data sets, and the average of the errors is used to determine the amount of parameter update. The angle by which the learning image is rotated in the learning image generation unit 22 may be different for each learning data set.

モデル記憶部２６は、学習部２４によって学習されたモデルのパラメータを記憶している。 The model memory unit 26 stores the parameters of the model learned by the learning unit 24.

（画像処理装置１５０）
画像処理装置１５０は、ＣＰＵ、ＧＰＵ、ＭＰＵ、周辺回路、端子、各種メモリなどから構成され、魚眼撮影装置１１００が送信した魚眼画像を受信し、魚眼画像を被写体のクラスごとに分類してラベル領域に分割するセグメンテーション処理を行う。 (Image Processing Device 150)
The image processing device 150 is composed of a CPU, GPU, MPU, peripheral circuits, terminals, various memories, etc., and receives fisheye images sent by the fisheye photography device 1100, and performs segmentation processing to classify the fisheye images by subject class and divide them into label areas.

図６に示すように、画像処理装置１５０は、機能的には、受付部６０、データ処理部７０、及び出力部８０を備えている。 As shown in FIG. 6, the image processing device 150 functionally comprises a reception unit 60, a data processing unit 70, and an output unit 80.

受付部６０は、受信した魚眼画像を受け付ける。また、画素毎に、魚眼画像に写る被写体の基準方向（上方向）は既知であるとし、受付部６０は、画素毎に、魚眼画像に写る被写体の基準方向を格納した角度マップを受け付ける。 The reception unit 60 receives the received fisheye image. In addition, the reference direction (upward) of the subject in the fisheye image is assumed to be known for each pixel, and the reception unit 60 receives an angle map that stores the reference direction of the subject in the fisheye image for each pixel.

上記図２に示すような魚眼画像を受け付けるため、角度マップの画素ｉの値は、画像中心をｃとしたときに、以下の式に従って設定される。 To accept a fisheye image such as that shown in Figure 2 above, the value of pixel i in the angle map is set according to the following formula, where c is the center of the image.

ここでａｒｃｔａｎ２は以下を満たす関数であるとする。 Here, arctan2 is a function that satisfies the following:

データ処理部７０には、学習装置１００で学習された畳み込みニューラルネットワークが実装されており、角度マップに基づいて、フィルタの係数を変更した変更後のフィルタを用いた畳み込みニューラルネットワークに、魚眼画像を入力して、魚眼画像に対して畳み込み処理を行ってセグメンテーション処理を行い、魚眼画像を被写体のクラスごとに分類してラベル領域に分割したセグメンテーション結果を、出力部８０により出力する。 The data processing unit 70 is equipped with a convolutional neural network trained by the learning device 100, and a fisheye image is input to the convolutional neural network using a modified filter whose filter coefficients have been changed based on the angle map. A convolution process is performed on the fisheye image to perform segmentation processing, and the segmentation results in which the fisheye image is classified by subject class and divided into label regions are output by the output unit 80.

具体的には、データ処理部７０は、上記図５に示すように、フィルタ合成部７２及び畳み込み計算部７４を備えている。 Specifically, as shown in FIG. 5 above, the data processing unit 70 includes a filter synthesis unit 72 and a convolution calculation unit 74.

フィルタ合成部７２は、フィルタ合成部３２と同様に、魚眼画像の画素毎に、角度マップの値に基づいて、フィルタの係数を変更する。 Similar to the filter synthesis unit 32, the filter synthesis unit 72 changes the filter coefficients for each pixel of the fisheye image based on the angle map values.

具体的には、魚眼画像での画素ｉにおける角度マップの値をθｉとし、合成フィルタｗｉを、上記（１）式に従って求める。 Specifically, the angle map value for pixel i in the fisheye image is θi, and the synthesis filter wi is calculated according to the above formula (1).

畳み込み計算部７４は、畳み込み計算部３４と同様に、各層において、画素毎に、フィルタ合成部７２で求めた合成フィルタを元に、魚眼画像に畳み込み処理を行い、次の層に結果を出力する。 Similar to the convolution calculation unit 34, the convolution calculation unit 74 performs convolution processing on the fisheye image for each pixel in each layer based on the synthesis filter obtained by the filter synthesis unit 72, and outputs the results to the next layer.

畳み込み計算部７４は、最終的に各画素において、最も確度が大きいクラスを、当該画素のクラスとし、セグメンテーション処理結果を、出力部８０により出力する。 The convolution calculation unit 74 finally determines the class with the highest probability for each pixel as the class of that pixel, and outputs the segmentation processing result via the output unit 80.

＜画像処理システムの動作＞
以下、図７、図８に示したフローチャートを参照しつつ、本発明を適用した画像処理システム１０００の動作を説明する。なお、学習装置１００に、受付部１０により、予め用意された複数の学習用データセットが入力されている場合を例に説明する。 <Operation of image processing system>
The operation of the image processing system 1000 to which the present invention is applied will be described below with reference to the flowcharts shown in Figures 7 and 8. Note that the description will be given taking as an example a case in which a plurality of training data sets prepared in advance are input to the training device 100 by the receiving unit 10.

図７に示す学習装置１００の学習処理は事前に実行される。学習処理では、学習用画像生成部２２は、学習用データセット毎に、学習用画像と正解ラベルとを、無作為に選択される角度θ_ａｕｇだけ回転させると共に、各画素の値を角度θ_ａｕｇとした角度マップを生成する（ステップＳ１００）。 7 is executed in advance. In the learning process, the learning image generating unit 22 rotates the learning image and the correct label by a randomly selected angle θ _aug for each learning data set, and generates an angle map in which the value of each pixel is the angle θ _aug (step S100).

データ処理部３０には、学習用データセット毎に、学習用画像と角度マップとを用いて、学習用画像に対してセグメンテーション処理を行う。ここで、データ処理部３０は、角度マップに基づいて、フィルタの係数を変更した変更後のフィルタを用いた畳み込みニューラルネットワークに、学習用画像を入力して、学習用画像に対して畳み込み処理を行ってセグメンテーション処理を行い、セグメンテーション結果を出力する（ステップＳ１０２）。 For each training data set, the data processing unit 30 performs segmentation processing on the training images using the training images and the angle map. Here, the data processing unit 30 inputs the training images to a convolutional neural network using a modified filter whose filter coefficients have been changed based on the angle map, performs convolution processing on the training images to perform segmentation processing, and outputs the segmentation results (step S102).

そして、誤差計算部４０は、学習用データセット毎に、各画素における各クラスの確度と、回転済み正解ラベルとを元に、各画素におけるクロスエントロピー誤差を計算する（ステップＳ１０４）。 Then, for each learning dataset, the error calculation unit 40 calculates the cross-entropy error at each pixel based on the accuracy of each class at each pixel and the rotated correct label (step S104).

更新部５０は、学習用データセット毎に計算された誤差を元に、畳み込みニューラルネットワークのフィルタの係数を含む各種パラメータを更新する（ステップＳ１０６）。 The update unit 50 updates various parameters, including the filter coefficients of the convolutional neural network, based on the error calculated for each learning dataset (step S106).

そして、予め定められた反復終了条件を満たしたか否かを判定する（ステップＳ１０８）。予め定められた反復終了条件を満たさない場合には、ステップＳ１０２へ戻り、一方、予め定められた反復終了条件を満たす場合には、ステップＳ１１０へ移行する。 Then, it is determined whether or not a predetermined iteration end condition is satisfied (step S108). If the predetermined iteration end condition is not satisfied, the process returns to step S102, whereas if the predetermined iteration end condition is satisfied, the process proceeds to step S110.

なお、反復終了条件としては、計算される誤差の値が収束したことや、反復回数が上限に到達したことなどを用いればよい。 The condition for ending the iterations can be when the calculated error value has converged or when the number of iterations has reached an upper limit.

そして、最終的に学習された畳み込みニューラルネットワークのパラメータをモデル記憶部２６に格納し（ステップＳ１１０）、学習処理を終了する。 Then, the final trained parameters of the convolutional neural network are stored in the model storage unit 26 (step S110), and the training process is terminated.

そして、学習装置１００によって学習された、畳み込みニューラルネットワークのパラメータが、画像処理装置１５０のデータ処理部７０に設定される。 The parameters of the convolutional neural network learned by the learning device 100 are then set in the data processing unit 70 of the image processing device 150.

魚眼撮影装置１１００から魚眼画像を受信した画像処理装置１５０は、図８に示す画像処理を行う。なお、以下に説明する図８の画像処理装置１５０の動作は、魚眼撮影装置１１００によって撮影された魚眼画像を１枚受信するごとに実行される。 The image processing device 150 that receives the fisheye image from the fisheye photography device 1100 performs the image processing shown in FIG. 8. Note that the operation of the image processing device 150 in FIG. 8 described below is executed each time a fisheye image captured by the fisheye photography device 1100 is received.

まず、受付部６０は、受信した魚眼画像を受け付ける。また、受付部６０は、画素毎に、魚眼画像に写る被写体の基準方向を格納した角度マップを受け付ける（ステップＳ１１０）。 First, the reception unit 60 receives the received fisheye image. The reception unit 60 also receives an angle map that stores the reference direction of the subject in the fisheye image for each pixel (step S110).

フィルタ合成部７２は、魚眼画像の画素毎に、角度マップの値に基づいて、フィルタの係数を変更する（ステップＳ１１２）。 The filter synthesis unit 72 changes the filter coefficients for each pixel of the fisheye image based on the angle map values (step S112).

畳み込み計算部７４は、各層において、画素毎に、フィルタ合成部７２で求めたフィルタを元に、魚眼画像に畳み込み処理を行い、次の層に結果を出力する。畳み込み計算部７４は、最終的に各画素において、最も確度が大きいクラスを、当該画素のクラスとし、セグメンテーション処理結果を、出力部８０により出力する（ステップＳ１１４）。 For each layer, the convolution calculation unit 74 performs convolution processing on the fisheye image based on the filter calculated by the filter synthesis unit 72 for each pixel, and outputs the results to the next layer. Finally, for each pixel, the convolution calculation unit 74 determines the class with the highest probability as the class of that pixel, and outputs the segmentation processing results from the output unit 80 (step S114).

以上説明してきたように、本発明の実施の形態に係る画像処理装置１５０では、魚眼画像上の、被写体の基準方向に対応する方向に基づいて、フィルタの係数を変更した変更後のフィルタを用いて、魚眼画像に対して畳み込み処理を行ってセグメンテーション処理を行うことにより、魚眼画像に対して、被写体の基準方向を考慮してセグメンテーション処理を行うことができる。例えば、図９に示すように、魚眼画像に対して、正しくセグメンテーション処理を行うことができる。これにより魚眼画像を複数の画像に展開する処理が不要になる。図９では、図９（Ａ）に示す魚眼画像に対するセグメンテーション処理結果として、図９（Ｂ）に示すように、正しくラベル領域に分割された例を示している。 As described above, in the image processing device 150 according to the embodiment of the present invention, a modified filter whose coefficients are changed based on the direction corresponding to the reference direction of the subject in the fisheye image is used to perform convolution processing on the fisheye image to perform segmentation processing, thereby making it possible to perform segmentation processing on the fisheye image while taking into account the reference direction of the subject. For example, as shown in FIG. 9, it is possible to perform correct segmentation processing on the fisheye image. This makes it unnecessary to perform processing to expand the fisheye image into multiple images. FIG. 9 shows an example in which the fisheye image shown in FIG. 9(A) is correctly divided into labeled regions as shown in FIG. 9(B) as a result of segmentation processing on the fisheye image shown in FIG. 9(A).

また、魚眼画像上の、被写体の基準方向に対応する方向に基づいて、畳み込み処理で用いるフィルタの係数を変更することにより、画像内に写っている被写体の向きに応じて処理を変更する機械学習モデルを用いることができる。 In addition, a machine learning model can be used that changes the processing depending on the orientation of the subject in the image by changing the filter coefficients used in the convolution process based on the direction that corresponds to the reference direction of the subject in the fisheye image.

＜変形例＞
以上、本発明の好適な実施形態について説明してきたが、本発明はこれらの実施形態に限定されるものではない。例えば、本実施形態では、上記図５に示すようにデータ処理部３０、７０を構成する場合を例に説明したが、これに限定されない。例えば、データ処理部３０、７０の代わりに、図１０のように、データ処理部２３０、２７０を構成するようにしてもよい。図１０では、学習装置１００のデータ処理部２３０が、ニューラルネットワークの各層においてＮ個の畳み込み計算部２３４及び出力統合部２３６を備えており、画像処理装置１５０のデータ処理部２７０が、ニューラルネットワークの各層においてＮ個の畳み込み計算部２７４及び出力統合部２７６を備えている例を示している。 <Modification>
Although the preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments. For example, in the present embodiment, the data processing units 30 and 70 are configured as shown in FIG. 5, but the present invention is not limited to this. For example, instead of the data processing units 30 and 70, data processing units 230 and 270 may be configured as shown in FIG. 10. FIG. 10 shows an example in which the data processing unit 230 of the learning device 100 includes N convolution calculation units 234 and an output integration unit 236 in each layer of the neural network, and the data processing unit 270 of the image processing device 150 includes N convolution calculation units 274 and an output integration unit 276 in each layer of the neural network.

学習装置１００のデータ処理部２３０の各層におけるＮ個の畳み込み計算部２３４は、複数の特定方向の各々に対応する基底フィルタを用いて、画像に対して畳み込み処理を行う。例えば、１番目の畳み込み計算部２３４は、画素毎に、１番目の基底フィルタを元に、学習用画像に畳み込み処理を行い、対応する出力統合部２３６に結果を出力する。同様に、Ｎ番目の畳み込み計算部２３４は、画素毎に、Ｎ番目の基底フィルタを元に、学習用画像に畳み込み処理を行い、対応する出力統合部２３６に結果を出力する。また、画像処理装置１５０の各層におけるＮ個の畳み込み計算部２７４は、学習装置１００の畳み込み計算部２３４と同様に、魚眼画像に対して畳み込み処理を行う。 The N convolution calculation units 234 in each layer of the data processing unit 230 of the learning device 100 perform convolution processing on an image using basis filters corresponding to each of a plurality of specific directions. For example, the first convolution calculation unit 234 performs convolution processing on the learning image based on the first basis filter for each pixel, and outputs the result to the corresponding output integration unit 236. Similarly, the Nth convolution calculation unit 234 performs convolution processing on the learning image based on the Nth basis filter for each pixel, and outputs the result to the corresponding output integration unit 236. Furthermore, the N convolution calculation units 274 in each layer of the image processing device 150 perform convolution processing on a fisheye image, similar to the convolution calculation unit 234 of the learning device 100.

学習装置１００の出力統合部２３６は、画素毎に、対応するＮ個の畳み込み計算部２３４から得られる畳み込み処理の結果を、当該画素の被写体の基準方向と特定方向との相対角度に応じて重み付けて統合する。例えば、出力統合部２３６は、上記（１）式と同様に、畳み込み処理の結果を統合し、統合した結果を次の層に出力する。最終的に最後の層における統合の結果に基づいて各画素における各クラスの確度を、セグメンテーション処理結果として出力する。また、画像処理装置１５０の出力統合部２７６は、学習装置１００の出力統合部２３６と同様に、Ｎ個の畳み込み計算部２７４にて行われた畳み込み処理の結果を統合する。このような処理によって、上記図５の構成と同じ結果を出力することができる。この構成では、畳み込み操作を行う際にフィルタが位置によって変化しないため、ＧＰＵなどを用いた計算を高速に行うことが可能になる。 The output integration unit 236 of the learning device 100 integrates the results of the convolution processing obtained from the corresponding N convolution calculation units 234 for each pixel by weighting them according to the relative angle between the reference direction of the subject of the pixel and the specific direction. For example, the output integration unit 236 integrates the results of the convolution processing as in the above formula (1) and outputs the integrated result to the next layer. Finally, based on the integration result in the last layer, the accuracy of each class at each pixel is output as the segmentation processing result. Also, the output integration unit 276 of the image processing device 150 integrates the results of the convolution processing performed by the N convolution calculation units 274, similar to the output integration unit 236 of the learning device 100. By such processing, it is possible to output the same results as the configuration of FIG. 5 above. In this configuration, since the filter does not change depending on the position when performing the convolution operation, it is possible to perform calculations using a GPU or the like at high speed.

また、データ処理部３０、７０で用いる基底フィルタの数は３以下であってもよいし、５つ以上あっても良い。この場合、例えば基底フィルタの向きが等間隔になるように配置し、任意の角度において、その角度に最も近い２つの基底フィルタの重み付き和として基底フィルタを定義することができる。また、基底フィルタの数を４の倍数にすることによって、９０度間隔の基底フィルタの組を作成することができ、その組の中で基底フィルタの転置や反転を用いて基底フィルタの回転関係を定義することができる。 The number of basis filters used in the data processing units 30 and 70 may be three or less, or may be five or more. In this case, for example, the basis filters may be arranged so that their orientations are equally spaced, and at any angle, a basis filter may be defined as the weighted sum of the two basis filters closest to that angle. By setting the number of basis filters to a multiple of four, a set of basis filters spaced at 90-degree intervals can be created, and within that set, the rotation relationship of the basis filters can be defined by transposing or inverting the basis filters.

また、角度マップを用いて畳み込み操作は、畳み込みニューラルネットワークの全層で用いても良いし、１層目のみ、前半のみ、後半のみなどその一部だけで用いても良い。 In addition, the convolution operation using the angle map may be used in all layers of the convolutional neural network, or in only a part of it, such as only the first layer, only the first half, or only the second half.

また、上記の実施の形態で説明した技術は畳み込み操作に関するものであり、セグメンテーション処理以外のデータ処理にも用いることができる。例えば姿勢推定やディテクションに応用しても良い。 The technology described in the above embodiment relates to convolution operations and can be used for data processing other than segmentation processing. For example, it can be applied to pose estimation and detection.

また、学習用画像に用いる画像の一部または全部は魚眼画像でも良い。この際、角度マップは、画像処理装置１５０における魚眼画像に対するデータ処理で用いたものを入力する。 In addition, some or all of the images used for learning may be fisheye images. In this case, the angle map used in the data processing of the fisheye images in the image processing device 150 is input.

また、上記の実施の形態で説明した技術は、３次元の畳み込みにも応用することができる。この場合、例えば基底フィルタはｘ，ｙ，ｚ軸の各正負方向の６つ取り、角度マップは単位球上の点の座標として与えることができる。 The technology described in the above embodiment can also be applied to three-dimensional convolution. In this case, for example, six basis filters can be taken for the positive and negative directions of the x, y, and z axes, and the angle map can be given as the coordinates of a point on a unit sphere.

３次元の畳み込みは例えば動画などの時系列画像の解析に応用される。ｘ，ｙ方向に画像を配置し、ｚ方向に時間軸をおいて３次元的に時系列画像を配置するような時空間を考える。この場合、対象物が移動すると、時空間において様々な向きや歪みで対象物が存在することになる。この時空間内での対象物の向きは、時系列画像の撮影間隔や連続する２つの画像内での移動距離を算出することで求められる。この向きを角度マップとして入力して３次元畳み込みを行うことで、より高精度な時系列画像の分析が可能になる。また、学習時には１枚の静止画を繰り返し入力して学習し、データ処理時に角度マップを変化させることで時系列画像に対応することも可能になる。 Three-dimensional convolution is applied to the analysis of time-series images such as videos. Consider a space-time in which images are arranged in the x and y directions, and the time axis is set in the z direction, so that the time-series images are arranged three-dimensionally. In this case, when an object moves, the object will exist in various orientations and distortions in the space-time. The orientation of the object in this space-time can be found by calculating the interval between the capture of the time-series images and the distance traveled between two consecutive images. By inputting this orientation as an angle map and performing three-dimensional convolution, it becomes possible to analyze time-series images with higher accuracy. In addition, during learning, a single still image is repeatedly input and learned, and by changing the angle map during data processing, it becomes possible to respond to time-series images.

また、本実施形態では、画像を対象データとしてデータ処理する場合を例に説明したが、これに限定されない。例えばＬｉＤＡＲ（Light Detection and Ranging）を用いて計測された点群データを対象データとしてデータ処理してもよい。例えば、上記実施形態の場合と同様に、被写体の基準方向を格納した角度マップを用いることにより、魚眼レンズが装着されたＬｉＤＡＲからの点群データから被写体を識別することが可能となる。 In addition, in this embodiment, an example has been described in which an image is used as the target data for data processing, but the present invention is not limited to this. For example, point cloud data measured using LiDAR (Light Detection and Ranging) may be processed as the target data. For example, as in the above embodiment, by using an angle map that stores the reference direction of the subject, it is possible to identify the subject from point cloud data from a LiDAR equipped with a fisheye lens.

また、学習部２４が、画像処理装置１５０とは別の学習装置１００ではなく、画像処理装置１５０に設けられてもよい。 In addition, the learning unit 24 may be provided in the image processing device 150, rather than in a learning device 100 separate from the image processing device 150.

以上のように、当業者は本発明の範囲内で、実施される形態に合わせて様々な変更を行うことができる。 As described above, those skilled in the art can make various modifications to suit the implementation form within the scope of the present invention.

１０、６０受付部
２０演算部
２２学習用画像生成部
２４学習部
２６モデル記憶部
３０、７０、２３０、２７０データ処理部
３２、７２フィルタ合成部
３４、７４、２３４、２７４畳み込み計算部
８０出力部
１００学習装置
１５０画像処理装置
２３６、２７６出力統合部
１０００画像処理システム
１１００魚眼撮影装置
１２００ネットワーク 10, 60 Reception unit 20 Calculation unit 22 Learning image generation unit 24 Learning unit 26 Model storage unit 30, 70, 230, 270 Data processing unit 32, 72 Filter synthesis unit 34, 74, 234, 274 Convolution calculation unit 80 Output unit 100 Learning device 150 Image processing device 236, 276 Output integration unit 1000 Image processing system 1100 Fisheye photography device 1200 Network

Claims

1. A data processing device that processes data using a filter using a coefficient that is learned in advance using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object,
a receiving unit that receives object data representing an object and a direction on the object data that corresponds to a reference direction of the object;
a data processing unit that performs data processing by performing a convolution process on the target data using a modified filter obtained by modifying a coefficient of the filter based on a direction corresponding to a reference direction of the target object;
Including,
The data processing unit changes the coefficients of the filters corresponding to each of a plurality of specific directions, which are generated using a filter using the pre-learned coefficients, by weighting the filters according to the relative angle between the specific directions and a direction corresponding to a reference direction of the object, and performs a convolution process on the target data using the changed filters to process data.

1. A data processing device that processes data using a filter using a coefficient that is learned in advance using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object,
a receiving unit that receives object data representing an object and a direction on the object data that corresponds to a reference direction of the object;
a data processing unit that performs data processing by performing a convolution process on the target data using a modified filter obtained by modifying a coefficient of the filter based on a direction corresponding to a reference direction of the target object;
Including,
the data processing unit performs a convolution process on the target data using a filter corresponding to each of a plurality of specific directions, the filter being generated using the filter using the pre-learned coefficient, to perform data processing,
a data processing device that weights and integrates the results of the data processing using the filters corresponding to each of the plurality of specific directions in accordance with a relative angle between the direction corresponding to a reference direction of the object and the specific directions.

The target data is data in which a reference direction of an object differs for each element,
The reception unit receives an angle map that stores a direction corresponding to a reference direction of the object for each of the elements;
3 . The data processing device according to claim 1 , wherein the data processing unit performs data processing by performing a convolution process on the target data using a filter in which a coefficient of the filter is changed based on the angle map.

The data processing unit changes a coefficient of the filter for each of the elements based on the direction;
4. The data processing apparatus according to claim 3, further comprising: a data processing unit for performing a convolution process on the target data for each of the elements using the changed filter.

The data processing device according to any one of claims 1 to 4, wherein the data processing unit performs data processing by performing convolution processing on the target data using the changed filter as a filter for at least a portion of a layer of a convolutional neural network.

A data processing device according to any one of claims 1 to 5, wherein the target data is a fisheye image taken by a fisheye camera.

1. A data processing method for processing data using a filter using a coefficient that is learned in advance using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object, comprising:
a receiving unit receiving object data representing an object and a direction on the object data corresponding to a reference direction of the object;
a data processing unit performing data processing by performing a convolution process on the target data using a modified filter in which a coefficient of the filter is modified based on a direction corresponding to a reference direction of the target object;
The data processing method includes changing the coefficients of the filters corresponding to each of a plurality of specific directions, the filters being generated using the filters that use the pre-learned coefficients, by weighting the filters according to the relative angle between the specific directions and a direction corresponding to a reference direction of the object, and performing a convolution process on the target data using the changed filters to process data.

1. A data processing method for processing data using a filter using a coefficient that is learned in advance using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object, comprising:
a receiving unit receiving object data representing an object and a direction on the object data corresponding to a reference direction of the object;
a data processing unit performing data processing by performing a convolution process on the target data using a modified filter obtained by modifying a coefficient of the filter based on a direction corresponding to a reference direction of the target object;
the data processing unit performs a convolution process on the target data using a filter corresponding to each of a plurality of specific directions, the filter being generated using the filter using the pre-learned coefficient, to perform data processing,
A data processing method in which the results of the data processing using filters corresponding to each of the plurality of specific directions are weighted and integrated according to a relative angle between the direction corresponding to a reference direction of the object and the specific directions.

1. A data processing program for performing data processing using a filter using coefficients that have been learned in advance using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object, comprising:
Computer,
a reception unit that receives target data representing an object and a direction on the target data that corresponds to a reference direction of the object; and a data processing unit that performs data processing by performing a convolution process on the target data using a modified filter in which a coefficient of the filter is changed based on the direction that corresponds to the reference direction of the object, the data processing program comprising:
The data processing unit changes the coefficients of the filters corresponding to each of a plurality of specific directions, which are generated using a filter using the pre-learned coefficients, by weighting the filters according to the relative angle between the specific directions and a direction corresponding to a reference direction of the object, and uses the changed filters to perform a convolution process on the target data and process the data.

1. A data processing program for performing data processing using a filter using coefficients that have been learned in advance using learning data representing an object and a direction on the learning data that corresponds to a reference direction of the object, comprising:
Computer,
a reception unit that receives target data representing an object and a direction on the target data that corresponds to a reference direction of the object; and a data processing unit that performs data processing by performing a convolution process on the target data using a modified filter in which a coefficient of the filter is changed based on the direction that corresponds to the reference direction of the object, the data processing program comprising:
the data processing unit performs a convolution process on the target data using a filter corresponding to each of a plurality of specific directions, the filter being generated using the filter using the pre-learned coefficient, to perform data processing,
A data processing program that integrates the results of the data processing using the filters corresponding to each of the plurality of specific directions by weighting them according to a relative angle between the direction corresponding to a reference direction of the object and the specific directions.

A learning data generation unit generates a training data set including learning data representing an object, a correct data processing result, and a direction on the learning data corresponding to a reference direction of the object;
a learning unit performs convolution processing on the learning data using a modified filter, the coefficients of which are changed based on a direction corresponding to a reference direction of the object, to learn so that a result of data processing matches the correct data processing result;
When the learning unit performs the data processing, the learning model generation method changes the coefficients of the filters generated using the filters that use the coefficients, by weighting the filters corresponding to each of a plurality of specific directions according to the relative angle between the direction corresponding to the reference direction of the object and the specific directions, and performs a convolution process on the learning data using the changed filters to process the data.

A learning data generation unit generates a training data set including learning data representing an object, a correct data processing result, and a direction on the learning data corresponding to a reference direction of the object;
a learning unit performs convolution processing on the learning data using a modified filter, the coefficients of which are changed based on a direction corresponding to a reference direction of the object, to learn so that a result of data processing matches the correct data processing result;
When the learning unit performs the data processing, the learning unit performs a convolution process on the learning data using a filter corresponding to each of a plurality of specific directions, the filter being generated using a filter using the coefficient, and performs data processing on each of the learning data;
A learning model generation method in which the results of the data processing using filters corresponding to each of the multiple specific directions are weighted and integrated according to the relative angle between the direction corresponding to the reference direction of the object and the specific direction.