JP7749449B2

JP7749449B2 - Learning data generation device, automatic door system, learning data generation method, trained model generation method, control program, and recording medium

Info

Publication number: JP7749449B2
Application number: JP2021208570A
Authority: JP
Inventors: 龍一福田
Original assignee: Optex Co Ltd
Current assignee: Optex Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2025-10-06
Anticipated expiration: 2041-12-22
Also published as: WO2023119989A1; JP2023093134A

Description

本発明は、機械学習で用いられる学習データを生成する学習用データ生成装置、学習用データ生成装置の制御プログラム、該制御プログラムを記録した記録媒体、前記学習用データ生成装置により生成された学習データを用いた学習済みモデル生成方法、および該学習済みモデルを用いて移動物体の検知を行う検知部を含む自動ドアシステムに関する。 The present invention relates to a training data generation device that generates training data used in machine learning, a control program for the training data generation device, a recording medium on which the control program is recorded, a trained model generation method using training data generated by the training data generation device, and an automatic door system that includes a detection unit that detects moving objects using the trained model.

ディープラーニングなどを用いた機械学習において、学習を効率的に行うためには大量の学習データか必要である。しかし、例えば、移動物体を検出するセンサに用いる学習済みモデルのための学習データを作成する場合、撮像画像に映った移動物体を手作業でラベリングする等を行うことにより、学習データを作成していた。よって、相当な労力が必要であった。そこで、学習データを効率的に作成する方法が各種提案されている。 In machine learning using deep learning and other methods, large amounts of training data are required for efficient learning. However, when creating training data for a trained model used in a sensor that detects moving objects, for example, the training data was previously created by manually labeling moving objects captured in captured images. This required a considerable amount of effort. As a result, various methods for efficiently creating training data have been proposed.

特許文献１には、実物体を撮像し、実物体に対応する３Ｄモデルを用いて、実物体の姿勢を導出し、実物体の外観情報と３Ｄモデルから得られる２Ｄモデルとを対応付ける、コンピュータプログラムが記載されている。 Patent document 1 describes a computer program that captures an image of a real object, derives the orientation of the real object using a 3D model corresponding to the real object, and associates the appearance information of the real object with a 2D model obtained from the 3D model.

特許文献２には、属性が関連付けられた３次元モデルと背景とを仮想空間にモデリングした３次元空間を２次元平面に投影し、３次元モデルが投影された２次元物体のラベルを生成し、２次元画像とラベルとを対応付ける学習データ生成装置が記載されている。 Patent document 2 describes a training data generation device that models a 3D model with associated attributes and a background in a virtual space, projects the 3D space onto a 2D plane, generates labels for the 2D objects onto which the 3D model is projected, and associates the 2D images with the labels.

特許文献３には、特定物体モデルを特定色とした学習データ生成用シーンデータを生成し、特定物体モデルの領域を設定し、シーンデータ用画像と特定物体領域とを対応付ける学習データ生成装置が記載されている。 Patent document 3 describes a learning data generation device that generates scene data for generating learning data in which a specific object model has a specific color, sets an area for the specific object model, and associates an image for the scene data with the specific object area.

特許文献４には、作成した学習画像を固有空間上の特徴点に投影し、入力された画像に最も近い特徴点の情報を出力する物体認識装置が記載されている。 Patent document 4 describes an object recognition device that projects a created training image onto feature points in an eigenspace and outputs information about the feature points that are closest to the input image.

特開２０１８－５１４１０号公報JP 2018-51410 A 国際公開ＷＯ２０２０／１８３５９８号公報International Publication No. WO2020/183598 特開２０１９－２３８５８号公報Japanese Patent Application Laid-Open No. 2019-23858 特開２０１０－２１１７３２号公報JP 2010-211732 A

しかし、上述した従来技術に対しては、改良の余地があり、さらに効率的に、かつ信頼性の高い学習データを作成することが可能である。本発明の一態様は、上記課題に鑑みてなされたものであり、その目的は、効率的かつ信頼性の高い学習データを生成する学習用データ生成装置を実現することにある。 However, there is room for improvement in the above-mentioned conventional technology, and it is possible to create more efficient and reliable training data. One aspect of the present invention was made in consideration of the above-mentioned problems, and its purpose is to realize a training data generation device that generates efficient and reliable training data.

前記の課題を解決するために、本発明の一態様に係る学習用データ生成装置は、検知対象の３次元モデルが仮想空間内で動作する動画像データを作成する動画像データ作成部と、前記動画像データを投影した２次元画像における輝度変化データまたは差分データを作成する変化データ作成部と、前記輝度変化データまたは前記差分データにおける前記検知対象に対応する領域に、当該検知対象であることを示すラベルを付与するラベリング部と、前記輝度変化データ、または前記差分データと、前記検知対象に対応する領域、および前記ラベリング部が付与したラベルとの組を学習用データとして生成する学習用データ生成部と、を備える。 To solve the above-mentioned problems, a training data generation device according to one aspect of the present invention comprises a video data generation unit that generates video data in which a three-dimensional model of a detection target moves in a virtual space; a variation data generation unit that generates brightness change data or difference data in a two-dimensional image projected from the video data; a labeling unit that assigns a label indicating that the detection target is an area in the brightness change data or difference data that corresponds to the detection target; and a training data generation unit that generates, as training data, a combination of the brightness change data or the difference data, the area corresponding to the detection target, and the label assigned by the labeling unit.

本発明の一態様に係る学習用データ生成装置は、動画像データを出力した２次元画像における輝度変化データまたは差分データを用いて、２次元画像における検知対象領域および当該検知対象領域に付与されたラベルを推定する学習済みモデルを学習するための学習用データを生成するものである。 A training data generation device according to one aspect of the present invention uses brightness change data or difference data in a two-dimensional image that outputs video data to generate training data for training a trained model that estimates detection target regions in the two-dimensional image and labels assigned to the detection target regions.

前記の構成によれば、作成した動画像データであって、検知対象を含む動画像データを用いるので、検知対象領域を自動的に判別でき、当該検知対象領域を自動的にラベリングすることが可能となる。また、動画像データを出力した２次元画像における輝度変化データまたは差分データを用いるので、動画像データをそのまま用いる場合と比較して、処理負荷を軽減できる。よって、効率的に学習用データを作成することができる。また、生成した動画像データを用いてラベリングを行うので、正確にラベリングを行うことができ、信頼性の高い学習用データを作成できる。 With the above configuration, the created video data that includes the detection target is used, so the detection target area can be automatically identified and the detection target area can be automatically labeled. Furthermore, because brightness change data or difference data in the two-dimensional image from which the video data is output is used, the processing load can be reduced compared to when the video data is used as is. This allows for efficient creation of learning data. Furthermore, because labeling is performed using the generated video data, accurate labeling can be performed, resulting in the creation of highly reliable learning data.

また、前記の学習用データ生成装置で作成した学習用データを用いた学習モデルは、輝度データまたは差分データを入力として、推定結果を得ることができるので、学習モデルの推定処理における処理負荷を軽減することができる。 In addition, a learning model using the learning data created by the learning data generation device can obtain estimation results using brightness data or difference data as input, thereby reducing the processing load in the estimation process of the learning model.

本発明の一態様に係る学習用データ生成装置では、前記２次元画像を表示する表示部は、該表示部における各画素に、それぞれの画素における輝度値が第１閾値を上回った場合、または第２閾値を下回った場合、イベントを発生させるイベント発生部を備えるとともに、該イベント発生部が前記イベントを発生した画素の前記表示部における位置を出力する位置出力部を備え、前記変化データ作成部は、前記位置出力部が出力した前記位置を用いて、前記輝度変化データを作成するものであってもよい。 In a learning data generation device according to one aspect of the present invention, the display unit that displays the two-dimensional image may include an event generation unit that generates an event for each pixel on the display unit when the luminance value of that pixel exceeds a first threshold value or falls below a second threshold value, and a position output unit that outputs the position on the display unit of the pixel at which the event generation unit generated the event, and the variation data creation unit may create the luminance variation data using the position output by the position output unit.

第１閾値は、現行の輝度値よりも大きい値を示すものであり、第２閾値は、現行の輝度値よりも小さい値を示すものである。前記の構成によれば、輝度値が第１閾値を上回った画素、または輝度値が第２閾値を下回った画素を当該画素の位置とともに出力されるので、所定の範囲を超えて輝度値に変化があった位置を容易に認識でき、輝度変化データを容易に作成できる。 The first threshold indicates a value greater than the current brightness value, and the second threshold indicates a value less than the current brightness value. With this configuration, pixels whose brightness values exceed the first threshold or fall below the second threshold are output along with their positions, making it easy to identify positions where brightness values have changed beyond a specified range, and creating brightness change data.

本発明の一態様に係る学習用データ生成装置では、前記動画像データ作成部が作成する動画像データには背景データが含まれていないものであってもよい。 In a learning data generation device according to one aspect of the present invention, the video data created by the video data creation unit may not include background data.

前記の構成によれば、背景データが含まれないので、動画像データの容量を軽くすることができる。なお、自動ドアの検知センサに学習用データを用いる場合、検知対象である通行者を検知できればよく、背景は不要なので、背景データがなくても不都合は生じない。 With the above configuration, background data is not included, which reduces the volume of video data. Furthermore, when using learning data for automatic door detection sensors, it is sufficient to be able to detect passersby, the target of detection, and background data is not required, so there is no problem if background data is not included.

本発明の一態様に係る学習用データ生成装置では、前記動画像データ作成部が作成する前記動画像データに含まれる前記検知対象は移動体であってもよい。 In the learning data generation device according to one aspect of the present invention, the detection target included in the video data created by the video data creation unit may be a moving object.

前記の構成によれば、検知対象として移動体を検知対象とすることができる。 With the above configuration, moving objects can be detected.

本発明の一態様に係る学習用データ生成装置では、前記動画像データ作成部は、前記検知対象の進行方向、および進行速度の少なくとも何れかが異なる複数の動画像データを生成するものであってもよい。 In a learning data generation device according to one aspect of the present invention, the video data creation unit may generate multiple video data sets in which at least one of the traveling direction and traveling speed of the detection target differs.

前記の構成によれば、様々な方向に、検知対象として、様々な速度で移動する移動体の検知が可能となる。 The above configuration makes it possible to detect moving objects moving in various directions and at various speeds as detection targets.

本発明の一態様に係る学習用データ生成装置では、前記動画像データ作成部は、前記検知対象が存在する環境が異なる複数の動画像データを作成するものであってもよい。 In the learning data generation device according to one aspect of the present invention, the video data creation unit may create multiple video data sets in different environments in which the detection target exists.

前記の構成によれば、検知対象がどのような環境の元にあっても、検知対象を検知すること可能となる。ここで、環境とは、雨、風、霧、雪等のような天候に関するもの、風等により揺れる草木等、天候の影響を受けている物体、蛍光灯、ＬＥＤ、自動車のヘッドライト等の人工的な照明、タバコの煙、湯気等の煙等を含む。 The above configuration makes it possible to detect the target object regardless of the environment in which it is located. Here, the environment includes weather-related factors such as rain, wind, fog, snow, etc., vegetation swaying due to wind, etc., objects affected by weather, artificial lighting such as fluorescent lights, LEDs, and automobile headlights, cigarette smoke, steam, etc., etc.

本発明の一態様に係る学習用データ生成装置では、前記動画像データ作成部が作成する前記動画像データには、前記検知対象が複数含まれるものであってもよい。 In the learning data generation device according to one aspect of the present invention, the video data created by the video data creation unit may include a plurality of the detection targets.

前記の構成によれば、複数の検知対象を検知することが可能となる。 The above configuration makes it possible to detect multiple detection targets.

前記課題を解決するために、本発明の一態様に係る自動ドアシステムは、前記学習用データ生成装置により生成された学習用データを用いて機械学習した学習モデルを用いて、前記検知対象を検知するセンサと、前記センサの検知結果に基づいて開閉される自動ドアと、を含む。 To solve the above problem, an automatic door system according to one aspect of the present invention includes a sensor that detects the detection target using a learning model trained by machine learning using the learning data generated by the learning data generation device, and an automatic door that opens and closes based on the detection results of the sensor.

前記の構成によれば、自動ドアの開けるための検知センサにおける検知の精度を高めることができる。また、処理負荷の軽い学習モデルを用いて検知処理を行うことができる。 This configuration improves the detection accuracy of the detection sensor used to open automatic doors. Furthermore, detection processing can be performed using a learning model with a light processing load.

前記課題を解決するために、本発明の一態様に係る学習用データ生成方法は、検知対象の３次元モデルが仮想空間内で動作する動画像データを作成する動画像データ作成ステップと、前記動画像データを投影した２次元画像における輝度変化データまたは差分データを作成する変化データ作成ステップと、前記輝度変化データまたは前記差分データにおける前記検知対象に対応する領域に、当該検知対象であることを示すラベルを付与するラベリングステップと、前記輝度変化データまたは前記差分データと、前記検知対象に対応する領域および前記ラベリングステップで付与したラベルとの組を学習用データとして生成する学習用データ生成ステップと、を含む。これにより、上述した効果を奏することができる。 To solve the above problem, a training data generation method according to one aspect of the present invention includes a video data generation step of generating video data in which a three-dimensional model of a detection target moves in a virtual space; a variation data generation step of generating brightness change data or difference data in a two-dimensional image projected from the video data; a labeling step of assigning a label indicating that the detection target is located in the brightness change data or difference data to an area corresponding to the detection target; and a training data generation step of generating, as training data, a combination of the brightness change data or difference data, the area corresponding to the detection target, and the label assigned in the labeling step. This makes it possible to achieve the above-mentioned effects.

前記課題を解決するために、本発明の一態様に係る学習済みモデル生成方法は、前記学習用データ生成方法により学習用データを生成する学習用データ生成ステップと、前記学習用データ生成ステップで生成された前記学習用データを用いて機械学習を行う機械学習ステップと、を含む。 To solve the above problem, one aspect of the present invention provides a trained model generation method including a training data generation step of generating training data using the training data generation method, and a machine learning step of performing machine learning using the training data generated in the training data generation step.

本発明の各態様に係る学習用データ生成装置は、コンピュータによって実現してもよく、この場合には、コンピュータを前記学習用データ生成装置が備える各部（ソフトウェア要素）として動作させることにより前記学習用データ生成装置をコンピュータにて実現させる学習用データ生成装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The training data generation device according to each aspect of the present invention may be realized by a computer. In this case, the control program for the training data generation device, which causes the computer to operate as each unit (software element) of the training data generation device, thereby realizing the training data generation device on the computer, and the computer-readable recording medium on which the control program is recorded, also fall within the scope of the present invention.

本発明の一態様によれば、自動的にラベリングすることが可能で、かつ、処理負荷を軽減できるので、効率的に学習用データを作成できるという効果を奏する。また、生成した動画像データを用いてラベリングを行うので、信頼性の高い学習用データを作成できるという効果を奏する。また、処理負荷の軽い学習モデルを作成することができる。 One aspect of the present invention enables automatic labeling and reduces the processing load, resulting in the efficient creation of training data. Furthermore, since labeling is performed using generated video data, highly reliable training data can be created. Furthermore, a learning model with a light processing load can be created.

本発明の実施形態に係る学習用データ生成装置の要部構成を示す機能ブロック図である。1 is a functional block diagram showing a configuration of a main part of a learning data generation device according to an embodiment of the present invention. 学習用データ生成装置で用いる３次元モデルの例を示す図である。FIG. 1 is a diagram illustrating an example of a three-dimensional model used in a learning data generation device. ３次元モデルを仮想空間に配置し、２次元画像とした例を示す図である。FIG. 10 is a diagram showing an example in which a three-dimensional model is placed in a virtual space and converted into a two-dimensional image. ２次元画像の輝度変化データの例を示す図である。FIG. 10 is a diagram illustrating an example of brightness change data of a two-dimensional image. 輝度変化データにおける３次元モデルの領域の設定例を示す図である。FIG. 10 is a diagram showing an example of setting an area of a three-dimensional model in brightness change data. 学習用データの例を示す図である。FIG. 10 is a diagram illustrating an example of learning data. 学習装置の要部構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a main part of a learning device. 自動ドアシステムの要部構成を示す図である。1 is a diagram showing the main configuration of an automatic door system. 学習用データ生成装置における処理の流れを示すフローチャートである。10 is a flowchart showing a processing flow in the learning data generation device. 動画像データの変形例を示す図である。FIG. 10 is a diagram showing a modified example of moving image data. 動画像データの変形例を示す図である。FIG. 10 is a diagram showing a modified example of moving image data. 動画像データの変形例を示す図である。FIG. 10 is a diagram showing a modified example of moving image data. 動画像データの変形例を示す図である。FIG. 10 is a diagram showing a modified example of moving image data.

以下、本発明の一実施形態について、詳細に説明する。本実施形態に係る学習用データ生成装置１００は、学習済みモデル５の学習に用いられる学習用データを生成するものである。本実施形態では、自動ドアシステム１に備えられた検知部２による検知対象の検知に用いられる学習済みモデル５の学習に用いる学習用データを作成する例を挙げて説明するが、学習用データ生成装置１００が生成する学習用データはこれに限られるものではない。自動ドアに限られず、シャッター等のセンサの学習モデルに用いるものであってもよい。 One embodiment of the present invention will be described in detail below. The learning data generation device 100 according to this embodiment generates learning data used to train the trained model 5. In this embodiment, an example will be given of creating learning data used to train the trained model 5 used to detect a detection target by the detection unit 2 provided in the automatic door system 1, but the learning data generated by the learning data generation device 100 is not limited to this. It may also be used as a learning model for sensors other than automatic doors, such as shutters.

図１は、本実施形態に係る学習用データ生成装置１００の構成例を示す機能ブロック図である。図１に示すように、学習用データ生成装置１００は、動画像データ作成部１０、変化データ作成部２０、ラベリング部３０、学習用データ生成部４０、および表示部５０を含む。なお、表示部５０は、学習用データ生成装置１００の外部に備えられていてもよい。 Figure 1 is a functional block diagram showing an example configuration of a training data generation device 100 according to this embodiment. As shown in Figure 1, the training data generation device 100 includes a video data creation unit 10, a variation data creation unit 20, a labeling unit 30, a training data generation unit 40, and a display unit 50. Note that the display unit 50 may be provided external to the training data generation device 100.

動画像データ作成部１０は、検知対象である３次元モデルが仮想空間内で動作する動画像データを作成する。より詳細には、動画像データ作成部１０は、まず、検知対象の３次元モデルを作成する。３次元モデルは公知の技術を用いて作成可能であり、例えば、人物や車両等の移動体である。３次元モデルには、属性を表す情報が関連付けられている。属性の例としては、人であれば性別、年齢、車両であれば種類、色等が挙げられる。 The video data creation unit 10 creates video data in which a 3D model of the detection target moves in a virtual space. More specifically, the video data creation unit 10 first creates a 3D model of the detection target. The 3D model can be created using known technology and is, for example, a moving object such as a person or a vehicle. Information representing attributes is associated with the 3D model. Examples of attributes include gender and age for people, and type and color for vehicles.

図２に、作成する３次元モデルの例を示す。図２では成人男性２０１の例を示している。上述したように３次元モデルは公知の技術を用いて作成可能であり、各種パラメータ（位置、回転、拡大縮小、寸法等）を設定して作成される。 Figure 2 shows an example of a 3D model to be created. Figure 2 shows an example of an adult male 201. As mentioned above, 3D models can be created using known techniques, and are created by setting various parameters (position, rotation, scaling, dimensions, etc.).

そして、動画像データ作成部１０は、作成した３次元モデルを３次元の仮想空間に配置し、３次元モデルの動きを設定する。３次元モデルの動きは、ユーザにより設定される。これにより、３次元の仮想空間内で３次元モデルが移動する動画像データが作成される。なお、動画像データ作成部１０は、３次元モデルの進行方向、および進行速度が異なる複数の動画像データを作成してもよい。また、３次元モデルの周囲状況が異なる動画像データを作成してもよい。周囲状況が異なるとは、３次元モデルの周囲に何もない状況、３次元モデル以外の物により３次元モデルの一部が隠れている状況等のことを言う。 The video data creation unit 10 then places the created 3D model in a 3D virtual space and sets the movement of the 3D model. The movement of the 3D model is set by the user. This creates video data in which the 3D model moves within the 3D virtual space. Note that the video data creation unit 10 may create multiple video data sets in which the 3D model moves in different directions and at different speeds. It may also create video data sets in which the surrounding conditions of the 3D model are different. Different surrounding conditions refer to situations in which there is nothing around the 3D model, situations in which part of the 3D model is hidden by something other than the 3D model, etc.

なお、動画像データ作成部１０は、３次元の仮想空間における背景となる背景データを作成してもよいし、作成しなくてもよい。背景とは、仮想空間における３次元モデル以外の部分を言う。 The video data creation unit 10 may or may not create background data that serves as the background in the three-dimensional virtual space. The background refers to the parts of the virtual space other than the three-dimensional model.

変化データ作成部２０は、動画像データを投影した２次元画像における輝度変化データ、または差分データを作成する。より詳細には、変化データ作成部２０は、まず、３次元の仮想空間における３次元モデルを２次元平面に投影して、２次元物体を描画する。２次元物体の描画が公知の方法を用いて可能であり、どのような方法を用いてもよい。 The variation data creation unit 20 creates brightness variation data or difference data for a two-dimensional image onto which moving image data is projected. More specifically, the variation data creation unit 20 first projects a three-dimensional model in a three-dimensional virtual space onto a two-dimensional plane to render a two-dimensional object. Any known method can be used to render a two-dimensional object.

図３に、２次元平面に２次元物体が描画された２次元画像の例を示す。図３に示す２次元画像の例では、作成された３次元モデルの成人男性２０１の２次元画像２１１が示されている。 Figure 3 shows an example of a two-dimensional image in which a two-dimensional object is drawn on a two-dimensional plane. The example two-dimensional image shown in Figure 3 shows a two-dimensional image 211 of a created three-dimensional model of an adult male 201.

そして、変化データ作成部２０は、３次元モデルが２次元平面に投影された２次元画像２１１における輝度変化データを作成する。図４に輝度変化データの例を示す。図４は、一定時間の画素ごとの輝度変化データを積算して二次元画像化したものである。輝度変化データの詳細については後述するが、図４に示すように、輝度変化データでは、輝度値が第１閾値を上回った画素、および第２閾値を下回った画素の位置が認識可能となっている。なお、輝度変化データに代えて、フレーム間の差分データを作成してもよい。 The variation data creation unit 20 then creates brightness variation data for a two-dimensional image 211 in which the three-dimensional model is projected onto a two-dimensional plane. Figure 4 shows an example of brightness variation data. Figure 4 shows a two-dimensional image created by accumulating brightness variation data for each pixel over a certain period of time. Details of the brightness variation data will be described later, but as shown in Figure 4, the brightness variation data makes it possible to identify the positions of pixels whose brightness values exceed a first threshold and pixels whose brightness values fall below a second threshold. Note that difference data between frames may be created instead of brightness variation data.

ラベリング部３０は、輝度変化データまたは差分データにおける検知対象に対応する領域に、当該検知対象であることを示すラベルを付与する。より詳細には、ラベリング部３０は、描画された２次元物体が存在する領域を算出し、当該物体に関連付けられている属性を用いて、当該領域をラベリングする。例えば、属性が男性となっている３次元モデルを投影した２次元物体であれば、当該２次元物体の存在する領域に「男性」とラベリングする。ラベリングする属性は、対応する３次元モデルに関連付けられている属性の全部であってもよいし、一部であってもよい。また、２次元物体が存在する領域は、当該２次元物体に外接する矩形領域であってもよい。図５に、ラベリング部３０がラベリングする領域を示す。図５に示す例では、成人男性２２１に外接する矩形領域２３１がラベリングされる対象領域として設定される。ここでは、矩形領域２３１が「男性」とラベリングされる。 The labeling unit 30 assigns a label indicating that the region in the brightness change data or difference data corresponding to the detection target is the detection target. More specifically, the labeling unit 30 calculates the region in which the drawn two-dimensional object exists and labels the region using the attribute associated with the object. For example, if the two-dimensional object is a projection of a three-dimensional model with the attribute "male," the region in which the two-dimensional object exists is labeled "male." The attribute to be labeled may be all or some of the attributes associated with the corresponding three-dimensional model. The region in which the two-dimensional object exists may also be a rectangular region circumscribing the two-dimensional object. Figure 5 shows the region to be labeled by the labeling unit 30. In the example shown in Figure 5, a rectangular region 231 circumscribing an adult male 221 is set as the target region to be labeled. Here, rectangular region 231 is labeled "male."

学習用データ生成部４０は、輝度変化データ、または差分データと、検知対象を示す領域、およびラベリング部が付与したラベルとの組を学習用データとして生成する。
図６に、学習用データの例を示す。図６に示すように、学習用データ４００は、輝度変化データまたは差分データと、検知対象領域と、ラベルとの組となっており、輝度変化データまたは差分データが入力データ、検知対象領域とラベルとが出力データとなる学習モデルの学習用データとなる。 The learning data generating unit 40 generates, as learning data, a set of brightness change data or difference data, an area indicating the detection target, and a label assigned by the labeling unit.
An example of training data is shown in Fig. 6. As shown in Fig. 6, training data 400 is a set of brightness change data or difference data, a detection target region, and a label, and serves as training data for a learning model in which the brightness change data or difference data is input data and the detection target region and the label are output data.

表示部５０は、２次元画像を表示する表示装置である。表示部５０にはイベント発生部５１、および位置出力部５２が含まれる。ここでは、表示部５０は、学習用データ生成装置１００に備えられているものとして記載しているが、上述したように、表示部５０は学習用データ生成装置１００の外部にあってもよい。 The display unit 50 is a display device that displays two-dimensional images. The display unit 50 includes an event generation unit 51 and a position output unit 52. Here, the display unit 50 is described as being provided in the training data generation device 100, but as mentioned above, the display unit 50 may be external to the training data generation device 100.

イベント発生部５１は、表示部５０の各画素において、輝度値が第１閾値を上回った場合、または輝度値が第２閾値を下回った場合に、イベントを発生させる。第１閾値、および第２閾値は、基準値に対する割合で設定される。例えば、第１閾値は基準値の＋２０％、第２閾値は基準値の－１５％というように設定される。また、基準値は現在の輝度値であり、イベントが発生するとイベント後の輝度値が基準値となる。 The event generation unit 51 generates an event when the brightness value of each pixel on the display unit 50 exceeds a first threshold value or falls below a second threshold value. The first and second threshold values are set as percentages of a reference value. For example, the first threshold value is set to +20% of the reference value, and the second threshold value is set to -15% of the reference value. The reference value is the current brightness value, and when an event occurs, the brightness value after the event becomes the reference value.

以上のように、２次元画像を表示する表示部５０は、該表示部５０における各画素に、それぞれの画素における輝度値が第１閾値を上回った場合、または第２閾値を下回った場合、イベントを発生させるイベント発生部５１を備えるとともに、該イベント発生部５１が前記イベントを発生した画素の前記表示部５０における位置を出力する位置出力部５２を備える。 As described above, the display unit 50 that displays a two-dimensional image includes an event generation unit 51 that generates an event for each pixel on the display unit 50 when the luminance value of that pixel exceeds a first threshold value or falls below a second threshold value, and a position output unit 52 that outputs the position on the display unit 50 of the pixel at which the event generation unit 51 generated the event.

位置出力部５２は、イベント発生部５１がイベントを発生させた画素の表示部５０における位置を出力する。位置出力部５２は、位置とともに、イベントが発生した時刻を出力してもよい。変化データ作成部２０は、この位置出力部５２が出力した位置を用いて、輝度変化データを作成する。 The position output unit 52 outputs the position on the display unit 50 of the pixel at which the event generation unit 51 generated an event. The position output unit 52 may also output the time the event occurred along with the position. The change data creation unit 20 creates brightness change data using the position output by the position output unit 52.

以上のように、本実施形態に係る学習用データ生成装置１００は、検知対象の３次元モデルが仮想空間内で動作する動画像データを作成する動画像データ作成部１０と、前記動画像データを投影した２次元画像における輝度変化データ、または差分データを作成する変化データ作成部２０と、前記輝度変化データまたは前記差分データにおける前記検知対象に対応する領域に、当該検知対象であることを示すラベルを付与するラベリング部３０と、前記輝度変化データ、または前記差分データと、前記検知対象を示す領域、および前記ラベリング部が付与したラベルとの組を学習用データとして生成する学習用データ生成部４０と、を備える。 As described above, the learning data generation device 100 according to this embodiment comprises a video data generation unit 10 that generates video data in which a three-dimensional model of a detection target moves in a virtual space; a variation data generation unit 20 that generates brightness change data or difference data in a two-dimensional image projected from the video data; a labeling unit 30 that assigns a label indicating that the detection target is an area in the brightness change data or difference data that corresponds to the detection target; and a learning data generation unit 40 that generates, as learning data, a combination of the brightness change data or difference data, the area indicating the detection target, and the label assigned by the labeling unit.

〔学習装置の構成〕
次に、図７を参照して、学習装置２００について説明する。図７に示すように、学習装置２００は、学習部２１０を備える。学習部２１０は、学習用データ生成装置１００が生成した学習用データ４００を用いて、機械学習を行い、学習済みモデル５を生成する。学習済みモデル５は、輝度変化データまたは差分データを入力とし、検知対象領域およびラベルを出力する学習済みモデルである。 [Configuration of the learning device]
Next, the learning device 200 will be described with reference to Fig. 7. As shown in Fig. 7, the learning device 200 includes a learning unit 210. The learning unit 210 performs machine learning using the learning data 400 generated by the learning data generation device 100 to generate a trained model 5. The trained model 5 is a trained model that receives brightness change data or difference data as input and outputs a detection target region and a label.

換言すれば、学習装置２００による学習済みモデル生成方法は、学習用データ生成装置１００による学習用データ生成ステップで生成された学習用データ４００を用いて機械学習を行う機械学習ステップを含む。 In other words, the trained model generation method by the training device 200 includes a machine learning step of performing machine learning using the training data 400 generated in the training data generation step by the training data generation device 100.

〔自動ドアシステムの構成〕
次に、図８を参照して、自動ドアシステム１について説明する。図８に示す自動ドアシステム１は、検知部２による検知対象の検知処理に、学習用データ生成装置１００で作成した学習用データ４００を用いて学習した学習済みモデル５を用いるものである。なお、自動ドアシステム１の駆動等については、公知の技術を用いて可能であるので、ここでは、詳細な説明は割愛する。 [Configuration of automatic door system]
Next, the automatic door system 1 will be described with reference to Figure 8. The automatic door system 1 shown in Figure 8 uses a trained model 5 trained using training data 400 created by a training data generation device 100 for the detection process of the detection target by the detection unit 2. Note that the operation of the automatic door system 1 can be achieved using known technology, so a detailed description will be omitted here.

図８に示すように、自動ドアシステム１は、検知部（センサ）２、駆動部３、自動ドア４、および学習済みモデル５を含む。 As shown in Figure 8, the automatic door system 1 includes a detection unit (sensor) 2, a drive unit 3, an automatic door 4, and a trained model 5.

検知部２は、所定領域における人の存在を検知し、人が存在する場合に、駆動指示を駆動部３に送信するものである。上述したように、本実施形態では、検知部２は、学習済みモデル５を用いて、所定領域に人が存在するか否かを検知する処理を行う。 The detection unit 2 detects the presence of a person in a specified area and, if a person is present, transmits a drive instruction to the drive unit 3. As described above, in this embodiment, the detection unit 2 uses the trained model 5 to perform processing to detect whether or not a person is present in the specified area.

駆動部３は、検知部２の指示により自動ドア４を開ける動作をする。 The drive unit 3 operates to open the automatic door 4 in response to instructions from the detection unit 2.

自動ドア４は、駆動部３により開閉される自動ドアである。 Automatic door 4 is an automatic door that is opened and closed by drive unit 3.

なお、ここでは、学習済みモデル５は、自動ドアシステム１に含まれるものとして記載したが、学習済みモデル５を自動ドアシステム１に含めず、検知部２が外部の学習済みモデル５にアクセスして、学習済みモデル５を用いた推定結果を取得する構成であってもよい。 Note that, although the trained model 5 has been described here as being included in the automatic door system 1, the trained model 5 may not be included in the automatic door system 1, and the detection unit 2 may access an external trained model 5 to obtain estimation results using the trained model 5.

〔学習用データ生成装置における処理の流れ〕
図９は、学習用データ生成装置１００における処理の流れを示すフローチャートである。図９に示すように、学習用データ生成装置１００では、まず、動画像データ作成部１０が３次元モデルを作成する（Ｓ１０１）。 [Processing flow in the learning data generation device]
9 is a flowchart showing the flow of processing in the learning data generation device 100. As shown in FIG. 9, in the learning data generation device 100, first, the video data creation unit 10 creates a three-dimensional model (S101).

次に、動画像データ作成部１０は、作成した３次元モデルを用いて、検知対象の３次元モデルが仮想空間内で動作する動画像データを作成する（Ｓ１０２、動画像データ作成ステップ）。その後、変化データ作成部２０は、３次元の仮想空間における３次元モデルを２次元平面に投影して、２次元物体を描画し、２次元画像を作成する（Ｓ１０３）。そして、動画像データを投影した２次元画像における輝度変化データ、または差分データを作成する（Ｓ１０４、変化データ作成ステップ）。 Next, the video data creation unit 10 uses the created 3D model to create video data in which the 3D model of the detection target moves in a virtual space (S102, video data creation step). The variation data creation unit 20 then projects the 3D model in the 3D virtual space onto a 2D plane, rendering the 2D object and creating a 2D image (S103). Then, it creates brightness variation data or difference data for the 2D image onto which the video data is projected (S104, variation data creation step).

その後、ラベリング部３０は、輝度変化データまたは差分データにおける検知対象に対応する領域に、属性に基づくラベルを付与する（Ｓ１０５、ラベリングステップ）。そして、学習用データ生成部４０は、輝度変化データ、または差分データと、検知対象を示す領域、およびラベリング部３０が付与したラベルとの組を学習用データとして生成する（Ｓ１０６、学習用データ生成ステップ）。 Then, the labeling unit 30 assigns a label based on the attribute to the area in the brightness change data or difference data that corresponds to the detection target (S105, labeling step). The learning data generation unit 40 then generates learning data that is a combination of the brightness change data or difference data, the area indicating the detection target, and the label assigned by the labeling unit 30 (S106, learning data generation step).

以上が、学習用データ生成装置１００における学習用データを生成する処理の流れである。 The above is the process flow for generating training data in the training data generation device 100.

〔変形例〕
次に、図１０～図１３を参照して、学習用データ４００の作成に用いる動画像データの変形例について説明する。上述した例では、動画像データは、作成した３次元モデル仮想空間に配置することにより生成した。この３次元モデルは、図１０に示すように複数であってもよい。図１０に示す例では、３次元モデルとして、２０２～２０４で示す３人の成人男性が仮想空間に配置されている例を示す。３次元モデルは、成人男性に限られるものではなく、女性であってもよいし、子供であってもよい。 [Modification]
Next, modified examples of video data used to create the learning data 400 will be described with reference to Figures 10 to 13. In the above-described example, the video data was generated by placing the created three-dimensional model in a virtual space. There may be multiple three-dimensional models, as shown in Figure 10. In the example shown in Figure 10, three adult males, designated by 202 to 204, are placed in the virtual space as three-dimensional models. The three-dimensional models are not limited to adult males, and may also be females or children.

そして、ラベリング部３０がラベリングを行う場合、成人男性２０２～２０４それぞれに外接する矩形領域２３１～２３４をラベリングする対象領域とする。 When the labeling unit 30 performs labeling, the rectangular areas 231-234 circumscribing each of the adult males 202-204 are designated as the target areas for labeling.

このように、動画像データ作成部１０が作成する動画像データには、３次元モデルが複数含まれるものであってもよい。 In this way, the video data created by the video data creation unit 10 may include multiple 3D models.

また、図１１に示すように３次元モデルは人ではなく車両等であってもよい。図１１に示す例では、３次元モデルが、フォークリフト２０５の例を示す。この場合、ラベリング部３０は、フォークリフト２０５に外接する矩形領域２３５を、ラベリングする対象領域とする。３次元モデルとして、フォークリフト等を用いた学習用データにより学習された学習済みモデルは、フォークリフトが通過するシャッター等の開閉センサに用いることができる。 Also, as shown in Figure 11, the three-dimensional model may not be a person, but a vehicle or the like. In the example shown in Figure 11, the three-dimensional model is a forklift 205. In this case, the labeling unit 30 sets a rectangular area 235 circumscribing the forklift 205 as the target area to be labeled. A trained three-dimensional model trained using training data using a forklift or the like can be used as an opening/closing sensor for a shutter or the like through which a forklift passes.

また、図１２および図１３に示すように、動画像データは、様々な天候が設定されてもよい、図１２は、雨の中に３次元モデル２０６が配置されている例を示す。また、図１３は、風の中に３次元モデル２０７が配置されている例を示す。このように、様々な天候に３次元モデルを配置することにより、様々な天候下において、３次元モデルが存在する領域を認識することが可能となる。 Furthermore, as shown in Figures 12 and 13, various weather conditions may be set for the video data. Figure 12 shows an example in which a three-dimensional model 206 is placed in the rain. Figure 13 shows an example in which a three-dimensional model 207 is placed in the wind. In this way, by placing three-dimensional models in various weather conditions, it becomes possible to recognize the area in which the three-dimensional model exists under various weather conditions.

このように、動画像データ作成部１０は、３次元モデルが存在する天候が異なる複数の動画像データを作成してもよい。また、動画像データ作成部１０は、天候に限らず様々な環境下に３次元モデルが存在する動画像データを作成してもよい。ここで、環境とは、上述した天候のような自然に存在するものに加え、人工的に生成されたものを含む。例えば、蛍光灯、ＬＥＤ、自動車のヘッドライトの照明等の明かり、タバコの煙、湯気等の煙等を含む。また、自然光および人工光が透過、反射、屈折、散乱しているものも含む。 In this way, the video data creation unit 10 may create multiple video data sets in which the 3D model exists under different weather conditions. The video data creation unit 10 may also create video data sets in which the 3D model exists under various environments, not just weather conditions. Here, environment includes not only naturally occurring things such as the weather conditions described above, but also artificially created things. For example, it includes lights such as fluorescent lights, LEDs, and automobile headlights, cigarette smoke, steam, and other smoky elements. It also includes natural light and artificial light that transmits, reflects, refracts, and scatters.

このように、環境とは、自然的に発生するもの、および人工的に発生するものを含む。また、換言すれば、環境とは、（１）雪、草木などの固体、（２）雨などの液体、（３）風、霧、煙などの気体などの、３次元モデルに対し影響を及ぼすものということができる。 In this way, the environment includes both naturally occurring and artificially occurring things. In other words, the environment can be defined as things that affect a three-dimensional model, such as (1) solids such as snow and vegetation, (2) liquids such as rain, and (3) gases such as wind, fog, and smoke.

〔ソフトウェアによる実現例〕
学習用データ生成装置１００（以下、「装置」と呼ぶ）の機能は、当該装置としてコンピュータを機能させるためのプログラムであって、当該装置の各制御ブロック（特に、動画像データ作成部１０、変化データ作成部２０、ラベリング部３０、学習用データ生成部４０）としてコンピュータを機能させるためのプログラムにより実現することができる。 [Software implementation example]
The functions of the learning data generation device 100 (hereinafter referred to as the "device") can be realized by a program that causes a computer to function as the device, and a program that causes a computer to function as each control block of the device (in particular, the video data creation unit 10, the variation data creation unit 20, the labeling unit 30, and the learning data generation unit 40).

この場合、上記装置は、上記プログラムを実行するためのハードウェアとして、少なくとも１つの制御装置（例えばプロセッサ）と少なくとも１つの記憶装置（例えばメモリ）を有するコンピュータを備えている。この制御装置と記憶装置により上記プログラムを実行することにより、上記各実施形態で説明した各機能が実現される。 In this case, the device includes a computer having at least one control device (e.g., a processor) and at least one storage device (e.g., a memory) as hardware for executing the program. The functions described in each of the above embodiments are realized by executing the program using this control device and storage device.

上記プログラムは、一時的ではなく、コンピュータ読み取り可能な、１または複数の記録媒体に記録されていてもよい。この記録媒体は、上記装置が備えていてもよいし、備えていなくてもよい。後者の場合、上記プログラムは、有線または無線の任意の伝送媒体を介して上記装置に供給されてもよい。 The above program may be stored non-transitory on one or more computer-readable storage media. These storage media may or may not be included in the device. In the latter case, the program may be supplied to the device via any wired or wireless transmission medium.

また、上記各制御ブロックの機能の一部または全部は、論理回路により実現することも可能である。例えば、上記各制御ブロックとして機能する論理回路が形成された集積回路も本発明の範疇に含まれる。この他にも、例えば量子コンピュータにより上記各制御ブロックの機能を実現することも可能である。 Furthermore, some or all of the functions of each of the above control blocks can be realized by logic circuits. For example, integrated circuits incorporating logic circuits that function as each of the above control blocks are also included in the scope of the present invention. In addition, the functions of each of the above control blocks can also be realized by, for example, a quantum computer.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope of the claims. Embodiments obtained by appropriately combining the technical means disclosed in different embodiments are also included in the technical scope of the present invention.

１自動ドアシステム
２検知部（センサ）
３駆動部
４自動ドア
５学習済みモデル
１０動画像データ作成部
２０変化データ作成部
３０ラベリング部
４０学習用データ生成部
５０表示部
５１イベント発生部
５２位置出力部
１００学習用データ生成装置
２００学習装置
２１０学習部
４００学習用データ 1 Automatic door system 2 Detection unit (sensor)
3 Driving unit 4 Automatic door 5 Trained model 10 Video image data creation unit 20 Change data creation unit 30 Labeling unit 40 Training data generation unit 50 Display unit 51 Event generation unit 52 Position output unit 100 Training data generation device 200 Training device 210 Training unit 400 Training data

Claims

a moving image data creating unit that creates moving image data in which a three-dimensional model of the detection target moves in a virtual space;
a variation data generating unit that generates brightness variation data or difference data between frames obtained by comparing brightness values with a threshold value in a two-dimensional image projected from the moving image data;
a labeling unit that assigns a label indicating that the detection target is a region in the brightness change data or the difference data that corresponds to the detection target;
a learning data generation unit that generates, as learning data, a set of the brightness change data or the difference data, an area corresponding to the detection object, and a label assigned by the labeling unit.

the display unit that displays the two-dimensional image includes an event generation unit that generates an event in each pixel on the display unit when a luminance value of the pixel exceeds a first threshold value or falls below a second threshold value, and a position output unit that outputs a position on the display unit of the pixel at which the event generation unit has generated the event;
The learning data generating device according to claim 1 , wherein the variation data generating unit generates the luminance variation data using the positions output by the position output unit.

The learning data generation device according to claim 1 or 2, wherein the video data created by the video data creation unit does not include background data.

The learning data generation device according to any one of claims 1 to 3, wherein the detection target included in the video data created by the video data creation unit is a moving object.

The learning data generation device of claim 4, wherein the video data creation unit generates multiple video data sets in which at least one of the traveling direction and traveling speed of the detection target differs.

The learning data generation device according to claim 1 or 2, wherein the video data creation unit creates multiple video data sets in different environments in which the detection target exists.

The learning data generation device according to any one of claims 1 to 6, wherein the video data created by the video data creation unit includes a plurality of the detection targets.

a sensor that detects the detection target using a learning model that is machine-learned using the learning data generated by the learning data generation device according to any one of claims 1 to 7;
and an automatic door that opens and closes based on the detection results of the sensor.

a moving image data creation step of creating moving image data in which the three-dimensional model of the detection target moves in a virtual space;
a variation data creation step of creating brightness variation data or difference data between frames obtained by comparing brightness values with a threshold value in a two-dimensional image projected from the moving image data;
a labeling step of assigning a label indicating that the detection target is a region in the brightness change data or the difference data corresponding to the detection target;
a learning data generation step of generating, as learning data, a set of the brightness change data or the difference data, an area corresponding to the detection target, and a label assigned in the labeling step.

a learning data generating step of generating learning data by the learning data generating method according to claim 9;
a machine learning step of performing machine learning using the learning data generated in the learning data generation step.

A control program for causing a computer to function as the learning data generation device described in claim 1, the control program causing the computer to function as the video data creation unit, the variation data creation unit, the labeling unit, and the learning data generation unit.

A computer-readable recording medium having the control program described in claim 11 recorded thereon.

a moving image data creating unit that creates moving image data in which a three-dimensional model of the detection target moves in a virtual space;
a variation data generating unit that generates brightness variation data or difference data in a two-dimensional image projected from the moving image data;
a labeling unit that assigns a label indicating that the detection target is a region in the brightness change data or the difference data that corresponds to the detection target;
a learning data generation unit that generates, as learning data, a set of the luminance change data or the difference data, an area corresponding to the detection target, and a label assigned by the labeling unit;
the display unit that displays the two-dimensional image includes an event generation unit that generates an event in each pixel on the display unit when a luminance value of the pixel exceeds a first threshold value or falls below a second threshold value, and a position output unit that outputs a position on the display unit of the pixel at which the event generation unit has generated the event;
The variation data creation unit creates the luminance variation data using the position output by the position output unit.