JP7823451B2

JP7823451B2 - Image generation device, image generation method, and image generation program

Info

Publication number: JP7823451B2
Application number: JP2022045669A
Authority: JP
Inventors: 重男安藤
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2026-03-04
Anticipated expiration: 2042-03-22
Also published as: JP2023139901A

Description

本開示は、カメラ画像と物体センサの情報とから画像を生成する画像生成装置、画像生成方法、および画像生成プログラムに関する。 This disclosure relates to an image generation device, an image generation method, and an image generation program that generate an image from a camera image and information from an object sensor.

近年、カメラ画像を用いた人や車両等の監視システムにおいて、雨や雪の影響で視界が悪くなると監視対象の認識精度が低下するなど、カメラ画像の耐環境性の弱さを補完するため、他のセンサから得られるデータをカメラ画像のデータと組み合わせることにより高度な認識結果を得るセンサフュージョン技術が用いられている。 In recent years, in surveillance systems that use camera images to monitor people, vehicles, etc., the accuracy of recognizing monitored objects decreases when visibility is poor due to rain or snow. To compensate for this, sensor fusion technology is being used to obtain advanced recognition results by combining data obtained from other sensors with camera image data.

例えば、特許文献１には、水平角度分解能などに優れたカメラと、悪天候時の認識精度などに優れたミリ波レーダとを用いて対象物を認識することにより、対象物の認識精度を向上させる技術が開示されている。 For example, Patent Document 1 discloses technology that improves the accuracy of object recognition by using a camera with excellent horizontal angular resolution and millimeter-wave radar with excellent recognition accuracy in bad weather.

ＷＯ２０２０－１１６１９４WO2020-116194

しかしながら、上記特許文献１に記載の技術では、カメラ画像中の監視対象である物体を枠で囲むことにより物体の位置を特定するため、例えば、カメラ画像の状態が悪い場合においては、視覚的な情報が極端に少なく、監視対象の種別や状態などが判別できず有効な監視が難しいという課題があった。 However, the technology described in Patent Document 1 identifies the position of the object being monitored by surrounding it with a frame in the camera image. Therefore, when the quality of the camera image is poor, for example, there is extremely little visual information available, making it difficult to determine the type and condition of the object being monitored, making effective monitoring difficult.

本開示は、上記の課題を解決するためになされたものであり、カメラ画像の状態が悪い場合においても、物体を疑似的に表す画像を得ることを目的とする。 This disclosure has been made to solve the above problem, and aims to obtain an image that realistically represents an object even when the quality of the camera image is poor.

上述した課題を解決するために、本開示に係る画像生成装置は、カメラ画像取得部と、センサ情報取得部と、学習済みモデル記憶部と、画像生成部とを備える。カメラ画像取得部は、撮影対象の視覚情報であるカメラ画像を取得する。センサ情報取得部は、撮影対象を物体センサによって検知して得られたセンサ情報を取得する。学習済みモデル記憶部は、カメラ画像と、センサ情報とから、カメラ画像と、撮影対象を疑似的に表した視覚情報であるセンサ画像とを重畳した合成画像である疑似カメラ画像を生成するための学習済みモデルを記憶する。画像生成部は、カメラ情報取得部で取得されたカメラ画像と、センサ情報取得部で取得されたセンサ情報とから、学習済みモデルを用いて疑似カメラ画像を出力する。鮮明度判定部は、カメラ画像取得部で取得されたカメラ画像の鮮明度を判定する。出力切替部は、鮮明度に基づいて、カメラ画像または疑似カメラ画像のいずれかを出力する。
In order to solve the above-described problems, an image generation device according to the present disclosure includes a camera image acquisition unit, a sensor information acquisition unit, a trained model storage unit, and an image generation unit. The camera image acquisition unit acquires a camera image, which is visual information of a subject to be photographed. The sensor information acquisition unit acquires sensor information obtained by detecting the subject to be photographed using an object sensor. The trained model storage unit stores a trained model for generating a pseudo camera image, which is a composite image obtained by superimposing the camera image and the sensor image, which is visual information that simulates the subject to be photographed, from the camera image and the sensor information. The image generation unit outputs a pseudo camera image using the trained model from the camera image acquired by the camera information acquisition unit and the sensor information acquired by the sensor information acquisition unit. The sharpness determination unit determines the sharpness of the camera image acquired by the camera image acquisition unit. The output switching unit outputs either the camera image or the pseudo camera image based on the sharpness.

本開示によれば、カメラ画像の状態が悪い場合においても、物体を疑似的に表す画像を得ることができる、という効果を奏する。 This disclosure has the advantage that it is possible to obtain an image that simulates an object even when the quality of the camera image is poor.

実施の形態１に係る監視システムの構成を示す全体図である。1 is an overall diagram showing the configuration of a monitoring system according to a first embodiment; 実施の形態１に係る画像生成装置の構成を示す図である。1 is a diagram illustrating a configuration of an image generating device according to a first embodiment. 実施の形態１に係る疑似カメラ画像の生成を模擬的に表した図である。10A and 10B are diagrams illustrating the generation of a pseudo camera image according to the first embodiment; 実施の形態１に係る疑似カメラ画像の生成であって、学習が十分でない場合を模擬的に表した図である。10A and 10B are diagrams illustrating the generation of a pseudo camera image according to the first embodiment in a case where learning is insufficient; 実施の形態１に係る学習部の構成例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of a learning unit according to the first embodiment; 実施の形態１に係る学習部のデータの流れの例を表す図である。FIG. 4 is a diagram illustrating an example of a data flow in a learning unit according to the first embodiment. 実施の形態１に係る学習部において識別部を学習させる場合のデータの流れの例を表す図である。10 is a diagram illustrating an example of a data flow when a learning unit according to the first embodiment trains a discrimination unit. FIG. 実施の形態１に係る学習部において学習用生成部を学習させる場合のデータの流れの例を表す図である。10 is a diagram illustrating an example of a data flow when a learning generation unit is trained in a learning unit according to the first embodiment. FIG. 実施の形態１に係る画像生成装置の処理動作を示すフローチャートである。4 is a flowchart showing a processing operation of the image generating device according to the first embodiment. 実施の形態１に係る画像生成装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the image generating device according to the first embodiment. 実施の形態１に係る画像生成装置の構成の別の例を示す図である。FIG. 10 is a diagram illustrating another example of the configuration of the image generating device according to the first embodiment. 実施の形態１に係る監視システムの構成の別の例を示す全体図である。FIG. 10 is an overall view showing another example of the configuration of the monitoring system according to the first embodiment. 実施の形態２に係る画像生成装置の構成を示す図である。FIG. 10 is a diagram illustrating a configuration of an image generating device according to a second embodiment. 実施の形態２に係る学習部のデータの流れの例を表す図である。FIG. 10 is a diagram illustrating an example of a data flow in a learning unit according to the second embodiment. 実施の形態２に係る画像生成装置の処理動作を示すフローチャートである。10 is a flowchart showing a processing operation of the image generating device according to the second embodiment. 実施の形態２に係る画像生成装置の構成の別の例を示す図である。FIG. 10 is a diagram illustrating another example of the configuration of an image generating device according to the second embodiment.

実施の形態１．
実施の形態１に係る画像生成装置１について説明する。図１は、実施の形態１に係る画像生成装置１が含まれる監視システム１００の一例を示す。 Embodiment 1.
An image generating device 1 according to the first embodiment will be described below. Fig. 1 shows an example of a monitoring system 100 including the image generating device 1 according to the first embodiment.

監視システム１００は、画像生成装置１と、監視部２と、表示部３とから構成される。 The monitoring system 100 consists of an image generation device 1, a monitoring unit 2, and a display unit 3.

監視部２は、カメラ４および物体センサ５を備え、人や車などの撮影対象が含まれる監視領域をカメラ４および物体センサ５で検知して得たカメラ画像６およびセンサ情報７を画像生成装置１に出力する。カメラ４と物体センサ５は、検知範囲の少なくとも一部が重複するように配置され、重複する検知範囲が監視システム１００の監視領域となる。 The monitoring unit 2 includes a camera 4 and an object sensor 5, and outputs camera images 6 and sensor information 7 obtained by detecting a monitoring area including subjects such as people and vehicles using the camera 4 and object sensor 5 to the image generation device 1. The camera 4 and object sensor 5 are positioned so that at least a portion of their detection ranges overlap, and the overlapping detection ranges form the monitoring area of the monitoring system 100.

カメラ４は、例えば、デジタルカメラでも良いし、ビデオカメラでも良い。物体センサ５は、電磁波や超音波等を用いた非接触のセンサである。以下の説明においては、物体センサ５としてミリ波センサを用いる。ミリ波センサは、天候による視界不良や照明による逆光などによらず物体を検出することができるなど、耐環境性に優れる。そのため、カメラ画像の情報と組み合わせてセンサフュージョンを行うことで、カメラ画像を撮影する際の環境が悪くても撮影対象である物体の情報を得ることができる。また、物体センサ５の別の例として、ＬｉＤＡＲ（ＬａｓｅｒＩｍａｇｉｎｇＤｅｔｅｃｔｉｏｎａｎｄＲａｎｇｉｎｇ）を用いても良い。画像生成装置１は、監視部２からカメラ画像６およびセンサ情報７を受けとり、以下に説明する疑似カメラ画像１４を生成して表示部３へ出力する。 The camera 4 may be, for example, a digital camera or a video camera. The object sensor 5 is a non-contact sensor that uses electromagnetic waves, ultrasound, or the like. In the following description, a millimeter-wave sensor is used as the object sensor 5. Millimeter-wave sensors have excellent environmental resistance, such as the ability to detect objects regardless of poor visibility due to weather or backlighting due to lighting. Therefore, by combining this with camera image information and performing sensor fusion, it is possible to obtain information about the object being photographed even when the environment when the camera image is captured is poor. Another example of the object sensor 5 is LiDAR (Laser Imaging Detection and Ranging). The image generation device 1 receives the camera image 6 and sensor information 7 from the monitoring unit 2, generates a pseudo camera image 14 (described below), and outputs it to the display unit 3.

表示部３は、画像生成装置１で生成された疑似カメラ画像１４を表示する。表示部３は、例えば、液晶ディスプレイまたは有機ＥＬ（ＯｒｇａｎｉｃＥｌｅｃｔｒｏＬｕｍｉｎｅｓｓｅｎｃｅ）ディスプレイである。監視システム１００の使用者は、例えば、表示部３の画面に表示された疑似カメラ画像１４を目視することにより、監視領域の状況を把握することができる。 The display unit 3 displays the pseudo camera image 14 generated by the image generation device 1. The display unit 3 is, for example, a liquid crystal display or an organic electroluminescence (EL) display. A user of the surveillance system 100 can grasp the situation in the surveillance area by, for example, visually viewing the pseudo camera image 14 displayed on the screen of the display unit 3.

以下に、画像生成装置１の構成について説明する。図２は、実施の形態１に係る画像生成装置１の構成を示す図である。画像生成装置１は、画像生成部１０、カメラ画像取得部１１、センサ情報取得部１２、学習済みモデル記憶部１３から構成される。 The configuration of the image generation device 1 is described below. Figure 2 is a diagram showing the configuration of the image generation device 1 according to embodiment 1. The image generation device 1 is composed of an image generation unit 10, a camera image acquisition unit 11, a sensor information acquisition unit 12, and a trained model storage unit 13.

カメラ画像取得部１１は、人や車などの撮影対象をカメラで撮影して得られた視覚情報であるカメラ画像６を取得する。視覚情報とは、監視システム１００の使用者が視認することが可能な形態で表される情報であり、特に、画像で表される情報を指す。使用者は、表示部３に表示される視覚情報を視認することにより、撮影対象の外形、色などを識別することができる。 The camera image acquisition unit 11 acquires camera images 6, which are visual information obtained by capturing images of subjects such as people and cars with a camera. Visual information is information presented in a form that can be visually recognized by the user of the surveillance system 100, and in particular refers to information presented as an image. By visually viewing the visual information displayed on the display unit 3, the user can identify the shape, color, etc. of the subject.

センサ情報取得部１２は、人や車などの撮影対象を物体センサ５によって検知して得られたセンサ情報７を取得する。センサ情報７は、例えば、検出した物体の位置を示す情報であり、具体的には、方向、距離、検知範囲内における座標の情報でもよいし、これらの情報を視覚情報に加工して画像化したものでもよい。なお、なお、カメラ４による撮影と物体センサ５による検出は、同時に行われるが、画像生成装置１の処理に影響を及ぼさない範囲であれば、時間的または空間的にずれが生じても差し支えない。また、カメラ画像６とセンサ情報７は、予め取得した時間的または空間的なずれ情報に基づいて時間合わせ、位置あわせがされる。 The sensor information acquisition unit 12 acquires sensor information 7 obtained by detecting objects to be photographed, such as people or vehicles, using the object sensor 5. The sensor information 7 is, for example, information indicating the position of the detected object, and specifically may be information on direction, distance, and coordinates within the detection range, or this information may be processed into visual information and displayed as an image. Note that while the photography by the camera 4 and the detection by the object sensor 5 are performed simultaneously, a time or spatial delay is acceptable as long as it does not affect the processing of the image generation device 1. The camera image 6 and sensor information 7 are time-synchronized and positioned based on previously acquired time or spatial delay information.

画像生成部１０は、カメラ画像取得部１１で取得したカメラ画像６と、センサ情報取得部１２で取得したセンサ情報７とから、以下に示す学習済みモデル１３ａを用いて撮影対象の物体を疑似的に表した疑似カメラ画像１４を合成画像として出力する。画像生成部１０は、カメラ画像６が、撮影対象に関する視覚情報の一部または全部が欠落している画像であっても、疑似カメラ画像１４を出力することができる。撮影対象に関する視覚情報の一部または全部が欠落している画像とは、例えば、周辺の照度が不十分、逆光、雨や霧などの悪条件下で撮影されたことにより、撮影対象が不鮮明で、撮影対象と背景との区別が付き難い画像であり、その画像だけでは撮影対象の外形、色などを識別することが困難な画像を指す。 The image generation unit 10 outputs a pseudo camera image 14, which simulates the object being photographed, as a composite image from the camera image 6 acquired by the camera image acquisition unit 11 and the sensor information 7 acquired by the sensor information acquisition unit 12 using the trained model 13a described below. The image generation unit 10 can output a pseudo camera image 14 even if the camera image 6 is an image that lacks some or all of the visual information about the object being photographed. An image that lacks some or all of the visual information about the object being photographed refers to an image that was taken under adverse conditions, such as insufficient ambient illumination, backlight, rain, or fog, making the object unclear and difficult to distinguish from the background, and making it difficult to identify the shape, color, etc. of the object from the image alone.

学習済みモデル記憶部１３は、カメラ画像６と、センサ情報７とから、撮影対象の物体を疑似的に表した合成画像である疑似カメラ画像１４を出力するように予め学習された学習済みモデル１３ａを記憶する。学習済みモデル１３ａは、カメラ画像６とセンサ情報７から、カメラ画像６の撮影範囲における撮影対象を疑似的に表した視覚情報であるセンサ画像を生成する学習モデルである。センサ画像は、例えば、カメラ画像６に残っている撮影対象の視覚情報と、センサ情報７に含まれる撮影対象の位置を示す情報とから推測される撮影対象の外形など特徴を表す画像である。なお、センサ情報７は、撮影対象の外形などの視覚情報が認識できる情報ではないが、同じ撮影対象が比較的鮮明に映ったカメラ画像６と組み合わせて学習することで、物体の視覚情報を疑似的に表した情報であるセンサ画像を生成する学習済みモデル１３ａを得ることができる。疑似カメラ画像１４は、カメラ画像６と上記のセンサ画像とを重畳した画像である。 The trained model storage unit 13 stores a trained model 13a that has been trained in advance to output a pseudo camera image 14, which is a composite image that simulates the object being photographed, from the camera image 6 and the sensor information 7. The trained model 13a is a trained model that generates a sensor image, which is visual information that simulates the object being photographed within the shooting range of the camera image 6, from the camera image 6 and the sensor information 7. The sensor image is, for example, an image that represents the characteristics of the object, such as its outer shape, inferred from the visual information of the object remaining in the camera image 6 and the information indicating the position of the object contained in the sensor information 7. Note that although the sensor information 7 does not provide recognizable visual information such as the outer shape of the object, by training it in combination with a camera image 6 that captures the same object relatively clearly, it is possible to obtain a trained model 13a that generates a sensor image that simulates the visual information of the object. The pseudo camera image 14 is an image in which the camera image 6 and the above-mentioned sensor image are superimposed.

図３は、照明等の影響でカメラ画像６に映った撮影対象が不鮮明な場合の疑似カメラ画像１４の生成を模擬的に表した図である。このときのカメラ画像６とセンサ情報７とを画像生成装置１に入力すると、疑似カメラ画像１４が生成される。カメラ画像６中の物体Ａは撮影対象である人を表している。センサ情報７中の複数の点Ｂは物体センサ５によって検知した情報であり、撮影対象である人を検知した位置を表している。画像生成装置１は、これら二つの情報と、学習済みモデル１３ａとを用いて、人の姿を疑似的に表した疑似カメラ画像１４を出力する。疑似カメラ画像１４中の物体Ｃは、センサ情報７から学習済みモデル１３ａを用いて作成したセンサ画像であり、これをカメラ画像６に重畳することで疑似カメラ画像１４が生成される。 Figure 3 is a diagram illustrating the generation of a pseudo camera image 14 when the subject captured in the camera image 6 is unclear due to lighting or other factors. When the camera image 6 and sensor information 7 are input into the image generation device 1, a pseudo camera image 14 is generated. Object A in the camera image 6 represents the person being photographed. Multiple points B in the sensor information 7 are information detected by the object sensor 5 and represent the positions where the person being photographed was detected. Using these two pieces of information and the trained model 13a, the image generation device 1 outputs a pseudo camera image 14 that simulates the appearance of the person. Object C in the pseudo camera image 14 is a sensor image created from the sensor information 7 using the trained model 13a, and the pseudo camera image 14 is generated by superimposing this on the camera image 6.

図３においては、カメラ画像６中の撮影対象の一部が認識できない例を挙げたが、周囲が暗いなどの理由によりカメラ画像６中の撮影対象が見えない状態や、薄く見えている状態でも良い。この場合、学習済みモデル１３ａは、それぞれの状況に応じたカメラ画像６とセンサ情報７とを用いた学習をする必要がある。また、図３における疑似カメラ画像１４は鮮明な人の姿を表しているが、学習済みモデル１３ａの学習が充分でない場合や、学習内容とカメラ画像６の状況が異なる場合などにおいては、図４に示す様な輪郭のみを強調した画像となることもある。 Figure 3 shows an example in which part of the subject in the camera image 6 cannot be recognized, but it is also possible for the subject to be invisible or to appear faint in the camera image 6 due to dark surroundings or other reasons. In this case, the trained model 13a must be trained using the camera image 6 and sensor information 7 according to each situation. Also, while the pseudo camera image 14 in Figure 3 shows a clear human figure, if the trained model 13a has not trained sufficiently or if the learning content and the situation in the camera image 6 differ, the image may have only the contours emphasized, as shown in Figure 4.

学習済みモデル１３ａは、特に、公知の技術である敵対的生成ネットワーク（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ：ＧＡＮ、以下ＧＡＮと記載）を用いて学習することで、使用者が撮影対象をより明確に認識できる疑似カメラ画像１４を出力することができる。以下では、学習済みモデル１３ａをＧＡＮによって得る場合について説明する。 The trained model 13a can be trained using Generative Adversarial Networks (GAN), a well-known technology, to output a pseudo camera image 14 that allows the user to more clearly recognize the subject. The following describes the case where the trained model 13a is obtained using GAN.

図５は、学習済みモデル１３ａをＧＡＮによって得る学習部２０の構成を示す図である。学習部２０は、識別部２１、学習用生成部２２、識別モデル記憶部２３、生成モデル記憶部２４、学習用データ取得部２５、および劣化処理部２６からなる。本開示においては、識別部２１がＧＡＮにおける「Ｄｉｓｃｒｅｍｉｎａｔｏｒ」のニューラルネットワーク、学習用生成部２２がＧＡＮにおける「Ｇｅｎｅｒａｔｏｒ」のニューラルネットワークとしてそれぞれ学習する。 Figure 5 shows the configuration of the learning unit 20, which obtains the trained model 13a using a GAN. The learning unit 20 is composed of a discrimination unit 21, a training generation unit 22, a discriminative model storage unit 23, a generative model storage unit 24, a training data acquisition unit 25, and a degradation processing unit 26. In this disclosure, the discrimination unit 21 trains as a "Discreminator" neural network in the GAN, and the training generation unit 22 trains as a "Generator" neural network in the GAN.

学習用データ取得部２５は、撮影対象が良好に撮影されているカメラ画像である学習用カメラ画像２７と、学習用カメラ画像２７の撮影と同時に撮影対象を検出したセンサ情報である学習用センサ情報２８とを含む学習用画像データを図示しない外部の記憶部から取得する。学習用画像データは、学習部２０における学習に用いられる。 The learning data acquisition unit 25 acquires learning image data from an external storage unit (not shown), including learning camera images 27, which are camera images in which the subject is clearly captured, and learning sensor information 28, which is sensor information that detects the subject simultaneously with capturing the learning camera images 27. The learning image data is used for learning in the learning unit 20.

劣化処理部２６は、学習用画像データに含まれる学習用カメラ画像２７に対して画質を劣化させる処理を行い、劣化学習用カメラ画像２７ａを生成する。これにより、撮影対象が良好に撮影されている画像を加工して劣化させることにより、悪条件下で撮影された画像を模擬した画像を生成することができる。画質を劣化させる処理は、例えば、輝度を下げる処理でもよいし、解像度を低下させる処理でもよいし、コントラストを下げる処理でもよいし、ランダムノイズをのせる処理でもよい。画質を劣化させる処理は、学習用カメラ画像２７内の撮影対象に関する視覚情報の少なくとも一部が欠落し、撮影対象が使用者に認識され難くなるような処理であれば特に限定されない。また、画質を劣化させる処理は、学習用カメラ画像２７の全体に対して行ってもよいし、撮影対象の領域に対して行ってもよい。劣化処理部２６で生成された劣化学習用カメラ画像２７ａ、および学習用センサ情報２８は学習用生成部２２へ出力され、学習用カメラ画像２７は、識別部２１へ出力される。 The degradation processing unit 26 performs a process to degrade the image quality of the training camera image 27 included in the training image data, generating a degraded training camera image 27a. This process degrades an image in which the subject is photographed well, thereby generating an image that simulates an image photographed under adverse conditions. The process to degrade the image quality may be, for example, a process to reduce brightness, a process to reduce resolution, a process to reduce contrast, or a process to add random noise. The process to degrade the image quality is not particularly limited as long as it removes at least a portion of the visual information related to the subject in the training camera image 27, making the subject less recognizable to the user. Furthermore, the process to degrade the image quality may be performed on the entire training camera image 27, or on the area of the subject. The degraded training camera image 27a and training sensor information 28 generated by the degradation processing unit 26 are output to the training generation unit 22, and the training camera image 27 is output to the identification unit 21.

識別部２１は、比較的良好なカメラ画像が得られている状態の学習用カメラ画像２７を「正」、学習用生成部２２で出力される学習用疑似カメラ画像２９を「偽」と識別する。学習用生成部２２は、学習用カメラ画像２７を劣化させた劣化学習用カメラ画像２７ａと学習用センサ情報２８とから、学習用疑似カメラ画像２９を生成する。 The identification unit 21 classifies training camera images 27, which are relatively good camera images, as "true," and training pseudo camera images 29 output by the training generation unit 22 as "false." The training generation unit 22 generates training pseudo camera images 29 from degraded training camera images 27a, which are obtained by degrading the training camera images 27, and training sensor information 28.

識別モデル記憶部２３は、識別部２１が、学習用カメラ画像２７と、学習用生成部２２で生成される学習用疑似カメラ画像２９とを識別するための識別モデル２３ａを記憶する。また、生成モデル記憶部２４は、劣化学習用カメラ画像２７ａと、学習用センサ情報２８とから、学習用カメラ画像２７の画質を劣化させる処理で失われた撮影対象の視覚情報を模擬的に表す疑似的な視覚情報である学習用センサ画像を生成し、学習用センサ画像を劣化学習用カメラ画像２７ａに重畳した合成画像である学習用疑似カメラ画像２９を出力するための生成モデル２４ａを記憶する。 The discriminative model storage unit 23 stores a discriminative model 23a that the discrimination unit 21 uses to discriminate between the training camera image 27 and the pseudo training camera image 29 generated by the training generation unit 22. The generative model storage unit 24 also stores a generative model 24a that generates a training sensor image, which is pseudo visual information that simulates the visual information of the subject that was lost in the process of degrading the image quality of the training camera image 27, from the degraded training camera image 27a and the training sensor information 28, and outputs a synthetic image in which the training sensor image is superimposed on the degraded training camera image 27a.

次に、学習部２０がＧＡＮにより学習済みモデル１３ａを生成する学習処理について説明する。ＧＡＮの学習処理においては、「Ｄｉｓｃｒｅｍｉｎａｔｏｒ」である識別部２１と、「Ｇｅｎｅｒａｔｏｒ」である学習用生成部２２とを競合的に学習させる。 Next, we will explain the learning process in which the learning unit 20 generates a trained model 13a using a GAN. In the GAN learning process, the identification unit 21, which is the "Discriminator," and the learning generation unit 22, which is the "Generator," are trained competitively.

識別部２１の学習においては、学習用カメラ画像２７と、学習用疑似カメラ画像２９とを識別する精度を向上させるように、識別モデル２３ａの学習を行う。以降の説明においては、学習用カメラ画像２７を「正」、学習用疑似カメラ画像２９を「偽」とし、識別モデル２３ａは、入力された画像が「正」であるか「偽」であるかを識別するように学習されるものとする。なお、識別モデル２３ａは、学習用カメラ画像２７と学習用疑似カメラ画像２９が同時に入力され、どちらが「正」であり、どちらが「偽」であるかを識別するような構成であってもよい。 When the identification unit 21 is trained, the identification model 23a is trained to improve the accuracy of distinguishing between the training camera image 27 and the pseudo training camera image 29. In the following description, the training camera image 27 is referred to as "true" and the pseudo training camera image 29 as "false," and the identification model 23a is trained to distinguish whether the input image is "true" or "false." Note that the identification model 23a may also be configured to receive the training camera image 27 and the pseudo training camera image 29 simultaneously and distinguish which is "true" and which is "false."

学習用生成部２２の学習においては、学習用生成部２２が、識別部２１の識別において学習用疑似カメラ画像２９を誤って「正」と識別するような学習用疑似カメラ画像２９を生成する精度を向上させるように、生成モデル２４ａの学習を行う。 When the learning generation unit 22 is trained, the generative model 24a is trained so as to improve the accuracy of generating training pseudo camera images 29 that would otherwise be mistakenly identified as "true" by the identification unit 21.

図６は、学習部２０がＧＡＮの学習を行う際のデータの流れを模擬的に表した図である。学習部２０は、識別部２１が状態の良い画像である学習用カメラ画像２７と、学習用疑似カメラ画像２９とを識別する精度を向上させ、かつ、学習用生成部２２が出力する学習用疑似カメラ画像２９が、識別部２１の識別を見誤らせ、「正」と識別させる精度を向上させる。画像の劣化は劣化処理部２６を用いる。以上により識別部２１と学習用生成部２２とを競合的に学習させることで、学習用生成部２２はより良好カメラ画像に近い学習用疑似カメラ画像２９を出力するように生成モデル２４ａを生成することができる。 Figure 6 is a diagram showing a simulation of the data flow when the learning unit 20 learns a GAN. The learning unit 20 improves the accuracy with which the identification unit 21 distinguishes between training camera images 27, which are images in good condition, and training pseudo camera images 29, and improves the accuracy with which the training pseudo camera images 29 output by the training generation unit 22 misidentify the identification unit 21 as "correct." The degradation processing unit 26 is used to degrade images. By having the identification unit 21 and the training generation unit 22 learn competitively in this way, the training generation unit 22 can generate a generative model 24a that outputs training pseudo camera images 29 that are closer to good camera images.

識別部２１と、学習用生成部２２との競合的な学習は、例えば、識別部２１の学習用疑似カメラ画像２９に対する判定結果によって識別モデル２３ａか、生成モデル２４ａのいずれかを学習させることによって行う。 Competitive learning between the identification unit 21 and the learning generation unit 22 is performed, for example, by training either the identification model 23a or the generation model 24a based on the judgment results of the identification unit 21 on the learning pseudo camera image 29.

図７は、識別部２１が学習用疑似カメラ画像２９を「正」と判定した場合のデータの流れを表す。学習用生成部２２が、劣化学習用カメラ画像２７ａと、学習用センサ情報２８とから生成した学習用疑似カメラ画像２９を、識別部２１が「正」と判定した場合、識別部２１は判定を誤ったことになり、一方で学習用生成部２２は識別部２１の識別の誤認に成功したことになる。よってこの場合は、判定を誤った識別部２１の識別モデル２３ａを学習させる。学習は、例えば、状態の良いカメラ画像である学習用カメラ画像２７を「正」とラベル付けし、学習用生成部２２が出力する学習用疑似カメラ画像２９を「偽」とラベル付けすることで、識別部２１の識別モデル２３ａに反映させる。 Figure 7 shows the data flow when the identification unit 21 determines that the training pseudo camera image 29 is "correct." If the identification unit 21 determines that the training pseudo camera image 29 generated by the training generation unit 22 from the degraded training camera image 27a and the training sensor information 28 is "correct," this means that the identification unit 21 has made an incorrect judgment, while the training generation unit 22 has successfully misidentified the identification unit 21. Therefore, in this case, the identification model 23a of the identification unit 21 that made the incorrect judgment is trained. For example, the training is performed by labeling the training camera image 27, which is a camera image in good condition, as "correct" and labeling the training pseudo camera image 29 output by the training generation unit 22 as "false," thereby reflecting this in the identification model 23a of the identification unit 21.

図８は、識別部２１が学習用疑似カメラ画像２９を「偽」と判定した場合のデータの流れを表す。学習用生成部２２が、劣化学習用カメラ画像２７ａと、学習用センサ情報２８とから生成した学習用疑似カメラ画像２９を、識別部２１が「偽」と判定した場合、識別部２１は正しい判定をしたことになり、一方で学習用生成部２２は識別部２１の識別の誤認に失敗したことになる。よってこの場合は、識別の誤認に失敗した学習用生成部２２の生成モデル２４ａを学習させる。学習は、例えば、学習用生成部２２において出力する学習用疑似カメラ画像２９を、重みを固定した識別部２１に入力し、識別部２１が「正」と判定されることを目指すように、すなわち、より学習用カメラ画像２７に近い学習用疑似カメラ画像２９を生成するように学習する。そして、学習結果を学習用生成部２２の生成モデル２４ａに反映させる。 Figure 8 shows the data flow when the identification unit 21 determines that the training pseudo camera image 29 is "false." If the identification unit 21 determines that the training pseudo camera image 29 generated by the training generation unit 22 from the degraded training camera image 27a and the training sensor information 28 is "false," this means that the identification unit 21 made a correct determination, while the training generation unit 22 failed to make a misidentification. Therefore, in this case, the generative model 24a of the training generation unit 22 that failed to make a misidentification is trained. For example, the training pseudo camera image 29 output by the training generation unit 22 is input to the identification unit 21 with fixed weights, and the identification unit 21 learns to aim for a "correct" determination, i.e., to generate a training pseudo camera image 29 that is closer to the training camera image 27. The learning results are then reflected in the generative model 24a of the training generation unit 22.

以上のように、学習用疑似カメラ画像２９に対する識別部２１の判定に基づいて識別モデル２３ａか、生成モデル２４ａのいずれかを学習させる。これを繰り返し行うことで、識別部２１と学習用生成部２２とを競合的に学習させる。これにより、学習用生成部２２は、劣化学習用カメラ画像２７ａと学習用センサ情報２８とから、より学習用カメラ画像２７に近い学習用疑似カメラ画像２９を出力するように生成モデル２４ａを作成する。この生成モデル２４ａを学習済みモデル１３ａに適用することで、画像生成装置１を構成させる。つまり、学習された生成モデル２４ａは、学習済みモデル１３ａとして学習済みモデル記憶部１３に格納され、画像生成装置１において学習済みモデル１３ａとして使用される。 As described above, either the discriminative model 23a or the generative model 24a is trained based on the discrimination unit 21's judgment of the training pseudo camera image 29. By repeating this process, the discriminative unit 21 and the training generation unit 22 are trained competitively. As a result, the training generation unit 22 creates a generative model 24a from the degraded training camera image 27a and the training sensor information 28 so as to output a training pseudo camera image 29 that is closer to the training camera image 27. By applying this generative model 24a to the trained model 13a, the image generation device 1 is constructed. In other words, the trained generative model 24a is stored in the trained model storage unit 13 as the trained model 13a and is used as the trained model 13a in the image generation device 1.

以上の説明においては学習用疑似カメラ画像２９の判定に基づいて識別モデル２３ａと、生成モデル２４ａのいずれかを学習させたが、学習用カメラ画像２７と学習用疑似カメラ画像２９の両方を識別部２１に入力し、どちらの画像が「正」であるかを判定させ、その結果に基づいて識別モデル２３ａと、生成モデル２４ａのいずれかを学習させる構成としても良い。 In the above explanation, either the discriminative model 23a or the generative model 24a is trained based on the judgment of the training pseudo camera image 29. However, it is also possible to input both the training camera image 27 and the training pseudo camera image 29 into the discrimination unit 21, have it determine which image is "correct," and train either the discriminative model 23a or the generative model 24a based on the result.

また、以上の説明においては、学習用生成部２２は、劣化学習用カメラ画像２７ａおよび学習用センサ情報２８から学習用疑似カメラ画像２９を生成し、学習モデルである生成モデル２４ａを学習しているが、学習データとして、学習用カメラ画像２７を用いても良い。例えば、学習用カメラ画像２７を生成モデル２４ａの一部として生成モデル記憶部２４に記憶させる。以下、学習データとして記憶した学習用カメラ画像２７を記憶済み学習用カメラ画像２７ｂと記載する。学習用生成部２２は、劣化学習用カメラ画像２７ａと、学習用センサ情報２８と、生成モデル記憶部２４に記憶された記憶済み学習用カメラ画像２７ｂとを用いて学習用疑似カメラ画像２９を生成する。これにより、撮影対象に類似する記憶済み学習用カメラ画像２７ｂが生成モデル記憶部２４に存在すれば、より鮮明な学習用疑似カメラ画像２９を生成するよう生成モデル２４ａを学習させることができる。 In the above description, the learning generation unit 22 generates training pseudo camera images 29 from the degraded training camera images 27a and training sensor information 28 to train the generative model 24a, which is a learning model. However, the training camera images 27 may also be used as training data. For example, the training camera images 27 are stored in the generative model storage unit 24 as part of the generative model 24a. Hereinafter, the training camera images 27 stored as training data will be referred to as stored training camera images 27b. The learning generation unit 22 generates training pseudo camera images 29 using the degraded training camera images 27a, the training sensor information 28, and the stored training camera images 27b stored in the generative model storage unit 24. As a result, if a stored training camera image 27b similar to the photographed subject exists in the generative model storage unit 24, the generative model 24a can be trained to generate clearer training pseudo camera images 29.

学習用カメラ画像２７を学習用データとして用いる場合、生成モデル２４ａは、劣化学習用カメラ画像２７ａ、学習用センサ情報２８、および記憶済み学習用カメラ画像２７ｂを用いて学習用疑似カメラ画像２９を生成する様学習される。記憶済み学習用カメラ画像２７ｂが記憶され、かつ記憶済み学習用カメラ画像２７ｂを画像生成に用いる生成モデル２４ａが記憶された生成モデル記憶部２４が学習済みモデル記憶部１３に反映され、画像生成装置１において疑似カメラ画像１４の生成に記憶済み学習用カメラ画像２７ｂを用いる学習済みモデル１３ａとして使用される。 When training camera images 27 are used as training data, the generative model 24a is trained to generate training pseudo camera images 29 using degraded training camera images 27a, training sensor information 28, and stored training camera images 27b. The generative model memory unit 24, which stores the stored training camera images 27b and stores a generative model 24a that uses the stored training camera images 27b for image generation, is reflected in the trained model memory unit 13 and is used as a trained model 13a that uses the stored training camera images 27b to generate pseudo camera images 14 in the image generation device 1.

次に、実施の形態１に係る画像生成装置１において、カメラ画像取得部１１がカメラ画像６、センサ情報取得部１２がセンサ情報７を、それぞれ取得してから画像生成部１０が疑似カメラ画像１４を出力するまでの処理動作を、図９のフローチャートを用いて説明する。 Next, the processing operations in the image generation device 1 according to embodiment 1, from when the camera image acquisition unit 11 acquires the camera image 6 and the sensor information acquisition unit 12 acquires the sensor information 7, until when the image generation unit 10 outputs the pseudo camera image 14, will be described using the flowchart in Figure 9.

カメラ４と物体センサ５は、所定の間隔でカメラ画像６とセンサ情報７をそれぞれ取得するため、カメラ画像取得部１１およびセンサ情報取得部１２は、カメラ４と物体センサ５で取得されたカメラ画像６とセンサ情報７が入力されるまでデータの取得の確認を繰り返す（Ｓ１１：Ｎｏ）。カメラ画像取得部１１がカメラ画像６を、センサ情報取得部１２がセンサ情報７をそれぞれ受け取った場合（Ｓ１０：ＹＥＳ）、カメラ画像６およびセンサ情報７が画像生成部１０に入力される（Ｓ１１）。その後、画像生成部１０は、学習済みモデル１３ａを用いて、カメラ画像６とセンサ情報７とから疑似カメラ画像１４を作成する（Ｓ１２）。そして、疑似カメラ画像１４を表示部３に出力する（Ｓ１３）。 Since the camera 4 and object sensor 5 acquire camera images 6 and sensor information 7 at predetermined intervals, the camera image acquisition unit 11 and sensor information acquisition unit 12 repeatedly check data acquisition until the camera images 6 and sensor information 7 acquired by the camera 4 and object sensor 5 are input (S11: No). When the camera image acquisition unit 11 receives the camera image 6 and the sensor information acquisition unit 12 receives the sensor information 7 (S10: Yes), the camera image 6 and sensor information 7 are input to the image generation unit 10 (S11). The image generation unit 10 then uses the trained model 13a to create a pseudo camera image 14 from the camera image 6 and sensor information 7 (S12). The pseudo camera image 14 is then output to the display unit 3 (S13).

次に、実施の形態１に係る画像生成装置１のハードウェア構成の一例を説明する。図１０は、画像生成装置１のハードウェア構成の一例を示す図である。画像生成装置１は、プロセッサ１０１と、メモリ１０２とを備えるコンピュータを含む。 Next, an example of the hardware configuration of the image generation device 1 according to embodiment 1 will be described. Figure 10 is a diagram showing an example of the hardware configuration of the image generation device 1. The image generation device 1 includes a computer having a processor 101 and a memory 102.

プロセッサ１０１、およびメモリ１０２は、例えば、バス１０３によって互いに情報の送受信が可能である。プロセッサ１０１は、メモリ１０２に記憶されたプログラムを読みだして実行することによって、画像生成部１０、学習部２０、学習用生成部２２、劣化処理部２６といった機能を実行する。プロセッサ１０１は、例えば、処理回路の一例であり、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）、及びＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）のうち一つ以上を含む。 The processor 101 and memory 102 can send and receive information to and from each other via, for example, a bus 103. The processor 101 reads and executes programs stored in the memory 102 to perform functions such as the image generation unit 10, learning unit 20, learning generation unit 22, and degradation processing unit 26. The processor 101 is, for example, an example of a processing circuit, and includes one or more of a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an LSI (Large Scale Integration), and a GPU (Graphics Processing Unit).

メモリ１０２は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＥＰＲＯＭ（登録商標）（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）のうち一つ以上を含む。また、メモリ１０２は、コンピュータが読取可能なプログラムが記録された記録媒体を含む。かかる記録媒体は、不揮発性または揮発性の半導体メモリ、磁気ディスク、及び光ディスクなどを含む。なお、画像生成装置１は、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）およびＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などの集積回路を含んでいてもよい。 Memory 102 includes one or more of RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), and EEPROM (Electrically Erasable Programmable Read Only Memory). Memory 102 also includes a recording medium on which a computer-readable program is recorded. Such recording media include non-volatile or volatile semiconductor memory, magnetic disks, optical disks, etc. The image generation device 1 may also include integrated circuits such as an ASIC (Application Specific Integrated Circuit) and an FPGA (Field Programmable Gate Array).

以上のように、実施の形態１に係る画像生成装置１は、カメラ画像取得部１１と、センサ情報取得部１２と、学習済みモデル記憶部１３と、画像生成部１０とを備える。カメラ画像取得部１１は、撮影対象の視覚情報であるカメラ画像６を取得する。センサ情報取得部１２は、撮影対象を物体センサによって検知して得られたセンサ情報７を取得する。学習済みモデル記憶部１３は、カメラ画像６と、センサ情報７とから、カメラ画像６と、撮影対象の視覚情報を疑似的に表した視覚情報であるセンサ画像とを重畳した合成画像である疑似カメラ画像１４を生成するための学習済みモデル１３ａを記憶し、画像生成部１０は、カメラ画像取得部１１で取得されたカメラ画像６と、センサ情報取得部１２で取得されたセンサ情報７とから、学習済みモデル１３ａを用いて疑似カメラ画像１４を出力する。これにより、カメラ画像６が欠損した状態においても、物体を疑似的に表した画像を得ることができる。また、撮影対象の視覚情報を疑似的に表した疑似カメラ画像１４を表示部３に表示することにより、使用者は撮影対象の種別や状態などが判別することができ、有効な監視を行うことができる。 As described above, the image generation device 1 according to embodiment 1 includes a camera image acquisition unit 11, a sensor information acquisition unit 12, a trained model storage unit 13, and an image generation unit 10. The camera image acquisition unit 11 acquires a camera image 6, which is visual information of the subject being photographed. The sensor information acquisition unit 12 acquires sensor information 7 obtained by detecting the subject using an object sensor. The trained model storage unit 13 stores a trained model 13a for generating a pseudo camera image 14, which is a composite image obtained by superimposing the camera image 6 and the sensor image, which is visual information that simulates the visual information of the subject, from the camera image 6 and the sensor information 7. The image generation unit 10 outputs the pseudo camera image 14 using the trained model 13a from the camera image 6 acquired by the camera image acquisition unit 11 and the sensor information 7 acquired by the sensor information acquisition unit 12. This makes it possible to obtain an image that simulates an object even when the camera image 6 is missing. Additionally, by displaying a pseudo camera image 14, which simulates the visual information of the subject, on the display unit 3, the user can determine the type and condition of the subject, enabling effective monitoring.

なお、実施の形態１では、カメラ画像６に含まれる撮影対象が１つの場合を例に挙げて説明したが、カメラ画像６に含まれる撮影対象は複数でもよい。例えば、画像生成部１０は、センサ情報７を用いてカメラ画像６に含まれる撮影対象の数を判定し、１つの画像に１つの撮影対象が含まれるように分割したカメラ画像６を用いて疑似カメラ画像を生成し、生成した複数の疑似カメラ画像１４を１つの画像に合成して出力してもよい。また、実施の形態１では、劣化処理部２６で劣化学習用カメラ画像２７ａを生成する構成について説明したが、予め用意しておいた学習用カメラ画像２７と劣化学習用カメラ画像２７ａを学習用データ取得部２５が取得する構成でもよい。 In the first embodiment, the camera image 6 contains one photographed object, but the camera image 6 may contain multiple photographed objects. For example, the image generation unit 10 may use the sensor information 7 to determine the number of photographed objects contained in the camera image 6, generate pseudo camera images using camera images 6 divided so that each image contains one photographed object, and combine the generated multiple pseudo camera images 14 into a single image for output. In the first embodiment, the degradation processing unit 26 generates the degradation learning camera image 27a, but the learning data acquisition unit 25 may acquire the previously prepared learning camera image 27 and degradation learning camera image 27a.

以下に、実施の形態１に係る画像生成装置１の変形例を示す。変形例に係る画像生成装置１ａは、カメラ画像６に写った撮影対象の鮮明度を判定する鮮明度判定部１５と、鮮明度に基づいてカメラ画像６か、疑似カメラ画像１４のいずれかを表示部３に出力する出力切替部１６を備える点が画像生成装置１と異なる。以下、実施の形態１と異なる点を中心に説明する。 Below, a modified example of the image generating device 1 according to embodiment 1 is described. The image generating device 1a according to this modified example differs from the image generating device 1 in that it includes a sharpness determination unit 15 that determines the sharpness of the subject captured in the camera image 6, and an output switching unit 16 that outputs either the camera image 6 or the pseudo camera image 14 to the display unit 3 based on the sharpness. The following description will focus on the differences from embodiment 1.

図１１は、画像生成装置１ａの構成を表す図である。画像生成装置１ａは、画像生成部１０、カメラ画像取得部１１、センサ情報取得部１２、学習済みモデル記憶部１３、鮮明度判定部１５、および出力切替部１６から構成される。鮮明度判定部１５は、カメラ画像取得部１１で取得したカメラ画像６の鮮明度を算出する。鮮明度は、例えば、カメラ画像６全体の高周波成分、または、エッジ成分とする。また、鮮明度は、カメラ画像６のうち、センサ情報７で物体が検出された位置に対応する領域の高周波成分としてもよい。更に、撮影対象が該当の対象であると認識できるか否か、すなわち、物体検出の成否を鮮明度の判定に加えても良い。 Figure 11 is a diagram showing the configuration of image generation device 1a. Image generation device 1a is composed of image generation unit 10, camera image acquisition unit 11, sensor information acquisition unit 12, trained model storage unit 13, sharpness determination unit 15, and output switching unit 16. The sharpness determination unit 15 calculates the sharpness of the camera image 6 acquired by the camera image acquisition unit 11. The sharpness may be, for example, the high-frequency components or edge components of the entire camera image 6. Alternatively, the sharpness may be the high-frequency components of an area of the camera image 6 corresponding to the position where an object was detected in the sensor information 7. Furthermore, whether or not the photographed subject can be recognized as the relevant subject, i.e., whether or not object detection was successful, may also be added to the determination of sharpness.

出力切替部１６は、鮮明度に基づいて、表示部３へ出力する画像を切替える。鮮明度が所定の条件を満たす場合には、カメラ画像６に写る撮影対象が鮮明であると判断し、カメラ画像取得部１１から送られたカメラ画像６を表示部３へ出力する。一方、鮮明度が所定の条件を満たさない場合には、カメラ画像６に写る撮影対象が不鮮明であると判断し、画像生成部１０で生成された疑似カメラ画像１４を表示部３へ出力する。これにより、監視システム１００において、撮影対象がカメラ画像６に鮮明に写っていない場合には、撮影対象を疑似的に合成した疑似カメラ画像１４が表示部３に表示され、鮮明に写っている場合には、カメラ画像６をそのまま表示部３に表示させることができる。その結果、カメラ画像６と疑似カメラ画像１４のうち、撮影対象をより認識しやすい画像を表示することができる。 The output switching unit 16 switches the image to be output to the display unit 3 based on the clarity. If the clarity meets a predetermined condition, it determines that the subject captured in the camera image 6 is clear, and outputs the camera image 6 sent from the camera image acquisition unit 11 to the display unit 3. On the other hand, if the clarity does not meet the predetermined condition, it determines that the subject captured in the camera image 6 is unclear, and outputs the pseudo camera image 14 generated by the image generation unit 10 to the display unit 3. As a result, in the surveillance system 100, if the subject is not clearly captured in the camera image 6, a pseudo camera image 14 that is a synthetic image of the subject is displayed on the display unit 3, and if the subject is clearly captured, the camera image 6 can be displayed as is on the display unit 3. As a result, it is possible to display either the camera image 6 or the pseudo camera image 14, whichever image makes the subject more easily recognizable.

また、鮮明度判定部１５および出力切替部１６は、図１２のように画像生成装置１の外側に設けられ、鮮明度判定部１５がカメラ画像６の撮影対象を鮮明と判断した場合には画像生成部１０にカメラ画像６およびセンサ情報７を出力し、鮮明でないと判断した場合には表示部３にカメラ画像６を表示部に直接出力する構成としてもよい。これにより、画像生成装置１の設計変更を行うことなく、撮影対象の鮮明度によって疑似カメラ画像１４の出力の有無を決定することができる。 Alternatively, the clarity determination unit 15 and output switching unit 16 may be provided outside the image generation device 1 as shown in FIG. 12, and if the clarity determination unit 15 determines that the subject captured in the camera image 6 is clear, it may output the camera image 6 and sensor information 7 to the image generation unit 10, but if it determines that the subject is not clear, it may output the camera image 6 directly to the display unit 3. This makes it possible to determine whether or not to output the pseudo camera image 14 depending on the clarity of the subject, without having to change the design of the image generation device 1.

実施の形態２．
実施の形態２に係る画像生成装置は、画像生成装置が学習済みモデルを学習させる学習部を備える点で、実施の形態１に係る画像生成装置１と異なる。以下においては、実施の形態１と同様の機能を有する構成要素については同一記号を付して説明を省略し、実施の形態１の画像生成装置１と異なる点を中心に説明する。 Embodiment 2.
The image generation device according to the second embodiment differs from the image generation device 1 according to the first embodiment in that the image generation device includes a learning unit that learns a trained model. In the following, components having the same functions as those in the first embodiment are denoted by the same reference numerals and their description is omitted, and the description will focus on the differences from the image generation device 1 according to the first embodiment.

実施の形態２に係る画像生成装置１ｂの構成について説明する。図１３は、実施の形態２に係る画像生成装置１ｂの構成を示す図である。画像生成装置１ｂは、画像生成部１０が疑似カメラ画像１４の作成に用いる学習済みモデル１３ａを学習させる学習部２０ａを備える。これにより、画像生成装置１ｂは、カメラ画像６の状態が悪い場合において、画像の状態が悪くなる直前の視覚情報が学習モデルとして使えるため、より撮影対象の実物に近い疑似カメラ画像１４を生成することができるという効果を奏する。また、画像生成装置１ｂは、リアルタイムに学習済みモデル１３ａを学習させながら、疑似カメラ画像１４を出力するため、環境に適した効率的な学習をすることができるという効果をも奏する。 The configuration of image generation device 1b according to embodiment 2 will be described. FIG. 13 is a diagram showing the configuration of image generation device 1b according to embodiment 2. Image generation device 1b includes a learning unit 20a that trains a trained model 13a used by image generation unit 10 to create pseudo camera images 14. As a result, when the state of the camera image 6 is poor, image generation device 1b can use the visual information immediately before the image state deteriorates as a training model, thereby achieving the effect of being able to generate pseudo camera images 14 that are closer to the actual subject. Furthermore, image generation device 1b outputs pseudo camera images 14 while training trained model 13a in real time, thereby achieving the effect of being able to perform efficient learning suited to the environment.

実施の形態２における画像生成装置１ｂに含まれる学習部２０ａは、実施の形態１における学習済みモデル１３ａを生成する学習部２０と異なり、学習用データ取得部２５が、画像生成装置１ｂにおけるカメラ画像取得部１１が取得したカメラ画像６と、センサ情報取得部１２が取得したセンサ情報７とを取得し、学習に用いる。また、学習部２０ａは、実施の形態１に係る学習部２０等で予め学習されたモデルを適用し、その一部、もしくは全部を追加で学習する構成としても良い。 The learning unit 20a included in the image generation device 1b in embodiment 2 differs from the learning unit 20 that generates the trained model 13a in embodiment 1 in that the learning data acquisition unit 25 acquires the camera images 6 acquired by the camera image acquisition unit 11 and the sensor information 7 acquired by the sensor information acquisition unit 12 in the image generation device 1b and uses these for learning. Furthermore, the learning unit 20a may be configured to apply a model that has been trained in advance by the learning unit 20 in embodiment 1 or the like, and additionally train part or all of the model.

以上のように、取得したカメラ画像６とセンサ情報７とを画像生成部１０における疑似カメラ画像１４の出力に用いると同時に、学習部２０ａによって学習済みモデル１３ａを累積的に学習させる。ただし、学習に用いるカメラ画像６は鮮明なものである必要があるため、取得した全てのカメラ画像６とセンサ情報７とを学習させるのではなく、例えば、十分に鮮明と認められるカメラ画像６が取得できた場合のみ学習部２０ａが学習済みモデル１３ａを学習させる。 As described above, the acquired camera images 6 and sensor information 7 are used to output the pseudo camera images 14 in the image generation unit 10, and at the same time, the learning unit 20a cumulatively learns the trained model 13a. However, because the camera images 6 used for learning need to be clear, not all acquired camera images 6 and sensor information 7 are trained; rather, for example, the learning unit 20a trains the trained model 13a only when camera images 6 that are deemed to be sufficiently clear are acquired.

十分に鮮明と認められるカメラ画像６が得られ、学習に用いるか否かは、例えば、画像の輝度、高周波成分、またはエッジ成分などのパラメータに閾値を設けて判断させても良いし、時間帯によって切替えても良い。また、操作者の判断により、学習させる状態と、学習させない状態とを切替える方式でも良い。更に、撮影対象が該当の対象であると認識できるか否か、すなわち、物体検出の成否を鮮明度の判断に加えても良い。 Whether a camera image 6 that is deemed sufficiently clear and can be used for learning can be determined, for example, by setting thresholds for parameters such as image brightness, high frequency components, or edge components, or by switching based on the time of day. Alternatively, the operator can decide whether to switch between learning and non-learning states. Furthermore, whether the subject in the image can be recognized as the relevant object, i.e., whether object detection was successful, can also be added to the determination of clarity.

実施の形態２においても同様に、ＧＡＮを用いた学習が有効である。図１４は、ＧＡＮを用いる場合の学習部２０ａの学習におけるデータの流れを表した図である。識別部２１は、カメラ画像取得部１１が取得したカメラ画像６を「正」、学習用生成部２２が出力する学習用疑似カメラ画像２９を「偽」と判断する。また、学習用生成部２２は、カメラ画像取得部１１が取得したカメラ画像６を劣化させた劣化カメラ画像６ａと、センサ情報取得部１２が取得したセンサ情報７とを入力し、生成モデル２４ａを用いて学習用疑似カメラ画像２９を生成する。学習法は実施の形態１に係る学習部２０と同様である。 Learning using a GAN is also effective in embodiment 2. Figure 14 is a diagram showing the flow of data during learning by the learning unit 20a when a GAN is used. The identification unit 21 determines that the camera image 6 acquired by the camera image acquisition unit 11 is "true" and that the training pseudo camera image 29 output by the training generation unit 22 is "false." The training generation unit 22 also receives as input a degraded camera image 6a obtained by degrading the camera image 6 acquired by the camera image acquisition unit 11 and sensor information 7 acquired by the sensor information acquisition unit 12, and generates the training pseudo camera image 29 using the generative model 24a. The learning method is the same as that of the learning unit 20 in embodiment 1.

また、学習部２０ａは、劣化カメラ画像６ａ、センサ情報７に加え、カメラ画像６を学習用データとして用いる意義が実施の形態１の場合よりも高い。撮影対象のカメラ画像６の状態が悪くなった場合に、その直前に写され、撮影対象が鮮明であるカメラ画像６を疑似カメラ画像１４の生成に使うことができるためである。 Furthermore, the learning unit 20a is more effective than in the first embodiment in using the camera image 6 as learning data in addition to the degraded camera image 6a and sensor information 7. This is because, if the condition of the camera image 6 of the subject deteriorates, the camera image 6 taken immediately before that, in which the subject is clearly visible, can be used to generate the pseudo camera image 14.

学習用データとしてカメラ画像６を用いる場合の学習部２０ａの処理について説明する。鮮明な画像であるカメラ画像６およびセンサ情報７を学習用データ取得部２５により取得し、劣化処理部２６によって劣化カメラ画像６ａを生成する。その後、劣化カメラ画像６ａおよびセンサ情報７を学習用生成部２２に入力し、さらにカメラ画像６を学習用生成部２２の生成モデル記憶部２４に学習データとして記憶する。以下、学習データとして記憶したカメラ画像６を記憶済みカメラ画像６ｂと記載する。学習用生成部２２は、劣化カメラ画像６ａと、センサ情報７に加え、生成モデル記憶部２４に記憶された記憶済みカメラ画像６ｂおよび生成モデル２４ａを用いて学習用疑似カメラ画像２９を生成する。これにより、過去のカメラ画像６を元により鮮明な学習用疑似カメラ画像２９を生成することができる。 The processing of the learning unit 20a when using a camera image 6 as training data will be described. A clear camera image 6 and sensor information 7 are acquired by the training data acquisition unit 25, and a degraded camera image 6a is generated by the degradation processing unit 26. The degraded camera image 6a and sensor information 7 are then input to the training data generation unit 22, which then stores the camera image 6 as training data in the generative model storage unit 24 of the training data generation unit 22. Hereinafter, the camera image 6 stored as training data will be referred to as a stored camera image 6b. The training data generation unit 22 generates a training pseudo camera image 29 using the degraded camera image 6a, the sensor information 7, and the stored camera image 6b and generative model 24a stored in the generative model storage unit 24. This makes it possible to generate a clearer training pseudo camera image 29 based on a past camera image 6.

カメラ画像６を学習用データとして用いることは、カメラ画像取得部１１が取得したカメラ画像６の撮影対象と同じ撮影対象を写した記憶済みカメラ画像６ｂが生成モデル記憶部２４に存在する場合に特に効果が高い。同じ撮影対象を撮影したカメラ画像６を記憶した生成モデル記憶部２４が学習済みモデル記憶部１３に反映され、疑似カメラ画像１４の生成に同じ撮影対象を写したカメラ画像６を用いることができるためである。このことから、カメラ画像６の状態が突発的に悪くなった場合において、画像の状態が悪くなる直前のカメラ画像６の情報が学習済みモデル記憶部１３に記憶され、これを疑似カメラ画像１４の生成に使えるため、より撮影対象の実物に近い疑似カメラ画像１４を生成することができる。 Using camera images 6 as learning data is particularly effective when stored camera images 6b of the same subject as the subject of the camera image 6 acquired by the camera image acquisition unit 11 are present in the generative model storage unit 24. This is because the generative model storage unit 24 storing camera images 6 of the same subject is reflected in the trained model storage unit 13, and camera images 6 of the same subject can be used to generate pseudo camera images 14. As a result, if the condition of the camera image 6 suddenly deteriorates, information on the camera image 6 immediately before the image condition deteriorates is stored in the trained model storage unit 13 and can be used to generate the pseudo camera image 14, making it possible to generate pseudo camera images 14 that are closer to the actual subject.

また、学習済みモデル記憶部１３に、カメラ画像取得部１１で取得したカメラ画像６の撮影範囲にある撮影対象と同じ撮影対象が写された記憶済みカメラ画像６ｂが存在する場合にのみ、対応する記憶済みカメラ画像６ｂを用いて疑似カメラ画像１４を生成する構成としても良い。この場合、例えば、学習用生成部２２の学習の際、カメラ画像６と、同時に取得されたセンサ情報７とを対応付けて生成モデル記憶部２４に記憶する。以下、生成モデル記憶部２４に記憶されたセンサ情報７を記憶済みセンサ情報７ｂと記載する。画像生成部１０は、記憶済みセンサ情報７ｂと、センサ情報取得部１２で取得されたセンサ情報７との位置的な連続性や撮影時間等を考慮して、同じ撮影対象を写した記憶済みカメラ画像６ｂが学習済みモデル記憶部１３に存在する場合、記憶済みカメラ画像６ｂを用いて疑似カメラ画像１４を生成する。 Also, the pseudo camera image 14 may be generated using the corresponding stored camera image 6b only if the learned model storage unit 13 contains a stored camera image 6b that captures the same subject as the subject in the shooting range of the camera image 6 acquired by the camera image acquisition unit 11. In this case, for example, during learning by the learning generation unit 22, the camera image 6 and the sensor information 7 acquired at the same time are associated and stored in the generative model storage unit 24. Hereinafter, the sensor information 7 stored in the generative model storage unit 24 will be referred to as stored sensor information 7b. The image generation unit 10 takes into consideration the positional continuity and shooting time between the stored sensor information 7b and the sensor information 7 acquired by the sensor information acquisition unit 12, and generates the pseudo camera image 14 using the stored camera image 6b if a stored camera image 6b capturing the same subject is present in the learned model storage unit 13.

ただし、学習済みモデル記憶部１３に、カメラ画像取得部１１で取得したカメラ画像６の撮影範囲にある撮影対象と同じ撮影対象が写された記憶済みカメラ画像６ｂが存在する場合にのみ、対応する記憶済みカメラ画像６ｂを用いて疑似カメラ画像１４を生成する場合、学習用生成部２２の学習の際、記憶済みカメラ画像６ｂを用いて学習用疑似カメラ画像２９を生成する場合と、記憶済みカメラ画像６ｂを用いることなく学習用疑似カメラ画像２９を生成する場合との両方を学習させる必要がある。 However, if the learned model storage unit 13 contains a stored camera image 6b that captures the same subject as the subject in the shooting range of the camera image 6 acquired by the camera image acquisition unit 11, and the corresponding stored camera image 6b is used to generate the pseudo camera image 14, then during training the learning generation unit 22, it is necessary to train the learning generation unit 22 to both generate a training pseudo camera image 29 using the stored camera image 6b and generate a training pseudo camera image 29 without using the stored camera image 6b.

次に、実施の形態２に係る画像生成装置１ｂにおいて、カメラ画像取得部１１がカメラ画像６、センサ情報取得部１２がセンサ情報７を、それぞれ取得してから学習部２０ａが学習済みモデル１３ａに学習を反映させ、さらに画像生成部１０が疑似カメラ画像１４を出力するまでの処理動作を、図１５のフローチャートを用いて説明する。 Next, the processing operations in the image generation device 1b according to embodiment 2 will be described using the flowchart in Figure 15, from when the camera image acquisition unit 11 acquires the camera image 6 and the sensor information acquisition unit 12 acquires the sensor information 7, until when the learning unit 20a reflects the learning in the trained model 13a, and until when the image generation unit 10 outputs the pseudo camera image 14.

カメラ４と物体センサ５は、所定の間隔でカメラ画像６とセンサ情報７を取得するため、カメラ画像取得部１１およびセンサ情報取得部１２は、カメラ４と物体センサ５で取得されたカメラ画像６とセンサ情報７が入力されるまでデータの取得の確認を繰り返す（Ｓ２０：ＮＯ）。カメラ画像取得部１１がカメラ画像６を、センサ情報取得部１２がセンサ情報７をそれぞれ受け取った場合（Ｓ２０：ＹＥＳ）、カメラ画像６およびセンサ情報７が画像生成部１０に入力される（Ｓ２１）。画像生成部１０は、カメラ画像６とセンサ情報７とを受信した後、学習済みモデル１３ａを用いて疑似カメラ画像１４を作成する（Ｓ２２）。その後、疑似カメラ画像１４を表示部３に出力する（Ｓ２３）。 The camera 4 and object sensor 5 acquire camera images 6 and sensor information 7 at predetermined intervals, so the camera image acquisition unit 11 and sensor information acquisition unit 12 repeatedly check data acquisition until the camera images 6 and sensor information 7 acquired by the camera 4 and object sensor 5 are input (S20: NO). When the camera image acquisition unit 11 receives the camera image 6 and the sensor information acquisition unit 12 receives the sensor information 7 (S20: YES), the camera image 6 and sensor information 7 are input to the image generation unit 10 (S21). After receiving the camera image 6 and sensor information 7, the image generation unit 10 creates a pseudo camera image 14 using the trained model 13a (S22). The pseudo camera image 14 is then output to the display unit 3 (S23).

また、学習部２０ａは、取得したカメラ画像６が鮮明であるか否かを判断する（Ｓ３０）。カメラ画像６が鮮明であると判断した場合（Ｓ３０：ＹＥＳ）、学習部２０ａは、カメラ画像６とセンサ情報７とを用いてモデルを学習させる（Ｓ３１）。学習にＧＡＮを用いる場合、識別部２１および学習用生成部２２をここで学習させる。続いて、学習した結果を学習済みモデル１３ａに反映させて（Ｓ３２）、学習は終了する。学習にＧＡＮを用いる場合、学習させた学習用生成部２２の生成モデル２４ａを学習済みモデル１３ａに反映する。なお、カメラ画像６が鮮明ではないと判断した場合、学習部２０ａはそのまま学習せずに終了する（Ｓ３０：ＮＯ）。 The learning unit 20a also determines whether the acquired camera image 6 is clear (S30). If it is determined that the camera image 6 is clear (S30: YES), the learning unit 20a trains a model using the camera image 6 and sensor information 7 (S31). If a GAN is used for training, the identification unit 21 and the training generation unit 22 are trained here. Next, the training results are reflected in the trained model 13a (S32), and training ends. If a GAN is used for training, the trained generation model 24a of the training generation unit 22 is reflected in the trained model 13a. Note that if it is determined that the camera image 6 is not clear, the learning unit 20a simply terminates training without training (S30: NO).

なお、ここでは画像生成部１０が疑似カメラ画像１４を作成した後に学習部２０ａが学習処理を行う説明をしたが、学習部２０ａが学習を先に行ってから画像生成部１０が疑似カメラ画像１４を作成しても良いし、学習と画像生成を同時に行っても良い。 Note that, although the image generation unit 10 has created the pseudo camera image 14 and then the learning unit 20a has performed the learning process, the learning unit 20a may perform the learning first and then the image generation unit 10 may create the pseudo camera image 14, or learning and image generation may be performed simultaneously.

実施の形態２に係る画像生成装置１ｂのハードウェア構成例は、図に示す画像生成装置１のハードウェア構成と同じである。プロセッサ１０１は、メモリ１０２に記憶されたプログラムを読みだして実行することによって、画像生成部１０および学習部２０ａの機能を実行することができる。 An example of the hardware configuration of image generation device 1b according to embodiment 2 is the same as the hardware configuration of image generation device 1 shown in the figure. Processor 101 can execute the functions of image generation unit 10 and learning unit 20a by reading and executing programs stored in memory 102.

以上のように、実施の形態２に係る画像生成装置１ｂは、実施の形態１に係る画像生成装置１の構成に加え、学習済みモデル１３ａを学習させる学習部２０ａを更に備える。これにより、周囲の照明の故障等により突発的にカメラ画像６の状態が悪くなった場合に、その直前に撮影対象を撮影した状態の良いカメラ画像６を学習させた学習データを用いることができるため、より実物に近い疑似カメラ画像１４を出力することができるという効果を奏する。また、画像生成装置１ａは、画像生成部１０を動作させながら、リアルタイムに学習済みモデル１３ａを学習できる。このため、監視場所の環境で人や車の情報を学習できることから、良い状態で撮影したカメラ画像６により近い情報を持つ疑似カメラ画像１４を出力することができるという効果をも奏する。 As described above, the image generation device 1b according to embodiment 2 further includes a learning unit 20a that trains the trained model 13a in addition to the configuration of the image generation device 1 according to embodiment 1. As a result, if the condition of the camera image 6 suddenly deteriorates due to a malfunction of the surrounding lighting, for example, it is possible to use training data that has been trained from the camera image 6 taken of the subject in good condition just before the deterioration, thereby achieving the effect of outputting a pseudo camera image 14 that is closer to the real thing. Furthermore, the image generation device 1a can train the trained model 13a in real time while operating the image generation unit 10. As a result, it is possible to learn information about people and vehicles in the environment of the monitored location, thereby achieving the effect of outputting a pseudo camera image 14 that has information that is closer to that of a camera image 6 taken in good condition.

以下に実施の形態２に係る画像生成装置１ｂの変形例１を示す。変形例１に係る画像生成装置１ｃは、鮮明度判定部１５と、出力切替部１６と、鮮明度に基づいて、学習部２０ａか、画像生成部１０のいずれかにカメラ画像６およびセンサ情報７を入力する入力切替部１７を備える点が画像生成装置１ｂと異なる。以下、実施の形態２と異なる点を中心に説明する。図１１は、画像生成装置１ａの構成を表す図である。画像生成装置１ａは、画像生成部１０、カメラ画像取得部１１、センサ情報取得部１２、学習済みモデル記憶部１３、鮮明度判定部１５、および出力切替部１６から構成される。 Below, we will show a first variation of the image generation device 1b according to the second embodiment. The image generation device 1c according to the first variation differs from the image generation device 1b in that it includes a sharpness determination unit 15, an output switching unit 16, and an input switching unit 17 that inputs the camera image 6 and sensor information 7 to either the learning unit 20a or the image generation unit 10 based on the sharpness. The following will focus on the differences from the second embodiment. Figure 11 is a diagram showing the configuration of the image generation device 1a. The image generation device 1a is composed of the image generation unit 10, the camera image acquisition unit 11, the sensor information acquisition unit 12, the trained model storage unit 13, the sharpness determination unit 15, and the output switching unit 16.

図１６は、画像生成装置１ｃの構成を表す図である。画像生成装置１ｃは、画像生成部１０、カメラ画像取得部１１、センサ情報取得部１２、学習済みモデル記憶部１３、鮮明度判定部１５、および出力切替部１６、入力切替部１７、および学習部２０ａから構成される。カメラ画像取得部１１は、カメラ画像６を鮮明度判定部、入力切替部１７、および出力切替部１６に出力し、センサ情報取得部１２は、センサ情報７を入力切替部１７に入力する。 Figure 16 is a diagram showing the configuration of image generation device 1c. Image generation device 1c is composed of image generation unit 10, camera image acquisition unit 11, sensor information acquisition unit 12, trained model storage unit 13, sharpness determination unit 15, output switching unit 16, input switching unit 17, and learning unit 20a. Camera image acquisition unit 11 outputs camera image 6 to the sharpness determination unit, input switching unit 17, and output switching unit 16, and sensor information acquisition unit 12 inputs sensor information 7 to input switching unit 17.

入力切替部１７は、鮮明度に基づいて、カメラ画像６およびセンサ情報７を入力する対象を切替える。鮮明度が所定の条件を満たす場合には、カメラ画像６に写る撮影対象が鮮明であると判断し、カメラ画像６およびセンサ情報７を学習部２０ａに入力する。このとき、学習部２０ａは学習済みモデル１３ａを学習させる。また、画像生成部１０にはカメラ画像６およびセンサ情報７が入力されないため、疑似カメラ画像１４を生成しない。一方、鮮明度が所定の条件を満たさない場合には、カメラ画像６に写る撮影対象が不鮮明であると判断し、カメラ画像６およびセンサ情報７を画像生成部１０に入力する。このとき、画像生成部１０は疑似カメラ画像１４を生成する。また、学習部２０ａにはカメラ画像６およびセンサ情報７が入力されないため、学習済みモデル１３ａを学習しない。 The input switching unit 17 switches the object to which the camera image 6 and sensor information 7 are input based on the clarity. If the clarity meets a predetermined condition, it determines that the object captured in the camera image 6 is clear, and inputs the camera image 6 and sensor information 7 to the learning unit 20a. At this time, the learning unit 20a trains the trained model 13a. Furthermore, since the camera image 6 and sensor information 7 are not input to the image generation unit 10, a pseudo camera image 14 is not generated. On the other hand, if the clarity does not meet the predetermined condition, it determines that the object captured in the camera image 6 is unclear, and inputs the camera image 6 and sensor information 7 to the image generation unit 10. At this time, the image generation unit 10 generates a pseudo camera image 14. Furthermore, since the camera image 6 and sensor information 7 are not input to the learning unit 20a, a trained model 13a is not trained.

出力切替部１６は、実施の形態１に係る変形例と同様である。これにより、監視システム１００において、撮影対象がカメラ画像６に鮮明に写っていない場合には、撮影対象を疑似的に合成した疑似カメラ画像１４が表示部３に表示され、鮮明に写っている場合には、カメラ画像６をそのまま表示部３に表示させると同時に、鮮明なカメラ画像６を用いて学習済みモデル１３ａを学習させることができる。その結果、カメラ画像６と疑似カメラ画像１４のうち、撮影対象をより認識しやすい画像を表示すると同時に、効率的に学習済みモデル１３ａを学習させることができる。 The output switching unit 16 is the same as in the modified example of embodiment 1. As a result, in the surveillance system 100, if the subject is not clearly visible in the camera image 6, a pseudo camera image 14 that is a composite image of the subject is displayed on the display unit 3, and if the subject is clearly visible, the camera image 6 is displayed as is on the display unit 3, while the clear camera image 6 is used to train the trained model 13a. As a result, it is possible to display the image that makes it easier to recognize the subject, out of the camera image 6 and the pseudo camera image 14, and simultaneously train the trained model 13a efficiently.

以上の実施の形態に示した構成は、一例を示すものであり、別の公知の技術と組み合わせることも可能であるし、実施の形態同士を組み合わせることも可能であるし、要旨を逸脱しない範囲で、構成の一部を省略、変更することも可能である。 The configurations shown in the above embodiments are merely examples, and may be combined with other known technologies, or different embodiments may be combined with each other. Parts of the configuration may also be omitted or modified without departing from the spirit of the invention.

１，１ａ，１ｂ，１ｃ画像生成装置、２監視部、３表示部、４カメラ、５物体センサ、６カメラ画像、６ａ劣化カメラ画像、６ｂ記憶済みカメラ画像、７センサ情報、７ｂ記憶済みセンサ情報、１０画像生成部、１１カメラ画像取得部、１２センサ情報取得部、１３学習済みモデル記憶部、１３ａ学習済みモデル、１４疑似カメラ画像、１５鮮明度判定部、１６出力切替部、１７入力切替部、２０，２０ａ学習部、２１識別部、２２学習用生成部、２３識別モデル記憶部、２３ａ識別モデル、２４生成モデル記憶部、２４ａ生成モデル、２５学習用データ取得部、２６劣化処理部、２７学習用カメラ画像、２７ａ劣化学習用カメラ画像、２７ｂ記憶済み学習用カメラ画像、２８学習用センサ情報、２９学習用疑似カメラ画像、１００，１００ａ監視システム、１０１プロセッサ、１０２メモリ、１０３バス。 1, 1a, 1b, 1c Image generation device, 2 Monitoring unit, 3 Display unit, 4 Camera, 5 Object sensor, 6 Camera image, 6a Degraded camera image, 6b Stored camera image, 7 Sensor information, 7b Stored sensor information, 10 Image generation unit, 11 Camera image acquisition unit, 12 Sensor information acquisition unit, 13 Learned model storage unit, 13a Learned model, 14 Pseudo camera image, 15 Sharpness determination unit, 16 Output switching unit, 17 Input switching unit, 20, 20a Learning unit, 21 Identification unit, 22 Learning generation unit, 23 Identification model storage unit, 23a Identification model, 24 Generative model storage unit, 24a Generative model, 25 Learning data acquisition unit, 26 Degradation processing unit, 27 Learning camera image, 27a Degraded learning camera image, 27b Stored learning camera image, 28 Learning sensor information, 29 Pseudo camera image for training, 100, 100a surveillance system, 101 processor, 102 memory, 103 bus.

Claims

a camera image acquisition unit that acquires a camera image, which is visual information of a subject;
a sensor information acquisition unit that acquires sensor information obtained by detecting the object to be photographed using an object sensor;
a trained model storage unit that stores a trained model for generating a pseudo camera image, which is a composite image obtained by superimposing the camera image and a sensor image that is visual information that simulates visual information of the subject, from the camera image and the sensor information; and
an image generation unit that outputs the pseudo camera image using the trained model from the camera image acquired by the camera image acquisition unit and the sensor information acquired by the sensor information acquisition unit;
a sharpness determination unit that determines the sharpness of the camera image acquired by the camera image acquisition unit;
an output switching unit that outputs either the camera image or the pseudo camera image based on the sharpness;
An image generating device comprising:

The image generation device according to claim 1, wherein the trained model storage unit trains an identification model, which is a learning model of the identification unit, so that the identification unit, which identifies training camera images, which are images in which the subject is clearly captured, as positive and the pseudo camera images as false, improves the accuracy of the identification, and the training generation unit, which is a learning generator that outputs the pseudo camera images from the camera images and the sensor information, trains a generation model, which is a learning model of the training generation unit, so that the pseudo camera images cause the identification unit to misjudge the identification, and stores the trained generation model as the trained model in a generative adversarial network.

The image generation device according to claim 2, characterized in that the learning generation unit generates the pseudo camera image from a degraded learning camera image obtained by processing the learning camera image so that at least a portion of the visual information of the subject contained in the learning camera image is missing, and the sensor information.

2. The image generation device according to claim 1, further comprising a learning unit that acquires the camera image acquired by the camera image acquisition unit and the sensor information acquired by the sensor information acquisition unit as learning data, and uses the learning data to train the trained model for generating the pseudo camera image.

The learning unit
an identification unit that identifies the camera images acquired as the learning data as genuine and the pseudo camera images as fake;
a learning generation unit that is a learning generator that outputs the pseudo camera image from the camera image acquired as the learning data and the sensor information;
A learning device using a generative adversarial network comprising:
the identification unit trains an identification model, which is a learning model of the identification unit, so as to improve the accuracy of the identification;
the generation unit for learning trains a generation model that is a learning model of the generation unit for learning so as to generate the pseudo camera image that causes the identification unit to misjudge the identification;
The image generating device according to claim 4 , wherein the trained generative model is stored in the trained model storage unit as the trained model.

The image generation device according to claim 5, characterized in that the learning generation unit generates the pseudo camera image from the sensor information and a degraded camera image obtained by processing the camera image acquired as the learning data so that at least a portion of the visual information of the subject contained in the camera image is missing.

the learning unit stores the camera images acquired as the learning data in a generative model storage unit that stores the generative model as camera images for image generation;
The image generating device according to claim 5 or 6, wherein the learning generation unit trains the generative model to generate the pseudo camera image from the learning data and the camera image for image generation.

an input switching unit that inputs the camera image and the sensor information to either the image generation unit or the learning unit based on the sharpness;
the image generation unit generates the pseudo camera image when the input switching unit inputs the camera image and the sensor information to the image generation unit;
the learning unit causes the trained model to learn when the input switching unit inputs the camera image and the sensor information to the learning unit;
The image generation device according to any one of claims 4 to 7, characterized in that the output switching unit outputs a pseudo camera image when the input switching unit inputs the camera image and the sensor information to the image generation unit, and outputs the camera image when the input switching unit inputs the camera image and the sensor information to a learning unit.

The image generating device according to claim 1 , wherein the sensor information acquiring unit acquires millimeter wave sensor information obtained by detecting the object to be photographed using a millimeter wave sensor.

1. A computer-implemented image generation method comprising:
a camera image acquisition step of acquiring a camera image which is visual information of the subject;
a sensor information acquisition step of acquiring sensor information obtained by detecting the object to be photographed using an object sensor;
a step of outputting the pseudo camera image from the camera image acquired in the camera image acquiring step and the sensor information acquired in the sensor information acquiring step, using a trained model for generating a pseudo camera image, which is a composite image obtained by superimposing the camera image and a sensor image, which is data that simulates visual information of the subject, from the camera image and the sensor information;
a clarity determination step of determining clarity of the camera image acquired in the camera image acquisition step;
an output switching step of outputting either the camera image or the pseudo camera image based on the sharpness;
An image generating method comprising:

a camera image acquisition step of acquiring a camera image which is visual information of the subject;
a sensor information acquisition step of acquiring sensor information obtained by detecting the object to be photographed using an object sensor;
a step of outputting the pseudo camera image from the camera image acquired in the camera image acquiring step and the sensor information acquired in the sensor information acquiring step, using a trained model for generating a pseudo camera image, which is a composite image obtained by superimposing the camera image and a sensor image, which is data that simulates visual information of the subject, from the camera image and the sensor information;
a clarity determination step of determining clarity of the camera image acquired in the camera image acquisition step;
an output switching step of outputting either the camera image or the pseudo camera image based on the sharpness;
An image generating program that causes a computer to execute the above.