JP7699464B2

JP7699464B2 - Object detection device

Info

Publication number: JP7699464B2
Application number: JP2021081250A
Authority: JP
Inventors: 隆瀧本
Original assignee: Chubu Electric Power Co Inc
Current assignee: Chubu Electric Power Co Inc
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2025-06-27
Anticipated expiration: 2041-05-12
Also published as: JP2022175103A

Description

本発明は、画像情報に基づいて物体を検出する物体検出装置に関する。 The present invention relates to an object detection device that detects objects based on image information.

物体を検出する技術として、ディープラーニング（多層ニューラルネットワークによる機械学習手法）を用いて物体を検出する技術（ＡＩを利用した物体検出技術）が研究されている。例えば、ＣＣＤカメラ等の撮像手段で撮像した撮像画像を示す画像情報に基づいて物体の位置とカテゴリを同時に検出するＳＳＤ(Single Shot MultiBox Detector)やＹＯＬＯ(You Only Look Once)といったエンド・ツー・エンド(end-to-end)の手法が多数提案されている。これらの手法は、物体の位置検出のための多層ニューラルネットワークによる学習と、物体のカテゴリ判別のための多層ニューラルネットワークによる学習を同時に行うマルチタスク学習を基本としている。
ＳＳＤによる物体検出技術は、例えば、非特許文献１に開示され、ＹＯＬＯによる物体検出技術は、例えば、非特許文献２に開示されている。 As a technology for detecting objects, technology for detecting objects using deep learning (machine learning method using multi-layer neural networks) (object detection technology using AI) has been researched. For example, many end-to-end methods have been proposed, such as SSD (Single Shot MultiBox Detector) and YOLO (You Only Look Once), which simultaneously detect the position and category of an object based on image information indicating an image captured by an imaging means such as a CCD camera. These methods are based on multi-task learning, which simultaneously performs learning using a multi-layer neural network for detecting the position of an object and learning using a multi-layer neural network for discriminating the category of the object.
An object detection technique using SSD is disclosed in, for example, Non-Patent Document 1, and an object detection technique using YOLO is disclosed in, for example, Non-Patent Document 2.

近年、撮像手段の性能が向上し、撮像画像を示す画像情報の画素数（解像度）が増大する傾向にある。例えば、現在、多くの監視カメラは、２Ｋ（２００万画素）以下の対応のものが用いられているが、今後、４Ｋ（８００万画素）や８Ｋ（３３００万画素）対応のものが普及することが考えられる。画素数が多い（解像度が高い）撮像手段を用いることができれば、高精細な画像情報を得ることができ、物体の検出精度が向上する。
一方、現行の物体検出装置は、２Ｋ未満の解像度(例えば、数百×数百画素)の画像情報を処理するように構成されている。このため、現行の物体検出装置により、４Ｋや８Ｋ対応やそれ以上の高解像度の画像情報を処理すると、検出取りこぼしが多くなり、物体の検出精度が低下するおそれがある。現行の物体検出装置を、高解像度の画像情報を処理可能に構成するには、多大の労力と費用を要する。
そこで、本発明者は、撮像画像を複数の分割画像に分割し、各分割画像を示す分割画像情報を処理することにより、現行の物体検出装置を用いながら物体の検出精度を向上させることができる技術を開発し、出願した。 In recent years, the performance of imaging means has improved, and the number of pixels (resolution) of image information showing an imaged image tends to increase. For example, currently, many surveillance cameras are compatible with 2K (2 million pixels) or less, but in the future, it is expected that cameras compatible with 4K (8 million pixels) or 8K (33 million pixels) will become widespread. If an imaging means with a large number of pixels (high resolution) can be used, high-definition image information can be obtained, and the accuracy of object detection will improve.
On the other hand, current object detection devices are configured to process image information with a resolution of less than 2K (e.g., several hundred by several hundred pixels). For this reason, when current object detection devices process image information with a resolution of 4K, 8K, or higher, there is a risk that detection errors will increase and object detection accuracy will decrease. It takes a great deal of effort and cost to configure current object detection devices to be able to process high-resolution image information.
Therefore, the inventor has developed and filed a patent application for a technology that can improve the accuracy of object detection while using current object detection devices by dividing a captured image into multiple divided images and processing the divided image information representing each divided image.

“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf “You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf“You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf

ここで、ディープラーニングを用いて、遠方に配置されている撮像手段により、広い監視領域を撮像した撮像画像に基づいて、監視領域内に存在する物体を検出する場合には、撮像画像に含まれる物体の画像が非常に小さくなることがある。例えば、ダムの下流の河川敷等の監視領域に人が存在するか否かを検出する場合には、監視領域を撮像した撮像画像では、監視領域に存在する人の画像は非常に小さい。
このように、撮像画像に含まれている物体の画像が小さい場合には、前述した、撮像画像を分割し、各分割画像を示す分割画像情報を処理する技術を用いても、物体を検出することができない。
本発明者は、撮像画像に含まれている物体の画像が小さい場合における物体の検出技術について種々検討した。その結果、物体の画像が小さい場合でも、物体の画像の大きさと移動距離（移動速度）に着目することにより、物体を検出することができることを見出した。
本発明は、このような点に鑑みて創案されたものであり、撮像画像に含まれている物体の検出精度を向上させることができる技術を提供することを目的とする。 Here, when using deep learning to detect an object present within a wide monitoring area based on an image captured by an imaging means located at a distance, the image of the object contained in the captured image may be very small. For example, when detecting whether or not a person is present in a monitoring area such as a riverbed downstream of a dam, the image of the person present in the monitoring area may be very small in the captured image of the monitoring area.
In this way, when the image of an object contained in a captured image is small, the object cannot be detected even if the technique described above of dividing the captured image and processing the divided image information indicating each divided image is used.
The present inventors have conducted various studies on techniques for detecting objects when the image of the object contained in the captured image is small. As a result, they have found that even when the image of the object is small, it is possible to detect the object by focusing on the size of the image of the object and the distance (speed) of the object.
The present invention has been devised in view of the above points, and has an object to provide a technique capable of improving the detection accuracy of an object included in a captured image.

本発明の物体検出装置は、撮像手段と、第１の検出手段と、第２の検出手段と、を備えている。
撮像手段は、監視領域を撮像した撮像画像を示す画像情報を順次出力する。撮像手段としては、ＣＣＤカメラ等の公知の撮像手段を用いることができる。
第１の検出手段は、撮像手段から出力された画像情報に基づいて、監視領域に存在する移動体（例えば、検出対象である人等）を検出する。
第１の検出手段は、差分画像作成手段、移動体検出手段および第１の物体検出手段を有している。
差分画像作成手段は、撮像手段から異なる時間に出力された２つの画像情報で示される撮像画像の差分画像を示す差分画像情報を順次作成する。差分画像は、２つの撮像画像から、静止している背景画像を取り除いた、移動体の画像を示す。差分画像情報を作成する方法としては、公知の種々の方法を用いることができる。２つの画像情報の間隔としては、例えば、撮像手段から画像情報が出力される間隔の整数倍（「１」を含む）に設定される。
移動体検出手段は、差分画像情報に基づいて、移動体の移動速度と移動軌跡を検出する。移動体の移動速度と移動軌跡は、公知の種々の方法を用いて検出することができる。
第１の物体検出手段は、移動体検出手段で検出した移動体の移動速度と移動軌跡が、検出対象である物体に対応して設定されている設定条件を満足する場合に、当該移動体を検出対象の物体として検出する。
設定条件としては、検出対象である物体に特有の条件が用いられる。例えば、検出対象である物体が人である場合には、「第１の設定期間内における平均移動速度が、下限値（例えば、ゆっくり歩く速度）と上限量（例えば、早く走る速度）の範囲内である」という条件と、「第２の設定期間内における移動軌跡が、連続した所定形状の軌跡を形成している」という条件が設定される。第１の設定期間および第２の設定期間としては、検出対象である人に特有の移動を検出することができる期間が設定される。
第２の検出手段は、撮像手段から出力された画像情報で示される撮像画像に含まれている、検出対象である物体（例えば、人等）を検出する。
第２の検出手段は、第２の物体検出手段、画像分割手段および検出結果合成手段を有している。
画像分割手段は、撮像手段から出力された画像情報で示される撮像画像を複数の分割画像に分割する。撮像画像を複数の分割画像に分割する方法（分割画像の数、分割画像の大きさ、分割回数等）としては、適宜の方法を用いることができる。例えば、縦方向および横方向に等間隔に分割する方法、あるいは、縦方向および横方向に等間隔に同じ分割数で分割する方法が用いられる。
第２の物体検出手段は、各分割画像情報に基づいて、各分割画像に含まれている物体を検出する。第２の物体検出手段としては、画像情報に基づいて物体の位置とカテゴリを同時に検出する、ＳＳＤやＹＯＬＯ等の公知の画像処理手段が用いられる。
検出結果合成手段は、第２の物体検出手段による、各分割画像に対する物体検出結果を合成して、撮像画像の物体検出結果（各分割画像に対する物体検出結果に基づいた撮像画像の物体検出結果）として出力する。各分割画像に対する物体検出結果を合成する方法としては、例えば、分割画像における物体の位置情報を、撮像画像における物体の位置情報に変換する方法が用いられる。
本発明では、撮像画像に含まれている物体の大きさが小さい場合でも、物体を検出することができる。これにより、撮像画像に含まれている物体の検出精度を向上させることができる。また、第１の検出手段による物体検出と第２の検出手段による物体検出を行うことができるため、物体の検出精度をより高めることができる。
本発明の異なる形態では、画像分割手段は、撮像手段から出力された画像情報で示される撮像画像を、少なくとも、第１の数の第１の分割画像に分割するとともに第２の数の第２の分割画像に分割する。第１の数と第２の数は、第１の分割画像の境界部分と第２の分割画像の境界部分が、平行に重ならないように設定されている。
本形態では、第１の分割画像の境界部分と第２の分割画像の境界部分が交差することは許容される。これにより、例えば、一方の分割画像に対する物体検出では検出することができない、一方の分割画像の境界部分に跨って存在する物体を、他方の分割画像に対する物体検出によって検出することができる。第１の数および第２の数としては、適宜の数を設定することができる。分割画像の種類は、第１の数の第１の分割画像と第２の数の第２の分割画像の２種類に限定されない。
本形態では、第１の分割画像と第２の分割画像のうちの一方の分割画像の境界部分における物体の検出精度の低下を、他方の分割画像に対する物体の検出結果によって補うことができる。
第１の検出手段による物体検出結果と第２の検出手段による物体検出結果を利用する態様は、適宜設定可能である。
本発明の異なる形態では、先ず、第２の検出手段による物体検出を実行し、第２の検出手段によって物体を検出することができなかった場合に、第１の検出手段による物体検出を実行するように構成されている。
本形態では、第１の検出手段および第２の検出手段の処理負担を軽減することができる。
本発明の異なる形態では、第１の検出手段による物体検出と第２の検出手段による物体検出を併行して実行するように構成されている。
本形態では、撮像画像に含まれている物体を短時間に検出することができる。 The object detection device of the present invention includes an imaging means , a first detection means , and a second detection means .
The imaging means sequentially outputs image information indicating captured images of the monitoring area. As the imaging means, a known imaging means such as a CCD camera can be used.
The first detection means detects a moving object (for example, a person to be detected) existing in a monitoring area based on image information output from the imaging means.
The first detection means includes a difference image creation means, a moving body detection means, and a first object detection means.
The difference image creating means sequentially creates difference image information showing a difference image between captured images shown by two pieces of image information output from the imaging means at different times. The difference image shows an image of a moving object obtained by removing a stationary background image from the two captured images. Various known methods can be used to create the difference image information. The interval between the two pieces of image information is set, for example, to an integer multiple (including "1") of the interval at which the image information is output from the imaging means.
The moving object detection means detects the moving speed and the moving trajectory of the moving object based on the difference image information. The moving speed and the moving trajectory of the moving object can be detected by using various known methods.
The first object detection means detects the moving body as the object to be detected when the movement speed and movement trajectory of the moving body detected by the moving body detection means satisfy the setting conditions set corresponding to the object to be detected.
The set conditions are conditions specific to the object to be detected. For example, when the object to be detected is a person, the conditions that "the average moving speed in the first set period is within a range between a lower limit (e.g., a slow walking speed) and an upper limit (e.g., a fast running speed)" and "the moving trajectory in the second set period forms a continuous trajectory of a predetermined shape" are set. The first set period and the second set period are set to periods in which the movement specific to the person to be detected can be detected.
The second detection means detects an object (for example, a person) that is a detection target and is included in the captured image represented by the image information output from the imaging means.
The second detection means includes a second object detection means, an image division means, and a detection result synthesis means.
The image division means divides the captured image indicated by the image information output from the imaging means into a plurality of divided images. Any appropriate method can be used for dividing the captured image into a plurality of divided images (the number of divided images, the size of the divided images, the number of divisions, etc.). For example, a method of dividing at equal intervals in the vertical and horizontal directions, or a method of dividing at equal intervals in the vertical and horizontal directions by the same number of divisions can be used.
The second object detection means detects an object included in each divided image based on the divided image information. As the second object detection means, a known image processing means such as SSD or YOLO that simultaneously detects the position and category of an object based on image information is used.
The detection result synthesis means synthesizes the object detection results for each divided image by the second object detection means and outputs the result as an object detection result for the captured image (an object detection result for the captured image based on the object detection results for each divided image). As a method for synthesizing the object detection results for each divided image, for example, a method of converting position information of an object in a divided image into position information of an object in the captured image is used.
In the present invention, even if the size of an object included in a captured image is small, the object can be detected. This makes it possible to improve the detection accuracy of an object included in a captured image. In addition, since object detection can be performed by the first detection means and the second detection means, the object detection accuracy can be further improved.
In another embodiment of the present invention, the image dividing means divides the captured image represented by the image information output from the imaging means into at least a first number of first divided images and a second number of second divided images, the first number and the second number being set such that the boundary portions of the first divided images and the boundary portions of the second divided images do not overlap in parallel.
In this embodiment, the boundary portion of the first divided image and the boundary portion of the second divided image are allowed to intersect. As a result, for example, an object existing across the boundary portion of one divided image that cannot be detected by object detection for one divided image can be detected by object detection for the other divided image. Appropriate numbers can be set as the first number and the second number. The types of divided images are not limited to two types, a first number of first divided images and a second number of second divided images.
In this embodiment, a decrease in the accuracy of object detection at the boundary portion of one of the first and second divided images can be compensated for by the object detection result for the other divided image.
The manner in which the object detection result by the first detection means and the object detection result by the second detection means are used can be set appropriately.
In another aspect of the present invention, the object detection is first performed by the second detection means, and if the object cannot be detected by the second detection means, the object detection is performed by the first detection means.
In this embodiment, the processing load on the first detection means and the second detection means can be reduced.
In another embodiment of the present invention, the object detection by the first detection means and the object detection by the second detection means are executed in parallel.
In this embodiment, an object contained in a captured image can be detected in a short period of time.

本発明は、撮像画像に含まれている物体の検出精度を向上させることができる。 The present invention can improve the detection accuracy of objects contained in captured images.

本発明の物体検出装置の一実施形態のブロック図である。1 is a block diagram of an embodiment of an object detection device of the present invention. ディープラーニングを用いて物体を検出する第２の検出手段の一例の概要を示す図である。FIG. 13 is a diagram showing an overview of an example of a second detection means that detects an object using deep learning. 第２の検出手段を用いて、分割画像に基づいて物体を検出する場合と撮像画像に基づいて物体を検出する場合の検出精度を示す図である。13A and 13B are diagrams illustrating the detection accuracy when an object is detected based on a divided image and when an object is detected based on a captured image using the second detection means. 第２の検出手段を用いた物体検出動作の第１実施例を説明する図である。11A to 11C are diagrams illustrating a first embodiment of an object detection operation using a second detection means. 第２の検出手段を用いた物体検出動作の第２実施例を説明する図である。13A to 13C are diagrams illustrating a second embodiment of an object detection operation using a second detection means. 第２の検出手段を用いた物体検出動作の第３実施例を説明する図である。13A to 13C are diagrams illustrating a third embodiment of an object detection operation using the second detection means. 第１の検出手段を用いて物体検出動作の一例を説明する図である。10A and 10B are diagrams illustrating an example of an object detection operation using a first detection unit. 第１の検出手段を用いて物体検出動作の一例を説明する図である。10A and 10B are diagrams illustrating an example of an object detection operation using a first detection unit. 第１の検出手段による物体検出結果と第２の検出手段による物体検出結果を組み合わせた例を示す図である。11 is a diagram showing an example of combining an object detection result by a first detection means and an object detection result by a second detection means. FIG.

以下に、本発明の物体検出装置の実施形態を、図面を参照して説明する。
図１は、一実施形態の物体検出装置１０のブロック図を示している。
一実施形態の物体検出装置１０は、有線通信回線や無線通信回線等により接続されている、処理手段２０、撮像手段５０、記憶手段６０、入力手段７０、出力手段８０等により構成されている。
撮像手段５０は、例えば、ＣＣＤやＣＭＯＳを用いたデジタルカメラにより構成される。撮像手段５０は、撮像画像を示す画像情報を設定期間間隔（フレームレート）で出力する。なお、撮像手段５０は、監視領域が撮像画像に含まれるように配置される。
撮像手段５０が、本発明の「撮像手段」に対応し、撮像手段５０から出力される画像情報が、本発明の「撮像した撮像画像を示す画像情報」に対応する。
記憶手段６０は、ＲＯＭやＲＡＭ等により構成され、処理手段２０の処理を実行するプログラムや種々のデータが記憶される。
入力手段７０は、キーボードやタッチパネル等により構成され、種々の情報を入力する。
出力手段８０は、液晶表示装置や有機ＥＬ表示装置等により構成される表示手段や、印刷手段等により構成され、種々の情報を出力する。なお、表示手段として、表示画面に表示されている表示部をタッチすることによって情報を入力することができる表示手段が用いられる場合には、入力手段７０は、タッチセンサにより構成される。
撮像手段５０、記憶手段６０、入力手段７０，出力手段８０等は、処理手段２０と離れた場所に配置されていてもよい。 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, an embodiment of an object detection device of the present invention will be described with reference to the drawings.
FIG. 1 shows a block diagram of an object detection device 10 according to an embodiment.
An object detection device 10 according to one embodiment includes a processing unit 20, an imaging unit 50, a storage unit 60, an input unit 70, an output unit 80, and the like, which are connected via a wired communication line, a wireless communication line, or the like.
The imaging unit 50 is configured by, for example, a digital camera using a CCD or CMOS. The imaging unit 50 outputs image information indicating a captured image at a set interval (frame rate). The imaging unit 50 is disposed so that the monitored area is included in the captured image.
The imaging means 50 corresponds to the "imaging means" of the present invention, and the image information output from the imaging means 50 corresponds to the "image information indicating a captured image" of the present invention.
The storage means 60 is configured with a ROM, a RAM, etc., and stores programs for executing the processes of the processing means 20 and various data.
The input means 70 is composed of a keyboard, a touch panel, etc., and is used to input various information.
The output means 80 is composed of a display means such as a liquid crystal display device or an organic EL display device, a printing means, etc., and outputs various information. When a display means that allows information to be input by touching a display portion displayed on a display screen is used as the display means, the input means 70 is composed of a touch sensor.
The imaging means 50 , the storage means 60 , the input means 70 , the output means 80 , etc. may be located at a location separate from the processing means 20 .

処理手段２０は、ＣＰＵ等により構成される。
処理手段２０は、第１の検出手段３０と第２の検出手段４０を有している。
第２の検出手段４０は、撮像手段５０から出力された画像情報に基づいて、ディープラーニング等を用いて、画像情報で示される撮像画像に含まれている物体を検出する。好適には、第２の検出手段により、撮像画像に含まれている監視領域内に存在する物体を検出するように設定される。第２の検出手段は、物体のカテゴリと位置を検出することができるため、特定の物体（検出対象である物体）を検出することもできる。
第１の検出手段３０は、画像情報で示される撮像画像に含まれている物体の大きさが小さく、第２の検出手段４０で物体を検出することができない場合に、撮像画像に含まれている物体を検出する。第１の検出手段３０は、撮像画像に含まれている物体（好適には、撮像画像に含まれている監視領域内に存在する物体）の移動速度と移動軌跡が、検出対象である物体に対応して設定されている設定条件を満足する場合に、画像情報に含まれている物体が、検出対象である物体であることを検出する。 The processing means 20 is composed of a CPU and the like.
The processing means 20 includes a first detection means 30 and a second detection means 40 .
The second detection means 40 detects an object included in the captured image shown by the image information by using deep learning or the like based on the image information output from the imaging means 50. Preferably, the second detection means is set to detect an object present in a monitoring area included in the captured image. The second detection means can detect the category and position of an object, and therefore can also detect a specific object (an object to be detected).
The first detection means 30 detects an object included in the captured image shown by the image information when the size of the object included in the captured image is small and the object cannot be detected by the second detection means 40. The first detection means 30 detects that the object included in the image information is the object to be detected when the moving speed and moving trajectory of the object included in the captured image (preferably an object existing in a monitoring area included in the captured image) satisfy the setting conditions set corresponding to the object to be detected.

先ず、第２の検出手段４０について説明する。
第２の検出手段４０は、第２の物体検出手段４１、画像分割手段４２、検出結果合成手段４３を有している。 First, the second detection means 40 will be described.
The second detection means 40 includes a second object detection means 41 , an image division means 42 , and a detection result synthesis means 43 .

画像分割手段４２は、撮像手段５０から出力された画像情報で示される撮像画像（「元画像」という）を複数の分割画像に分割する。画像情報には、撮像手段５０から出力されて記憶手段６０に記憶されている画像情報も含まれる。画像分割手段４２による撮像画像を分割する方法については後述する。 The image splitting means 42 splits the captured image (referred to as the "original image") indicated by the image information output from the imaging means 50 into a plurality of split images. The image information also includes the image information output from the imaging means 50 and stored in the storage means 60. The method of splitting the captured image by the image splitting means 42 will be described later.

第２の物体検出手段４１は、画像分割手段４２で分割された分割画像を示す分割画像情報に基づいて、分割画像に含まれている物体および位置を検出する。なお、第２の物体検出手段４１は、撮像画像を示す画像情報に基づいて、撮像画像に含まれている物体および位置を検出することもできる。
第２の物体検出手段４１としては、ディープラーニングを用いて、撮像画像を示す画像情報あるいは分割画像を示す分割画像情報に基づいて、撮像画像あるいは分割画像に含まれている物体のカテゴリと位置を検出する、公知の種々の物体検出手段を用いることができる。例えば、ＳＳＤやＹＯＬＯの手法を用いて物体のカテゴリと位置を検出する物体検出手段を用いることができる。
例えば、ＳＳＤは、図２に示されているように、多層のＣＮＮ(Convolutional neural network)（畳み込みニューラルネットワーク）を基本とし、物体の存在領域候補を推定するレイヤと、存在領域候補内の物体を判別するレイヤとにより構成される。物体の存在領域候補を推定するレイヤでは、画像情報を、複数の所定サイズの矩形領域（デフォルトボックス）に分割し、矩形領域のずれを考慮しながら物体の存在領域候補（バウンディングボックス）を推定する。存在領域候補内の物体を判別するレイヤでは、別途学習済のＣＮＮを用いて存在領域候補内の物体を判別する。 The second object detection means 41 detects an object and its position included in the divided image based on divided image information indicating the divided image divided by the image division means 42. The second object detection means 41 can also detect an object and its position included in the captured image based on image information indicating the captured image.
As the second object detection means 41, various known object detection means that use deep learning to detect the category and position of an object included in a captured image or a divided image based on image information indicating the captured image or divided image information indicating the divided image can be used. For example, an object detection means that detects the category and position of an object using a method such as SSD or YOLO can be used.
For example, as shown in Fig. 2, SSD is based on a multi-layered CNN (Convolutional Neural Network) and is composed of a layer that estimates object existence region candidates and a layer that distinguishes objects within the existence region candidates. In the layer that estimates object existence region candidates, image information is divided into a plurality of rectangular regions (default boxes) of a predetermined size, and object existence region candidates (bounding boxes) are estimated while taking into account the deviation of the rectangular regions. In the layer that distinguishes objects within the existence region candidates, objects within the existence region candidates are distinguished using a separately trained CNN.

検出結果合成手段４３は、第２の物体検出手段４１による、各分割画像に対する物体検出結果を合成し、撮像画像に対する物体検出結果として出力する。
例えば、各分割画像に対する物体検出結果を、各分割画像における物体の位置情報を撮像画像における位置情報に変換した状態で合成して、撮像画像（「元画像」）に対する物体検出結果（この場合、各分割画像に対する物体検出結果に基づいた撮像画像に対する物体検出結果）として出力する。
なお、検出結果合成手段４３は、第２の物体検出手段４１による、各分割画像に対する物体検出結果と撮像画像（「元画像」）に対する物体検出結果を合成し、撮像画像（「元画像」）に対する物体検出結果（この場合、各分割画像に対する物体影検出結果と撮像画像に対する物体検出結果に基づいた撮像画像に対する物体検出結果）として出力することができる。この時、合成される物体検出結果に、位置がほぼ同じ物体が含まれている場合には、例えば、物体検出時に用いられる、物体らしさを示すスコアが高い方を選択する。あるいは、両方を出力することもできる。
また、検出結果合成手段４３は、撮像画像に対する物体検出結果を、撮像画像の物体検出結果（この場合、「撮像画像に対する物体検出結果に基づいた撮像画像の物体検出結果」）として出力することもできる。 The detection result synthesis means 43 synthesizes the object detection results for each divided image by the second object detection means 41 and outputs the result as the object detection result for the captured image.
For example, the object detection results for each divided image are synthesized after converting the positional information of the object in each divided image into positional information in the captured image, and output as the object detection result for the captured image (the "original image") (in this case, the object detection result for the captured image based on the object detection results for each divided image).
The detection result synthesis means 43 can synthesize the object detection results for each divided image by the second object detection means 41 and the object detection results for the captured image ("original image"), and output the result as the object detection result for the captured image ("original image") (in this case, the object detection result for the captured image based on the object shadow detection results for each divided image and the object detection result for the captured image). At this time, if the object detection results to be synthesized include objects at approximately the same position, for example, the one with the higher score indicating the object-likeliness used during object detection is selected. Alternatively, both can be output.
In addition, the detection result synthesis means 43 can also output the object detection result for the captured image as the object detection result for the captured image (in this case, the "object detection result for the captured image based on the object detection result for the captured image").

ここで、物体検出手段として現行の画像処理手段を用い、撮像画像に対して物体検出処理を実行する場合と、撮像画像を分割した分割画像に対して物体検出処理を実行する場合の物体の検出精度を、図３を参照して説明する。
（Ｍ１）は、遠方に存在する二人の人を含む画像を示している。
（Ｍ２）は、画像（Ｍ１）を、物体検出処理に用いられる撮像画像（Ｘ）に対応する大きさに縮小した画像を示している。なお、撮像手段５０のズーム機能を用いることによって、撮像画像（Ｘ）中における画像（Ｍ２）の大きさは変化する。
（Ｍ３）は、画像（Ｍ１）を、撮像画像（Ｘ）を分割した分割画像（図３では、縦方向および横方向それぞれに等間隔に２分割した４分割画像）に対応する大きさに縮小した画像を示している。
（Ｎ１）は、撮像画像（Ｘ）における、画像（Ｍ２）に対応する領域の画像（処理対象画像）を示している。
（Ｎ２）は、分割画像における、画像（Ｍ３）に対応する領域の画像（処理対象画像）を示している。
現行の画像処理手段を用いて、撮像画像（Ｘ）の処理対象画像（Ｎ１）に対して物体検出処理を実行した場合、物体の画像（Ｍ２）を検出することができなかった。一方、分割画像の処理対象画像（Ｎ２）に対して物体検出処理を実行した場合には、物体の画像（Ｍ３）を検出することができた。 Here, with reference to Figure 3, we will explain the object detection accuracy when using current image processing means as object detection means and performing object detection processing on a captured image, and when performing object detection processing on divided images obtained by dividing the captured image.
(M1) shows an image containing two people in the distance.
(M2) shows an image obtained by reducing the image (M1) to a size corresponding to the captured image (X) used in the object detection process. Note that the size of the image (M2) in the captured image (X) changes when the zoom function of the imaging means 50 is used.
(M3) shows an image (M1) reduced to a size corresponding to the divided images obtained by dividing the captured image (X) (in FIG. 3, four divided images obtained by dividing the captured image (X) into two equal parts vertically and horizontally).
(N1) indicates an image (image to be processed) of a region in the captured image (X) that corresponds to the image (M2).
(N2) indicates an image (image to be processed) of a region in the divided image that corresponds to the image (M3).
When object detection processing was performed on the processing target image (N1) of the captured image (X) using the current image processing means, the image of the object (M2) could not be detected. On the other hand, when object detection processing was performed on the processing target image (N2) of the divided image, the image of the object (M3) could be detected.

以上のように、撮像画像を分割した分割画像に対して、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を実行することにより、現行の画像処理手段では検出することができなかった、撮像画像に含まれている小さい画像の物体を検出することが可能となる。すなわち、物体の検出精度を向上させることができる。
実験では、現行の画像処理手段を用いた場合には、撮像画像における物体の最小検出サイズは、約［１０×３０画素］であったが、第２の検出手段４０を用いた場合には、４分割画像における物体の最小検出サイズは、約［７×１５画素］であった。 As described above, by performing object detection processing by the second detection means 40 (second object detection means 41) on the divided images obtained by dividing the captured image, it becomes possible to detect small objects in the captured image that could not be detected by the current image processing means. In other words, it is possible to improve the object detection accuracy.
In experiments, when the current image processing means was used, the minimum detection size of an object in a captured image was approximately 10 x 30 pixels, whereas when the second detection means 40 was used, the minimum detection size of an object in a four-part image was approximately 7 x 15 pixels.

次に、第２の検出手段４０の動作を説明する。
第２の検出手段４０の動作の第１実施例を、図４を参照して説明する。
第１実施例では、画像分割手段４２は、撮像画像（Ｘ）を、４分割線（１本の横方向分割線、１本の縦方向分割線）によって、４個（横方向に等間隔に２個×縦方向に等間隔に２個）の４分割画像（ａ）～（ｄ）に分割する。なお、撮像画像（Ｘ）における、各４分割画像（ａ）～（ｄ）の位置（例えば、撮像画像（Ｘ）の座標上における、各４分割画像（ａ）～（ｄ）の角部の位置）は、記憶手段６０に記憶される。
第１実施例では、撮像画像（Ｘ）は、２の２乗個（２^２）（縦方向および横方向それぞれに等間隔に２個）の４分割画像（ａ）～（ｄ）に分割されている。すなわち、４分割画像（ａ）～（ｄ）の縦横比（アスペクト比）が、撮像画像（Ｘ）の縦横比（アスペクト比）と等しい（「ほぼ等しい」を含む）。このため、第２の物体検出手段４１により、各４分割画像（ａ）～（ｄ）に対して物体検出処理を実行する場合に、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。 Next, the operation of the second detection means 40 will be described.
A first embodiment of the operation of the second detection means 40 will now be described with reference to FIG.
In the first embodiment, the image dividing means 42 divides the captured image (X) into four (two evenly spaced horizontally x two evenly spaced vertically) quarter-part images (a)-(d) by quarter-part lines (one horizontal division line, one vertical division line). The positions of each of the quarter-part images (a)-(d) in the captured image (X) (for example, the positions of the corners of each of the quarter-part images (a)-(d) on the coordinate system of the captured image (X)) are stored in the storage means 60.
In the first embodiment, the captured image (X) is divided into 2 squared (2 ² ) (two equally spaced vertically and horizontally) four-part images (a) to (d). That is, the aspect ratios of the four-part images (a) to (d) are equal to (including "almost equal") the aspect ratio of the captured image (X). Therefore, when the second object detection means 41 performs object detection processing on each of the four-part images (a) to (d), there is no distortion due to a change in image scale, and there is no impact on object detection performance.

第２の物体検出手段４１は、各４分割画像（ａ）～（ｄ）に対する物体検出処理を実行して、各４分割画像（ａ）～（ｄ）に対する物体検出結果を出力する。
また、第２の物体検出手段４１は、撮像画像（Ｘ）に対する物体検出処理を実行して、撮像画像（Ｘ）に対する物体検出結果を出力する。
検出結果合成手段４３は、各４分割画像（ａ）～（ｄ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果を合成し、撮像画像（Ｘ）の物体検出結果（各分割画像（ａ）～（ｄ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果に基づいた撮像画像（Ｘ）に対する物体検出結果）として出力する。例えば、各４分割画像（ａ）～（ｄ）に対する物体検出結果に含まれている物体の位置情報を、撮像画像（Ｘ）における位置情報に変換した状態で、各４分割画像（ａ）～（ｄ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果を合成する。 The second object detection means 41 executes an object detection process for each of the four divided images (a) to (d) and outputs an object detection result for each of the four divided images (a) to (d).
Moreover, the second object detection means 41 executes an object detection process for the captured image (X) and outputs an object detection result for the captured image (X).
The detection result synthesis means 43 synthesizes the object detection result for each of the four divided images (a) to (d) and the object detection result for the captured image (X), and outputs it as the object detection result for the captured image (X) (the object detection result for the captured image (X) based on the object detection result for each of the divided images (a) to (d) and the object detection result for the captured image (X)). For example, the object detection result for each of the four divided images (a) to (d) is converted into position information in the captured image (X), and then the object detection result for each of the four divided images (a) to (d) is synthesized with the object detection result for the captured image (X).

第１実施例では、撮像画像（Ｘ）を４分割した４個の４分割画像（ａ）～（ｄ）に対して物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理では検出することができない物体を検出することができる。
これにより、物体の検出精度を向上させることができる。
なお、撮像画像（Ｘ）を４個の４分割画像（ａ）～（ｄ）に分割した場合、各４分割画像（ａ）～（ｄ）の境界部分に存在する物体、例えば、４分割画像（ａ）～（ｄ）の境界部分を跨いで存在する物体を検出することができない可能性がある。例えば、図５に示されているように、４分割画像（ａ）と（ｂ）に跨って存在する物体（Ｐ）は、４分割画像（ａ）、（ｂ）に対する物体検出処理では検出することができない可能性がある。
第１実施例では、撮像画像（Ｘ）に対する物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理を出力する。そして、撮像画像（Ｘ）に対する物体検出結果と各４分割画像（ａ）～（ｄ）に対する物体検出結果を合成している。これにより、各４分割画像（ａ）～（ｄ）に対する物体検出処理では検出することができない、各４分割画像（ａ）～（ｄ）の境界部分に存在する物体を、撮像画像（Ｘ）に対する物体検出処理によって検出することができる。 In the first embodiment, by performing object detection processing on four quarter-part images (a) to (d) obtained by dividing the captured image (X) into four parts, it is possible to detect objects that cannot be detected by object detection processing on the captured image (X).
This makes it possible to improve the accuracy of object detection.
In addition, when the captured image (X) is divided into four quarter-part images (a) to (d), there is a possibility that an object existing in the boundary portion between each of the quarter-part images (a) to (d), for example, an object existing across the boundary portion between the quarter-part images (a) to (d), cannot be detected. For example, as shown in Fig. 5, there is a possibility that an object (P) existing across the quarter-part images (a) and (b) cannot be detected by the object detection process for the quarter-part images (a) and (b).
In the first embodiment, object detection processing is performed on the captured image (X), and the object detection processing on the captured image (X) is output. Then, the object detection result on the captured image (X) is combined with the object detection results on each of the four divided images (a) to (d). This makes it possible to detect objects that exist in the boundary portions of each of the four divided images (a) to (d), which cannot be detected by the object detection processing on each of the four divided images (a) to (d), by the object detection processing on the captured image (X).

第２の検出手段４０の動作の第２実施例を、図５を参照して説明する。
第２実施例では、画像分割手段４２は、撮像画像（Ｘ）を、４分割線によって４個の４分割画像（ａ）～（ｄ）に分割するとともに、９分割線（３本の横方向分割線、３本の縦方向分割線）によって９個（横方向に等間隔に３個×縦方向に等間隔に３個）の９分割画像（Ａ）～（Ｉ）に分割する。なお、撮像画像（Ｘ）における、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）の位置（例えば、撮像画像（Ｘ）の座標上における、各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）の角部の位置）は、記憶手段６０に記憶される。
第２実施例では、撮像画像（Ｘ）は、２の２乗個（２^２）（縦方向および横方向それぞれに等間隔に２個）の４分割画像（ａ）～（ｄ）と、３の２乗個（３^２）（縦方向および横方向それぞれに等間隔に３個）の９分割画像（Ａ）～（Ｉ）に分割されている。すなわち、４分割画像（ａ）～（ｄ）および９分割画像（Ａ）～（Ｉ）の縦横比（アスペクト比）が、撮像画像の縦横比（アスペクト比）と等しい（「ほぼ等しい」を含む）。このため、第２の物体検出手段４１により、４分割画像（ａ）～（ｄ）および９分割画像（Ａ）～（Ｉ）に対して物体検出処理を実行する場合に、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。 A second embodiment of the operation of the second detection means 40 will now be described with reference to FIG.
In the second embodiment, the image dividing means 42 divides the captured image (X) into four quarter-part images (a)-(d) by quarter-partition lines, and into nine nine-part images (A)-(I) by nine-partition lines (three horizontal division lines and three vertical division lines). The positions of each of the quarter-part images (a)-(d) and each of the nine-part images (A)-(I) in the captured image (X) (for example, the positions of the corners of each of the divided images (a)-(d) and (A)-(I) on the coordinate system of the captured image (X)) are stored in the storage means 60.
In the second embodiment, the captured image (X) is divided into 2 2 (2 ² ) (2 equally spaced vertically and horizontally) 4-division images (a) to (d) and 3 ² (3 equally spaced vertically and horizontally) 9-division images (A) to (I). That is, the aspect ratios of the 4-division images (a) to (d) and the 9-division images (A) to (I) are equal (including "almost equal") to the aspect ratio of the captured image. Therefore, when the second object detection means 41 performs object detection processing on the 4-division images (a) to (d) and the 9-division images (A) to (I), there is no distortion due to a change in the image scale, and there is no effect on object detection performance.

第２の物体検出手段４１は、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）に対して物体検出処理を実行し、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）に対する物体検出結果を出力する。
また、第２の物体検出手段４１は、撮像画像（Ｘ）に対して物体検出処理を実行し、撮像画像（Ｘ）に対する物体検出結果を出力する。
検出結果合成手段４３は、各４分割画像（ａ）～（ｄ）に対する物体検出結果および各９分割画像（Ａ）～（Ｉ）に対する物体検出結果と、撮像画像（Ｘ）に対する物体検出結果を合成し、撮像画像（Ｘ）に対する物体検出結果（各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果に基づいた撮像画像（Ｘ）に対する物体検出結果果）として出力する。例えば、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）に対する物体検出結果に含まれている物体の位置情報を、撮像画像（Ｘ）における位置情報に変換した状態で、各４分割画像（ａ）～（ｄ）および各９分割画像に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果を合成する。検出結果合成手段４３による物体検出結果の合成処理は、第１実施例における合成処理と同様の方法を用いることができる。 The second object detection means 41 performs object detection processing on each of the four-division images (a) to (d) and each of the nine-division images (A) to (I), and outputs object detection results for each of the four-division images (a) to (d) and each of the nine-division images (A) to (I).
Further, the second object detection means 41 executes an object detection process for the captured image (X) and outputs the object detection result for the captured image (X).
The detection result synthesis means 43 synthesizes the object detection results for each of the four-division images (a) to (d) and each of the nine-division images (A) to (I) with the object detection result for the captured image (X), and outputs it as an object detection result for the captured image (X) (an object detection result for the captured image (X) based on the object detection results for each of the divided images (a) to (d), (A) to (I) and the object detection result for the captured image (X)). For example, the object detection results for each of the four-division images (a) to (d) and each of the nine-division images (A) to (I) are synthesized with the object detection results for the captured image (X) in a state where the object position information included in the object detection results for each of the four-division images (a) to (d) and each of the nine-division images (A) to (I) is converted to position information in the captured image (X). The synthesis process of the object detection results by the detection result synthesis means 43 can use a method similar to the synthesis process in the first embodiment.

第２実施例では、撮像画像（Ｘ）を４分割した４個の４分割画像（ａ）～（ｄ）および９分割した９個の９分割画像（Ａ）～（Ｉ）に対して物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理では検出することができない物体を検出することができる。
これにより、物体の検出精度を向上させることができる。
また、撮像画像（Ｘ）を、偶数である２の２乗個（２^２）に４分割するとともに、奇数である３の２乗個（３^２）に９分割している。これにより、４分割画像（ａ）～（ｄ）の境界部分（縦方向境界線、横方向境界線）と、９分割画像（Ａ）～（Ｉ）の境界部分（縦方向境界線、横方向境界線）は、交差するが、平行に重ならない。
のため、各４分割画像（ａ）～（ｄ）の境界部分における物体検出精度の低下（例えば、境界部分を跨いで存在する物体を検出することができない）を、各９分割画像（Ａ）～（Ｉ）の物体検出結果によって補うことができる。例えば、図５に示されているように、４分割画像（ａ）と（ｂ）に跨って存在する物体（Ｐ）は、４分割画像（ａ）、（ｂ）に対する物体検出処理によって検出することができない可能性があるが、９分割画像（Ｂ）に対する物体検出処理によって検出することができる。同様に、各９分割画像（Ａ）～（Ｉ）の境界部分における物体検出精度の低下を、各４分割画像（ａ）～（ｄ）の物体検出結果によって補うことができる。さらに、撮像画像（Ｘ）に対する物体検出処理によって補うこともできる。
したがって、物体の検出精度をより向上させることができる。 In the second embodiment, object detection processing is performed on four 4-part images (a) to (d) obtained by dividing the captured image (X) into four parts, and on nine 9-part images (A) to (I) obtained by dividing the captured image (X) into nine parts, thereby making it possible to detect objects that cannot be detected by object detection processing performed on the captured image (X).
This makes it possible to improve the accuracy of object detection.
In addition, the captured image (X) is divided into four parts, an even number of 2 squared parts (2 ² ), and into nine parts, an odd number of 3 squared parts (3 ² ). As a result, the boundaries (vertical and horizontal boundaries) of the four-part images (a) to (d) intersect but do not overlap in parallel with the boundaries (vertical and horizontal boundaries) of the nine-part images (A) to (I).
Therefore, the decrease in object detection accuracy at the boundary portion of each of the four-part images (a) to (d) (for example, an object existing across the boundary portion cannot be detected) can be compensated for by the object detection results of each of the nine-part images (A) to (I). For example, as shown in FIG. 5, an object (P) existing across the four-part images (a) and (b) may not be detectable by the object detection process for the four-part images (a) and (b), but can be detected by the object detection process for the nine-part image (B). Similarly, the decrease in object detection accuracy at the boundary portion of each of the nine-part images (A) to (I) can be compensated for by the object detection results of each of the four-part images (a) to (d). Furthermore, it can also be compensated for by the object detection process for the captured image (X).
Therefore, the object detection accuracy can be further improved.

第２の検出手段４０の動作の第３実施例を、図６を参照して説明する。
第３実施例では、画像分割手段４２は、撮像画像（Ｘ）を、４分割線によって４個の４分割画像（ａ）～（ｄ）を得、９分割線によって９個の９分割画像（Ａ）～（Ｉ）を得るとともに、１６分割線（４本の横方向分割線、４本の縦方向分割線）によって１６個（横方向に等間隔に４個×縦方向に等間隔に４個）の１６分割画像（１）～（１６）に分割する。なお、撮像画像（Ｘ）における、各４分割画像（ａ）～（ｄ）、各９分割画像（Ａ）～（Ｉ）および各１６分割画像（１）～（１６）の位置（例えば、撮像画像（Ｘ）の座標上における、各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）、（１）～（１６）の角部の位置）は、記憶手段６０に記憶される。
第３実施例では、撮像画像（Ｘ）は、２の２乗個（２^２）（縦方向および横方向それぞれに等間隔に２個）の４分割画像（ａ）～（ｄ）と、３の２乗個（３^２）（縦方向および横方向それぞれに等間隔に３個）の９分割画像（Ａ）～（Ｉ）と、４の２乗個（４^２）（縦方向および横方向それぞれに等間隔に４個）の１６分割画像（１）～（１６）に分割されている。すなわち、４分割画像（ａ）～（ｃ）、９分割画像（Ａ）～（Ｉ）および１６分割画像（１）～（１６）の縦横比（アスペクト比）が、撮像画像の縦横比（アスペクト比）と等しい（「ほぼ等しい」を含む）。このため、第２の物体検出手段４１により、４分割画像（ａ）～（ｄ）、９分割画像（Ａ）～（Ｉ）および１６分割画像（１）～（１６）に対して物体検出処理を実行する場合に、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。 A third embodiment of the operation of the second detection means 40 will now be described with reference to FIG.
In the third embodiment, the image division means 42 divides the captured image (X) into four 4-division images (a) to (d) by 4-division lines, into nine 9-division images (A) to (I) by 9-division lines, and into 16 (4 equally spaced horizontally x 4 equally spaced vertically) 16-division images (1) to (16) by 16-division lines (4 horizontal division lines, 4 vertical division lines). The positions of the 4-division images (a) to (d), the 9-division images (A) to (I), and the 16-division images (1) to (16) in the captured image (X) (for example, the positions of the corners of the divided images (a) to (d), (A) to (I), and (1) to (16) on the coordinate system of the captured image (X)) are stored in the storage means 60.
In the third embodiment, the captured image (X) is divided into 2^ ² (2 equally spaced vertically and horizontally) 4-division images (a)-(d), 3^2 (3^ ² ) (3 equally spaced vertically and horizontally) 9-division images (A)-(I), and 4^2 (4 ^{^2} ) (4 equally spaced vertically and horizontally) 16-division images (1)-(16). In other words, the aspect ratios of the 4-division images (a)-(c), 9-division images (A)-(I), and 16-division images (1)-(16) are equal (including "almost equal") to the aspect ratio of the captured image. Therefore, when the second object detection means 41 performs object detection processing on the 4-division images (a) to (d), the 9-division images (A) to (I), and the 16-division images (1) to (16), there is no distortion due to changes in the image scale, and there is no impact on object detection performance.

第２の物体検出手段４１は、各４分割画像（ａ）～（ｄ）、各４分割画像（Ａ）～（Ｉ）および各１６分割画像（１）～（１６）に対して物体検出処理を実行し、各４分割画像（ａ）～（ｄ）、各９分割画像（Ａ）～（Ｉ）および各１６分割画像（１）～（１６）に対する物体検出結果を出力する。
また、第２の物体検出手段４１は、撮像画像（Ｘ）に対する物体検出処理を実行して、撮像画像（Ｘ）に対する物体検出結果を出力する。
検出結果合成手段４３は、各４分割画像（ａ）～（ｄ）に対する物体検出結果、各９分割画像（Ａ）～（Ｉ）に対する物体検出結果および各１６分割画像（１）～（１６）に対する物体検出結果と、撮像画像（Ｘ）に対する物体検出結果を合成し、撮像画像（Ｘ）に対する物体検出結果（各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）、（１）～（１４）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果に基づいた撮像画像（Ｘ）に対する物体検出結果）として出力する。検出結果合成手段４３による物体検出結果の合成処理は、第１実施例や第２実施例における合成処理と同様の方法を用いることができる。 The second object detection means 41 performs object detection processing on each of the four-division images (a) to (d), each of the four-division images (A) to (I), and each of the sixteen-division images (1) to (16), and outputs object detection results for each of the four-division images (a) to (d), each of the nine-division images (A) to (I), and each of the sixteen-division images (1) to (16).
Moreover, the second object detection means 41 executes an object detection process for the captured image (X) and outputs an object detection result for the captured image (X).
The detection result synthesis means 43 synthesizes the object detection results for each of the four-division images (a) to (d), the object detection results for each of the nine-division images (A) to (I), and the object detection results for each of the sixteen-division images (1) to (16), and the object detection result for the captured image (X), and outputs it as an object detection result for the captured image (X) (an object detection result for the captured image (X) based on the object detection results for each of the divided images (a) to (d), (A) to (I), and (1) to (14) and the object detection result for the captured image (X)). The synthesis process of the object detection results by the detection result synthesis means 43 can use a method similar to the synthesis process in the first and second embodiments.

第３実施例では、撮像画像（Ｘ）を４分割した４個の４分割画像（ａ）～（ｄ）、９分割した９個の９分割画像（Ａ）～（Ｉ）および１６分割した１６個の１６分割画像（１）～（１６）に対して物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理では検出することができない物体を検出することができる。
これにより、物体の検出精度を向上させることができる。
また、撮像画像（Ｘ）を、偶数である２の２乗個（２^２）に４分割および４の２乗個（４^２）に１６分割するとともに、奇数である３の２乗個（３^２）に９分割している。これにより、４分割画像（ａ）～（ｄ）と１６分割画像（１）～（１６）の境界部分の一部が平行に重なっているが、４分割画像（ａ）～（ｄ）の境界部分および１６分割画像(１)～（１６）の境界部分と、９分割画像（Ａ）～（Ｉ）の境界部分は、平行に重なっていない。
このため、各９分割画像（Ａ）～（Ｉ）の境界部分における物体検出精度の低下（例えば、境界部分を跨いで存在する物体を検出することができない）を、各４分割画像（ａ）～（ｄ）および各１６分割画像（１）～（１６）の物体検出結果によって補うことができる。同様に、各４分割画像（ａ）～（ｄ）および各１６分割画像（１）～（１６）の境界部分における物体検出精度の低下を、各９分割画像（Ａ）～（Ｉ）の物体検出結果によって補うことができる。さらに、撮像画像（Ｘ）の物体検出結果によって補うこともできる。
したがって、物体の検出精度をより向上させることができる。 In the third embodiment, object detection processing is performed on four 4-division images (a) to (d) obtained by dividing the captured image (X) into four, nine 9-division images (A) to (I) obtained by dividing the captured image (X) into nine, and sixteen 16-division images (1) to (16) obtained by dividing the captured image (X) into sixteen. This makes it possible to detect objects that cannot be detected by object detection processing performed on the captured image (X).
This makes it possible to improve the accuracy of object detection.
In addition, the captured image (X) is divided into four even number of 2 squared pieces (2 ² ), 16 odd number of 4 squared pieces (4 ² ), and 9 odd number of 3 squared pieces (3 ² ). As a result, parts of the boundaries between the four-part images (a)-(d) and the 16-part images (1)-(16) overlap in parallel, but the boundaries between the four-part images (a)-(d) and the 16-part images (1)-(16) do not overlap in parallel with the boundaries between the nine-part images (A)-(I).
Therefore, the decrease in object detection accuracy at the boundary portions of each of the nine-division images (A) to (I) (for example, an object that exists across the boundary portion cannot be detected) can be compensated for by the object detection results of each of the four-division images (a) to (d) and each of the sixteen-division images (1) to (16). Similarly, the decrease in object detection accuracy at the boundary portions of each of the four-division images (a) to (d) and each of the sixteen-division images (1) to (16) can be compensated for by the object detection results of each of the nine-division images (A) to (I). Furthermore, it can also be compensated for by the object detection results of the captured image (X).
Therefore, the object detection accuracy can be further improved.

次に、第１の検出手段３０について説明する。
第１の検出手段３０は、差分画像作成手段３１、移動体検出手段３２および第１の物体検出手段３３を有している。 Next, the first detection means 30 will be described.
The first detection means 30 includes a difference image creation means 31 , a moving body detection means 32 and a first object detection means 33 .

差分画像作成手段３１は、撮像手段５０から異なる時間に出力された二つの画像情報で示される撮像画像の差分画像を示す差分画像情報を順次作成する。差分画像は、二つの撮像画像それぞれから、二つの画像情報に共通の画像（背景画像）を除去した画像である。すなわち、差分画像は、移動体の画像を示している。差分画像情報を作成する方法としては、公知の種々の方法を用いることができる。例えば、ＯｐｅｎＣＶの背景差分法を用いることができる。 The difference image creation means 31 sequentially creates difference image information showing the difference image between the captured images shown by the two pieces of image information output from the imaging means 50 at different times. The difference image is an image obtained by removing an image (background image) common to the two pieces of image information from each of the two captured images. In other words, the difference image shows an image of a moving object. Various known methods can be used to create the difference image information. For example, the OpenCV background difference method can be used.

差分画像情報を作成する方法の一例を、図７、図８を参照して説明する。
なお、図７は、時点［ｔ］より１タイミング前の時点［ｔ－１］に、撮像手段５０から出力された画像情報で示される撮像画像（Ｘ［ｔ－１］）を示している。撮像画像（Ｘ［ｔ－１］）には、背景画像（静止画像）（Ｍ［ｔ－１］）と、移動体（Ｙ１［ｔ－１］）、（Ｙ２［ｔ－１］）の画像が含まれている。
図８は、時点［ｔ］に、撮像手段５０から出力された画像情報で示される撮像画像Ｘ［ｔ］）を示している。撮像画像（Ｘ［ｔ］）には、背景画像（静止画像）（Ｍ［ｔ］）と、移動体（Ｙ１［ｔ］）、（Ｙ２［ｔ］）の画像が含まれている。
撮像画像（Ｘ［ｔ］）に含まれている移動体（Ｙ１［ｔ］）、（Ｙ２［ｔ］）の位置は、撮像画像（Ｘ［ｔ－１］）に含まれている移動体（Ｙ１［ｔ－１］）、（Ｙ２［ｔ－１］）の位置と異なっている。すなわち、移動体（Ｙ１）、（Ｙ２）は、時点［ｔ－１］と時点［ｔ］の間で移動している。 An example of a method for creating differential image information will be described with reference to FIGS.
7 shows a captured image (X[t-1]) represented by image information output from the imaging means 50 at a time point [t-1] one timing before the time point [t]. The captured image (X[t-1]) includes a background image (still image) (M[t-1]) and images of moving objects (Y1[t-1]) and (Y2[t-1]).
8 shows a captured image X[t] at a time point [t], which is indicated by image information output from the imaging means 50. The captured image (X[t]) includes a background image (still image) (M[t]) and images of moving objects (Y1[t]) and (Y2[t]).
The positions of the moving objects (Y1[t]) and (Y2[t]) included in the captured image (X[t]) are different from the positions of the moving objects (Y1[t-1]) and (Y2[t-1]) included in the captured image (X[t-1]). In other words, the moving objects (Y1) and (Y2) move between time points [t-1] and [t].

差分画像は、例えば、撮像画像（Ｘ［ｔ］）の各画素の状態（明度等）を、撮像画像（Ｘ［ｔ－１]）の対応する各画素の状態（明度等）と対比し、異なっていると判断される画素を抽出することによって作成される。
具体的には、図８に示されている撮像画像（Ｘ［ｔ］）に含まれている移動体（Ｙ１［ｔ］）と（Ｙ２［ｔ］）の画像と、移動体（Ｙ１［ｔ－１］）と（Ｙ２［ｔ－１］）に対応する位置の背景画像を含む差分画像が作成される。 The difference image is created, for example, by comparing the state (brightness, etc.) of each pixel in the captured image (X[t]) with the state (brightness, etc.) of the corresponding pixel in the captured image (X[t-1]) and extracting pixels that are determined to be different.
Specifically, a differential image is created that includes images of the moving objects (Y1[t]) and (Y2[t]) included in the captured image (X[t]) shown in Figure 8, and background images at positions corresponding to the moving objects (Y1[t-1]) and (Y2[t-1]).

移動体検出手段３２は、差分画像作成手段３１で作成された差分画像情報に基づいて、時点［ｔ］における（撮像画像（Ｘ[ｔ]）に含まれている）移動体（Ｙ１[ｔ]）、（Ｙ２[ｔ]）の位置を検出する。また、時点［t－1］における（撮像画像（Ｘ[ｔ－１]）に含まれている）移動体（Ｙ１[ｔ－１]）、（Ｙ２[ｔ－１]）の位置を検出する。
なお、（Ｙ１ｓ[ｔ]）、（Ｙ２ｓ[ｔ]）、（Ｙ１ｓ[ｔ－１]）、（Ｙ２ｓ[ｔ－１]）は、それぞれ移動体（Ｙ１[ｔ]）、（Ｙ２[ｔ]）、（Ｙ１[ｔ－１]）、（Ｙ２[ｔ－１]）の大きさを示している。移動体の大きさは、例えば、画素の数で判別する。すなわち、移動体の画像に外接する長方形や正方形の領域の画素数（縦の画素数×横の画素数）で判別する。
そして、時点［ｔ－１］における移動体（Ｙ１[ｔ－１]）、（Ｙ２[ｔ－１]）の位置と、時点［ｔ］における移動体（Ｙ１[ｔ]）、（Ｙ２[ｔ]）の位置との間の距離（移動距離）および方向（移動方向）を検出する。距離および方向を検出する方法としては、公知の種々の方法を用いることができる。例えば、「ＯｐｔｉｃａｌＦｌｏｗ」法を用いることができる。
図８には、移動体（Ｙ１[ｔー１]）の位置と（Ｙ１[ｔ]）の位置との間の距離および方向が移動ベクトル[Ｙ１（Ｗｔ）]で示され、移動体（Ｙ２[ｔー１]）の位置と（Ｙ２[ｔ]）の位置との間の距離および方向が移動ベクトル[Ｙ２（Ｗｔ）]で示されている。
さらに、各移動体に対する、各時点における移動ベクトルに基づいて、第１の設定期間内における各移動体の平均移動速度と、第２の設定期間内における各移動体の移動軌跡を検出する。第１の設定期間および第２の設定期間は、検出対象である物体に対応して適宜設定される。移動ベクトルとしては、２次元平面（例えば、撮像画像の左右方向および上下方向を含む２次元平面）上の移動を示すベクトルであってもよいが、好適には、３次元空間（例えば、撮像画像の左右方向、上下方向および前後方向を含む３次元空間）の移動を示すベクトルが用いられる。
好適には、移動体検出手段３２は、撮像画像に含まれている監視領域内に存在する移体を検出するように構成される。監視領域は、例えば、撮像画像における、監視領域の境界個所の位置を設定することにより規定される。 The moving object detection means 32 detects the positions of the moving objects (Y1[t]), (Y2[t]) (contained in the captured image (X[t])) at time [t] based on the differential image information created by the differential image creation means 31. It also detects the positions of the moving objects (Y1[t-1]), (Y2[t-1]) (contained in the captured image (X[t-1])) at time [t-1].
Note that (Y1s[t]), (Y2s[t]), (Y1s[t-1]), and (Y2s[t-1]) indicate the sizes of the moving bodies (Y1[t]), (Y2[t]), (Y1[t-1]), and (Y2[t-1]), respectively. The size of the moving body is determined, for example, by the number of pixels. In other words, it is determined by the number of pixels (number of vertical pixels x number of horizontal pixels) of a rectangular or square area circumscribing the image of the moving body.
Then, the distance (movement distance) and direction (movement direction) between the positions of the moving bodies (Y1[t-1]), (Y2[t-1]) at time [t-1] and the positions of the moving bodies (Y1[t]), (Y2[t]) at time [t] are detected. Various known methods can be used to detect the distance and direction. For example, the "Optical Flow" method can be used.
In Figure 8, the distance and direction between the position of the moving body (Y1[t-1]) and the position of (Y1[t]) are shown by the movement vector [Y1(Wt)], and the distance and direction between the position of the moving body (Y2[t-1]) and the position of (Y2[t]) are shown by the movement vector [Y2(Wt)].
Furthermore, based on the motion vectors of each moving object at each time point, the average motion speed of each moving object within the first set period and the motion trajectory of each moving object within the second set period are detected. The first set period and the second set period are appropriately set corresponding to the object to be detected. The motion vector may be a vector indicating motion on a two-dimensional plane (e.g., a two-dimensional plane including the left-right direction and the up-down direction of the captured image), but preferably, a vector indicating motion in a three-dimensional space (e.g., a three-dimensional space including the left-right direction, the up-down direction, and the front-back direction of the captured image) is used.
Preferably, the moving object detection means 32 is configured to detect a moving object existing within a monitoring area included in the captured image. The monitoring area is defined, for example, by setting the positions of the boundaries of the monitoring area in the captured image.

第１の物体検出手段３３は、移動体（Ｙ１）、（Ｙ２）が検出対象である物体（以下、「対象物体」という）であるか否かを検出する。
移動体（Ｙ１）、（Ｙ２）が対象物体であるか否かを検出する方法としては、種々の方法を用いることができる。本実施形態では、移動体（Ｙ１）、（Ｙ２）の移動速度と移動軌跡が、対象物体に対応して設定されている設定条件を満足しているか否かによって、移動体（Ｙ１）、（Ｙ２）が対象物体であるか否かを検出している。
移動体が対象物体であることを検出するための、移動速度と移動軌跡に関する設定条件としては、対象物体に固有に設定条件が用いられる。移動速度および移動軌跡としては、２次元平面（例えば、撮像画像の左右方向および上下方向を含む２次元空間）上における移動速度および移動軌跡が用いてもよいが、好適には、３次元空間（例えば、撮像画像の左右方向、上下方向および前後方向を含む３次元空間）における移動速度および移動軌跡が用いられる。
例えば、対象物体が人である場合には、以下の設定条件が用いられる。
（１）第１の設定期間内における平均移動速度が、下限値と上限値の範囲内である。
下限値としては、例えば、人がゆっくり歩く速度に設定され、上限量としては、例えば、人が早く走る速度に設定される。第１の設定期間としては、人の平均移動速度を判別することができる適宜の期間が設定される。
人の平均移動速度は、鳥や車両等の移動体の平均移動速度と異なっている。
（２）第２の設定期間内における移動軌跡が、連続した所定形状の軌跡を形成している。
人の移動軌跡は、鳥や車両等の移動体の移動軌跡と、異なっている。第２の設定期間としては、人の移動軌跡を判別することができる適宜の期間が設定される。
設定条件は、人の移動に関するデータを収集し、収集したデータに基づいて人の移動に特有の設定条件を設定することができる。あるいは、ディープラーニングを用いた学習により設定することもできる。 The first object detection means 33 detects whether the moving bodies (Y1) and (Y2) are objects to be detected (hereinafter referred to as "target objects").
As a method for detecting whether the moving bodies (Y1) and (Y2) are the target object, various methods can be used. In this embodiment, whether the moving bodies (Y1) and (Y2) are the target object is detected based on whether the moving speed and moving trajectory of the moving bodies (Y1) and (Y2) satisfy the setting conditions set corresponding to the target object.
As the setting conditions for the moving speed and the moving trajectory for detecting that the moving body is the target object, a setting condition specific to the target object is used. As the moving speed and the moving trajectory, the moving speed and the moving trajectory on a two-dimensional plane (for example, a two-dimensional space including the left-right direction and the up-down direction of the captured image) may be used, but preferably, the moving speed and the moving trajectory in a three-dimensional space (for example, a three-dimensional space including the left-right direction, the up-down direction, and the front-back direction of the captured image) are used.
For example, when the target object is a person, the following setting conditions are used.
(1) The average moving speed within the first set period is within the range between a lower limit value and an upper limit value.
The lower limit is set to, for example, a speed at which a person walks slowly, and the upper limit is set to, for example, a speed at which a person runs fast. The first set period is set to an appropriate period during which the average moving speed of a person can be determined.
The average moving speed of a person is different from the average moving speed of a moving object such as a bird or a vehicle.
(2) The movement trajectory within the second set period forms a continuous trajectory of a predetermined shape.
The movement trajectory of a person is different from the movement trajectory of a moving object such as a bird, a vehicle, etc. As the second set period, an appropriate period during which the movement trajectory of a person can be identified is set.
The setting conditions can be set by collecting data on the movement of people and setting conditions specific to the movement of people based on the collected data, or by learning using deep learning.

以上のように、移動体の動きを検出し、移動体の動きに基づいて検出対象の物体を検出することにより、第２の検出手段４０（第２の物体検出手段４１）では検出することができない、撮像画像に含まれている小さい画像の物体を検出することが可能となった。すなわち、物体の検出精度を向上させることができる。
実験では、撮像画像における物体の最小検出サイズ（図８に示されている、（Ｙ１ｓ[ｔ]）、（Ｙ２ｓ[ｔ]）、（Ｙ１ｓ[ｔ－１]）、（Ｙ２ｓ[ｔ－１]）の最小サイズ）は、第２の検出手段４０を用いた場合の、４分割画像における物体の最小検出サイズ［７×１５画素］より小さい、約［４×７画素］であった。 As described above, by detecting the movement of the moving body and detecting the object to be detected based on the movement of the moving body, it is possible to detect small objects included in the captured image that cannot be detected by the second detection means 40 (second object detection means 41). In other words, it is possible to improve the accuracy of object detection.
In the experiment, the minimum detection size of an object in a captured image (the minimum size of (Y1s[t]), (Y2s[t]), (Y1s[t-1]), and (Y2s[t-1]) shown in Figure 8) was approximately [4 x 7 pixels], which is smaller than the minimum detection size of an object in a four-part image [7 x 15 pixels] when the second detection means 40 was used.

第１の検出手段３０による物体検出結果と第２の検出手段４０による物体検出結果を組み合わせた例が図９に示されている。
図９は、撮像手段５０により、監視領域である、ダムの下流の河川敷を撮像した撮像画面（Ｘ）が示されている。図９には、監視領域に、移動体（１）～（４）が存在する状態が示されている。移動体（１）～（４）は、対象物体である人の画像である。撮像画面（Ｘ）では、移動体（１）と（２）の画像は大きく、移動体（３）と（４）の画像は小さい。
移動体（１）と（２）の画像は大きいので、第２の検出手段４０による、ディープラーニングを用いた物体検出処理（ＡＩ検出）で移動体（１）と（２）を検出することができる。
移動体（３）と（４）の画像は小さいので、第２の検出手段４０による物体検出処理（ＡＩ検出）では移動体（３）と（４）を検出することはできないが、第１の検出手段３０による、移動体の動きに基づいた物体検出処理（動きの検出）で移動体（３）と（４）を検出することができる。 FIG. 9 shows an example in which the object detection result by the first detection means 30 and the object detection result by the second detection means 40 are combined.
Fig. 9 shows an image capture screen (X) of a riverbed downstream of a dam, which is a monitoring area, captured by the imaging means 50. Fig. 9 shows a state in which moving objects (1) to (4) exist in the monitoring area. Moving objects (1) to (4) are images of people, which are target objects. In the imaging screen (X), the images of moving objects (1) and (2) are large, while the images of moving objects (3) and (4) are small.
Since the images of the moving objects (1) and (2) are large, the moving objects (1) and (2) can be detected by object detection processing (AI detection) using deep learning by the second detection means 40.
Because the images of the moving objects (3) and (4) are small, the moving objects (3) and (4) cannot be detected by the object detection process (AI detection) by the second detection means 40. However, the moving objects (3) and (4) can be detected by the object detection process (movement detection) based on the movement of the moving objects by the first detection means 30.

以上のように、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理では、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理に比べて、種々の物体を精度良く検出することができる。一方、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理では、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理に比べて、画像サイズが小さい物体を検出することができる。
このため、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理と第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を組み合わせることにより、物体の検出精度を向上させることができる。 As described above, the object detection process by the second detection means 40 (second object detection means 41) can detect various objects with higher accuracy than the object detection process by the first detection means 30 (first object detection means 33). On the other hand, the object detection process by the first detection means 30 (first object detection means 33) can detect objects with smaller image sizes than the object detection process by the second detection means 40 (second object detection means 41).
Therefore, by combining the object detection processing by the first detection means 30 (first object detection means 33) and the object detection processing by the second detection means 40 (second object detection means 41), the object detection accuracy can be improved.

第１実施例では、先ず、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を実行する。そして、第２の検出手段４０による物体検出処理によって検出対象である物体を検出することができなかった場合には、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理を実行する。
第１実施例では、第１の検出手段３０（第１の物体検出手段３３）および第２の検出手段４０（第２の物体検出手段４１）の処理負担を軽減することができる。
なお、第１実施例において、第２の検出手段４０による物体検出処理では物体を検出することができなかったが、第１の検出手段３０による物体検出処理で検出対象である物体を検出した場合には、さらに、係員による物体の確認を促す報知を行うように構成することもできる。あるいは、撮像手段５０のズーム機能を用いて撮像画像を拡大し、拡大した撮像画像を示す画像情報に基づいて、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を実行するように構成することもできる。 In the first embodiment, first, an object detection process is executed by the second detection means 40 (second object detection means 41). Then, if the object to be detected cannot be detected by the object detection process by the second detection means 40, an object detection process is executed by the first detection means 30 (first object detection means 33).
In the first embodiment, it is possible to reduce the processing load on the first detection means 30 (first object detection means 33) and the second detection means 40 (second object detection means 41).
In the first embodiment, the object could not be detected by the object detection process by the second detection means 40, but when the object to be detected is detected by the object detection process by the first detection means 30, a notification can be issued to prompt the staff member to confirm the object. Alternatively, the captured image can be enlarged using the zoom function of the imaging means 50, and the object detection process can be executed by the second detection means 40 (second object detection means 41) based on image information indicating the enlarged captured image.

第２実施例では、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理と第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を、併行して（同時に）実行する。
第２実施例では、短時間で物体を検出することができる。
なお、第２実施例において、第１の検出手段３０による物体検出処理で物体を検出した場合には、さらに、第１実施例と同様に、係員による物体の確認を促す報知を行うように構成し、あるいは、撮像手段５０のズーム機能を用いて撮像画像を拡大し、拡大した撮像画像を示す画像情報に基づいて、第２の検出手段４０による物体検出処理を実行するように構成することもできる。 In the second embodiment, the object detection process by the first detection means 30 (first object detection means 33) and the object detection process by the second detection means 40 (second object detection means 41) are executed in parallel (simultaneously).
In the second embodiment, an object can be detected in a short time.
In the second embodiment, when an object is detected by the object detection process by the first detection means 30, a notification may be issued to prompt an attendant to check the object, as in the first embodiment, or the captured image may be enlarged using the zoom function of the imaging means 50, and object detection process may be performed by the second detection means 40 based on image information indicating the enlarged captured image.

また、撮像画像に含まれている物体の画像が大きい場合には、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理によって検出することができる。一方、撮像画像に含まれている物体の画像が小さい場合には、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理では検出することはできないが、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理により検出することができる。
このため、撮像画像に含まれている物体の画像が小さい場合には、第１の検出手段３０（第１の物体検出手段３３）のみにより物体検出処理を実行することによって、撮像画像に含まれている物体を検出するように構成することもできる。
すなわち、本発明は、第１の検出手段３０（第１の物体検出手段３３）のみで構成することもできる。 Furthermore, if the image of an object included in the captured image is large, it can be detected by the object detection process by the second detection means 40 (second object detection means 41). On the other hand, if the image of an object included in the captured image is small, it cannot be detected by the object detection process by the second detection means 40 (second object detection means 41), but can be detected by the object detection process by the first detection means 30 (first object detection means 33).
Therefore, when the image of the object contained in the captured image is small, the object detection process can be performed only by the first detection means 30 (first object detection means 33) to detect the object contained in the captured image.
That is, the present invention may be configured with only the first detection means 30 (first object detection means 33).

以上の実施形態では、分割画像として、撮像画像を２の２乗（２^２）個（縦方向および横方向に等間隔に２分割）に分割した４分割画像を含む分割画像グループ、撮像画像を２の２乗（２^２）個に分割した４分割画像および３の２乗（３^２）個（縦方向および横方向に等間隔に３分割）に分割した９分割画像を含む分割画像グループ、２の２乗（２^２）個に分割した４分割画像、３の２乗（３^２）個に分割した９分割画像および４の２乗（４^２）個（縦方向および横方向に等間隔に４分割）に分割した１６分割画像を含む分割画像グループを用いたが、分割画像グループを構成する分割画像あるいは分割画像の組み合わせは、これに限定されない。
各分割画像に対する物体検出結果と撮像画像に対する物体検出結果を合成して撮像画像に対する物体検出結果として出力したが、分割画像に対する物体検出結果を合成して撮像画像に対する物体検出結果として出力するように構成することもできる。
１つの分割画像グループを用いたが、複数の分割画像グループを用い、選択した１つの分割画像グループを構成する分割画像に対して物体検出処理を実行し、物体検出結果に物体が含まれていない場合は、異なる分割画像グループを選択し、選択した分割画像グループを構成する分割画像に対して物体検出処理を実行するように構成することもできる。異なる分割画像グループに対する物体検出処理の繰り返しは、適宜のタイミングで終了させることができる。例えば、物体検出処理を実行した分割画像グループの数が設定値に達した時点あるいは物体検出処理の開始から設定時間経過した時点で終了させることができる。
第２の物体検出手段４１による物体検出処理を実行する際に、物体の存在の有無（少なくとも一つの物体が存在していること）を検出することを目的とする場合には、以下のように構成することができる。
第２の物体検出手段４１は、撮像画像に対する物体検出処理を実行し、撮像画像に対する物体検出結果に物体が含まれていない場合には、各分割画像に対する物体検出処理を実行する。検出結果合成手段４３は、撮像画像に対する物体検出結果に物体が含まれている場合には、撮像画像に対する物体検出結果を、撮像画像に対する物体検出結果として出力し、撮像画像に対する物体検出結果に物体が含まれていない場合には、各分割画像に対する物体検出結果を合成し、撮像画像に対する物体検出結果として出力する。なお、各分割画像に対する物体検出結果に物体が含まれていない場合には、異なる数の各分割画像に対する物体検出処理を実行し、異なる数の各分割画像に対する物体検出結果を合成し、撮像画像に対する物体検出結果として出力するように構成することもできる。異なる数の各分割画像に対する物体検出処理の繰り返しは、例えば、前述した、異なる分割画像グループに対する物体検出処理の繰り返しを終了させるタイミングと同様のタイミングで終了させることができる。
この場合、第２の物体検出手段４１による物体検出処理の回数を軽減することができる。 In the above embodiments, the split images used are split image groups including four split images obtained by dividing the captured image into two to the power of two (2 ² ) pieces (divided into two equally spaced vertically and horizontally), split image groups including four split images obtained by dividing the captured image into two to the power of two (2 ² ) pieces and nine split images obtained by dividing the captured image into three to the power of two (3 ² ) pieces (divided into three equally spaced vertically and horizontally), and split image groups including four split images divided into two to the power of two ( ² ² ) pieces, nine split images divided into three to the power of three (3 ^{2 ) pieces, and sixteen split images divided into four to the power of four (4 2} ) pieces (divided into four equally spaced vertically and horizontally). However, the split images or combinations of split images that make up the split image groups are not limited to these.
Although the object detection results for each divided image and the object detection results for the captured image are combined and output as the object detection result for the captured image, it is also possible to configure the object detection results for the divided images to be combined and output as the object detection result for the captured image.
Although one divided image group is used, multiple divided image groups are used, and object detection processing is performed on the divided images constituting one selected divided image group, and if the object detection result does not include an object, a different divided image group is selected, and object detection processing is performed on the divided images constituting the selected divided image group. The repetition of object detection processing on different divided image groups can be terminated at an appropriate timing. For example, it can be terminated when the number of divided image groups on which object detection processing has been performed reaches a set value or when a set time has elapsed since the start of object detection processing.
When performing object detection processing by the second object detection means 41, if the purpose is to detect the presence or absence of an object (that at least one object is present), the second object detection means 41 can be configured as follows.
The second object detection means 41 executes an object detection process for the captured image, and executes an object detection process for each divided image when the object detection result for the captured image does not include an object. The detection result synthesis means 43 outputs the object detection result for the captured image as the object detection result for the captured image when the object detection result for the captured image includes an object, and synthesizes the object detection results for each divided image when the object detection result for the captured image does not include an object, and outputs the result as the object detection result for the captured image. Note that, when the object detection result for each divided image does not include an object, it is also possible to configure the apparatus to execute object detection processes for a different number of divided images, synthesize the object detection results for the different number of divided images, and output the result as the object detection result for the captured image. The repetition of the object detection process for each divided image of a different number can be ended, for example, at the same timing as the timing of ending the repetition of the object detection process for the different divided image groups described above.
In this case, the number of times the second object detection means 41 performs the object detection process can be reduced.

本発明は、以下のように構成することもできる。
「（態様１）請求項２～４のうちのいずれかの物体検出装置であって、
前記画像分割手段は、前記撮像手段から出力された前記画像情報で示される前記撮像画像を、少なくとも、第１の数の第１の分割画像に分割するとともに第２の数の第２の分割画像に分割し、
前記第１の数と前記第２の数は、前記第１の分割画像の境界部分と前記第２の分割画像の境界部分が、平行に重ならないように設定されていることを特徴とする物体検出装置。」として構成することができる。
本態様では、第１の分割画像の境界部分と第２の分割画像の境界部分が交差することは許容される。これにより、例えば、一方の分割画像に対する物体検出では検出することができない、一方の分割画像の境界部分に跨って存在する物体を、他方の分割画像に対する物体検出によって検出することができる。第１の数および第２の数としては、適宜の数を設定することができる。分割画像の種類は、第１の数の第１の分割画像と第２の数の第２の分割画像の２種類に限定されない。
本態様では、第１の分割画像と第２の分割画像のうちの一方の分割画像の境界部分における物体の検出精度の低下を、他方の分割画像に対する物体の検出結果によって補うことができる。
また、「（態様２）請求項２～４、態様１のうちのいずれかの物体検出装置であって、
前記画像分割手段は、前記撮像手段から出力された前記画像情報で示される前記撮像画像を、少なくとも、第１の奇数の２乗個の第１の分割画像に分割するとともに第１の偶数の２乗個の第２の分割画像に分割することを特徴とする物体検出装置。」として構成することができる。
好適には、撮像画像を、縦方向および横方向に、等間隔で同じ分割数（奇数あるいは偶数）で分割する。分割画像の種類は、第１の奇数個の２乗個の第１の分割画像と第１の偶数個の２乗個の第２の分割画像の２種類に限定されない。
本態様では、第１の分割画像および第２の分割画像として、撮像画像の縦横比（アスペクト比）とほぼ同じ縦横比の分割画像を用いることができるため、通常の物体検出装置で用いられている画像処理手段を用いて、分割画像に対して物体検出処理を実行した場合でも、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。
また、「（態様３）請求項２～４、態様１、２のうちのいずれかの物体検出装置であって、
前記画像分割手段は、前記撮像手段から出力された前記画像情報で示される前記撮像画像を、少なくとも１種類の分割画像を含み、分割画像の総数が異なる複数の分割画像グループに分割可能であり、
前記検出結果合成手段は、１つの分割画像グループを構成する各分割画像を示す分割画像情報に基づいて、各分割画像に含まれている前記検出対象である物体を検出し、各分割画像に対する検出結果のいずれかに前記検出対象である物体が含まれている場合には、各分割画像に対する物体検出結果を合成し、前記撮像手段から出力された前記画像情報で示される前記撮像画像に対する物体検出結果として出力し、各分割画像に対する物体検出結果に前記検出対象である物体が含まれていない場合には、異なる分割画像グループに対して同様の処理を行うことを特徴とする物体検出装置。」として構成することができる。
異なる分割画像グループに対する物体検出処理の繰り返しは、適宜のタイミングで終了させることができる。例えば、物体検出処理を実行した分割画像グループの数が設定値に達した時点あるいは物体検出処理の開始から設定時間経過した時点で終了させることができる。
本態様は、好適には、撮像画像に少なくとも一つの物体が含まれていることを検出する場合に用いることができる。
本態様では、撮像画像に物体が存在することを検出した時点で物体検出処理を終了させることができるため、第２の検出手段の処理負担を軽減することができる。
また、「（態様４）請求項２～４、態様１、２のうちのいずれかの物体検出装置であって、
前記第２の物体検出手段は、ディープラーニングを用いて、前記撮像手段から出力された前記画像情報で示される前記撮像画像に含まれている、前記検出対象である物体を検出し、
前記検出結果合成手段は、前記各分割画像に対する物体影検出結果と前記撮像画像に対す物体検出結果を合成し、前記撮像画像に対する物体検出結果として出力することを特徴とする物体検出装置。」として構成することができる。
各分割画像に対する物体検出結果と撮像画像に対する物体検出結果を合成する方法としては、例えば、分割画像における物体の位置情報を撮像画像における位置情報に変換する方法を用いることができる。また、複数の物体検出結果に、カテゴリと位置が同じ物体が含まれている場合には、例えば、物体検出処理において用いた、人物体らしさを示すスコアが高い方の物体を選択する方法を用いることができる。
本態様では、物体の検出精度をより向上させることができる。
また、「（態様５）請求項２～３、態様１～４のうちのいずれかの物体検出装置であって、
前記第２の物体検出手段は、ディープラーニングを用いて、前記撮像画像に含まれている、前記検出対象である物体を検出し、前記撮像画像に対する物体検出結果に、前記検出対象である物体が含まれていない場合に、前記各分割画像に含まれている、前記検出対象である物体を検出し、
前記検出結果合成手段は、前記撮像画像に対する物体検出結果に、前記検出対象である物体が含まれている場合には、前記撮像画像に対する物体検出結果を、前記撮像画像に対する物体検出結果として出力し、前記撮像画像に対する物体検出結果に、前記検出対象である物体が含まれていない場合には、前記各分割画像に対する物体検出結果を合成し、前記撮像画像に対する物体検出結果として出力することを特徴とする物体検出装置。」として構成することができる。
本態様は、好適には、撮像画像に少なくとも一つの物体が存在していることを検出する場合に用いることができる。
本態様は、第２の検出手段の処理負担を軽減することができる。 The present invention can also be configured as follows.
"(Aspect 1) An object detection device according to any one of claims 2 to 4,
the image dividing means divides the captured image indicated by the image information output from the imaging means into at least a first number of first divided images and a second number of second divided images;
the first number and the second number are set such that a boundary portion of the first divided image and a boundary portion of the second divided image are parallel to each other and do not overlap each other.
In this embodiment, the boundary portion of the first divided image and the boundary portion of the second divided image are allowed to intersect. As a result, for example, an object that exists across the boundary portion of one divided image, which cannot be detected by object detection for one divided image, can be detected by object detection for the other divided image. Appropriate numbers can be set as the first number and the second number. The types of divided images are not limited to two types, a first number of first divided images and a second number of second divided images.
In this aspect, a decrease in the accuracy of object detection in the boundary portion of one of the first and second divided images can be compensated for by the object detection result for the other divided image.
Also, "(Aspect 2) An object detection device according to any one of claims 2 to 4 and aspect 1,
the image division means divides the captured image indicated by the image information output from the imaging means into at least a first odd number of first divided images and a first even number of second divided images.
Preferably, the captured image is divided into equal numbers of divisions (odd or even) at equal intervals in the vertical and horizontal directions. The types of division images are not limited to two types, i.e., a first odd number of squared first division images and a first even number of squared second division images.
In this aspect, the first and second split images can be split images having an aspect ratio approximately the same as that of the captured image. Therefore, even when object detection processing is performed on the split images using image processing means used in conventional object detection devices, there is no distortion due to changes in image scale, and object detection performance is not affected.
Also, "(Aspect 3) An object detection device according to any one of claims 2 to 4, aspects 1 and 2,
the image dividing means is capable of dividing the captured image indicated by the image information output from the imaging means into a plurality of divided image groups each including at least one type of divided image and differing in total number of divided images;
the detection result synthesis means detects the object to be detected that is included in each divided image based on divided image information indicating each divided image that constitutes one divided image group, and if the object to be detected is included in any of the detection results for each divided image, synthesizes the object detection results for each divided image and outputs it as the object detection result for the captured image indicated by the image information output from the imaging means, and if the object to be detected is not included in the object detection results for each divided image, performs similar processing on a different divided image group.
The repetition of the object detection process for different divided image groups can be terminated at an appropriate timing, for example, when the number of divided image groups for which the object detection process has been executed reaches a set value or when a set time has elapsed since the start of the object detection process.
This aspect can be preferably used in the case where it is detected that at least one object is included in a captured image.
In this aspect, since the object detection process can be terminated at the point in time when the presence of an object in the captured image is detected, the processing load on the second detection means can be reduced.
Also, "(Aspect 4) An object detection device according to any one of claims 2 to 4, aspects 1 and 2,
The second object detection means detects, by using deep learning, an object to be detected that is included in the captured image indicated by the image information output from the imaging means;
the detection result synthesis means synthesizes the object shadow detection result for each of the divided images and the object detection result for the captured image, and outputs the result as the object detection result for the captured image.
As a method for combining the object detection results for each divided image and the object detection results for the captured image, for example, a method of converting the position information of the object in the divided image into the position information in the captured image can be used. In addition, when multiple object detection results include objects of the same category and position, for example, a method of selecting the object with a higher score indicating the likelihood of being a human body used in the object detection process can be used.
In this embodiment, the object detection accuracy can be further improved.
Also, "(Aspect 5) An object detection device according to any one of claims 2 to 3 and aspects 1 to 4,
The second object detection means detects the object to be detected that is included in the captured image using deep learning, and when the object to be detected is not included in the object detection result for the captured image, detects the object to be detected that is included in each of the divided images;
the detection result synthesis means outputs the object detection result for the captured image as the object detection result for the captured image when the object to be detected is included in the object detection result for the captured image, and when the object detection result for the captured image does not include the object to be detected, synthesizes the object detection results for the divided images and outputs the result as the object detection result for the captured image.
This aspect can be preferably used in the case where the presence of at least one object in a captured image is detected.
This aspect can reduce the processing load on the second detection means.

本発明は、実施形態で説明した構成に限定されず、種々の変更、追加、削除が可能である。
差分画像生成手段、移動体検出手段、第１の物体検出手段、第２の物体検出手段、画像分割手段、検出結果合成手段は、実施形態で説明した構成に限定されない。
第１の検出手段は、実施形態で説明した構成に限定されない。
第２の検出手段は、実施形態で説明した構成に限定されない。
実施形態で説明した各構成は、単独で用いることもできるし、適宜選択した複数を組み合わせて用いることもできる。 The present invention is not limited to the configurations described in the embodiments, and various modifications, additions, and deletions are possible.
The differential image generating means, the moving object detecting means, the first object detecting means, the second object detecting means, the image dividing means, and the detection result combining means are not limited to the configurations described in the embodiment.
The first detection means is not limited to the configuration described in the embodiment.
The second detection means is not limited to the configuration described in the embodiment.
Each of the configurations described in the embodiments can be used alone or in combination with an appropriately selected plurality of configurations.

１０物体検出装置
２０処理手段
３０第１の検出手段
３１差分画像作成手段
３２移動体検出手段
３３第１の物体検出手段
４０第２の検出手段
４１第２の物体検出手段
４２画像分割手段
４３検出結果合成手段
５０撮像手段
６０記憶手段
７０入力手段
８０出力手段 10 Object detection device 20 Processing means 30 First detection means 31 Difference image creation means 32 Moving object detection means 33 First object detection means 40 Second detection means 41 Second object detection means 42 Image division means 43 Detection result synthesis means 50 Imaging means 60 Storage means 70 Input means 80 Output means

Claims

The imaging device includes an imaging means, a first detection means , and a second detection means ,
the first detection means includes a difference image creation means, a moving body detection means, and a first object detection means;
The imaging means outputs image information indicating a captured image,
the difference image creating means creates difference image information indicating a difference image between the captured images indicated by the two pieces of image information output from the imaging means at different times,
The moving object detection means detects a moving speed and a moving trajectory of the moving object based on the difference image information created by the difference image creation means,
the first object detection means detects the detected moving object as the object to be detected when a moving speed and a moving trajectory of the moving object detected by the moving object detection means satisfy a set condition set corresponding to the object to be detected ;
the second detection means includes a second object detection means, an image division means, and a detection result synthesis means;
the image dividing means divides the captured image indicated by the image information output from the imaging means into a plurality of divided images, and outputs divided image information indicating each divided image;
the second object detection means detects an object to be detected that is included in each divided image indicated by each divided image information output from the image division means;
The object detection device is characterized in that the detection result synthesis means synthesizes the object detection results for each divided image by the second object detection means and outputs them as the object detection result for the captured image indicated by the image information output from the imaging means .

The object detection device according to claim 1 ,
the image dividing means divides the captured image indicated by the image information output from the imaging means into at least a first number of first divided images and a second number of second divided images;
An object detection device characterized in that the first number and the second number are set so that the boundary portions of the first divided image and the boundary portions of the second divided image are parallel and do not overlap .

The object detection device according to claim 1 or 2,
An object detection device, characterized in that the first detection means is configured to operate when the object to be detected is not detected by the second detection means.

The object detection device according to claim 1 or 2,
13. An object detection device, wherein the first detection means and the second detection means are configured to operate in parallel.