JP7549841B2

JP7549841B2 - Image processing device, image recognition device, image processing program, and image recognition program

Info

Publication number: JP7549841B2
Application number: JP2021054053A
Authority: JP
Inventors: 英夫山田; 雅聡柴田; 修一榎田; 崚吾武本
Original assignee: Aisin Seiki Co Ltd; Kyushu Institute of Technology NUC; Aisin Corp
Current assignee: Kyushu Institute of Technology NUC; Aisin Corp
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2024-09-12
Anticipated expiration: 2041-03-26
Also published as: JP2022151129A

Description

本発明は、画像処理装置、画像認識装置、画像処理プログラム、及び画像認識プログラムに関し、例えば、学習した対象を画像認識するものに関する。 The present invention relates to an image processing device, an image recognition device, an image processing program, and an image recognition program, for example, for image recognition of a learned object.

自動車の自動運転技術の需要拡大にともなって、歩行者や車両を画像認識により検出する研究が盛んに行われている。
このような技術に、特許文献１に示したＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いるものがある。この技術は、同一画像の異なる解像度間における輝度勾配方向の共起の頻度分布を連続値にて特徴空間に写像するものであって、先行技術であるＨＯＧ（ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴量、ＣｏＨＯＧ（Ｃｏ－ｏｃｃｕｒｒｅｎｃｅＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴量、ＭＲＣｏＨＯＧ（ＭｕｌｔｉｐｌｅＲｅｓｏｌｕｔｉｏｎＣｏ－ｏｃｃｕｒｅｎｃｅ．ＨｉｓｔｏｇｒａｍｓｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴量などを用いた画像認識手法に比べて非常に頑健性の高い識別能力を誇っている。 As demand for self-driving technology grows, research is being actively conducted into detecting pedestrians and vehicles using image recognition.
One such technique uses the GMM-MRCoHOG feature described in Patent Document 1. This technique maps the frequency distribution of co-occurrence of luminance gradient directions between different resolutions of the same image into a feature space as a continuous value, and boasts a highly robust discrimination ability compared to image recognition techniques using prior art Histograms of Oriented Gradients (HOG) features, Co-occurrence Histograms of Oriented Gradients (CoHOG) features, Multiple Resolution Co-occurrence. Histograms of Oriented Gradients (MRCoHOG) features, and the like.

より詳細には、ＨＯＧ特徴量、ＣｏＨＯＧ特徴量、ＭＲＣｏＨＯＧ特徴量は、何れも、量子化した方向（一般的には８方向）に対応するビンに各画素の輝度勾配方向を投票することにより、その頻度分布をヒストグラムで表すものであり、ＨＯＧ特徴量では、各画素の輝度勾配方向を投票し、ＣｏＨＯＧ特徴量では、２つの画素による輝度勾配方向の共起を投票し、ＭＲＣｏＨＯＧ特徴量では、異なる解像度間での画素における輝度勾配方向の共起を投票する。
これらに対し、ＧＭＭ－ＭＲＣｏＨＯＧ特徴量では、ＧＭＭ（ＧａｕｓｓｉａｎＭｉｘｔｕｒｅＭｏｄｅｌ：混合ガウスモデル）を用いることにより、輝度勾配方向を連続値にて状態空間に配置することにより、共起の頻度分布を多峰性の確率密度関数によって表す。前３者は、状態空間を固定したビンにて形成するのに対し、ＧＭＭ－ＭＲＣｏＨＯＧ特徴量では、状態空間を自律的に形成することができる。 More specifically, the HOG feature, CoHOG feature, and MRCoHOG feature all represent the frequency distribution of the intensity gradient direction of each pixel in a histogram by voting the intensity gradient direction of each pixel into a bin corresponding to a quantized direction (generally eight directions). In the HOG feature, the intensity gradient direction of each pixel is voted, in the CoHOG feature, the co-occurrence of intensity gradient directions of two pixels is voted, and in the MRCoHOG feature, the co-occurrence of intensity gradient directions at pixels across different resolutions is voted.
In contrast to these, the GMM-MRCoHOG feature uses a Gaussian Mixture Model (GMM) to arrange the luminance gradient direction in a state space as continuous values, thereby expressing the frequency distribution of co-occurrences by a multi-peak probability density function. While the first three form a state space with fixed bins, the GMM-MRCoHOG feature can form a state space autonomously.

ところで、ＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた画像認識では、認識の対象となる画像を複数のブロックに区分して、ブロックごとに最適化した個別の異なる基底関数を用いており、メモリ容量やＣＰＵの演算能力などの多くの計算リソースを必要とするという問題があった。
特に計算リソースが限られているＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などに実装する場合、計算リソースの節約は極めて重要である。 Incidentally, in image recognition using GMM-MRCoHOG features, an image to be recognized is divided into a plurality of blocks, and a different basis function optimized for each block is used, which poses a problem of requiring a large amount of computational resources, such as memory capacity and CPU computing power.
In particular, when implementing the technology in a field programmable gate array (FPGA) or the like, where the computing resources are limited, saving computing resources is extremely important.

特開２０１８－１２４９６３号公報JP 2018-124963 A

本発明は、輝度勾配方向を用いた画像認識で計算リソースを節約することを目的とする。 The present invention aims to save computational resources in image recognition using brightness gradient orientation.

（１）請求項１に記載の発明では、画像認識学習用の画像を取得する画像取得手段と、前記取得した画像を複数のブロックに区分する区分手段と、輝度勾配方向の共起の頻度分布を前記区分したブロックごとに取得する頻度分布取得手段と、前記取得したブロックごとの頻度分布を統合して１の頻度分布に統一する統一手段と、前記統一した頻度分布に基づいて画像認識の基準となる基底関数を生成する基底関数生成手段と、を具備したことを特徴とする画像処理装置を提供する。
（２）請求項２に記載の発明では、前記統一手段が、前記複数のブロックにおける頻度分布を重畳することにより前記統合を行うことを特徴とする請求項１に記載の画像処理装置を提供する。
（３）請求項３に記載の発明では、前記統一手段が、前記取得したブロックごとの頻度分布に基づいてサンプルを発生させ、当該発生させたサンプルを前記複数のブロックに渡って足し合わせることにより、前記複数のブロックにおける頻度分布を重畳することを特徴とする請求項２に記載の画像処理装置を提供する。
（４）請求項４に記載の発明では、前記頻度分布取得手段が、同一画像の異なる解像度間における輝度勾配方向の共起の頻度分布を取得することを特徴とする請求項１、請求項２、又は請求項３に記載の画像処理装置を提供する。
（５）請求項５に記載の発明では、前記画像取得手段が、複数の画像を取得し、前記統一手段が、前記複数の画像のブロックごとの頻度分布を１の頻度分布に統一することを特徴とする請求項１から請求項４までのうちの何れか１の請求項に記載の画像処理装置を提供する。
（６）請求項６に記載の発明では、前記画像取得手段が、認識対象が写った認識対象画像と、認識対象が写っていない非認識対象画像を取得し、前記頻度分布取得手段が、前記認識対象画像と前記非認識対象画像の対応するブロックにおける輝度勾配方向の頻度分布の差異に基づいて、当該ブロックにおける頻度分布を取得することを特徴とする請求項１から請求項５までのうちの何れか１の請求項に記載の画像処理装置を提供する。
（７）請求項７に記載の発明では、前記基底関数は混合ガウスモデルによる確率密度関数であって、尤度と混合数の兼ね合いから適当な混合数を決定する決定手段を具備したことを特徴とする請求項１から請求項６までのうちの何れか１の請求項に記載の画像処理装置を提供する。
（８）請求項８に記載の発明では、前記複数のブロックには、頻度分布を統合する際の重み付けが設定されており、前記統一手段は、当該重み付けに従って、前記複数のブロックごとの頻度分布を統合することを特徴とする請求項１から請求項７までのうちの何れか１の請求項に記載の画像処理装置を提供する。
（９）請求項９に記載の発明では、請求項１から請求項８までのうちの何れか１の請求項に記載の基底関数を取得する基底関数取得手段と、画像認識に係る画像を取得する画像取得手段と、前記取得した画像をブロックに区分する区分手段と、前記区分した各ブロックに対して前記取得した基底関数を適用し、当該基底関数に対する特徴量を取得する特徴量取得手段と、前記各ブロックから取得した特徴量を用いて前記取得した画像に所定の画像認識対象が写っているか否かを判定する判定手段と、を具備したことを特徴とする画像認識装置を提供する。
（１０）請求項１０に記載の発明では、画像認識学習用の画像を取得する画像取得機能と、前記取得した画像を複数のブロックに区分する区分機能と、輝度勾配方向の共起の頻度分布を前記区分したブロックごとに取得する頻度分布取得機能と、前記取得したブロックごとの頻度分布を統合して１の頻度分布に統一する統一機能と、前記統一した頻度分布に基づいて画像認識の基準となる基底関数を生成する基底関数生成機能と、をコンピュータで実現する画像処理プログラムを提供する。
（１１）請求項１１に記載の発明では、請求項１から請求項８までのうちの何れか１の請求項に記載の基底関数を取得する基底関数取得機能と、画像認識に係る画像を取得する画像取得機能と、前記取得した画像をブロックに区分する区分機能と、前記区分した各ブロックに対して前記取得した基底関数を適用し、当該基底関数に対する特徴量を取得する特徴量取得機能と、前記各ブロックから取得した特徴量を用いて前記取得した画像に所定の画像認識対象が写っているか否かを判定する判定機能と、をコンピュータで実現する画像認識プログラムを提供する。 (1) The invention described in claim 1 provides an image processing apparatus comprising: an image acquisition means for acquiring an image for image recognition learning; a division means for dividing the acquired image into a plurality of blocks; a frequency distribution acquisition means for acquiring a frequency distribution of co-occurrence of luminance gradient directions for each of the divided blocks; a unification means for integrating the frequency distributions acquired for each of the acquired blocks into a single frequency distribution; and a basis function generation means for generating basis functions serving as a standard for image recognition based on the unified frequency distribution.
(2) In the invention as set forth in claim 2, there is provided the image processing apparatus as set forth in claim 1, characterized in that the unifying means performs the unification by superimposing the frequency distributions in the plurality of blocks.
(3) The invention described in claim 3 provides the image processing device described in claim 2, characterized in that the unifying means generates samples based on the frequency distribution for each of the acquired blocks, and adds up the generated samples across the multiple blocks, thereby superimposing the frequency distributions in the multiple blocks.
(4) In the invention described in claim 4, there is provided the image processing device described in claim 1, claim 2, or claim 3, characterized in that the frequency distribution acquisition means acquires a frequency distribution of co-occurrence of luminance gradient directions between different resolutions of the same image.
(5) The invention described in claim 5 provides an image processing device described in any one of claims 1 to 4, characterized in that the image acquisition means acquires a plurality of images, and the unification means unifies the frequency distribution for each block of the plurality of images into a single frequency distribution.
(6) The invention described in claim 6 provides an image processing device according to any one of claims 1 to 5, characterized in that the image acquisition means acquires a recognition target image including a recognition target and a non-recognition target image not including a recognition target, and the frequency distribution acquisition means acquires a frequency distribution in a corresponding block based on a difference in frequency distribution of a luminance gradient direction in the corresponding block between the recognition target image and the non-recognition target image.
(7) In the invention described in claim 7, there is provided an image processing device described in any one of claims 1 to 6, characterized in that the basis functions are probability density functions based on a Gaussian mixture model, and a determination means is provided for determining an appropriate number of mixtures based on a balance between the likelihood and the number of mixtures.
(8) In the invention described in claim 8, there is provided an image processing device described in any one of claims 1 to 7, characterized in that a weighting is set for the multiple blocks when integrating the frequency distributions, and the unifying means integrates the frequency distributions for each of the multiple blocks in accordance with the weighting.
(9) The invention described in claim 9 provides an image recognition device comprising: a basis function acquisition means for acquiring a basis function described in any one of claims 1 to 8; an image acquisition means for acquiring an image related to image recognition; a division means for dividing the acquired image into blocks; a feature acquisition means for applying the acquired basis function to each of the divided blocks and acquiring a feature for the basis function; and a determination means for determining whether or not a predetermined image recognition target is included in the acquired image using the feature acquired from each of the blocks.
(10) The invention described in claim 10 provides an image processing program for implementing on a computer an image acquisition function for acquiring an image for image recognition learning, a division function for dividing the acquired image into a plurality of blocks, a frequency distribution acquisition function for acquiring a frequency distribution of co-occurrence of luminance gradient directions for each of the divided blocks, a unification function for integrating the frequency distributions for each of the acquired blocks into a single frequency distribution, and a basis function generation function for generating basis functions serving as a standard for image recognition based on the unified frequency distribution.
(11) The invention described in claim 11 provides an image recognition program that realizes on a computer a basis function acquisition function for acquiring a basis function described in any one of claims 1 to 8, an image acquisition function for acquiring an image related to image recognition, a division function for dividing the acquired image into blocks, a feature acquisition function for applying the acquired basis function to each of the divided blocks and acquiring features for the basis function, and a judgment function for determining whether or not a predetermined image recognition target is included in the acquired image using the feature acquired from each of the blocks.

複数のブロックに対して統一した基底関数を用いることにより、計算リソースを節約することができる。 By using a unified basis function for multiple blocks, computational resources can be saved.

画像処理装置のハードウェア的な構成の一例を示した図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of an image processing apparatus. 基底関数を生成する工程を説明するための図である。FIG. 13 is a diagram for explaining a process of generating basis functions. 基底関数を生成する工程を説明するための図である。FIG. 13 is a diagram for explaining a process of generating basis functions. 基底関数を生成する工程を説明するための図である。FIG. 13 is a diagram for explaining a process of generating basis functions. 特徴空間への写像を説明するための図である。FIG. 13 is a diagram for explaining mapping onto a feature space. 赤池情報基準を説明するための図である。FIG. 1 is a diagram for explaining the Akaike information criterion. 各種の数式を示した図である。FIG. 2 is a diagram showing various mathematical expressions. 基底関数生成処理を説明するためのフローチャートである。11 is a flowchart for explaining a basis function generation process. 画像認識方法を説明するための図である。FIG. 1 is a diagram for explaining an image recognition method. 画像認識を行った結果を表したグラフである。13 is a graph showing the results of image recognition.

（１）実施形態の概要
全ブロックの状態空間について統一したＧＭＭ－ＭＲＣｏＨＯＧ特徴量を採用することにより、各ブロックに渡って基底関数を統一する。これにより、ハードウェア化に向けて、計算リソースの使用を大幅に削減することができる。
具体的には，各ブロックで学習に用いるポジティブ画像とネガティブ画像それぞれの輝度勾配方向の頻度分布をカーネル密度推定により確率密度分布で近似する。
次に，ポジティブ画像とネガティブ画像で特徴的な部分に着目し、ＪＳ情報量に基づく尺度を用いて累積分布関数を算出する。そして、逆関数法により全ブロックの累積分布関数から一定のサンプルを共通の特徴空間に生成し、ＥＭアルゴリズムを用いて混合ガウス分布で近似する。近似の際には、赤池情報基準により、適当な混合数を自動決定する。 (1) Overview of the embodiment By adopting a unified GMM-MRCoHOG feature for the state space of all blocks, the basis functions are unified across each block. This allows for a significant reduction in the use of computational resources, in preparation for hardware implementation.
Specifically, the frequency distribution of the brightness gradient direction of each positive and negative image used for learning in each block is approximated by a probability density distribution using kernel density estimation.
Next, focusing on characteristic parts of the positive and negative images, the cumulative distribution function is calculated using a measure based on the JS information amount. A certain number of samples are then generated in a common feature space from the cumulative distribution functions of all blocks using the inverse function method, and approximated with a mixed Gaussian distribution using the EM algorithm. During the approximation, an appropriate number of mixtures is automatically determined using the Akaike information criterion.

（２）実施形態の詳細
図１は、画像処理装置８のハードウェア的な構成の一例を示した図である。
画像処理装置８は、ＣＰＵ８１、ＲＯＭ８２、ＲＡＭ８３、記憶装置８４、記憶媒体駆動装置８５、入力部８６、及び出力部８７などがバスラインで接続されて構成されている。
ＣＰＵ８１は、中央処理装置であって、記憶装置８４が記憶する画像処理プログラムに従って動作し、学習用画像から画像認識に用いる基底関数を生成する処理を行う。 (2) Details of the embodiment FIG. 1 is a diagram showing an example of a hardware configuration of an image processing device 8. As shown in FIG.
The image processing device 8 is configured by connecting a CPU 81, a ROM 82, a RAM 83, a storage device 84, a storage medium drive device 85, an input unit 86, and an output unit 87 via a bus line.
The CPU 81 is a central processing unit that operates according to an image processing program stored in the storage device 84, and performs processing to generate basis functions used for image recognition from learning images.

ＲＯＭ８２は、読み出し専用のメモリであって、ＣＰＵ８１を動作させるための基本的なプログラムやパラメータを記憶している。
ＲＡＭ８３は、読み書きが可能なメモリであって、ＣＰＵ８１が画像処理を行う際のワーキングメモリを提供する。 The ROM 82 is a read-only memory, and stores basic programs and parameters for operating the CPU 81 .
The RAM 83 is a readable and writable memory, and provides a working memory when the CPU 81 performs image processing.

記憶装置８４は、ハードディスクなどの大容量の記憶媒体を用いて構成されており、画像処理プログラムや学習用画像（学習画像データ）などを記憶している。
画像処理プログラムは、ＣＰＵ８１に画像処理機能を発揮させるプログラムである。 The storage device 84 is configured using a large-capacity storage medium such as a hard disk, and stores image processing programs, learning images (learning image data), and the like.
The image processing program is a program that causes the CPU 81 to perform image processing functions.

記憶媒体駆動装置８５は、例えば、半導体記憶装置やハードディスクなどの外付けの記憶媒体を駆動する装置である。
ＣＰＵ８１は、記憶媒体から学習画像データを読み込むことができる。
入力部８６は、操作担当者からの入力を受け付けるキーボード、マウスなどの入力デバイスを備えており、各種プログラムやデータの読み込みや、操作担当者からの操作を受け付ける。
出力部８７は、操作担当者に各種の情報を提示するディスプレイ、プリンタなどの出力デバイスを備えており、画像処理の操作画面、及び、画像処理結果を出力する。 The storage medium drive device 85 is a device that drives an external storage medium such as a semiconductor storage device or a hard disk.
The CPU 81 can read the learning image data from the storage medium.
The input unit 86 includes input devices such as a keyboard and a mouse for receiving input from an operator, reads various programs and data, and receives operations from the operator.
The output unit 87 includes output devices such as a display and a printer for presenting various information to an operator, and outputs an image processing operation screen and image processing results.

このほかに、画像処理装置８は、通信ネットワークと接続する通信制御部や外部機器と接続するためのインターフェースなどを備えており、外部のサーバから学習画像データをダウンロードすることもできる。 In addition, the image processing device 8 is equipped with a communication control unit for connecting to a communication network and an interface for connecting to external devices, and can also download learning image data from an external server.

画像処理装置８は、画像処理プログラムを実行することにより、図２から図４までの各図に示した工程に従って、ＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた画像認識に用いる基底関数を生成する。以下、これについて説明する。
なお、ＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた画像認識は、本願発明者らが開発した技術であって、極めて高い認識精度を誇っている。 The image processing device 8 executes an image processing program to generate basis functions used in image recognition using GMM-MRCoHOG features in accordance with the steps shown in each of Figures 2 to 4. This will be described below.
Image recognition using GMM-MRCoHOG features is a technology developed by the present inventors and boasts extremely high recognition accuracy.

本実施形態では、図２（ａ）（ｄ）に示したように、画像認識対象（歩行者とする）が様々な姿態で写ったポジティブ画像１０と、歩行者を除いて背景となる様々な景色が写ったネガティブ画像２０との間のＪＳ（Ｊｅｎｓｅｎ－Ｓｈａｎｎｏｎ）情報量を用いて基底関数を作成する。
ＪＳ情報量を用いると、より頑健に画像認識を行うことができることは、本願発明者らが見出したものである。 In this embodiment, as shown in FIGS. 2(a) and 2(d), a basis function is created using the JS (Jensen-Shannon) divergence between a positive image 10 in which the image recognition target (a pedestrian) is captured in various poses, and a negative image 20 in which various background scenery is captured excluding the pedestrian.
The inventors of the present application have found that use of the JS divergence enables more robust image recognition.

図では、ポジティブ画像１０とネガティブ画像２０をそれぞれ一枚ずつ示してあるが、画像処理装置８は、２万枚程度のポジティブ画像１０とネガティブ画像２０を学習して基底関数を作成する。
このように、画像処理装置８は、認識対象が写った認識対象画像（ポジティブ画像１０）と、認識対象が写っていない非認識対象画像（ネガティブ画像２０）で構成された、複数の画像認識学習用の画像を取得する画像取得手段を備えている。 In the figure, one positive image 10 and one negative image 20 are shown, but the image processing device 8 creates basis functions by learning from approximately 20,000 positive images 10 and negative images 20.
In this way, the image processing device 8 is equipped with an image acquisition means for acquiring a plurality of images for image recognition learning, each of which is composed of a recognition target image (positive image 10) containing the recognition target and a non-recognition target image (negative image 20) not containing the recognition target.

まず、画像処理装置８は、ポジティブ画像１０を複数の同じ正方形のブロック１１ａ、１１ｂ、１１ｃ、・・・に区分する。ここでは、一例として歩行者の形状に合わせて水平方向の３個、鉛直方向に６個の計１８個のブロック１１に区分する。
このように、画像処理装置８は、画像を複数のブロックに区分する区分手段を備えている。
そして、画像処理装置８は、各ブロック１１における各画素の輝度勾配方向の共起の頻度分布を、図２（ｂ）に示したブロック１１ごとの特徴空間１３ａ、１３ｂ、１３ｃ、・・・に写像する。 First, the image processing device 8 divides the positive image 10 into a plurality of identical square blocks 11a, 11b, 11c, .... In this example, the image is divided into a total of 18 blocks 11, three in the horizontal direction and six in the vertical direction, in accordance with the shape of a pedestrian.
Thus, the image processing device 8 includes a division means for dividing the image into a plurality of blocks.
Then, the image processing device 8 maps the frequency distribution of co-occurrence of the luminance gradient direction of each pixel in each block 11 into the feature space 13a, 13b, 13c, . . . for each block 11 shown in FIG.

輝度勾配方向は、例えば、当該画素の位置における低輝度から高輝度に向かう方向である。以下、勾配方向と略記することにする。
また、勾配方向を写像した空間、及び、これから派生する空間（後のサンプリングによる空間など）は、画像の特徴を抽出した空間であるため、特徴空間と呼ぶことにする。
ブロック１１ａ、１１ｂ、１１ｃ、・・・や特徴空間１３ａ、１３ｂ、１３ｃ、・・・を特に区別しない場合は、それぞれ、単にブロック１１や特徴空間１３と記すことにし、他の同様な構成要素についても同様とする。 The luminance gradient direction is, for example, the direction from low luminance to high luminance at the position of the pixel in question, and will hereinafter be abbreviated to gradient direction.
Furthermore, the space onto which the gradient direction is mapped and the space derived therefrom (such as the space obtained by subsequent sampling) are spaces into which image features are extracted, and therefore will be referred to as feature spaces.
When there is no need to distinguish between the blocks 11a, 11b, 11c, . . . and the feature spaces 13a, 13b, 13c, . . . , they will be simply referred to as the blocks 11 and the feature spaces 13, respectively, and the same applies to other similar components.

図５は、勾配方向の特徴空間１３への写像を説明するための図である。
画像処理装置８は、図５（ａ）に示したように、ポジティブ画像１０の解像度を変換して、ポジティブ画像１０から、画像サイズの異なる高解像度画像１５、中解像度画像１６、低解像度画像１７を生成する。
ポジティブ画像１０の解像度が適当な場合は、ポジティブ画像１０をそのまま高解像度画像１５として使用する。画像処理装置８は、上記各解像度の画像に対して以下の処理をブロックごとに行う。 FIG. 5 is a diagram for explaining mapping of gradient directions onto the feature space 13. In FIG.
As shown in FIG. 5A, the image processing device 8 converts the resolution of the positive image 10 to generate a high-resolution image 15, a medium-resolution image 16, and a low-resolution image 17, each having a different image size, from the positive image 10.
If the resolution of the positive image 10 is appropriate, the positive image 10 is used as is as the high-resolution image 15. The image processing device 8 performs the following processing for each block of the images at each resolution.

まず、画像処理装置８は、高解像度画像１５、中解像度画像１６、低解像度画像１７のそれぞれの画素について勾配方向を計算する。勾配方向の角度は、０°～３６０°の連続値である。３６方向など、量子化した値を用いることも可能である。 First, the image processing device 8 calculates the gradient direction for each pixel of the high-resolution image 15, the medium-resolution image 16, and the low-resolution image 17. The angle of the gradient direction is a continuous value between 0° and 360°. It is also possible to use quantized values, such as 36 directions.

画像処理装置８は、勾配方向を算出すると、ブロック１１ａにおいて、基準とする画素（以下、注目画素）と、これから離れた位置にある画素（以下、オフセット画素）の勾配方向の共起を次のようにして取得する。 After calculating the gradient direction, the image processing device 8 obtains in block 11a the co-occurrence of the gradient directions of a reference pixel (hereinafter, the pixel of interest) and a pixel located at a distance from the reference pixel (hereinafter, the offset pixel) as follows:

まず、画像処理装置８は、図５（ｂ）に示したように、高解像度画像１５に注目画素５を設定し、高解像度画像１５で注目画素５からオフセット距離１（即ち、高解像度において隣接する）にあるオフセット画素１ａ～１ｄに着目する。
なお、画素ｎ個分の距離をオフセット距離ｎと呼ぶことにする。 First, as shown in FIG. 5(b), the image processing device 8 sets a pixel of interest 5 in the high-resolution image 15, and focuses on offset pixels 1a to 1d that are an offset distance 1 from the pixel of interest 5 in the high-resolution image 15 (i.e., adjacent in high resolution).
The distance of n pixels is referred to as the offset distance n.

そして、画像処理装置８は、注目画素５とオフセット画素１ａ～オフセット画素１ｄとの各勾配方向の共起（勾配方向の組合せ）を取得し、これに対応する点をデータ点５１、５１、・・・として、図５（ｃ）に示すブロック１１ａ用の特徴空間１３ａにプロットする。 Then, the image processing device 8 obtains the co-occurrence (combination of gradient directions) of each gradient direction between the pixel of interest 5 and offset pixels 1a to 1d, and plots the corresponding points as data points 51, 51, ... in the feature space 13a for block 11a shown in Figure 5(c).

例えば、図５（ｂ）において、注目画素５とオフセット画素１ａの共起をプロットする場合、注目画素５の勾配方向が２６°で、オフセット画素１ａの勾配方向が１３５°であったとすると、画像処理装置８は、特徴空間１３ａの横軸が２６°で縦軸が１３５°である位置にデータ点５１をプロットする。
同様にして、画像処理装置８は、注目画素５とオフセット画素１ｂ～１ｄとの共起を取得して特徴空間１３ａにプロットする。なお、注目画素５の上及び左にある画素との共起を取得しないのは、注目画素５を右側に順次移動しながら共起を取得していくので、既に前の段階で取得してプロットしてあるためである。 For example, in FIG. 5(b), when plotting the co-occurrence of the pixel of interest 5 and the offset pixel 1a, if the gradient direction of the pixel of interest 5 is 26° and the gradient direction of the offset pixel 1a is 135°, the image processing device 8 plots a data point 51 at a position in the feature space 13a where the horizontal axis is 26° and the vertical axis is 135°.
Similarly, the image processing device 8 obtains the co-occurrence between the pixel of interest 5 and the offset pixels 1b to 1d and plots them in the feature space 13a. Note that the reason why the co-occurrence with the pixels above and to the left of the pixel of interest 5 is not obtained is because the co-occurrence is obtained while moving the pixel of interest 5 sequentially to the right, and therefore these have already been obtained and plotted in the previous stage.

次に、画像処理装置８は、オフセット距離２に位置する中解像度画像１６のオフセット画素２ａ～２ｄについて、同様に注目画素５との勾配方向の共起を取得して特徴空間１３ａにプロットし、更に、オフセット距離３に位置する低解像度画像１５のオフセット画素３ａ～３ｄについて、同様に注目画素５との勾配方向の共起を取得して特徴空間１３ａにプロットする。 Next, the image processing device 8 similarly obtains the co-occurrence of gradient directions with the pixel of interest 5 for offset pixels 2a to 2d of the medium-resolution image 16, which are located at an offset distance of 2, and plots them in the feature space 13a.Furthermore, the image processing device 8 similarly obtains the co-occurrence of gradient directions with the pixel of interest 5 for offset pixels 3a to 3d of the low-resolution image 15, which are located at an offset distance of 3, and plots them in the feature space 13a.

画像処理装置８は、このようにして、高解像度から低解像度に渡るオフセット距離１～３のオフセット画素との勾配方向の共起をプロットすると、注目画素５をブロック１１ａ内で順次移動しながら（注目画素５を中解像度画像１６、低解像度画像１７にも移動する）同様の処理を行って、ブロック１１ａについての特徴空間１３ａを完成させる。 In this way, the image processing device 8 plots the co-occurrence of gradient directions with offset pixels at offset distances 1 to 3 from high resolution to low resolution, and then performs similar processing while sequentially moving the pixel of interest 5 within block 11a (also moving the pixel of interest 5 to the medium resolution image 16 and the low resolution image 17) to complete the feature space 13a for block 11a.

なお、注目画素５の移動はブロック１１ａ内で行うが、オフセット画素については、ブロック１１ａを超える場合でも選択する。また、ブロック１１ａの端部で隣接するオフセット画素がないものについては適当な方法により処理する。
同様にして、画像処理装置８は、ブロック１１ｂ、１１ｃ、・・・についても特徴空間１３ｂ、１３ｃ、・・・にデータ点をプロットしていく。これにより、勾配方向の共起の頻度分布をデータ点の粗密によって表した、ブロック１１ごとの特徴空間１３を得ることができる。 Although the movement of the pixel of interest 5 is performed within the block 11a, the offset pixel is selected even if it exceeds the block 11a. Also, an appropriate method is used for processing an edge of the block 11a that does not have an adjacent offset pixel.
In the same manner, the image processing device 8 plots data points in the feature spaces 13b, 13c, ... for the blocks 11b, 11c, .... In this way, it is possible to obtain a feature space 13 for each block 11, in which the frequency distribution of co-occurrence of gradient directions is represented by the density of data points.

このように、画像処理装置８は、同一画像の異なる解像度間における輝度勾配方向の共起の頻度分布を区分したブロックごとに取得する頻度分布取得手段を備えており、解像度の違う複数の画像間の勾配方向の共起分布を連続値のまま特徴空間１３に写像する。
図２に戻り、画像処理装置８は、図２（ｅ）に示したように、ネガティブ画像２０についても、ブロック１１ａ、１１ｂ、１１ｃ、・・・に対応する特徴空間２３ａ、２３ｂ、２３ｃ、・・・を作成する。 In this way, the image processing device 8 includes a frequency distribution acquisition means for acquiring the frequency distribution of co-occurrence of brightness gradient directions between different resolutions of the same image for each divided block, and maps the co-occurrence distribution of gradient directions between multiple images with different resolutions into the feature space 13 as continuous values.
Returning to FIG. 2, the image processing device 8 also creates feature spaces 23a, 23b, 23c, . . . corresponding to the blocks 11a, 11b, 11c, .

次に、画像処理装置８は、図２（ｃ）に示したように、特徴空間１３にプロットしたポジティブデータ（特徴空間１３上のデータ点）から確率密度関数ｆｐ（ｘ）をブロック１１ごとに生成する。図では、密度の高低を等高線で模式的に表している。
画像処理装置８は、同様にして、図２（ｆ）に示したように、特徴空間２３にプロットしたネガティブデータ（特徴空間２３上のデータ点）から確率密度関数ｆｎ（ｘ）を生成する。 Next, as shown in Fig. 2(c), the image processing device 8 generates a probability density function fp(x) for each block 11 from the positive data (data points on the feature space 13) plotted in the feature space 13. In the figure, the highs and lows of density are represented diagrammatically by contour lines.
Similarly, the image processing device 8 generates a probability density function fn(x) from the negative data (data points on the feature space 23) plotted in the feature space 23, as shown in FIG. 2(f).

ｆｐ（ｘ）とｆｎ（ｘ）は、図７（ａ）の式（３）で示したガウス型の関数をカーネル密度関数として、それぞれ、式（１）、（２）で表される。
ｎはデータ数である。Ｘｉ（ｐ）、Ｘｉ（ｎ）は、それぞれポジティブデータ、ネガティブデータであり、それぞれ２次元のベクトル量である。ｘは、特徴空間上の点であり、２次元のベクトル量である。 fp(x) and fn(x) are expressed by equations (1) and (2), respectively, with the Gaussian function shown in equation (3) in FIG. 7A being the kernel density function.
n is the number of data. Xi(p) and Xi(n) are positive data and negative data, respectively, and are two-dimensional vector quantities. x is a point in the feature space and is a two-dimensional vector quantity.

なお、図では上下付き文字を示してあるが、文字コード誤変換を避けるため、通常の文字で記載する。他の式も同様とする。また、ベクトル量は太字で表すところ、誤変換防止のため、これも通常の文字で記載する。
ｈは、バンド幅であって、分布の広がり程度をどのくらいにするか、といったような量を規定するパラメータである。これには適当な値を設定する。 Although the figures show superscript characters, they will be written in normal characters to avoid misconversion of character codes. The same applies to other equations. Also, vector quantities are written in bold, but they will also be written in normal characters to prevent misconversion.
h is a bandwidth, which is a parameter that defines the amount of spread of the distribution, etc. An appropriate value is set for this.

ｆｐ（ｘ）、ｆｎ（ｘ）は、それぞれ、ポジティブ画像１０、ネガティブ画像２０における勾配方向の共起の生起確率を表している。
このように、画像処理装置８は、ポジティブデータとネガティブデータの勾配方向の共起を連続値のまま特徴空間に投票し、投票したデータ点をカーネル密度関数推定によって確率密度関数に近似する。 fp(x) and fn(x) represent the probability of co-occurrence of gradient directions in the positive image 10 and the negative image 20, respectively.
In this way, the image processing device 8 votes the co-occurrence of the gradient directions of positive data and negative data as continuous values in the feature space, and approximates the voted data points to a probability density function by kernel density function estimation.

次に、画像処理装置８は、図３（ａ）に示したように、ポジティブ画像１０上とネガティブ画像２０上でブロックの位置が対応する特徴空間（特徴空間１３ａと特徴空間２３ａなど）を組み合わせて、それぞれのｆｐ（ｘ）とｆｎ（ｘ）から図３（ｂ）に示したように、ＪＳ情報量３３を組ごとに生成する。
このようにして、画像処理装置８は、特徴空間１３ａと特徴空間２３ａの組からＪＳ情報量３３ａを生成し、特徴空間１３ｂと特徴空間２３ｂの組からＪＳ情報量３３ｂを生成し、といったようにＪＳ情報量３３をブロック１１ごとに生成する。 Next, the image processing device 8 combines feature spaces (such as feature space 13a and feature space 23a) in which the positions of blocks on the positive image 10 and the negative image 20 correspond, as shown in Figure 3(a), and generates a JS information amount 33 for each pair from each fp(x) and fn(x), as shown in Figure 3(b).
In this way, the image processing device 8 generates the JS information amount 33a from the pair of feature space 13a and feature space 23a, generates the JS information amount 33b from the pair of feature space 13b and feature space 23b, and so on, generating the JS information amount 33 for each block 11.

ＪＳ情報量は、図７（ｂ）の式（５）のＪ（ｆｐ（ｘ）：ｆｎ（ｘ））で表される。
Ｊ（ｆｐ（ｘ）：ｆｎ（ｘ））は、式（４）で示したＫＬ（Ｋｕｌｌｂａｃｋ－Ｌｅｉｂｌｅｒ）情報量が対称性を持つように式（６）を用いて定義したものである。
ＪＳ情報量は、２つの確率分布の距離を計量する計量空間を構成し、これを用いることにより、ｘに対するｆｐ（ｘ）とｆｎ（ｘ）の類似度を算出することができる。 The JS information amount is expressed as J(fp(x):fn(x)) in equation (5) of FIG.
J(fp(x):fn(x)) is defined using equation (6) so that the KL (Kullback-Leibler) information amount shown in equation (4) has symmetry.
The JS divergence constitutes a metric space that measures the distance between two probability distributions, and by using this, the similarity between fp(x) and fn(x) for x can be calculated.

Ｊ（ｆｐ（ｘ）：ｆｎ（ｘ））は、ｆｐ（ｘ）とｆｎ（ｘ）の形状が異なるほど（類似していないほど）絶対値が大きくなる。このため、ＪＳ情報量によりｆｐ（ｘ）とｆｎ（ｘ）の何れか一方に特徴的な部分を表現することができる。
このような何れか一方に生起確率が偏っている、ｆｐ（ｘ）とｆｎ（ｘ）の差異の箇所が情報として有用であり（偏っていない箇所は、歩行者であるか背景であるか判断が困難）、画像処理装置８は、当該差異の大きい領域の情報をＪＳ情報量により抽出する。
このように、画像処理装置８が備える頻度分布取得手段は、認識対象画像と非認識対象画像の対応するブロックにおける輝度勾配方向の頻度分布の差異に基づいて、当該ブロックにおける頻度分布を取得する。 The absolute value of J(fp(x):fn(x)) increases as the shapes of fp(x) and fn(x) differ (are less similar). Therefore, the JS information amount can express the characteristic parts of either fp(x) or fn(x).
Areas where there is a difference between fp(x) and fn(x), where the occurrence probability is biased towards one side, are useful information (areas where there is no bias are difficult to determine whether they are pedestrians or background), and the image processing device 8 extracts information on areas where the difference is large using the JS information amount.
In this way, the frequency distribution acquisition means included in the image processing device 8 acquires the frequency distribution in a block based on the difference in the frequency distribution of the luminance gradient direction in the corresponding block between the recognition target image and the non-recognition target image.

次に、画像処理装置８は、図３（ｃ）に示したように、ブロック１１ごとのＪ（ｆｐ（ｘ）：ｆｎ（ｘ））に対して、それぞれの累積分布関数を用いた逆関数法によりサンプリングして、ブロック１１ごとにサンプル（特徴空間上の点）を発生させる。
このようにして、ＪＳ情報量３３ａ、３３ｂ、・・・から、ブロック１１ａ、１１ｂ、・・・ごとにサンプルを発生させた特徴空間３５ａ、３５ｂ、・・・を生成する。
このように、画像処理装置８は、ブロックごとの頻度分布に基づいてサンプルを発生させる。 Next, as shown in FIG. 3( c), the image processing device 8 samples J(fp(x):fn(x)) for each block 11 using an inverse function method that uses the respective cumulative distribution functions, thereby generating samples (points in the feature space) for each block 11.
In this manner, feature spaces 35a, 35b, . . . in which samples are generated for each of the blocks 11a, 11b, .
In this way, the image processor 8 generates samples based on the frequency distribution for each block.

Ｊ（ｆｐ（ｘ）：ｆｎ（ｘ））は、ポジティブデータとネガティブデータの何れか一方に偏っているため、逆関数法を用いたサンプリングにより生起確率が偏っている箇所に集中して多数のサンプルを生成させることができる。
カーネル密度推定を用いてＪ（ｆｐ（ｘ）：ｆｎ（ｘ））を求めると、基底となる式（３）のガウス分布の数がデータ数に依存するため、パラメータ数が非常に多くなっているが、これをサンプリングによって削減することができる。 Since J(fp(x):fn(x)) is biased toward either positive or negative data, sampling using the inverse function method can generate a large number of samples that are concentrated in areas where the occurrence probability is biased.
When J(fp(x):fn(x)) is calculated using kernel density estimation, the number of Gaussian distributions in the underlying equation (3) depends on the number of data, resulting in a very large number of parameters. However, this can be reduced by sampling.

なお、上の説明では、単一のポジティブ画像１０から特徴空間１３をブロックごとに生成しているが、画像処理装置８は、多数の学習用のポジティブ画像１０から取得した特徴空間１３をブロックごとに重畳して、ブロックごとの特徴空間１３を作成する。
例えば、１枚目のポジティブ画像１０から作成した特徴空間１３ａ１、２枚目のポジティブ画像１０から作成した特徴空間１３ａ２、・・・・を足し合わせて特徴空間１３ａを作成し、同様に特徴空間１３ｂ１、１３ｂ２・・を足し合わせて特徴空間１３ｂを作成する。ネガティブ画像２０についても同様である。 In the above explanation, the feature space 13 is generated for each block from a single positive image 10, but the image processing device 8 creates the feature space 13 for each block by superimposing the feature spaces 13 obtained from a large number of learning positive images 10 for each block.
For example, the feature space 13a1 created from the first positive image 10 and the feature space 13a2 created from the second positive image 10 are added together to create the feature space 13a, and similarly, the feature spaces 13b1, 13b2, ... are added together to create the feature space 13b. The same is true for the negative image 20.

画像処理装置８は、図４（ａ）に示したように、ブロック１１ごとのサンプリングデータによる特徴空間３５を生成した後、図４（ｂ）に示したように、これら特徴空間３５ａ、３５ｂ、・・・のサンプルを全て足し合わせることによって統合し、これによって勾配方向の頻度分布がサンプルの粗密によって表された、統一した特徴空間３６を生成する。
このように、画像処理装置８は、ブロックごとの頻度分布を重畳することにより統合して１の頻度分布に統一する統一手段を備えており、当該統一手段は、発生させたサンプルを複数のブロックに渡って足し合わせることにより、複数のブロックにおける頻度分布を重畳している。
更に、画像処理装置８は、多数の学習画像について、頻度分布を１つに統合するため、当該統一手段は、複数の画像のブロックごとの頻度分布を１の頻度分布に統一している。 As shown in FIG. 4(a), the image processing device 8 generates a feature space 35 based on sampling data for each block 11, and then integrates these feature spaces 35a, 35b, ... by adding together all the samples as shown in FIG. 4(b), thereby generating a unified feature space 36 in which the frequency distribution in the gradient direction is represented by the density of the samples.
In this way, the image processing device 8 is equipped with a unification means that integrates and unifies the frequency distributions for each block into a single frequency distribution by superimposing the frequency distributions for each block, and the unification means superimposes the frequency distributions in multiple blocks by adding up the generated samples across multiple blocks.
Furthermore, since the image processing device 8 integrates the frequency distributions for a large number of learning images into one, the unifying means unifies the frequency distributions for each block of a plurality of images into one frequency distribution.

変形例として、ブロック１１に重み付けを設定しておき、当該ブロック１１に対応する特徴空間３５のサンプルを当該重み付けに従って加算するように構成することもできる。
例えば、重みの小さいブロック１１については、サンプル１つにつき１つ加算し、重みの大きいブロック１１については、サンプル１つにつき３つ加算するなどする。
これにより、重要度の低いブロック１１（歩行者の写りにくい４隅のブロックなど）の重み付けを小さく設定し、重要度の高いブロック１１の重み付けを高く設定することができる。
当該変形例では、複数のブロックに、頻度分布を統合する際の重み付けが設定されており、画像処理装置８が備える統一手段は、当該重み付けに従って、複数のブロックごとの頻度分布を統合する。 As a modified example, a weighting may be set for each block 11, and the samples in the feature space 35 corresponding to that block 11 may be added according to that weighting.
For example, for blocks 11 with low weights, one addition is made per sample, for blocks 11 with high weights, three additions are made per sample, and so on.
This allows the weighting of blocks 11 with low importance (such as blocks at the four corners where pedestrians are less likely to be captured) to be set low, and the weighting of blocks 11 with high importance to be set high.
In this modified example, weighting is set for a plurality of blocks when integrating frequency distributions, and the unifying means included in the image processing device 8 integrates the frequency distributions for each of the plurality of blocks in accordance with the weighting.

画像処理装置８は、このように統一した特徴空間３６を生成すると、図４（ｃ）に示したように、ｃ－ＡＩＣ（後述する）を用いて混合数を決定し、更に、ＥＭアルゴリズム（ＥステップとＭステップを繰り返すことによりＧＭＭの数式を探索する手法）によってＧＭＭによる状態空間を生成し、これを基底関数３７に設定する。
ここで、状態空間とは、ヒストグラムやＧＭＭなどで特徴量の境界や配置が決定した空間を意味する。
このように、画像処理装置８は、統一した頻度分布に基づいて画像認識の基準となる基底関数を生成する基底関数生成手段を備えている。
従来は、特徴空間３５ａ、３５ｂ、・・・ごとにＧＭＭを生成してブロック１１ごとに基底関数を生成していたが、これに対し、本実施形態の画像処理装置８は、特徴空間３６から全ブロック１１に共通の基底関数３７を生成するところが新規な点である。 After generating the unified feature space 36 in this manner, the image processing device 8 determines the number of mixtures using c-AIC (described later) as shown in FIG. 4( c), and further generates a state space by GMM using the EM algorithm (a method of searching for a formula for GMM by repeating the E step and the M step), and sets this as the basis function 37.
Here, the state space refers to a space in which the boundaries and arrangement of features are determined by a histogram, GMM, or the like.
In this manner, the image processing device 8 includes a basis function generating means for generating basis functions that serve as a reference for image recognition based on a unified frequency distribution.
Conventionally, a GMM was generated for each feature space 35a, 35b, ..., and a basis function was generated for each block 11. In contrast, the image processing device 8 of the present embodiment is novel in that it generates a basis function 37 common to all blocks 11 from the feature space 36.

ＧＭＭは、ガウス分布を線形に重ね合わせて任意の分布を近似するモデルであり、式（１１）で表される。ｋは混合数（重ね合わせるガウス分布の数）、Ｎは、平均がμｋで分散共分散がΣｋであるｋ番目のガウス分布の確率密度関数、θは混合数ｋの混合正規分布のパラメータである。
αｊは、重ね合わせるガウス分布の重みを表す混合係数であって、足すと合計が１になる正の実数である。
ＧＭＭは、積分すると１になるように規格化されており、ＧＭＭによって特徴空間３６のサンプルの分布を多峰性の確率密度関数ｐ（ｘ｜θ）で近似することができる。 The GMM is a model that approximates an arbitrary distribution by linearly superimposing Gaussian distributions, and is expressed by Equation (11), where k is the mixture number (the number of Gaussian distributions to be superimposed), N is the probability density function of the k-th Gaussian distribution with mean μk and variance-covariance Σk, and θ is the parameter of the mixed normal distribution with the mixture number k.
αj is a mixing coefficient representing the weight of the Gaussian distributions to be overlapped, and is a positive real number whose sum is 1 when added.
The GMM is normalized so that its integration becomes 1, and the distribution of samples in the feature space 36 can be approximated by a multi-modal probability density function p(x|θ) using the GMM.

ＧＭＭでは、混合数ｋを指定すると、対象となる分布をｋ個のクラスタにクラスタリングし、その上ガウス分布を配置する。
このように、ＧＭＭによる最適な状態空間を構成するためには混合数の決定が必要であるが、混合数は増やしすぎるとモデルの汎化能力が低下すると共に計算コストが増加するという問題がある。
そこで、画像処理装置８は、ＧＭＭを生成する前に、赤池情報基準（ＡＩＣ）に基づいた尺度によって混合数を自動決定した。 In GMM, when the mixture number k is specified, the target distribution is clustered into k clusters and a Gaussian distribution is placed on top of them.
Thus, in order to construct an optimal state space using a GMM, it is necessary to determine the number of mixtures. However, if the number of mixtures is increased too much, the generalization ability of the model decreases and the calculation cost increases, which is a problem.
Therefore, before generating the GMM, the image processing device 8 automatically determined the number of mixtures using a measure based on the Akaike Information Criterion (AIC).

図６は、赤池情報基準を説明するための図である。
赤池情報基準には、ＡＩＣ（Ａｋａｉｋｅ’s ＩｎｆｏｒｍａｔｉｏｎＣｒｉｔｅｒｉｏｎ）と、これを用いたｃ－ＡＩＣ（ｃｏｒｒｅｃｔｉｏｎｏｆＡＩＣ）がある。
ここで、ＡＩＣは、統計的モデルの良さを評価する基準であり、汎化能力に優れたモデルであるほど小さな値となる。
一方、ｃ－ＡＩＣは、ＡＩＣを少ないサンプルでも適応可能にしたものである。
図６（ａ）に示したように、ＡＩＣは、単調減少するモデルのフィット度合いと単調増加するパラメータ数の和で表される。そして、ＡＩＣ値が最小のモデルが、ペナルティとモデルの複雑さのバランスがとれ、汎化能力に優れたモデルとなる。 FIG. 6 is a diagram for explaining the Akaike information criterion.
The Akaike's information criterion includes AIC (Akai's Information Criterion) and c-AIC (correction of AIC) which uses AIC.
Here, AIC is a standard for evaluating the quality of a statistical model, and the more excellent the generalization ability of a model, the smaller its value.
On the other hand, c-AIC is a version of AIC that can be applied to a small number of samples.
As shown in Fig. 6(a), AIC is expressed as the sum of the monotonically decreasing degree of fit of the model and the monotonically increasing number of parameters. A model with the smallest AIC value has a good balance between penalty and model complexity and is a model with excellent generalization ability.

本実施形態では、ＡＩＣを図７（ｃ）の式（７）で定義した。
ｎはサンプル数、ｋは混合数、ｐは、ＧＭＭからのサンプルｘｉの生起確率、θｋ（ハットを省略）は、混合数ｋで構成されたＧＭＭのパラメータである。
ｔｋは、式（８）で表される。ここで、ｄはサンプルデータの次元数である。 In this embodiment, the AIC is defined by equation (7) in FIG.
n is the number of samples, k is the number of mixtures, p is the occurrence probability of sample xi from the GMM, and θk (hat omitted) is a parameter of the GMM configured with the number of mixtures k.
tk is expressed by the following equation (8): where d is the number of dimensions of the sample data.

ところで、ＡＩＣは、大規模な標本サイズを前提としており、サンプル数が少ない場合にはパラメータ数を過大に見積もる傾向がある。
そこで、本実施形態では、サンプル数が少ない場合にモデルのシンプルさを高評価する、式（９）で表されたｃ－ＡＩＣに従って混合数を決定した。 However, AIC assumes a large sample size and tends to overestimate the number of parameters when the number of samples is small.
Therefore, in this embodiment, the number of mixtures is determined according to c-AIC expressed by equation (9), which highly evaluates the simplicity of the model when the number of samples is small.

式（９）では、第１項を負の対数尤度によって構成し、モデルが複雑になるほど単調減少すると想定した。
また、第２項は、パラメータ数によるペナルティ項であり、単調増加する。
本実施形態では、いくつかの混合数に対してｃ－ＡＩＣ値を計算して曲線近似し、これによる近似値から混合数を決定した。
曲線による近似値を用いることにより、学習データのばらつきに影響されずに、最もｃ－ＡＩＣが低い混合数を決定することができる。 In equation (9), the first term is constructed by the negative log-likelihood, which is assumed to monotonically decrease as the model becomes more complex.
The second term is a penalty term depending on the number of parameters, and increases monotonically.
In this embodiment, the c-AIC values were calculated for several mixture numbers and curve approximation was performed, and the mixture number was determined from the approximated value obtained.
By using a curve approximation, the number of mixtures with the lowest c-AIC can be determined without being affected by the variability of the training data.

このようにして適当な混合数を探索したところ図６（ｂ）のようになった。
このグラフの横軸は混合数を表しており、縦軸は負の対数尤度を示している。負の対数尤度が小さいほど（即ち、尤度が大きくなり）よいモデルであることを示している。
グラフにプロットした探査値は、ｃ－ＡＩＣの計算値であり、推定値は、探査値から求めた近似曲線上の点である。
グラフに示したように、混合数１５程度以上では、負の対数尤度がほぼ一定となっており、１５程度まで混合数を下げることが可能と思われる。
このように、画像処理装置８は、基底関数を混合ガウスモデルによる確率密度関数で生成し、尤度と混合数の兼ね合いから適当な混合数を決定する決定手段を備えている。 When an appropriate number of mixtures was searched for in this way, the result was as shown in FIG.
The horizontal axis of this graph represents the number of mixtures, and the vertical axis represents the negative log-likelihood. The smaller the negative log-likelihood (i.e., the larger the likelihood), the better the model.
The searched values plotted on the graph are the calculated values of c-AIC, and the estimated values are the points on the approximation curve obtained from the searched values.
As shown in the graph, when the number of mixtures is about 15 or more, the negative log-likelihood is almost constant, and it seems possible to reduce the number of mixtures to about 15.
In this way, the image processing device 8 generates the basis functions using a probability density function based on a Gaussian mixture model, and includes a determination means for determining an appropriate number of mixtures based on a balance between the likelihood and the number of mixtures.

図８は、画像処理装置８が行う基底関数生成処理を説明するためのフローチャートである。
ＣＰＵ８１は、記憶装置８４からポジティブ画像１０を１枚読み込み、ＲＡＭ８３に入力して記憶する（ステップ１０）。
次に、ＣＰＵ８１は、ポジティブ画像１０をブロック１１に区分し、ブロックごとに勾配方向の共起を特徴空間１３にプロットしてＲＡＭ８３に記憶する（ステップ１５）。
ＣＰＵ８１は、以上のポジティブ画像１０に対するプログラム処理を基底関数生成に必要な枚数だけ行う。
次に、ＣＰＵ８１は、ＲＡＭ８３に記憶した多数の特徴空間１３を、ブロック１１ごとに重畳することにより、ブロック１１ごとのｆｐ（ｘ）を生成してＲＡＭ８３に記憶する（ステップ２５）。 FIG. 8 is a flowchart for explaining the basis function generation process performed by the image processing device 8.
The CPU 81 reads one positive image 10 from the storage device 84 and inputs and stores it in the RAM 83 (step 10).
Next, the CPU 81 divides the positive image 10 into blocks 11, plots the co-occurrence of gradient directions for each block in the feature space 13, and stores the plot in the RAM 83 (step 15).
The CPU 81 performs the above program processing on the positive images 10 as many times as necessary to generate the basis functions.
Next, the CPU 81 generates fp(x) for each block 11 by convolving the multiple feature spaces 13 stored in the RAM 83 for each block 11, and stores the fp(x) in the RAM 83 (step 25).

次に、ＣＰＵ８１は、記憶装置８４に記憶してあるネガティブ画像２０に対しても、ＲＡＭ８３への入力（ステップ３５）、特徴空間２３へのプロット（ステップ４０）を必要な枚数分だけ行い、そして、ブロック１１ごとのｆｎ（ｘ）を生成してＲＡＭ８３に記憶する（ステップ５０）。 Next, the CPU 81 inputs the necessary number of negative images 20 stored in the storage device 84 into the RAM 83 (step 35) and plots them in the feature space 23 (step 40), and then generates fn(x) for each block 11 and stores them in the RAM 83 (step 50).

次に、ＣＰＵ８１は、ＲＡＭ８３に記憶したｆｐ（ｘ）とｆｎ（ｘ）を用いてブロック１１ごとのＪＳ情報量を生成してＲＡＭ８３に記憶する（ステップ５５）。
次に、ＣＰＵ８１は、ＲＡＭ８３に記憶したＪＳ情報量に基づいてサンプリングを行いブロック１１ごとのサンプルによる特徴空間３５を生成してＲＡＭ８３に記憶する（ステップ６０）。 Next, the CPU 81 generates a JS information amount for each block 11 using fp(x) and fn(x) stored in the RAM 83, and stores the amount in the RAM 83 (step 55).
Next, the CPU 81 performs sampling based on the JS information amount stored in the RAM 83, generates a feature space 35 based on the samples for each block 11, and stores the sampled feature space 35 in the RAM 83 (step 60).

次に、ＣＰＵ８１は、ＲＡＭ８３に記憶したブロック１１ごとの特徴空間３５を足し合わせることにより統合し、これによって統一した特徴空間３６を生成してＲＡＭ８３に記憶する（ステップ６５）。
次に、ＣＰＵ８１は、ＲＡＭ８３に記憶した特徴空間３６に対して、ｃ－ＡＩＣを用いて混合数を決定し、更に、ＥＭアルゴリズムを用いて当該混合数に基づくＧＭＭを生成する（ステップ７０）。
そして、ＣＰＵ８１は、当該ＧＭＭをＲＡＭ８３に記憶して、画像認識に用いる基底関数３７に設定する（ステップ７５）。 Next, the CPU 81 integrates the feature spaces 35 for each block 11 stored in the RAM 83 by adding them together, thereby generating a unified feature space 36 and storing it in the RAM 83 (step 65).
Next, the CPU 81 determines the mixture number for the feature space 36 stored in the RAM 83 using c-AIC, and further generates a GMM based on the mixture number using the EM algorithm (step 70).
Then, the CPU 81 stores the GMM in the RAM 83 and sets it as the basis function 37 to be used for image recognition (step 75).

次に、基底関数３７を用いた画像の特徴の抽出方法について説明する。
図示しないが、画像認識装置９は、画像処理装置８と同様のハードウェア構成を有しており、画像認識プログラム、認識対象の画像、及び画像処理装置８が生成した基底関数３７などを記憶した記憶装置９４、画像認識プログラムに従って画像認識するＣＰＵ９１、及び、これにワーキングメモリを提供するＲＡＭ９３などを備えている。 Next, a method for extracting image features using the basis functions 37 will be described.
Although not shown in the figure, the image recognition device 9 has a hardware configuration similar to that of the image processing device 8, and is equipped with a memory device 94 that stores an image recognition program, an image to be recognized, and basis functions 37 generated by the image processing device 8, a CPU 91 that performs image recognition in accordance with the image recognition program, and a RAM 93 that provides working memory for the CPU 91.

画像認識装置９は、次のように、画像の基底関数３７に対する負担率を当該画像の特徴量として算出する。
負担率λｊは、図７（ｄ）の式（１０）で表され、ｚは潜在パラメータ（ｊ番目の成分が１で他が０となるｋ次元のベクトル量）である。
負担率λｊは、データ点の分布ｘがｊ番目のガウス分布から生成される確率を表している。
各ｚについて計算するとλｊによるｋ次元のベクトルが得られるが、画像認識装置９は、これを特徴量とする。データ点ｘがポジティブ画像１０とネガティブ画像２０の何れにも類似していない場合は０ベクトルに近づく。
このような原理に基づき、画像認識装置９は、次のようにして画像から特徴量を抽出する。 The image recognition device 9 calculates the burden rate of the image with respect to the basis function 37 as a feature amount of the image, as follows.
The burden rate λj is expressed by equation (10) in FIG. 7(d), where z is a latent parameter (a k-dimensional vector quantity in which the j-th component is 1 and the others are 0).
The contribution rate λ j represents the probability that the distribution of data points x is generated from the j-th Gaussian distribution.
When the calculation is performed for each z, a k-dimensional vector by λj is obtained, which is used as a feature by the image recognition device 9. If the data point x is not similar to either the positive image 10 or the negative image 20, it approaches a 0 vector.
Based on this principle, the image recognition device 9 extracts features from an image in the following manner.

図９は、画像認識方法を説明するための図である。
以下の処理は、ＣＰＵ９１が画像認識プログラムに従って行うものである。
図９（ａ）に示したように、画像認識装置９は、画像認識対象である画像４０をＲＡＭ９３に読み込み、その上に識別フィルタ４１ａで矩形領域を設定する。
このように、画像認識装置９は、画像認識に係る画像を取得する画像取得手段を備えている。
そして、画像認識装置９は、識別フィルタ４１ａによって抽出した画像を、例えば、ポジティブ画像１０やネガティブ画像２０と同じ３×６個のブロック１１ａ、１１ｂ、・・・に区分する。
このように、画像認識装置９は、取得した画像をブロックに区分する区分手段を備えている。 FIG. 9 is a diagram for explaining the image recognition method.
The following process is performed by the CPU 91 in accordance with the image recognition program.
As shown in FIG. 9A, the image recognition device 9 reads an image 40, which is the object of image recognition, into a RAM 93, and sets a rectangular area thereon using a discrimination filter 41a.
In this manner, the image recognition device 9 includes an image acquisition means for acquiring an image related to image recognition.
Then, the image recognition device 9 divides the image extracted by the discrimination filter 41a into, for example, 3×6 blocks 11a, 11b, . . . like the positive image 10 and the negative image 20.
Thus, the image recognition device 9 includes a division means for dividing the acquired image into blocks.

次いで、画像認識装置９は、ブロック１１ごとに高中低の解像度に渡って勾配方向の共起を特徴空間１３にプロットする。
そして、画像認識装置９は、記憶装置９４から基底関数３７を読み出して、その基底関数３７に対する各ブロック１１の負担率を図７（ｄ）の式（１０）によって計算する。
このように、画像認識装置９は、画像処理装置８が生成した基底関数を取得する基底関数取得手段と、区分した各ブロックに対して当該基底関数を適用し、当該基底関数に対する特徴量を取得する特徴量取得手段を備えている。 The image recognition device 9 then plots the co-occurrence of gradient directions across high, medium and low resolutions for each block 11 in a feature space 13 .
Then, the image recognition device 9 reads out the basis function 37 from the storage device 94, and calculates the burden rate of each block 11 with respect to the basis function 37 by using the formula (10) in FIG.
In this way, the image recognition device 9 is equipped with a basis function acquisition means for acquiring the basis function generated by the image processing device 8, and a feature acquisition means for applying the basis function to each divided block and acquiring features for the basis function.

画像認識装置９は、このようにして算出した負担率による特徴量を用いて識別フィルタ４１内の画像に歩行者が写っているか否かを判断し、判断結果をＲＡＭ９３に記憶する。
これは各種の方法が考えられ、例えば、ブロック１１ごとに判定してそれを総合判定してもよいし、あるいは、各ブロック１１の負担率を統合して全体として判定してもよい。 The image recognition device 9 uses the feature amount based on the load ratio calculated in this manner to determine whether or not a pedestrian is captured in the image in the discrimination filter 41 , and stores the determination result in the RAM 93 .
This can be achieved in various ways. For example, a judgment may be made for each block 11 and then a comprehensive judgment may be made, or the burden rates of each block 11 may be integrated and judged as a whole.

判定は、例えば、ＳＶＭ（サポートベクターマシン）やＡｄａＢｏｏｓｔなどの識別器に正規化した特徴量を入力して行うことができる。
画像認識装置９は、このようにして識別フィルタ４１ａ内の画像を判定すると、識別フィルタ４１を１ブロックずつシフトしながら画像４０を走査し、同様の判定を行っていく。
このように、画像認識装置９は、各ブロックから取得した特徴量を用いて画像に所定の画像認識対象が写っているか否かを判定する判定手段を備えている。 The determination can be performed, for example, by inputting normalized features into a classifier such as an SVM (support vector machine) or AdaBoost.
After the image recognition device 9 has determined the image in the discrimination filter 41a in this manner, it shifts the discrimination filter 41 by one block while scanning the image 40, and performs the same determination.
In this manner, the image recognition device 9 includes a determination means for determining whether or not a predetermined image recognition target is included in an image using the feature amount acquired from each block.

画像認識装置９は、全ブロック１１で統一した最適なＧＭＭによる状態空間を作成し、各ブロック１１に同一の基底関数３７を適用するため、各ブロック１１同士の状態空間に互換性がある。
これにより、識別フィルタ４１を移動させても一度計算したブロック１１は特徴量の引き継ぎが可能となり、画像中の識別フィルタ４１をスライドさせても、その都度特徴量を計算し直す必要が無くなる。 The image recognition device 9 creates a state space by an optimal GMM that is unified for all blocks 11, and applies the same basis function 37 to each block 11, so that the state spaces of the blocks 11 are compatible with each other.
This makes it possible for the block 11 to inherit the feature values once calculated even if the discrimination filter 41 is moved, and eliminates the need to recalculate the feature values each time the discrimination filter 41 in the image is slid.

例えば、図９（ａ）の識別フィルタ４１ａと識別フィルタ４１ｂでは、ブロック１１Ａが共通である。
従来は、識別フィルタ４１ごとのブロック１１ごとに基底関数を設定していたため、図９（ｂ）上図に示したように、同じブロック１１Ａであるにもかかわらず、特徴量を再度計算していた。 For example, the discrimination filter 41a and discrimination filter 41b in FIG. 9A share the block 11A.
Conventionally, a basis function is set for each block 11 for each discrimination filter 41, so that the feature amount is calculated again even for the same block 11A, as shown in the upper diagram of FIG. 9B.

これに対し、画像認識装置９は、同じ基底関数３７を使用するため、図９（ｂ）下図に示したように、識別フィルタ４１ａ、４１ｂで、ブロック１１Ａの特徴量が同じ値になるため、先に計算した特徴量を引き継ぐことができる。これにより計算リソースを大幅に節約することができる。 In contrast, the image recognition device 9 uses the same basis function 37, so as shown in the lower diagram of FIG. 9(b), the feature value of block 11A becomes the same in discrimination filters 41a and 41b, and the feature value calculated earlier can be inherited. This allows for a significant saving in calculation resources.

このように、従来手法では、各ブロックで使用する基底関数が異なるため、隣接した矩形領域の特徴量を計算する際、重複した領域があるにも関わらず、全ての領域で特徴量の計算を再度行う必要があり、計算コストが高くなっていたが、共通の基底関数３７を採用することにより、これらの問題を解決することができる。 As described above, in conventional methods, because different basis functions are used for each block, when calculating the features of adjacent rectangular regions, the feature calculations must be performed again for all regions, even though there are overlapping regions, resulting in high calculation costs. However, by adopting a common basis function 37, these problems can be solved.

更に、ＧＭＭを用いない従来方式では、図９（ｃ）上図のように、ブロック１１ごとに設定した２次元ヒストグラムに、例えば、８方向に量子化した勾配方向のペアを投票していた。ヒストグラムのビンは、認識対象にかかわらず設定したため、画像の特徴が現れないビンにも投票していた。
これに対し、ＧＭＭによって状態空間を生成する方式では、図９（ｃ）下図のように、画像の特徴が現れる領域に対して自律的に確率密度の高い領域が形成されるため、領域４５のように、特徴の現れない領域に対する処理を行わずに済む。これにより、計算コストを低減することができる。 Furthermore, in the conventional method that does not use GMM, as shown in the upper diagram of Fig. 9(c), for example, pairs of gradient directions quantized into eight directions are voted for in a two-dimensional histogram set for each block 11. Since the bins of the histogram are set regardless of the recognition target, voting is also performed for bins in which image features do not appear.
In contrast, in the method of generating a state space by GMM, as shown in the lower diagram of Fig. 9(c), a region with high probability density is formed autonomously in the region where image features appear, so there is no need to process regions where features do not appear, such as region 45. This makes it possible to reduce calculation costs.

更に、従来は、ブロック１１ごとに基底関数を設定していたため、例えば、基底関数を３×６のブロック１１で生成した場合は、識別フィルタも３×６にする必要があった。
これに対し、本実施形態の方式では、基底関数３７が共通なため、識別フィルタを、例えば、３×５にするなど、ブロック単位で変形して設計することも可能な場合がある。これにより、識別フィルタ作成側のシステムと識別フィルタ使用側のシステムの結合を従来よりも疎とすることができる。 Furthermore, conventionally, a basis function is set for each block 11. For example, if a basis function is generated for a 3×6 block 11, the discrimination filter must also be 3×6.
In contrast to this, in the method of this embodiment, since the basis function 37 is common, it may be possible to design the discrimination filter by modifying it in block units, for example to 3 × 5. This allows the coupling between the discrimination filter generating system and the discrimination filter using system to be looser than in the past.

図１０は、画像処理装置８が生成した基底関数３７を用いて画像認識を行った結果を表したグラフである。
図１０（ａ）、（ｂ）、（ｃ）は、ｃ－ＡＩＣ値に基づき、それぞれ混合数ｋ＝４５、３２、１５とした場合のＲＯＣ（ＲｅｃｅｉｖｅｒＯｐｅｒａｔｉｎｇＣｈａｒａｃｔｅｒｉｓｔｉｃ）曲線であって、横軸は誤検出率、縦軸は正検出率を示している。 FIG. 10 is a graph showing the results of image recognition using the basis functions 37 generated by the image processing device 8.
10(a), (b), and (c) are ROC (Receiver Operating Characteristic) curves based on the c-AIC value when the mixture number k is 45, 32, and 15, respectively, where the horizontal axis indicates the false positive rate and the vertical axis indicates the true positive rate.

太線は基底関数を基底関数３７に統一した画像認識装置９によるＧＭＭ－ＭＲＣｏＨＯＧ特徴量の場合、破線はブロック１１ごとに異なる基底関数を用いた従来のＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた場合、細線はヒストグラムを用いたＭＲＣｏＨＯＧ特徴量の場合を示しており、曲線が左上の隅に寄るほどよい精度であることを示している。 The thick line shows the GMM-MRCoHOG features obtained by the image recognition device 9 with the basis function unified to basis function 37, the dashed line shows the conventional GMM-MRCoHOG features using different basis functions for each block 11, and the thin line shows the MRCoHOG features using a histogram. The closer the curve is to the upper left corner, the better the accuracy.

グラフに示したように、ｋ＝４５では、基底関数を統一したＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた場合は、従来のＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた場合に比べて認識精度が若干劣るが、従来のＭＲＣｏＨＯＧ特徴量を用いた場合に比べて高い認識精度を誇っており、十分に実用に耐えることができる。
ｋ＝３２、１５では、基底関数を統一したＧＭＭ－ＭＲＣｏＨＯＧ特徴量を用いた場合は、ｋ＝４５の場合よりも若干認識精度が劣るが、従来のＭＲＣｏＨＯＧ特徴量を用いた場合に比べて高い認識精度を誇っており、十分に実用に耐えることができる。 As shown in the graph, when k=45, the recognition accuracy is slightly lower when the GMM-MRCoHOG features with unified basis functions are used than when the conventional GMM-MRCoHOG features are used, but it is still higher in recognition accuracy than when the conventional MRCoHOG features are used, and is fully practical.
When k=32 and 15, the recognition accuracy is slightly lower when the GMM-MRCoHOG features with a unified basis function are used than when k=45. However, the recognition accuracy is higher than when the conventional MRCoHOG features are used, and the recognition accuracy is sufficient for practical use.

以上、本実施形態について説明したが、各種の変形が可能である。
例えば、本実施形態では、ポジティブ画像とネガティブ画像を用いたが、基底関数の作成は、ポジティブ画像だけで行うことも可能である。
また、本実施形態では、確率分布間の計量にＪＳ情報量を用いたが、他の計量を用いることも可能である。 Although the present embodiment has been described above, various modifications are possible.
For example, in this embodiment, positive and negative images are used, but the basis functions can also be created using only positive images.
In addition, in this embodiment, the JS divergence is used as a metric between probability distributions, but other metrics can also be used.

以上に説明したように、本実施形態によれば、各ブロックで使用する基底関数を一つに統一することでメモリの使用量を大幅に低減することができる。
また、基底関数が各ブロックで共通なため、隣接した矩形領域の特徴量を計算する際においても、既に計算した特徴量を活用でき、計算コストの削減が可能となる。
また、基底関数を統一することで精度低下が懸念されるが、赤池情報量規準に基づく尺度を用いて混合数を自動決定することにより、計算リソースの使用を抑制したまま精度を保つことができる。より精度を上げたい場合は、混合数を増加させればよい。
これにより、ＦＰＧＡや小型コンピュータ、あるいは、ＧＰＧＰＵ（Ｇｅｎｅｒａｌ－ｐｕｒｐｏｓｅｃｏｍｐｕｔｉｎｇｏｎｇｒａｐｈｉｃｓｐｒｏｃｅｓｓｉｎｇｕｎｉｔｓ）などの高機能な演算処理を有しない機器に高い識別能力を維持したまま搭載することができる。 As described above, according to this embodiment, the amount of memory used can be significantly reduced by unifying the basis functions used in each block into one.
In addition, since the basis functions are common to each block, when calculating the features of adjacent rectangular regions, the features that have already been calculated can be used, making it possible to reduce calculation costs.
Although there is concern that the accuracy may decrease due to the unification of basis functions, it is possible to maintain accuracy while suppressing the use of computational resources by automatically determining the number of mixtures using a measure based on the Akaike Information Criterion. If you want to improve accuracy further, you can simply increase the number of mixtures.
This allows the device to be installed in devices that do not have high-performance arithmetic processing, such as FPGAs, small computers, or general-purpose computing on graphics processing units (GPGPUs), while maintaining high discrimination capabilities.

５注目画素
８画像処理装置
１０ポジティブ画像
１１ブロック
１３、２３、３５、３６特徴空間
１５高解像度画像
１６中解像度画像
１７低解像度画像
２０ネガティブ画像
３３ＪＳ情報量
３７基底関数
４０画像
４１識別フィルタ
４５領域
８１ＣＰＵ
８２ＲＯＭ
８３ＲＡＭ
８４記憶装置
８５記憶媒体駆動装置
８６入力部
８７出力部 5 Pixel of interest 8 Image processing device 10 Positive image 11 Block 13, 23, 35, 36 Feature space 15 High resolution image 16 Medium resolution image 17 Low resolution image 20 Negative image 33 JS information amount 37 Basis function 40 Image 41 Discrimination filter 45 Region 81 CPU
82 ROM
83 RAM
84 Storage device 85 Storage medium drive device 86 Input unit 87 Output unit

Claims

An image acquisition means for acquiring images for image recognition training;
A partitioning means for partitioning the acquired image into a plurality of blocks;
a frequency distribution acquiring means for acquiring a frequency distribution of co-occurrence of luminance gradient directions for each of the divided blocks;
a unifying means for unifying the acquired frequency distributions for each block into a single frequency distribution;
a basis function generating means for generating a basis function serving as a criterion for image recognition based on the unified frequency distribution;
13. An image processing device comprising:

the unifying means performs the unification by superimposing the frequency distributions in the plurality of blocks.
2. The image processing device according to claim 1,

the unifying means generates a sample based on the acquired frequency distribution for each block, and adds up the generated samples across the plurality of blocks, thereby superimposing the frequency distributions in the plurality of blocks.
3. The image processing device according to claim 2.

the frequency distribution acquisition means acquires a frequency distribution of co-occurrence of brightness gradient directions between different resolutions of the same image;
4. The image processing device according to claim 1, 2 or 3.

The image acquisition means acquires a plurality of images;
the unifying means unifies frequency distributions for each block of the plurality of images into a single frequency distribution.
5. The image processing device according to claim 1, wherein the image processing device further comprises: a first input unit;

The image acquisition means acquires a recognition target image including a recognition target and a non-recognition target image including no recognition target,
the frequency distribution acquisition means acquires a frequency distribution in a corresponding block based on a difference between a frequency distribution of a luminance gradient direction in the corresponding block of the recognition target image and the non-recognition target image,
6. An image processing device according to any one of claims 1 to 5.

The basis function is a probability density function based on a Gaussian mixture model, and a determination means is provided for determining an appropriate number of mixtures based on a balance between likelihood and number of mixtures.
7. The image processing device according to claim 1, wherein the image processing device further comprises: a first input section;

a weighting factor is set for each of the plurality of blocks when integrating the frequency distributions, and the unifying means integrates the frequency distributions of each of the plurality of blocks in accordance with the weighting factor;
8. The image processing device according to claim 1, wherein the image processing device further comprises:

A basis function acquisition means for acquiring the basis function according to any one of claims 1 to 8;
An image acquisition means for acquiring an image related to image recognition;
A partitioning means for partitioning the acquired image into blocks;
a feature acquisition means for applying the acquired basis function to each of the divided blocks and acquiring a feature for the basis function;
a determining means for determining whether or not a predetermined image recognition target is included in the acquired image by using the feature amount acquired from each block;
An image recognition device comprising:

An image acquisition function to acquire images for image recognition training;
A segmentation function for segmenting the acquired image into a plurality of blocks;
a frequency distribution acquisition function for acquiring a frequency distribution of co-occurrence of luminance gradient directions for each of the divided blocks;
A unification function for unifying the frequency distributions for each block obtained by integrating them into a single frequency distribution;
a basis function generating function for generating a basis function serving as a criterion for image recognition based on the unified frequency distribution;
An image processing program that realizes this on a computer.

A basis function acquisition function for acquiring the basis function according to any one of claims 1 to 8;
An image acquisition function for acquiring an image related to image recognition;
A segmentation function for segmenting the acquired image into blocks;
a feature acquisition function that applies the acquired basis function to each of the divided blocks and acquires a feature for the basis function;
a determination function for determining whether or not a predetermined image recognition target is included in the acquired image by using the feature amount acquired from each block; and
An image recognition program that realizes this on a computer.