JP4637564B2

JP4637564B2 - Status detection device, status detection method, program, and recording medium

Info

Publication number: JP4637564B2
Application number: JP2004372326A
Authority: JP
Inventors: 雅二郎岩崎
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2004-12-22
Filing date: 2004-12-22
Publication date: 2011-02-23
Anticipated expiration: 2024-12-22
Also published as: JP2006178790A

Description

本発明は、状態検知装置、状態検知方法、プログラムおよび記録媒体に関し、例えば、防犯用などのカメラを用いた状態監視装置に適用して好適である。 The present invention relates to a state detection device, a state detection method, a program, and a recording medium, and is suitable for application to a state monitoring device using a camera for crime prevention, for example.

近年、施設の大規模化、人件費の高騰などに伴い、建造物や室内等の立ち入り禁止領域等への人や動物その他の物体の接近を監視し、異常な状態を検出した場合、警告の発生を行う監視装置の必要性が年々高まっている。
従来、この種の画像監視装置として防犯カメラが知られている。防犯カメラは、監視対象領域に向けて設置したビデオカメラと、これに接続されたＴＶモニタおよびビデオテープレコーダ（ＶＴＲ）から構成されており、ビデオカメラで撮影した監視対象領域の画像をＴＶモニタに映しながらＶＴＲで録画するようにしたものであり、ＴＶモニタの監視と異常発生の判断作業は監視員が行うようにしている。
この場合には、画面の監視と異常発生の判断作業はすべて人手に頼っているため、監視員の疲労やストレスの増加、あるいは監視員の能力の個人差などにより監視精度が大きく変化するという問題があった。さらに、人手による場合、画面全体に及ぶ大きな画像変化に対して検出精度は高くとも、細かい部分の画像変化を見落としがちであるという問題があった。
このため、人間が理解しやすい動画像を入力とするＴＶカメラなどを用い、ある時点での画像の各画素を背景に対応する基準画像もしくは過去の画像と比較し、その変化が大きいときに、異常であると判定する監視装置が提案されている（例えば、特許文献１参照）。
また、一台のＴＶカメラで検出できない広い区域の監視を行う監視装置の例としては、変化領域の抽出処理を行う画像監視装置と分散ネットワークを利用した画像監視システム（例えば、特許文献２参照）や分散型画像認識システム（例えば、特許文献３参照）がある。
特開昭６２−１４７８９０号公報特開平４−２６１２８９号公報特開平２−８２３７５号公報 In recent years, as facilities have become larger and labor costs have risen, the approach of people, animals, and other objects to buildings, indoors, and other restricted areas is monitored, and if an abnormal condition is detected, a warning is issued. The need for monitoring devices to generate is increasing year by year.
Conventionally, a security camera is known as this type of image monitoring apparatus. The security camera is composed of a video camera installed toward the monitored area, a TV monitor and a video tape recorder (VTR) connected to the video camera, and an image of the monitored area captured by the video camera is used as the TV monitor. The video is recorded on the VTR while it is being projected, and the monitor is responsible for monitoring the TV monitor and determining the occurrence of an abnormality.
In this case, since the monitoring of the screen and the determination of the occurrence of an abnormality all depend on human resources, there is a problem that the monitoring accuracy greatly changes due to an increase in the fatigue and stress of the observer or individual differences in the ability of the observer. was there. Furthermore, in the case of manual operation, there is a problem that even if the detection accuracy is high for a large image change over the entire screen, it is easy to overlook the image change in a fine part.
For this reason, when using a TV camera or the like that inputs a moving image that is easy for humans to understand, when comparing each pixel of the image at a certain point of time with a reference image corresponding to the background or a past image, A monitoring device that determines an abnormality has been proposed (see, for example, Patent Document 1).
In addition, as an example of a monitoring apparatus that monitors a wide area that cannot be detected by a single TV camera, an image monitoring system that performs a change area extraction process and an image monitoring system that uses a distributed network (for example, see Patent Document 2). And a distributed image recognition system (see, for example, Patent Document 3).
Japanese Patent Laid-Open No. 62-147890 JP-A-4-261289 JP-A-2-82375

しかしながら、従来の監視装置は、以前のものに比して相当改善されているとはいえ、いまだに設備のコストがかかり、また、検知のための演算量も少ないものとはいえないため、例えば、緊急事態が発生した場合、リアルタイムでその異常を確認することができず、瞬時の対応にかけてしまう場合があった。
本発明は、上述した実情を考慮してなされたものであって、簡単な設備を利用して、信頼性の高い監視が行えるとともに、緊急事態への瞬時の対応や遠隔地に対する検知が行える状態検知装置、状態検知方法、状態検知装置の機能を実行させるためのプログラムおよびそのプログラムを記録した記録媒体を提供することを目的とする。 However, although the conventional monitoring device is considerably improved as compared with the previous one, it still costs equipment, and the amount of calculation for detection cannot be said to be small. When an emergency situation occurs, the abnormality cannot be confirmed in real time, and there are cases where an immediate response is made.
The present invention has been made in consideration of the above-described circumstances, and is capable of performing high-reliability monitoring using a simple facility and capable of instantaneous response to emergency situations and detection of remote locations. It is an object of the present invention to provide a detection device, a state detection method, a program for executing the function of the state detection device, and a recording medium on which the program is recorded.

上記の課題を解決するために、請求項１に記載の発明は、正常な状態を撮影装置により撮影した一連の静止画像から各時間区間の区間動画特徴量を抽出し、正常状態の特徴量として前記区間動画特徴量を学習特徴量データベースへ蓄積する学習特徴量抽出手段と、前記撮影装置から入力される一連の静止画像に対して時間区間の区間動画特徴量を抽出する特徴量抽出手段と、該特徴量抽出手段により抽出した区間動画特徴量と前記学習特徴量データベースに蓄積された正常状態の特徴量とを比較することによって、入力画像が正常状態か否かを自動判定する検知手段と、を備え、前記学習特徴量抽出手段は、正常状態の区間動画特徴量を累積する際、現在処理対象となっている一または複数の時間区間における区間動画特徴量と、既に抽出したすべての時間区間における各区間動画特徴量と、を比較して夫々時間区間相違度を算出し、算出された前記時間区間相違度の最大値である最大時間区間相違度を求め、該最大時間区間相違度と、現在処理対象となっている時間区間を除いた時間区間について求めた最大時間区間相違度との差異が所定値以内である場合に、学習処理を終了することを特徴とする。
請求項２に記載の発明は、請求項１に記載の状態検知装置において、前記時間区間相違度は、一方の前記時間区間特徴量に含まれる各静止画像の特徴量と、他方の前記時間区間特徴量に含まれる各静止画像の特徴量との差分をとって、全静止画像について総和したものであることを特徴とする。
請求項３に記載の発明は、請求項１に記載の状態検知装置において、前記区間動画特徴量を抽出する各区間をオーバーラップさせるようにしたことを特徴とする。 In order to solve the above-mentioned problem, the invention according to claim 1 extracts a segment moving image feature amount of each time interval from a series of still images obtained by photographing a normal state with a photographing device, and obtains the feature amount as a normal state feature amount. a learning feature quantity extraction means to accumulate the segment video feature quantity learning feature database, a feature amount extracting section which extracts a section video feature quantity of a series of time intervals with respect to the still image input from the imaging device , the feature quantity by comparing the characteristic quantity of the normal state of being accumulated and extracted segment video feature quantity to the learning feature quantity database by extracting means, the automatic determining detect hand stage whether the input image is normal state When, wherein the learning feature quantity extracting means, when accumulating section moving feature amount in a normal state, a section moving feature amount in one or more time intervals to be currently processed, previously extracted Each section moving image feature value in all time sections is compared to calculate a time section dissimilarity, a maximum time section dissimilarity which is the maximum value of the calculated time section dissimilarity is obtained, and the maximum time section and dissimilarity, when the difference between the maximum time interval dissimilarity determined for the current processing target and going on time interval excluding the time interval is within a predetermined value, characterized that you finished the learning process.
According to a second aspect of the present invention, in the state detection device according to the first aspect, the time interval dissimilarity includes the feature amount of each still image included in one of the time interval feature amounts and the other time interval. A feature is that the difference between the feature amount of each still image included in the feature amount and the total still image are summed .
According to a third aspect of the present invention, in the state detection device according to the first aspect, the sections from which the section moving image feature values are extracted are overlapped.

請求項４に記載の発明は、請求項１に記載の状態検知装置において、単一の静止画の特徴量は、単一の静止画をブロックに分割し、各ブロックから抽出した平均色または明度であることを特徴とする。
請求項５に記載の発明は、請求項１に記載の状態検知装置において、前記区間動画特徴量は所定の時間区間の全静止画から抽出された全特徴量を時系列に整列させたものであることを特徴とする。
請求項６に記載の発明は、請求項１に記載の状態検知装置において、クライアント側に配置した撮影装置で撮影した動画データを、ネットワークを介して接続されたサーバ側へ送信し、該サーバ側で受信した撮影装置からの動画データで学習処理あるいは検知処理を実行するようにしたことを特徴とする。
請求項７に記載の発明は、正常な状態を撮影装置により撮影した一連の静止画像から各時間区間の区間動画特徴量を抽出し、抽出した区間動画特徴量を正常状態の特徴量として学習特徴量データベースに蓄積し、その後に、前記撮影装置から入力される一連の静止画像からも同様に抽出した各時間区間の区間動画特徴量と、前記学習特徴量データベースに蓄積された正常状態の特徴量を比較して、入力画像が正常状態か否かを自動判定する状態検知方法であって、正常状態の区間動画特徴量を累積する際、現在処理対象となっている一または複数の時間区間における区間動画特徴量と、既に抽出したすべての時間区間における各区間動画特徴量と、を比較して夫々時間区間相違度を算出し、算出された前記時間区間相違度の最大値である最大時間区間相違度を求め、該最大時間区間相違度と、現在処理対象となっている時間区間を除いた時間区間について求めた最大時間区間相違度との差異が所定値以内である場合に、学習処理を終了することを特徴とする。
請求項８に記載の発明は、コンピュータに、請求項１乃至６のいずれか一項に記載の状態検知装置の機能を実行させるためのプログラムであることを特徴とする。 According to a fourth aspect of the present invention, in the state detection device according to the first aspect, the feature amount of a single still image is obtained by dividing the single still image into blocks and extracting an average color or brightness extracted from each block. It is characterized by being.
According to a fifth aspect of the present invention, in the state detection device according to the first aspect, the section moving image feature amount is obtained by aligning all feature amounts extracted from all still images in a predetermined time section in time series. It is characterized by being.
According to a sixth aspect of the present invention, in the state detection device according to the first aspect, the moving image data photographed by the photographing device arranged on the client side is transmitted to the server side connected via the network, and the server side The learning process or the detection process is executed on the moving image data received from the imaging apparatus.
According to the seventh aspect of the present invention, the section moving image feature amount of each time section is extracted from a series of still images obtained by photographing the normal state with the photographing device, and the extracted section moving image feature amount is learned as the normal state feature amount. A segment moving image feature amount of each time interval that is accumulated in the amount database and then extracted in a similar manner from a series of still images input from the imaging device, and a normal state feature amount accumulated in the learning feature amount database Is a state detection method for automatically determining whether or not an input image is in a normal state, and when accumulating normal state segment video feature values, in one or more time segments currently being processed The section video feature amount is compared with each section video feature amount in all the extracted time sections to calculate the time section dissimilarity, and the maximum value that is the maximum value of the calculated time section dissimilarity If the difference between the maximum time interval difference and the maximum time interval difference calculated for the time interval excluding the time interval currently being processed is within a predetermined value, learning is performed. The process is terminated .
The invention according to claim 8 is a program for causing a computer to execute the function of the state detection device according to any one of claims 1 to 6 .

本発明によれば、監視対象領域の画像変化を検知するようにしたので、簡単な設備を利用して、信頼性の高い監視が行える。
また、検知の演算を簡単化することにより緊急事態への瞬時の対応ができる。
また、分散ネットワークに接続されたコンピュータからの指示により、遠隔地の画像を取得できるようにしたことにより、遠隔地の状態も監視できる。 According to the present invention, since an image change in the monitoring target region is detected, highly reliable monitoring can be performed using simple equipment.
In addition, it is possible to respond to an emergency situation instantly by simplifying the calculation of detection.
In addition, since the remote image can be acquired by an instruction from a computer connected to the distributed network, the state of the remote location can be monitored.

以下、図面を参照して、本発明の実施形態を詳細に説明する。
本発明は、学習処理部と検知処理部とからなっており、学習処理部では、撮影装置から入力された画像からある時間間隔ごとに動画像から画像特徴（画像特徴量）を抽出し、この画像特徴量を学習特徴量ＤＢに登録する。学習処理が終了した時点で取得した画像特徴量空間における特徴量の分布を、正常状態を示す領域（正常特徴量領域）と定義する。
検知処理部では、学習処理部と同様にして画像特徴量を抽出し、抽出した画像特徴量が正常特徴量領域に含まれるか否かを判断し、正常特徴量領域以外の場合には異常状態と判断し、事前に指定された何らかの処理を実行する。
＜実施形態１＞
（１）学習処理部
図１は、本実施形態に係る状態検知装置のうち学習処理部の機能構成を示すブロック図である。同図において、学習処理部１０は、学習制御手段１１、学習動画登録手段１２、学習動画ＤＢ（データベース）１３、学習特徴量抽出手段１４、学習特徴量ＤＢ（データベース）１５、学習終了判定手段１６とから構成される。
学習制御手段１１は、学習が終了したか否かによって、学習動画登録手段１２、学習特徴量抽出手段１４および学習終了判定手段１６の起動制御を行う。この学習終了判定手段１６で終了とされるまで、動画の読み取り、特徴量の抽出を続ける。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
The present invention includes a learning processing unit and a detection processing unit. The learning processing unit extracts image features (image feature amounts) from a moving image at certain time intervals from an image input from the imaging device. The image feature amount is registered in the learning feature amount DB. The distribution of feature amounts in the image feature amount space acquired when the learning process is completed is defined as a region indicating a normal state (normal feature amount region).
The detection processing unit extracts the image feature amount in the same manner as the learning processing unit, determines whether or not the extracted image feature amount is included in the normal feature amount region. It is determined that some processing specified in advance is executed.
<Embodiment 1>
(1) Learning Processing Unit FIG. 1 is a block diagram illustrating a functional configuration of a learning processing unit in the state detection device according to the present embodiment. In the figure, a learning processing unit 10 includes a learning control unit 11, a learning moving image registration unit 12, a learning moving image DB (database) 13, a learning feature amount extraction unit 14, a learning feature amount DB (database) 15, and a learning end determination unit 16. It consists of.
The learning control unit 11 performs activation control of the learning moving image registration unit 12, the learning feature amount extraction unit 14, and the learning end determination unit 16 depending on whether or not learning is completed. The moving image reading and the feature amount extraction are continued until the learning end determination unit 16 ends the operation.

学習動画登録手段１２は、正常状態の動画像が入力されると仮定して、動画像を撮影装置（例えば、カメラあるいはビデオカメラ等）で入力し、この動画像を構成する静止画像（フレーム）を抽出して、フレームＩＤとフレームの画像とを対応付けて学習動画ＤＢ１３に格納する。このフレームＩＤは、学習動画ＤＢ１３に登録される画像の順序番号（１で始まり、１つずつ増加する）である。
また、入力は動画とするが、動画は時系列に並んだ複数の静止画像であると定義すれば、例えば、１分間隔に撮影した所定数の静止画を入力データとしても構わない。
ここで、学習動画登録手段１２で読み取るフレーム画像の数は、動画を読み込む時間の長さを予め指定しておき、これを１つの時間区間と考え、この時間区間で読み込まれるフレーム画像の個数（本実施形態１では、ｎとする）とする。
この時間区間は、図２に示すように、各区間が一定の時間オーバーラップしてもよいものとする。この一定時間に読み込まれるフレーム画像数も一定数（本実施形態１ではａとする。ここでａ＝０であってもよい）とすると、１つの時間区間のフレーム画像が読み込まれたあと、次のｎ個のフレーム画像のうち、はじめのａ個は前のフレーム画像のうち最後のａ個をコピーしたものであり、新たに（ｎ−ａ）個のフレーム画像が読み込まれることになる。フレームＩＤは、重複してコピーされたものについても順序番号を与えるようにする。
このようにフレーム画像を取得する区間をオーバーラップさせることにより、区間を跨いだ動画部分の変化の検知精度を高めることが可能となる。
学習特徴量抽出手段１４は、学習動画ＤＢ１３に登録された最近の１時間区間の各フレーム画像に対して公知の手法を用いて画像の特徴量を抽出し、この抽出した特徴量をフレームＩＤと対応付けて学習特徴量ＤＢ１５へ格納する。

Assuming that a moving image in a normal state is input, the learning moving image registration unit 12 inputs the moving image with a photographing device (for example, a camera or a video camera), and a still image (frame) constituting the moving image. And the frame ID and the frame image are stored in the learning video DB 13 in association with each other. This frame ID is the sequence number (starting with 1 and incrementing by 1) of the images registered in the learning video DB 13.
Further, although the input is a moving image, if a moving image is defined as a plurality of still images arranged in time series, for example, a predetermined number of still images taken at one minute intervals may be used as input data.
Here, the number of frame images to be read by the learning moving image registration means 12 specifies the length of time for reading a moving image in advance, considers this as one time interval, and the number of frame images to be read in this time interval ( In the first embodiment, it is assumed that n).
In this time interval, as shown in FIG. 2, each interval may overlap for a certain time. Assuming that the number of frame images to be read during this fixed time is also a fixed number (a in the first embodiment, where a = 0 may be used), after the frame image of one time interval is read, the next Of the n frame images, the first a is a copy of the last a of the previous frame images, and (na) frame images are newly read. The frame ID is given a sequence number even for duplicate copies.
In this way, by overlapping the sections in which the frame images are acquired, it is possible to improve the detection accuracy of the change in the moving image portion across the sections.
The learning feature amount extraction unit 14 extracts a feature amount of an image using a known method for each frame image of the latest one hour section registered in the learning video DB 13, and uses the extracted feature amount as a frame ID. The associated information is stored in the learning feature value DB 15.

本実施形態１では、１フレームの画像データをメッシュ分割し、分割された各ブロックごとに、そのブロックのＲＧＢ成分ごとの平均色を抽出する（図３参照）。例えば、図３のように、画像を５×５にメッシュ分割した場合、画像の特徴量は、各ブロックにＲＧＢの３成分があるので、５×５×３＝７５個の、ブロックとＲＧＢの３成分からなるベクトルデータとなる。
また、特徴量は、色を使わずに明度のみで表現することも可能であるし、公知の動きベクトルを特徴量としても良い。
このように、画像をブロックに分けることによって、各色が画像中のどの位置に現れているかという位置情報が特徴量に含まれることになる。
さらに、各フレームごとにこの特徴量を抽出したので、時間経過とともに、特定の色の移動の様子が特徴量として捉えることができる。例えば、図３において、車の部分を表すブロック（点線で囲まれた６つのブロックａ〜ｆ）が次のフレームで右側に移動していれば、車が右に移動したことが分かる。
このように、ブロックの特徴量として平均色および明度を利用することにより、検知精度を高めることが可能となる。
学習終了判定手段１６は、学習終了の判定を行う。予め指定された時間までのフレームを処理したときに学習の終了としてもよいが、本実施形態１では、学習が終了（収束）したかどうかを自動判定することにする。
このために、画像Ａの特徴量をＡ＝（ａ₁，ａ₂，ａ₃，…，ａ₇₅）、画像Ｂの特徴量をＢ＝（ｂ₁，ｂ₂，ｂ₃，…，ｂ₇₅）と表し、また、あるフレームの画像特徴量をＬ_i、時間区間に属するｎフレームの特徴量を区間動画特徴量Ｌ＝（Ｌ₁，Ｌ₂，Ｌ₃，…，Ｌ_n）で表すことにする。
また、画像Ａと画像Ｂ間の類似度（相違度）Ｄｆ（Ａ，Ｂ）を次の式（１）のような画像の特徴量の差分の絶対値の総和として表すものとする。しかし、他の一般的に知られている公式を用いても構わない。 In the first embodiment, image data of one frame is divided into meshes, and an average color for each RGB component of the divided block is extracted for each divided block (see FIG. 3). For example, as shown in FIG. 3, when the image is divided into 5 × 5 meshes, the feature amount of the image has 3 components of RGB in each block, so 5 × 5 × 3 = 75 blocks and RGB. The vector data consists of three components.
Further, the feature amount can be expressed only by lightness without using a color, and a known motion vector may be used as the feature amount.
As described above, by dividing the image into blocks, position information indicating where each color appears in the image is included in the feature amount.
Furthermore, since this feature amount is extracted for each frame, the state of movement of a specific color can be grasped as the feature amount as time passes. For example, in FIG. 3, if a block representing the vehicle portion (six blocks a to f surrounded by a dotted line) moves to the right side in the next frame, it can be understood that the vehicle has moved to the right side.
As described above, the detection accuracy can be improved by using the average color and the lightness as the feature amount of the block.
The learning end determination means 16 determines the end of learning. Although learning may be ended when frames up to a predetermined time are processed, in the first embodiment, it is automatically determined whether learning is completed (converged).
For this purpose, the feature quantity of the image A is A = (a ₁ , a ₂ , a ₃ ,..., A ₇₅ ), and the feature quantity of the image B is B = (b ₁ , b ₂ , b ₃ _,. ), And an image feature amount of a certain frame is represented by L _i , and a feature amount of n frames belonging to the time interval is represented by section moving image feature amount L = (L ₁ , L ₂ , L ₃ ,..., L _n ). To.
Also, the similarity (difference) Df (A, B) between the image A and the image B is expressed as a sum of absolute values of differences in image feature amounts as in the following equation (1). However, other commonly known formulas may be used.

・・・・（１）
すると、２つの時間区間の時間区間動画特徴量Ｌ＝（Ｌ₁，Ｌ₂，Ｌ₃，…，Ｌ_n）、とＭ＝（Ｍ₁，Ｍ₂，Ｍ₃，…，Ｍ_n）の類似度（相違度）Ｄｍ（Ｌ，Ｍ）は、次の式（２）で表される。

(1)
Then, the time interval moving image feature amount L = (L ₁ , L ₂ , L ₃ ,..., L _n ) of two time intervals and the similarity of M = (M ₁ , M ₂ , M ₃ ,..., M _n ) Degree (difference) Dm (L, M) is expressed by the following equation (2).

・・・・（２）
現在処理対象となっている時間区間動画特徴量と、過去のすべての時間区間動画特徴量との相違度を算出し、これらの相違度の最大値をこの処理対象の時間区間相違度とする。各時間区間と時間区間相違度の関係をグラフに表すと図４の実線Ａのグラフのようになる。これによると、時間区間相違度は初期には増加するが、徐々に増加しなくなる。
また、処理対象の時間区間までの時間区間相違度の最大値をその時間区間の最大相違度とすると同様に図４の点線Ｂのようになり、やはり初期には増加するが、徐々に増加しなくなる。
このような事実から、最大相違度の変化が少なくなれば学習が終了したと仮定することにする。
この学習終了判定手段１６は、次のような処理手順でこの最大相違度を算出して収束判定を行い、判定結果を学習制御手段１１へ戻す。
（１）最大相違度Ｄｍａｘ（＝０）と、過去の時間区間を表わす区間番号ｊ（＝１）を初期化する。
（２）現在処理対象となっている時間区間（ｋ番目とする）の時間区間動画特徴量Ｌと、ｊ番目の時間区間の時間区間動画特徴量Ｍ^jを取り出す。
（３）式（２）より時間区間相違度Ｄｍ（Ｌ，Ｍ^j）を算出する。
（４）最大相違度Ｄｍａｘと時間区間相違度Ｄｍ（Ｌ，Ｍ^j）を比較して、大きい方を最大相違度Ｄｍａｘに設定する。
（５）ｊに１を加えて、ｋ以下であれば、（２）から（５）までを繰り返す。
（６）ｊがｋ以上となった場合、現在処理対象となっている時間区間ｋにおける時間区間の最大相違度をＤｍａｘとする。
（７）この最大相違度Ｄｍａｘと時間区間ｋ以前における最大相違度との差の絶対値が所定値以内であれば、この学習は終了したものとし、この最大相違度を学習特徴量ＤＢ１５へ記録する。
（８）上記（７）の差の絶対値が所定値以上であれば、まだ学習は終了していないものとする。

(2)
The degree of difference between the current time segment moving image feature amount and all past time interval moving image feature amounts is calculated, and the maximum value of these dissimilarities is set as the time interval difference degree of this processing target. The relationship between each time interval and the time interval dissimilarity is represented by a graph as shown by a solid line A in FIG. According to this, although the time interval difference increases in the initial stage, it does not increase gradually.
Further, if the maximum value of the time interval difference up to the time interval to be processed is the maximum difference value of the time interval, it will be as shown by the dotted line B in FIG. 4 and will increase gradually but gradually increase. Disappear.
From these facts, it is assumed that the learning is completed when the change in the maximum difference decreases.
The learning end determination unit 16 calculates the maximum difference by the following processing procedure, performs convergence determination, and returns the determination result to the learning control unit 11.
(1) The maximum dissimilarity Dmax (= 0) and the section number j (= 1) representing the past time section are initialized.
(2) The time interval moving image feature amount L in the time interval (kth) currently being processed and the time interval moving image feature amount M ^j in the j th time interval are extracted.
(3) The time interval difference Dm (L, M ^j ) is calculated from the equation (2).
(4) The maximum dissimilarity Dmax and the time interval dissimilarity Dm (L, M ^j ) are compared, and the larger one is set as the maximum dissimilarity Dmax.
(5) Add 1 to j, and if k or less, repeat steps (2) to (5).
(6) When j is equal to or greater than k, the maximum difference between the time sections in the time section k currently being processed is set to Dmax.
(7) If the absolute value of the difference between the maximum dissimilarity Dmax and the maximum dissimilarity before the time interval k is within a predetermined value, the learning is terminated, and the maximum dissimilarity is recorded in the learning feature amount DB 15. To do.
(8) If the absolute value of the difference in (7) is greater than or equal to a predetermined value, it is assumed that learning has not been completed yet.

次に、図５のフローチャートを用いて、学習処理部１０の処理手順を説明する。
まず、１つの時間区間の動画を読み取り、学習動画ＤＢ１３へフレームＩＤと対応付けて記憶し（ステップ１１）、この学習動画ＤＢ１３に記憶された動画の特徴量を抽出して、フレームＩＤと対応付けて学習特徴量ＤＢ１５へ記憶する（ステップＳ１２）。
次の時間区間の動画を読み取り、学習動画ＤＢ１３へフレームＩＤと対応付けて記憶し（ステップ１３）、この学習動画ＤＢ１３に記憶された動画の特徴量を抽出して、フレームＩＤと対応付けて学習特徴量ＤＢ１５へ記憶する（ステップＳ１４）。
次に、現在読み取った時間区間の時間区間動画特徴量と、これ以前のすべての時間区間の時間区間動画特徴量との時間区間相違度を上記の式（２）で計算し、最大の時間区間相違度を算出する（ステップＳ１５）。
さらに、最大の時間区間相違度と、現在読み取った時間区間以前の最大の時間区間相違度とを比較して、収束していない場合には（ステップＳ１６のＮＯ）、次の時間区間の動画を読み込むためにステップＳ１３へ戻る。
一方、収束している場合には（ステップＳ１６のＹＥＳ）、現在読み取った時間区間の最大の時間区間相違度を学習特徴量ＤＢ１５へ記憶させて、処理を終了する。
このようにして蓄積された学習特徴量ＤＢ１５の特徴量が正常状態を示す特徴量集合となる。
上記の収束判定では、隣の時間区間の最大相違度と比較するだけで収束判定を行っていたが、一定の区間内で最大相違度の増加がないかどうかを比較して学習の収束判定を行うようにしてもよい。 Next, the processing procedure of the learning processing unit 10 will be described using the flowchart of FIG.
First, a moving image of one time interval is read and stored in the learning moving image DB 13 in association with the frame ID (step 11), the moving image feature quantity stored in the learning moving image DB 13 is extracted, and associated with the frame ID. And stored in the learned feature DB 15 (step S12).
The moving image of the next time interval is read and stored in the learning moving image DB 13 in association with the frame ID (step 13), the moving image feature amount stored in the learning moving image DB 13 is extracted, and learning is performed in association with the frame ID. It memorize | stores in feature-value DB15 (step S14).
Next, the time interval difference between the time interval moving image feature amount of the currently read time interval and the time interval moving image feature amount of all the previous time intervals is calculated by the above equation (2), and the maximum time interval is calculated. The degree of difference is calculated (step S15).
Furthermore, when the difference between the maximum time interval difference and the maximum time interval difference before the currently read time interval is not converged (NO in step S16), the video of the next time interval is displayed. The process returns to step S13 for reading.
On the other hand, if it has converged (YES in step S16), the maximum time interval dissimilarity of the currently read time interval is stored in the learning feature DB 15 and the process is terminated.
The feature quantities stored in this way in the learned feature quantity DB 15 become a feature quantity set indicating a normal state.
In the above convergence determination, the convergence determination is performed only by comparing with the maximum dissimilarity of the adjacent time interval, but the learning convergence determination is performed by comparing whether there is no increase in the maximum dissimilarity within a certain interval. You may make it perform.

（２）検知処理部
検知処理部は、異常状態を検知したい動画を入力し、その画像の特徴量と上述した学習特徴量ＤＢ１５とを比較して自動的に異常状態を検知するものである。
図６は、本実施形態に係る状態検知装置のうち検知処理部の機能構成を示すブロック図である。同図において、検知処理部２０は、検知制御手段２１、動画登録手段２２、動画ＤＢ（データベース）２３、特徴量抽出手段２４、特徴量ＤＢ（データベース）２５、検知手段２６、学習特徴量ＤＢ（データベース）１５とから構成される。
検知制御手段２１は、動画登録手段２２、特徴量抽出手段２４および検知手段２６の起動制御を行い、検知手段２６での判定結果（異常または正常）を出力する。また、異常の場合に、異常とされた動画の表示要求に対して、動画を要求元へ送り出す。
動画登録手段２２は、学習処理部１０の動画登録手段１２と同様にして、読み込んだ動画をフレーム単位に動画ＤＢ２３へフレームＩＤに対応付けて記憶させる。この読み込み対象となるフレーム数も学習処理部１０と同様に１時間区間内に読み取られるフレーム数（ｎ）である。
特徴量抽出手段２４は、学習処理部１０の学習特徴量抽出手段１４と同様にして、動画ＤＢ２３に記憶された各フレームに対する特徴量を抽出して、フレームＩＤと対応付けて特徴量ＤＢ２５へ記憶させる。
検知手段２６は、特徴量ＤＢ２５へ記憶した１時間区間の時間区間動画特徴量と、学習特徴量ＤＢ１５に記憶されているすべての時間区間に対する時間区間動画特徴量との相違度を上記式（２）によってそれぞれ算出し、その最大値を区間相違度として算出する。
検知手段２６は、この区間相違度が予め指定された閾値以内であれば、学習された正常状態であると判断し、閾値を超えた場合には正常状態ではない（異常状態）と判断する。この閾値として、学習終了判定時に学習特徴量ＤＢ１５に記憶した最大の時間区間相違度としてもよい。 (2) Detection processing unit The detection processing unit inputs a moving image in which an abnormal state is to be detected, compares the feature amount of the image with the learning feature amount DB 15 described above, and automatically detects the abnormal state.
FIG. 6 is a block diagram illustrating a functional configuration of a detection processing unit in the state detection device according to the present embodiment. In the figure, a detection processing unit 20 includes a detection control unit 21, a moving image registration unit 22, a moving image DB (database) 23, a feature amount extraction unit 24, a feature amount DB (database) 25, a detection unit 26, a learning feature amount DB ( Database) 15.
The detection control unit 21 performs activation control of the moving image registration unit 22, the feature amount extraction unit 24, and the detection unit 26, and outputs a determination result (abnormal or normal) by the detection unit 26. In the case of an abnormality, the moving image is sent to the request source in response to the display request for the moving image that is abnormal.
The moving image registration unit 22 stores the read moving image in the moving image DB 23 in association with the frame ID in the frame unit in the same manner as the moving image registration unit 12 of the learning processing unit 10. The number of frames to be read is also the number of frames (n) read in one hour interval, like the learning processing unit 10.
The feature amount extraction unit 24 extracts the feature amount for each frame stored in the moving image DB 23 in the same manner as the learning feature amount extraction unit 14 of the learning processing unit 10, and stores the feature amount in association with the frame ID in the feature amount DB 25. Let
The detecting means 26 calculates the difference between the time interval moving image feature amount of one hour interval stored in the feature amount DB 25 and the time interval moving image feature amount for all the time intervals stored in the learning feature amount DB 15 by the above formula (2). ), And the maximum value is calculated as the interval difference.
The detection means 26 determines that the learned state is normal when the interval difference is within a predetermined threshold value, and determines that it is not normal (abnormal state) when the threshold value is exceeded. The threshold may be the maximum time interval difference stored in the learning feature DB 15 at the time of learning end determination.

次に、図７のフローチャートを用いて、検知処理部２０の処理手順を説明する。
まず、１時間区間内に読み取られるフレーム数（ｎ）だけ、動画を読み込みフレーム単位で動画ＤＢ２３へフレームＩＤに対応付けて記憶させ（ステップＳ２１）、これらのフレーム画像から特徴量を抽出して、フレームＩＤと対応付けて特徴量ＤＢ２５へ記憶させる（ステップＳ２２）。
次に、特徴量ＤＢ２５へ記憶した１時間区間の時間区間動画特徴量と、学習特徴量ＤＢ１５に記憶されているすべての時間区間に対する時間区間動画特徴量との相違度を上記式（２）によってそれぞれ算出し、その最大値を区間相違度として算出し（ステップＳ２３）、この区間相違度が予め指定された閾値以内であれば（ステップＳ２４のＮＯ）、学習された正常状態であると判断し（ステップＳ２６）、閾値を超えた場合には（ステップＳ２４のＹＥＳ）正常状態ではない（異常状態）と判断する（ステップＳ２５）。 Next, the processing procedure of the detection processing unit 20 will be described using the flowchart of FIG.
First, as many frames (n) as are read in one hour interval, moving images are read and stored in the moving image DB 23 in association with frame IDs in units of frames (step S21), and feature amounts are extracted from these frame images, It is stored in the feature value DB 25 in association with the frame ID (step S22).
Next, the degree of difference between the time interval moving image feature amount of the one hour interval stored in the feature amount DB 25 and the time interval moving image feature amount for all the time intervals stored in the learning feature amount DB 15 is expressed by the above equation (2). Each is calculated, and the maximum value is calculated as a section dissimilarity (step S23). If the section dissimilarity is within a predetermined threshold value (NO in step S24), it is determined that the learning is in a normal state. (Step S26) When the threshold value is exceeded (YES in Step S24), it is determined that the state is not normal (abnormal state) (Step S25).

＜実施形態２＞
本発明の状態検知装置は、状態監視システムとして利用することもできる。この状態監視システムとしては、例えば、商店への防犯カメラとしての設置、また、オフィスへ設置して盗難の検知や火の色による火災の検知などに利用することができる。また、同植物の観察などにも利用できる。
このような状態監視システムとして本発明の状態検知装置を利用する場合には、撮影装置と異常を知らせるための異常処理装置が必要になってくる。
異常状態の時に処理する内容を異常時処理として予め設定しておき、状態検知装置で異常状態だと判断された場合に、その異常時処理を異常処理装置で実行する。
例えば、異常処理装置では、異常状態と検知された区間の動画に異常状態を示すフラグをセットし、異常状態であることをメールなどでユーザに知らせる。
知らせを受けたユーザは、異常状態を示すフラグを基に異常時の動画を動画ＤＢ２３から瞬時に閲覧することができる。
もちろん、異常処理装置では、異常状態をユーザの表示手段へ表示したり、アラーム等を鳴らしたり、消化装置やシャッターを下ろすような処理を行うようにしてもよい。
＜実施形態３＞
さらに、実施形態１では単一のコンピュータに実装されているとしたが、撮影装置（例えば、ネットワークカメラ等）とその撮影装置からの動画を送出するコンピュータをクライアントとし、サーバ側のコンピュータで検知処理を行うようにしてもよい。実施形態２のような場合には、異常処理装置も異常の内容に応じて設置場所が決定される。
このように構成すると、クライアント／サーバの形態を利用することによって、リモート上の撮影装置（カメラ等）からのデータを処理することが可能となる。 <Embodiment 2>
The state detection device of the present invention can also be used as a state monitoring system. As this state monitoring system, for example, it can be used as a security camera in a store, or installed in an office to detect theft or to detect a fire by the color of fire. It can also be used to observe the plant.
When the state detection device of the present invention is used as such a state monitoring system, an imaging device and an abnormality processing device for notifying abnormality are required.
The contents to be processed in the abnormal state are set in advance as abnormal time processing, and when the state detection device determines that the abnormal state is detected, the abnormal time processing is executed by the abnormal processing device.
For example, in the abnormality processing device, a flag indicating an abnormal state is set in the moving image of the section detected as an abnormal state, and the user is notified of the abnormal state by e-mail or the like.
The user who has received the notification can instantly view the abnormal moving image from the moving image DB 23 based on the flag indicating the abnormal state.
Of course, the abnormality processing device may perform processing such as displaying an abnormal state on the display means of the user, sounding an alarm or the like, or lowering the digester or the shutter.
<Embodiment 3>
Furthermore, although the first embodiment is implemented in a single computer, the image processing apparatus (for example, a network camera) and a computer that transmits a moving image from the image capture apparatus are used as clients, and the server side computer performs detection processing. May be performed. In the case of the second embodiment, the installation location of the abnormality processing apparatus is determined according to the content of the abnormality.
With this configuration, it is possible to process data from a remote photographing apparatus (camera or the like) by using a client / server configuration.

＜実施形態４＞
さらに、本発明は、上述した実施形態のみに限定されたものではなく、上述した実施形態の各機能をそれぞれプログラム化し、あらかじめＣＤ−ＲＯＭ等の記録媒体に書き込んでおき、コンピュータに搭載したＣＤ−ＲＯＭドライブのような媒体駆動装置にこのＣＤ−ＲＯＭ等を装着し、これらのプログラムをインストールして、実行することによっても、本発明の目的が達成されることは言うまでもない。この場合、記録媒体から読出されたプログラム自体が上述した実施形態を実現することになり、そのプログラムおよびそのプログラムを記録した記録媒体も本発明を構成することになる。
なお、記録媒体としては半導体媒体（例えば、ＲＯＭ、不揮発性メモリカード等）、光媒体（例えば、ＤＶＤ、ＭＯ、ＭＤ、ＣＤ−Ｒ等）、磁気媒体（例えば、磁気テープ、フレキシブルディスク等）のいずれであってもよい。あるいは、インターネット等の通信網を介して記憶装置に格納されたプログラムをサーバコンピュータから直接供給を受けるようにしてもよい。この場合、このサーバコンピュータの記憶装置も本発明の記録媒体に含まれる。
また、ロードしたプログラムを実行することにより上述した実施形態が実現されるだけでなく、そのプログラムの指示に基づき、オペレーティングシステム等が実際の処理の一部または全部を行い、その処理によって上述した実施形態が実現される場合も含まれる。
したがって、上述した実施形態の機能を実行するプログラムやそのプログラムを記録した記録媒体を流通させ、そのプログラムをコンピュータの内部記憶装置または外部記憶装置にインストールし、そのインストールされたプログラムを実行することによって、上述した実施形態の機能が実現されるので、コスト、可搬性、汎用性を向上させることができる。 <Embodiment 4>
Further, the present invention is not limited only to the above-described embodiment, and each function of the above-described embodiment is programmed, written in a recording medium such as a CD-ROM in advance, and loaded on a computer. It goes without saying that the object of the present invention can also be achieved by mounting the CD-ROM or the like on a medium driving device such as a ROM drive, and installing and executing these programs. In this case, the program read from the recording medium itself realizes the above-described embodiment, and the program and the recording medium on which the program is recorded also constitute the present invention.
As a recording medium, a semiconductor medium (for example, ROM, nonvolatile memory card, etc.), an optical medium (for example, DVD, MO, MD, CD-R, etc.), a magnetic medium (for example, magnetic tape, flexible disk, etc.) Either may be sufficient. Alternatively, the program stored in the storage device may be directly supplied from the server computer via a communication network such as the Internet. In this case, the storage device of this server computer is also included in the recording medium of the present invention.
Further, not only the above-described embodiment is realized by executing the loaded program, but the operating system or the like performs part or all of the actual processing based on the instruction of the program, and the above-described embodiment is performed by the processing. The case where the form is realized is also included.
Accordingly, by distributing a program for executing the functions of the above-described embodiment and a recording medium storing the program, installing the program in an internal storage device or an external storage device of the computer, and executing the installed program Since the functions of the above-described embodiment are realized, cost, portability, and versatility can be improved.

本実施形態に係る状態検知装置のうち学習処理部の機能構成を示すブロック図。The block diagram which shows the function structure of a learning process part among the state detection apparatuses which concern on this embodiment. 時間区間を説明するための図。The figure for demonstrating a time interval. フレーム画像のメッシュ分割とそのブロックにおける特徴量とを説明する図。The figure explaining the mesh division | segmentation of a frame image, and the feature-value in the block. 時間区間と区間相違度および最大相違度との関係を示すグラフ図。The graph which shows the relationship between a time interval, an area difference degree, and a maximum difference degree. 本実施形態に係る状態検知装置のうち学習処理部の処理手順を示すフローチャート。The flowchart which shows the process sequence of a learning process part among the state detection apparatuses which concern on this embodiment. 本実施形態に係る状態検知装置のうち検知処理部の機能構成を示すブロック図。The block diagram which shows the function structure of a detection process part among the state detection apparatuses which concern on this embodiment. 本実施形態に係る状態検知装置のうち検知処理部の処理手順を示すフローチャート。The flowchart which shows the process sequence of a detection process part among the state detection apparatuses which concern on this embodiment.

Explanation of symbols

１０学習処理部、１１学習制御手段、１２学習動画登録手段、１３学習動画ＤＢ、１４学習特徴量抽出手段、１５学習特徴量ＤＢ、１６学習終了判定手段、２０検知処理部、２１検知制御手段、２２動画登録手段、２３動画ＤＢ、２４特徴量抽出手段、２５特徴量ＤＢ、２６検知手段
DESCRIPTION OF SYMBOLS 10 Learning process part, 11 Learning control means, 12 Learning moving image registration means, 13 Learning moving image DB, 14 Learning feature-value extraction means, 15 Learning feature-value DB, 16 Learning end determination means, 20 Detection processing part, 21 Detection control means, 22 moving image registration means, 23 moving image DB, 24 feature amount extraction means, 25 feature amount DB, 26 detection means

Claims

A learning feature quantity extractor that extracts a section moving image feature amount of each time interval from a series of still images obtained by photographing a normal state with a photographing device and accumulates the section moving image feature amount in a learning feature amount database as a normal state feature amount. stage and the feature value extraction means for extracting the section video feature quantity of time intervals for a series of still image input from the imaging device, the learning feature quantity and extracted segment moving feature amount by the feature extraction means by comparing the characteristic quantity of the normal state stored in the database, and a automatic determining detect hand stage whether the input image is normal state,
When the learning feature value extracting unit accumulates the section moving image feature values in the normal state, the section moving image feature values in one or a plurality of time intervals that are currently processed, and each section in all the time intervals that have already been extracted The video feature amount is compared with each other to calculate the time interval difference, the maximum time interval difference that is the maximum value of the calculated time interval difference is obtained, and the maximum time interval difference and the current processing target and turned when the difference between the maximum time interval dissimilarity determined is within the predetermined value for the times time interval excluding the interval, state detection apparatus characterized that you finished the learning process.

The state detection device according to claim 1, wherein the time interval dissimilarity includes a feature amount of each still image included in one of the time interval feature amounts and a still image included in the other time interval feature amount. A state detection device characterized by taking a difference from a feature amount and summing up all still images .

The state detection apparatus according to claim 1, wherein each section from which the section moving image feature value is extracted is overlapped.

2. The state detection apparatus according to claim 1, wherein the feature amount of a single still image is an average color or brightness extracted from each block obtained by dividing a single still image into blocks. apparatus.

2. The state detection device according to claim 1, wherein the section moving image feature amount is a time series of all feature amounts extracted from all still images in a predetermined time section. .

2. The state detection device according to claim 1, wherein moving image data captured by a photographing device arranged on a client side is transmitted to a server connected via a network, and received from the photographing device on the server side. A state detection device characterized in that learning processing or detection processing is executed.

Extracting the segment video feature amount of each time interval from a series of still images captured by the imaging device in the normal state, storing the extracted segment video feature amount in the learning feature amount database as the normal state feature amount, By comparing the segment moving image feature amount of each time interval extracted from a series of still images input from the photographing device in the same manner with the normal state feature amount accumulated in the learning feature amount database, the input image is normal. A state detection method for automatically determining whether a state is present,
When accumulating the segment video feature values in the normal state, the segment video feature values in one or more time segments that are currently processed are compared with the segment video feature values in all the time segments that have already been extracted. Calculating a time interval dissimilarity, obtaining a maximum time interval dissimilarity which is the maximum value of the calculated time interval dissimilarity, and excluding the maximum time interval dissimilarity and the time interval currently being processed. A state detection method characterized in that the learning process is terminated when the difference from the maximum time interval difference obtained for a given time interval is within a predetermined value .

The program for making a computer perform the function of the state detection apparatus as described in any one of Claims 1 thru | or 6.