JP4064604B2

JP4064604B2 - Image processing method and apparatus

Info

Publication number: JP4064604B2
Application number: JP2000200302A
Authority: JP
Inventors: 貴章澤田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2000-06-30
Filing date: 2000-06-30
Publication date: 2008-03-19
Anticipated expiration: 2020-06-30
Also published as: JP2002027362A

Description

【０００１】
【発明の属する技術分野】
本発明は、フレーム間予測を用いた動画像圧縮方式により符号化された動画像データを固定レート或いは可変レートにより送出する画像処理方法及び装置に係り、特に早送りや逆早送り（巻き戻し）再生などのトリックプレイを実現するのに好適な画像処理方法及び装置に関する。
【０００２】
【従来の技術】
現在、フレーム間予測を用いた動画像圧縮方式により符号化を行う技術の主流となっているものにＭＰＥＧ（Moving Picture Experts Group）国際標準規格がある。通常、ＭＰＥＧによる動画像処理システムでは、アナログＶＴＲテープ等に記録されている動画像、またはビデオカメラ等を使用したリアルタイムなアナログの動画像を、専用のエンコーダにより符号化（エンコード）し、動画像ファイルとして外部記憶装置へ格納する。そして、格納した動画像ファイルは専用のデコーダにより復号化を行い、再生する。ファイルからデコーダへのデータ送出は、固定レートまたは可変レートの２通りが存在する。従来、このＭＰＥＧのシステムにおいて、アナログＶＴＲのような早送り／逆早送り再生（以下トリックプレイと称する）を実現するために、トリックプレイ専用の動画像データを作成するという方法がある。
【０００３】
ＭＰＥＧで圧縮された動画像において、我々は、トリックプレイが必ずしも高画質である必要がないという観点から、倍速率と再生時のビットレート（再生レート）を調整できるような、トリックプレイ用動画像データの作成方式を提案してきた（特願平９−１４７８４６号）。この方式は、ある特定の倍速率と再生レートを持つトリックプレイ用動画像データを容易に作成することができるという優れた特徴を有している。しかしながら、この方式には、次のような事象が発生した場合に、トリックプレイ用動画像データの作成が不可能となる問題が存在する。
【０００４】
上記方式では、通常再生用の動画像データから取り出したフレーム内符号化画像（Ｉピクチャ）情報を、指定の再生レートを満たすように計算されたサイズ値以下となるまで削減している。この際、可能な限り画像情報を削減しても目標のサイズ値以下とならない場合がある。ところが、上記方式では、このような事象の発生については考慮されていない。このため、画像情報が目標のサイズ値以下とならないような事象が発生した場合、指定の再生レートでのトリックプレイ用動画像データの作成が不可能となる。
【０００５】
【発明が解決しようとする課題】
上述したように、フレーム間予測を用いた動画像圧縮方式により符号化を行う技術において、現在提案されているトリックプレイ用の動画像データ作成方式では、フレーム内符号化画像データを可能な限り削減しても、目標とするサイズ値以下にならない場合には、動画像データの作成が不可能になるという問題があった。
【０００６】
本発明は上記事情を考慮してなされたものでその目的は、フレーム間予測を用いた動画像圧縮方式により符号化された通常再生の動画像データを用いてトリックプレイ用動画像データを作成するのに、トリックプレイ再生時の倍速率や再生レートを柔軟に設定可能で、且つ作成不可能になるケースを減らすことができる画像処理方法及び装置を提供することにある。
【０００７】
【課題を解決するための手段】
本発明は、フレーム間予測を用いて動画像圧縮を行うことにより符号化された動画像データを固定レート或いは可変レートにより送出する画像処理方法において、上記動画像データの先頭または末尾のうち再生方向で決まる側から順にフレーム内符号化画像データを抽出するステップと、この抽出したフレーム内符号化画像データが指定された再生時のビットレートを満足する目標サイズ以下になるように当該フレーム内符号化画像データから所定の情報を削減するステップと、上記情報を削減したフレーム内符号化画像データのうち目標サイズ以下になったもののみを抽出するステップと、上記抽出した目標サイズ以下のフレーム内符号化画像データが指定された再生時のビットレートを満足するように、パディング符号を当該フレーム内符号化画像データに挿入するステップと、このパディング符号が挿入された動画像データのヘッダに対し、指定された再生時のビットレートを反映したビットレートを当該動画像データのヘッダに対し設定するステップと、上記パディング符号が挿入された動画像データを再生する際に再生のスタート及びランダムアクセスが適切に行われるようにするためのバッファ制御情報を設定するステップとを備えたことを特徴とする。ここで、フレーム内符号化画像データから削減する所定の情報として、ＤＣＴ（Ｄiscrete Ｃosine Ｔransform）係数、好ましくはＤＣＴ係数のうちのＡＣ成分、特に画質を左右する高周波成分（ＡＣ成分）の係数を対象とするとよい。
【０００８】
このように本発明の特徴は、動画像データ（一般には通常再生用動画像データ）より抽出したフレーム内符号化画像データ（Ｉピクチャ）のうち、所定の情報（例えばＤＣＴ係数）を削減しても目標とするサイズ（目標サイズ）以下にならないＩピクチャはトリックプレイでは表示せず間引いてしまうという手法を適用した点にある。これにより、トリックプレイ時に表示されるピクチャ数が減る場合には倍速率に若干揺らぎが生じるものの、より低い再生レートのままでトリックプレイが実現できる。しかも、倍速率の若干の揺らぎは、高速のトリックプレイでは視覚的にあまり気になるものではなく、むしろ、ネットワーク上のストリームとして動画像データを流すことを考えた場合に低い再生レートでトリックプレイ用動画像データを送出できるというメリットが大きい。
【０００９】
ここで、Ｉピクチャのみを抽出したことにより、ピクチャ数は減少し、ＧＯＰ（Group of Pictures）内フレーム数も変化する。そこで、動き予測に関するデータを持たないフレーム間順方向予測画像データ（Ｐピクチャ）をＩピクチャ間に挿入するとよい。このＩピクチャ間に挿入するＰピクチャ（特殊Ｐピクチャ）の数は、好ましくは、指定倍速率をＮ、元の動画像のＧＯＰ内フレーム数をＭ（Ｎ，Ｍは整数、Ｎ≦Ｍ）とすると、（Ｍ／Ｎ）−１にするとよい。
【００１０】
また本発明は、情報削減によっても目標サイズ以下にならなかったＩピクチャを間引くだけでなく、その代わりに、動画像データ内の当該Ｉピクチャの近傍にあるＰピクチャの中から、目標サイズ以下になるものを抽出して、当該Ｉピクチャに代えて動画像データに挿入するようにし、これに伴い目標サイズ以下のＩピクチャだけでなく、このＰピクチャについても、パディング符号を挿入するようにしたことを特徴とする。ここで、該当するＰピクチャが存在しない場合、特殊Ｐピクチャを用いるとよい。
【００１１】
これにより、指定された再生時のビットレートや倍速率での条件を満たせないようなＩピクチャがある場合に、それを間引くだけでなく、その代用として間引いたＩピクチャと画像的に近い可能性のあるＰピクチャを挿入することで、倍速率の揺れを減らし、より視覚的な違和感（飛び飛び感）を抑えることができる。また低い再生レートのトリックプレイ用動画像データが作成できる。
【００１２】
また本発明は、情報削減によっても目標サイズ以下にならなかったＩピクチャを間引かずに、その代わりに、動画像データ内の当該Ｉピクチャの近傍にあるＰピクチャの中から、当該Ｉピクチャのサイズが目標サイズを超えた分を補うサイズのものを抽出して、当該Ｉピクチャの直後に挿入するようにし、これに伴い目標サイズ以下のＩピクチャだけでなく、このＰピクチャについても、パディング符号を挿入するようにしたことを特徴とする。ここで、上記目標サイズを超えた分を補うサイズの上限を、目標サイズの２倍のサイズから目標サイズ以下とならなかったＩピクチャのサイズを減じた値とするとよい。また、該当するＰピクチャが存在しない場合、特殊Ｐピクチャを用いるとよい。
【００１３】
これにより、指定された再生時のビットレートや倍速率での条件を満たせないようなＩピクチャがある場合に、それを間引く代わりに、当該Ｉピクチャがオーバーしたサイズを補う（補償できる）Ｐピクチャを当該Ｉピクチャの直後に挿入することで、倍速率は若干揺れる場合があるものの、視覚的違和感（飛び飛びの感覚）は解消され、より滑らかで、低再生レートのトリックプレイ用動画像データが作成できる。
【００１４】
【発明の実施の形態】
以下、本発明の実施の形態につき図面を参照して説明する。
【００１５】
［第１の実施形態］
まず、本発明の第１の実施形態に係る画像処理装置で適用される画像データ、即ちフレーム間予測を用いて動画像圧縮をすることにより符号化された動画像データを得る画像圧縮方式の原理を、ＭＰＥＧ２による画像圧縮を例に、図２を参照して説明する。
【００１６】
ＭＰＥＧ２による画像圧縮では、画像データに対しＤＣＴ（Discrete Cosine Transform、離散コサイン変換）を施し、量子化を行う。即ち、図２に示すように、ＭＰＥＧ２に準拠したエンコーダ（符号器）２００への入力画像（原画像）２１０は、まず８×８画素のブロックに分割される。このブロック単位にエンコーダ２００内のＤＣＴ回路２０１によりＤＣＴ演算を行い、得られたＤＣＴ係数をＤＣ成分（直流成分）及びＡＣ成分（交流成分）で独立して量子化回路２０２により量子化する。量子化に用いる量子化テーブル２２０は輝度信号用量子化テーブルと色差信号用量子化テーブルとで構成される。量子化したＤＣＴ係数のうち、ＤＣ係数は、直前のブロックのＤＣ係数を予測値とした差分値をエントロピー符号化回路２０３により符号化する。残りのＡＣ成分は、ブロック内でジグザグスキャンによって並び替えた後、回路２０３により符号化する。上記ＤＣＴにより変換前にランダムに分布していた画素値（例えば輝度）が、ＤＣＴ変換後は低周波項に大きな値が集中する。したがって、高周波項を落とす（取り除く）操作をすれば画像データを圧縮することができる。以上が、ＭＰＥＧ２による画像圧縮の原理である。
【００１７】
さて、図２のエンコーダ２００を用いてＭＰＥＧ２による画像圧縮を行うことで得られる符号化された動画像データ、つまりフレーム間予測を用いて動画像圧縮をすることにより符号化された動画像データを再生する形態の１つに、従来の技術の欄でも述べたようにトリックプレイ（早送り／逆早送り再生）がある。
【００１８】
本実施形態に係る画像処理装置は、このＭＰＥＧ２の動画像再生において、トリックプレイ時の画質をどの程度重視すべきかという点に着目してなされている。
【００１９】
高速なトリックプレイは、主に、映像の特定のシーンを検索し、必要な部分へ素早くジャンプしたい場合や映像の内容を早く理解したい場合等で利用することが多い。このようなときトリックプレイの映像は必ずしも通常再生時と同レベルの画質でなく、少々画質を落としても特に支障はないと考えられる。このような観点から、１つの方策として、特願平９−１４７８４６号にて倍速率と再生時のビットレート（再生レート）を調整できるような、トリックプレイ用動画像データの作成方式を提案した。
【００２０】
この方式では、ＭＰＥＧの動画像データよりフレーム内符号化画像（Ｉピクチャ）のみを抽出し、抽出したＩピクチャのマクロブロック内のＤＣＴ係数のうち、画質を左右する高周波成分の係数を削減し、各Ｉピクチャのサイズを小さくすることで、再生レートを低く抑えることを可能とした。
【００２１】
また、ＩピクチャとＩピクチャの間に（動き予測に関するデータを持たない）特別なフレーム間順方向予測画像（Ｐピクチャ）を挿入し、その挿入数で倍速率を可変にしたり、Ｉピクチャの削減するＤＣＴ係数の数を調節することで、指定の再生レートとなるようなトリックプレイ用動画像を作成することを可能とした。ここで、特殊Ｐピクチャの挿入数の計算式は、指定倍速率がＮ、元の動画像のＧＯＰ（Group of Pictures）内フレーム数がＭ（Ｎ，Ｍは整数、Ｎ≦Ｍ）であるものとすると、次式（１）
Ｐピクチャ挿入数＝（Ｍ／Ｎ）−１ …（１）
により表される。
【００２２】
しかしながら、この方法では、指定の再生レート（ビットレート）Ｒの値によっては、Ｉピクチャ内のＤＣＴ係数を可能な限り削減しても、その再生レートＲとなる条件を満たすようなピクチャサイズ（Ｉピクチャサイズ）ＴＳにまで小さくできない場合が生じる。このＩピクチャサイズＴＳ、即ち指定の再生レートＲを実現するための条件となるＩピクチャサイズ（目標ピクチャサイズ）ＴＳの計算式は、次式（２）
ＩピクチャのサイズＴＳ
＝Ｒ×(Ｐ_num＋１)／Ｆrame−（Ｐ_sz×Ｐ_num＋Ｈdr_sz）…（２）
Ｒ：指定再生レート（ｂｐｓ）
Ｐ_sz ：Ｐピクチャのサイズ（ビット）
Ｈdr_sz ：ヘッダのサイズ（ビット）（シーケンスヘッダやＧＯＰヘッダがあるときのみ）
Ｐ_num ：挿入するＰピクチャ数（倍速率を決める）
Ｆrame ：１秒あたりの表示ピクチャ数
で表される。
【００２３】
もし、上記式（２）で示されるＩピクチャサイズＴＳにまで小さくできない場合には、指定の再生レートＲでのトリックプレイ用動画像データの作成は事実上不可能となる。元の動画像データにもよるが、通常、倍速率を上げ、再生レートを低くしようとすると、この問題が起こる可能性が高くなる。
【００２４】
そこで本実施形態では、上記問題に関して、次に説明する画像処理装置を用いて解決するようにしている。
【００２５】
図１は、本発明の第１の実施形態に係る画像処理装置の構成を示すブロック図である。
図１の画像処理装置１００は、抽出部１０１、フレーム内符号化情報削除部１０２、フレーム再構成部１０３、及び動画像属性情報設定部１０４から構成される。
【００２６】
抽出部１０１は、図２に示したようなＭＰＥＧ２準拠のエンコーダ（２００）によりエンコードされた通常再生用動画像データ１１０を入力し、当該動画像データ１１０からフレーム内符号化画像（Ｉピクチャ）を抽出する。ここで、通常再生用動画像データのエンコード形式は固定レート（ＣＢＲ）、可変レート（ＶＢＲ）のどちらでもよい。
フレーム内符号化情報削除部１０２は、抽出部１０１により抽出された各Ｉピクチャのサイズが再生レートＲとなる条件を満足するようなサイズになるように当該Ｉピクチャから情報を削減する。
【００２７】
フレーム再構成部１０３は、フレーム内符号化情報削除部１０２による情報削減の結果、再生レートＲとなる条件を満足するようなサイズ以下となったＩピクチャに対し、当該条件を満足するサイズになるようにパディング符号を挿入するパディング部１０３ａと、Ｉピクチャ間に動き予測に関するデータを持たない特別なフレーム間順方向予測画像（Ｐピクチャ）を挿入するフレーム間順方向予測画像挿入部１０３ｂから構成されている。
【００２８】
動画像属性情報設定部１０４は、フレーム再構成部１０３によりパディング符号が挿入されたＩピクチャを含む動画像データをデコーダにより再生する際に、再生のスタート及びランダムアクセスが適切に行われるようにするための動画像属性情報を設定することで、トリックプレイ用（早送りまたは逆早送り再生用）動画像データ１２０を作成する。
【００２９】
動画像属性情報設定部１０４は、再生レートを算出して動画像データのヘッダ（シーケンスヘッダ）に設定する再生レート設定部１０４ａと、各ピクチャ毎にバッファ制御情報としての後述するvbv-delay（vbv:video buffering verifier）の値を計算して対応するピクチャ層のヘッダに設定するバッファ制御情報設定部１０４ｂとから構成されている。
【００３０】
次に、以上のように構成された画像処理装置の動作を説明する。なお、トリックプレイ用動画像の再生レート及び倍速率は外部からパラメータとして指定される（与えられる）ものとする。
【００３１】
まず、画像処理装置１００内の抽出部１０１は、通常再生用動画像データ１１０からフレーム内符号化画像（Ｉピクチャ）を抽出する。このＩピクチャの抽出方向は、早送り再生用または逆早送り再生用のいずれのトリックプレイ用動画像データ１２０を作成するかで異なる。抽出部１０１は、早送り再生用ならば動画像データの先頭側から抽出を開始し、逆早送り再生用ならば動画像データの末尾側から抽出を開始する。
【００３２】
フレーム内符号化情報削除部１０２は、抽出部１０１により抽出された各Ｉピクチャを対象に当該Ｉピクチャの情報を削減することで、当該Ｉピクチャのサイズを小さくする。Ｉピクチャの情報の削減は、ＭＰＥＧ２の例では、前述したＤＣＴ演算を行った際に得られるＤＣＴ係数を削減することにより実現される。
【００３３】
ここで、フレーム内符号化情報削除部１０２によるＤＣＴ係数削減方法について説明する。
ＭＰＥＧ２では、ＤＣＴ係数はマクロブロック毎に存在する。そして、この係数は、ＤＣ成分とＡＣ成分とに分けられ、それぞれ独立して量子化される。削除対象とするのはＡＣ成分である。ここではＡＣ成分のうち低周波に相当する係数の一部を残し、高周波に相当する係数は削除する。削除のアルゴリズムについては、各種考えられる。例えば、ピクチャを構成する全マクロブロックから均等に削除するとか、或いはマクロブロック毎に削除する数を変えるなどが適用可能である。また、ヒューリスティックな方法を取ることも可能である。いずれにしても、ＡＣ成分のうち高周波成分を削除する方法であれば、どのようなアルゴリズムを用いても構わない。
【００３４】
さて、フレーム内符号化情報削除部１０２は、上記したＩピクチャの情報の削減処理を、前記式（２）により求められるＩピクチャサイズＴＳ以内になるまで実施する。もし情報削減に失敗した場合には、その旨をフレーム再構成部１０３に通知する。
【００３５】
これによりフレーム再構成部１０３内のパディング部１０３ａは、フレーム内符号化情報削除部１０２により削減に成功したＩピクチャのみ、即ちＩピクチャサイズＴＳ以内になるまで情報が削減されたＩピクチャのみを対象に、次式（３）
パディングするサイズ＝ＴＳ−Ｉ’…（３）
ＴＳ：式（２）で求めたＩピクチャサイズ
Ｉ’：削減後のＩピクチャサイズ
に従って、パディングを施す。即ち式（３）で算出されるサイズ分のパディング符号を挿入する。
【００３６】
このように、ＩピクチャサイズＴＳ以内になるまで情報が削減されたＩピクチャに対してパディング符号を挿入する理由は次の通りである。まず、Ｉピクチャのサイズはもともと固定ではなく、しかもサイズＴＳ以内になるまで情報が削減されているので、そのままでは、動画像データを指定のビットレート（再生レート）で再生する際にデコード処理量にばらつきが生じ、デコーダ（画像出力）側のバッファのオーバーフロー及びアンダーフローが起こりやすい状態となる。そこで、指定のビットレートで再生した場合に正常なデコード動作が行われるように、Ｉピクチャのサイズを前記式（２）で算出されるサイズＴＳに合わせ、パケットデータ長を一定にするために、パディング符号を挿入する。この理由は、前記特願平９−１４７８４６号の願書に最初に添付された明細書の詳細な説明の欄の段落００２６〜００２９に記載されている理由と同様である。
【００３７】
さて、パディング部１０３ａによるＩピクチャに対するパディング符号の挿入（パディング処理）は、次のように行われる。
まず、前記式（３）で算出されるサイズ（ＴＳ−Ｉ’）のパディング符号は、一種のダミーデータ（例えば「０」データ）である。このパディング符号（ダミーデータ）がサイズＩ’（Ｉ’＜ＴＳ）の（情報削減後の）Ｉピクチャに挿入される（埋め込まれる）。図３はＭＰＥＧ２画像データの階層構成を示す。同図に示すようにＭＰＥＧ２画像データはシーケンス層、ＧＯＰ層、ピクチャ層、スライス層、マクロブロック層、及びブロック層から構成される。ＭＰＥＧ２のＥＳ(Elementary Stream)の仕様では、スライス層の始まりを示す開始コードの前に任意の数の０の挿入が許されている。そこで、Ｉピクチャのこの部分に、前記式（３）で算出されるサイズ（ＴＳ−Ｉ’）のパディング符号、即ち指定されたビットレートに適合したパディング符号ＰＡＤを埋め込む。このパディング符号ＰＡＤが埋め込まれた動画像データのシーケンスを図４に示す。
【００３８】
さて、パディング部１０３ａによるパディングが施された動画像データは、早送りまたは逆早送りに係わるトリックプレイ用の動画像データのフレーム構造を有する。この動画像データは、通常再生用動画像データ１１０から抽出されたＩピクチャのうち、フレーム内符号化情報削除部１０２によってサイズがＴＳ以下まで削減されたＩピクチャのみを対象にパディング処理を施し、サイズを一定に揃えることによりデコード処理量が一定化することを狙ったものである。
【００３９】
しかし、これだけでは必ずしも十分でない。即ち、通常再生用動画像データ１１０からＩピクチャのみを抽出したことにより、ピクチャ数は減少し、ＧＯＰ（ＩピクチャとＩピクチャとの間）も変化しているためである。そこでフレーム再構成部１０３内のフレーム間順方向予測画像挿入部１０３ｂは、動き予測に関するデータを持たないＰピクチャ（特殊Ｐピクチャ）をＩピクチャ間に挿入する。この特殊Ｐピクチャは動き予測に関するデータを何ら有さないものなので、Ｐピクチャの再生画像としては例えば直前のＩピクチャと同じものが用いられる。挿入する特殊Ｐピクチャの数は、指定の倍速率Ｎに応じて決定される。挿入する特殊Ｐピクチャの数を変えることで、Ｉピクチャの処理される時間間隔が変わり、倍速率を変えることができる。
【００４０】
ここで、倍速率（トリックプレイの倍率）Ｎと挿入するＰピクチャの数の関係について以下に述べる。
上述したように、ＭＰＥＧ２では、Ｉピクチャと次のＩピクチャとの間のピクチャ数、つまりＧＯＰのサイズが固定であれば、そのサイズの比率がそのままトリックプレイの倍率となる。このため、ストリームＧＯＰの大きさ（ＧＯＰ内フレーム数）をＭとすると、倍速率Ｎ、即ちＮ倍速の早送り動画像データを作成するために挿入するＰピクチャ数は、前記式（１）により（Ｍ／Ｎ）−１で求められる。
【００４１】
フレーム間順方向予測画像挿入部１０３ｂは、Ｉピクチャ間に、前記式（１）で算出される数の特殊Ｐピクチャを挿入すると、その挿入後の動画像データを動画像属性情報設定部１０４に渡す。
【００４２】
動画像属性情報設定部１０４は、Ｐピクチャ挿入後の動画像データに対し、デコーダ側のバッファ内のデータ量を安定させるために、再生レート（ｂｐｓ）を次式（４）
再生レート＝(Ｉ_sz＋Ｐ_sz×Ｐ_num＋Ｈdr_sz)
÷(Ｐ_num＋１)／Ｆrame …（４）
Ｉ_sz：Ｉピクチャのサイズ（ビット）
Ｐ_sz：Ｐピクチャのサイズ（ビット）
Ｐ_num ：挿入するＰピクチャ数
Ｈdr_sz ：ヘッダのサイズ（ビット）
Ｆrame ：１秒あたりの表示ピクチャ数
により計算する。この式（４）に従って算出される再生レートは、指定された再生レートを反映したものであるが、必ずしも当該指定再生レートに一致するとは限らず、Ｐピクチャ挿入を含むサイズ調整がなされた後の動画像データに適したものとなっている。
【００４３】
そこで動画像属性情報設定部１０４は、式（４）に従って算出した再生レートを指定の再生レートに代えて用い、当該算出した再生レートを動画像データのシーケンスヘッダＳＨ（図４参照）に再設定する。これによりシーケンスヘッダＳＨ中の再生レート（通常再生用の再生レート）が、上記算出した再生レートに更新される。このように再生レートは、フレーム再構成後に正常にデコード（復号）処理ができるように再設定を要する情報である。
【００４４】
また動画像属性情報設定部１０４は、Ｐピクチャ挿入後の動画像データの各ピクチャ毎にvbv-delay（vbv:video buffering verifier）の値を再計算し、対応するピクチャ層のヘッダに設定する。このようにvbv-delayも再生レートと同様に、フレーム再構成後に正常にデコード処理ができるように再設定を要する情報である。このvbv-delayは、ランダムアクセス時にデコーダ側のバッファ（デコーダバッファ）内のデータ量を調節するために必要とされる一種のバッファ制御情報である。ＭＰＥＧの規格によれば、このvbv delayは以下の如くに規定されている。
【００４５】
vbv_delay：１６ビットの符号なし整数。ビットレートが固定の動作の場合、vbv_delayを用いてピクチャのデコードの開始時にデコーダバッファがオーバーフロー或いはアンダーフローを起こさないようにデコーダバッファの初期占有率を設定する。vbv_delayは初期の空の状態から目標とするビットレートＲで現在のピクチャがデコーダバッファから除去される直前の正しいレベルまで当該デコーダバッファ（ＶＢＶバッファ）を満たすのに必要な時間を計測する。
【００４６】
vbv_delayの値はＶＢＶバッファがピクチャ開始コードの最終バイトを受け取った後待たなければならない９０ＫＨｚシステムクロックの期間の数である。ＧＯＰ内のｎ番目（ｎ＞０）のピクチャ（ピクチャｎ）のvbv_delayの値をvbv_delay_nとすると、当該vbv_delay_nは次式（５）
vbv_delay_n＝９００００＊Ｂ_n ^*／Ｒ …（５）
により算出される。
【００４７】
なお、Ｂ_n ^*はピクチャｎをバッファから除去する直前であってＧＯＰ層データと、シーケンスヘッダＳＨのデータと、ピクチャｎのデータエレメントの直前のpicture_start_code（ピクチャ層の開始コード）を除去した後のビットで測定されたＶＢＶ占有率である。
【００４８】
Ｒは１秒間当たりのビット数で表わされるビットレート（再生レート）である。シーケンスヘッダＳＨ内のbit_rateフィールドより符号化された丸め値よりもむしろ完全に正確なビットレートがＶＢＶモデルのエンコーダにより使用される。固定でないビットレート動作の場合、vbv_delayは１６進でＦＦＦＦの値を有する。
【００４９】
例えばある動画像において途中から再生しようとした場合を考える。最初は、出力側のバッファには何のデータも入っていない状態であるので、そのままでは再生できない。したがって、適度な量のデータがバッファ内に溜まるまでデコーダはデコード処理を待たなければならない。その待ち時間に直接関係する値がvbv_delayである。vbv_delayの値の計算は、ピクチャの流れ及び再生レートに依存しているので、Ｉピクチャのみを抽出したトリックプレイ動画像データの場合、以前の値のままでは、ランダムアクセス時に正常に動作しない（バッファのアンダーフローあるいはオーバーフローが起きる）。したがって、ピクチャ毎にvbv delayを再計算し、新たに設定を行う。これにより、トリックプレイ動画像データをランダムアクセスする際にバッファ内のデータ量を正しく調節でき、正常動作が可能となる。
【００５０】
上記のようにしてトリックプレイ用（早送りまたは逆早送り再生用）動画像データ１２０が作成される。このトリックプレイ用動画像データ１２０を、デコーダシステムによりデコードし再生することにより、例えばテレビジョン受像機に早送りまたは逆早送り再生された動画像が表示される。このとき、上記式（４）により求められた再生レートによる再生が行われる結果、同一のデータ量の各フレーム（Ｉピクチャ及びＰピクチャ）が同一の再生レートにより再生され、バッファ内のデータ量を安定化することができる。
【００５１】
以上に述べたように第１の実施形態の特徴は、通常再生用動画像データより抽出したＩピクチャのうち、ＤＣＴ係数を削減しても目標とするサイズ以下にならないＩピクチャはトリックプレイでは表示せず間引いてしまうという手法を適用した点にある。これにより、トリックプレイ時に表示されるピクチャ数が減る場合があり、その際は倍速率に若干揺らぎが生じる。しかし、一方において、より低い再生レートのままでトリックプレイが実現できるようになるという利点がある。しかも、倍速率の若干の揺らぎは、高速のトリックプレイでは視覚的にあまり気になるものではなく、むしろ、ネットワーク上のストリームとして動画像データを流すことを考えた場合に低い再生レートでトリックプレイ用動画像データを送出できるというメリットが大きい。
【００５２】
次に第１の実施形態の具体例について図５を参照して説明する。
今、倍速率３倍（Ｎ＝３）で、再生レートＲが指定され、目標ＩピクチャサイズをＴＳとし、抽出後のＩピクチャをＩi、ＤＣＴ係数削減後のＩピクチャをＩi’、挿入する特殊ＰピクチャをＰｓとする。また、元のシーケンスがＧＯＰサイズ「６」で構成されるものとする。
【００５３】
通常再生用動画像データ１１０のシーケンスが図５（ａ）に示すように
Ｉ1ＢＢＰＢＢＩ2ＢＢＰＢＢＩ3ＢＢＰＢＢＩ4ＢＢＰＢＢＩ5…
の場合、Ｉピクチャ抽出後は、図５（ｂ）に示すように
Ｉ1 Ｉ2 Ｉ3 Ｉ4 Ｉ5 Ｉ6 Ｉ7…
となる。
【００５４】
したがって、ＤＣＴ係数削減後のシーケンスは、図５（ｃ）に示すように
Ｉ1’Ｉ2’Ｉ3’Ｉ4’Ｉ5’Ｉ6’Ｉ7’…
となる。
【００５５】
ここで、Ｉ3’，Ｉ6’＞ＴＳであるものとすると、フレーム再構成部１０３内のフレーム間順方向予測画像挿入部１０３ｂでは、図５（ｄ）に示すように
Ｉ1’Ｉ2’Ｉ4’Ｉ5’Ｉ7’…
が特殊ＰピクチャＰｓの挿入の対象として抽出され、最終的には当該ＰピクチャＰｓの挿入により、図５（ｅ）に示すように
Ｉ１’ＰｓＩ2’ＰｓＩ4’ＰｓＩ5’ＰｓＩ7’…
というシーケンスが生成される。
【００５６】
［第２の実施形態］
前記第１の実施形態では、ＤＣＴ係数を削減しても目標とするサイズ以下にならないＩピクチャは間引いている。このため、Ｉピクチャの間引く頻度が高すぎる動画像の場合、シーンが飛びすぎてしまい、視覚的に違和感が生じる可能性がある。
【００５７】
そこで、間引いたＩピクチャに対し近傍のＰピクチャを代用するようにした本発明の第２の実施形態について図面を参照して説明する。
【００５８】
図６は本発明の第２の実施形態に係る画像処理装置の構成を示すブロック図である。
図６の画像処理装置６００は、抽出部６０１、フレーム内符号化情報削除部６０２、フレーム再構成部６０３、動画像属性情報設定部６０４、及びピクチャ挿入部６０５から構成される。抽出部６０１、フレーム内符号化情報削除部６０２、フレーム再構成部６０３、動画像属性情報設定部６０４は、図１の画像処理装置１００における抽出部１０１、フレーム内符号化情報削除部１０２、フレーム再構成部１０３、動画像属性情報設定部１０４に相当し、その機能もほぼ同様である。
【００５９】
図６の画像処理装置６００が図１の構成の画像処理装置１００と異なる点は、ピクチャ挿入部６０５が追加されていることである。このピクチャ挿入部６０５はフレーム内符号化情報削除部６０２とフレーム再構成部６０３の間に設けられ、次に述べるアルゴリズムに従うピクチャ挿入処理を行う。即ちピクチャ挿入部６０５は、フレーム内符号化情報削除部６０２よるＩピクチャの情報削減に失敗した場合、そのＩピクチャの代わりに近傍のＰピクチャで適したものがあればそれを挿入する。
【００６０】
本実施形態は、ＭＰＥＧの特性上、Ｉピクチャの近傍のＰピクチャならば、画像としてはＩピクチャに近い表示効果を得られる可能性が高いことに着目したものである。このとき近傍のＰピクチャの定義と選択対象については幾つかの手法が考えられる。例えば、カレントのＧＯＰ内でシーケンス上次のＰピクチャを近傍とする第１の定義、或いはカレントのＧＯＰ内のＰピクチャ全てを近傍のＰピクチャとする第２の定義、或いは直前のＧＯＰ内（早送り再生の場合）または直後のＧＯＰ内（逆早送り再生の場合）とカレントのＧＯＰ内のＰピクチャを全て近傍とする第３の定義などが考えられる。この第３の定義例を図７に示す。
【００６１】
また、近傍のＰピクチャが複数ある場合、そのうちのいずれを挿入対象とするか決める（選択する）手法としては、例えば、前記式（２）で求めた目標のサイズＴＳ以下であるＰピクチャのうち表示時刻的に（情報削減に失敗したＩピクチャに）最も近いものを選ぶ第１の手法、或いは目標のサイズＴＳ以下のＰピクチャのうちピクチャサイズの最大のものを選ぶ第２の手法などが適用できる。
【００６２】
上記のように近傍のＰピクチャの定義と選択を実施した後、もしも条件に見合うＰピクチャが全く存在しなかった場合には、近傍のＰピクチャはあきらめ、代わりに特殊Ｐピクチャを挿入するという手法を適用することもできる。但し、この場合、近傍のＰピクチャが利用できる場合と比較して映像の違和感は多少大きくなる。
【００６３】
以上の手法を用いてピクチャ挿入部６０５によるＰピクチャの挿入処理を実施した後、情報の削減されたＩピクチャ、または情報削減に失敗したＩピクチャに代えて挿入されたＰピクチャに対し、フレーム再構成部６０３中のパディング部６０３ａがパディングを施し、同じくフレーム間順方向予測画像挿入部６０３ｂが指定の倍速率に従い各ピクチャ間に特殊Ｐピクチャを挿入する。そして、動画像属性情報設定部６０４が、再生レートを動画像データのシーケンスヘッダへ格納し、各ピクチャ毎にvbv-delayの値を設定する。この一連の処理によりトリックプレイ用（早送りまたは逆早送り再生用）動画像データ６２０を作成する。
【００６４】
以上に述べた第２の実施形態では、指定の再生レートや倍速率での条件を満たせないようなＩピクチャがある場合に、それを間引くだけでなく、その代用として間引いたＩピクチャと画像的に近い可能性のあるＰピクチャを挿入することで、倍速率の揺れを減らし、より視覚的な違和感（飛び飛び感）を抑えることができ、また低い再生レートのトリックプレイ用動画像データを作成できる。
【００６５】
次に第２の実施形態の具体例について図８を参照して説明する。
今、倍速率３倍（Ｎ＝３）で、再生レートＲが指定され、目標ピクチャサイズをＴＳとし、抽出後のＩピクチャをＩi、ＤＣＴ係数削減後のＩピクチャをＩi’、挿入する特殊ＰピクチャをＰｓとする。また、元のシーケンスがＧＯＰサイズ「９」で構成されるものとする。
【００６６】
通常再生用動画像データ６１０のシーケンスが
ＩiＢＢＰi,1ＢＢＰi,2ＢＢＩi+1ＢＢＰi+1,1ＢＢＰi+1,2ＢＢＩi+2…の並びで、Ｉiの近傍のＰをＰi,k（ｋ≦２）と定義する。
【００６７】
ピクチャ挿入部６０５は、Ｉi’＞ＴＳのときＩi’を間引き、Ｐi,k≦ＴＳを満たすＰピクチャがあれば、それを挿入する。なければ特殊ＰピクチャＰｓ（Ｐｓ≦ＴＳ）を挿入する。
【００６８】
例えば、ソース映像としての通常再生用動画像データ６１０のシーケンスが図８（ａ）に示すように
Ｉ1ＢＢＰ1,1ＢＢＰ1,2ＢＢＩ2ＢＢＰ2,1ＢＢＰ2,2ＢＢＩ3ＢＢＰ3,1ＢＢＰ3,2ＢＢＩ4ＢＢＰ4,1…
の場合、Ｉピクチャ抽出後は、
Ｉ1 Ｉ2 Ｉ3 Ｉ4 Ｉ5…
となる。
【００６９】
したがって、ＤＣＴ係数削減後のシーケンスは、図８（ｂ）に示すように
Ｉ1’Ｉ2’Ｉ3’Ｉ4’Ｉ5’…
となる。
【００７０】
ここで、Ｉ2’，Ｉ4’＞ＴＳで、Ｐ2,1≦ＴＳで、Ｐ4,1，Ｐ4,2＞ＴＳであるものとすると、削減に失敗したＩ2’，Ｉ4’に代えてＰ2,1，Ｐｓがピクチャ挿入部６０５により挿入される。したがって、フレーム再構成部６０３（内のフレーム間順方向予測画像挿入部６０３ｂ）による特殊ＰピクチャＰｓ挿入後には、図８（ｃ）に示すように
Ｉ1’ＰｓＰｓＰ2,2ＰｓＰｓＩ3’ＰｓＰｓＰｓＰｓＰｓＩ５’…
というシーケンスが生成される。
【００７１】
［第３の実施形態］
前記第２の実施形態では、間引かれるＩピクチャが多く、近傍のＰピクチャが全くない動画像データの場合や、条件を満たさないＰピクチャしか存在しないケースが連続する動画像データの場合には、前記第１の実施形態と同様の視覚的違和感が生じる可能性が残る。
【００７２】
そこで、Ｉピクチャを間引かず、更に近傍のＰピクチャも挿入して再生レートを調整する手法を適用した、本発明の第３の実施形態について説明する。なお、本実施形態で適用される画像処理装置の基本構成は前記第２の実施形態で適用された図６の構成の画像処理装置と同様である。このため以下の説明では、便宜的に図６の構成を援用する。
【００７３】
抽出部６０１は、通常再生用動画像データ６１０からＩピクチャを抽出する。フレーム内符号化情報削除部６０２は、通常再生用動画像データ６１０により抽出された各Ｉピクチャから、当該Ｉピクチャのサイズが前記式（２）で求めた目標ピクチャサイズＴＳ以下になるようにＤＣＴ係数を削減する。このＤＣＴ係数削減（情報削減）によってもＴＳ以下にならない場合、フレーム内符号化情報削除部６０２はオーバーしたサイズのままそのＩピクチャを残し、その旨をピクチャ挿入部６０５に通知する。
【００７４】
ピクチャ挿入部６０５は、ＴＳを超えたサイズのＩピクチャに対しては、次のシーケンスとして近傍のＰピクチャで適切なものがあればそれを挿入する。近傍のＰピクチャの定義や選択手法（選択アルゴリズム）は前記第２の実施形態と同様である。但し、条件となるＰピクチャのサイズは、ＴＳではなく次式（６）
挿入条件となるＰピクチャのサイズ
＝ＴＳ−（Ｉ’−ＴＳ）＝２×ＴＳ−Ｉ’ …（６）
ＴＳ：式（２）で求めたＩピクチャサイズ
Ｉ’ ：情報削減後のＩピクチャサイズ
により求めた値を使用する。
【００７５】
上記式（６）でわかるように、ＴＳを超えたＩピクチャをそのまま残したことへの調整を次のＰピクチャを用いて行う。近傍のＰピクチャを利用する有効性は既に第２の実施形態で述べている。もし、近傍のＰピクチャで適切なものがなかった場合は、第２の実施形態と同様に、特殊ＰピクチャＰｓを代わりに挿入する。
【００７６】
フレーム再構成部６０３内のパディング部６０３ａは、上記のようにしてピクチャ挿入部６０５により挿入されたＰピクチャ、及びＴＳ以下に情報削減されたＩピクチャに対して前記式（３）に従うパディングを行う。なお、Ｐピクチャに対するパディングでは、式（３）中のＩ’を当該Ｐピクチャのサイズに置き換えればよい。またフレーム再構成部６０３内のフレーム間順方向予測画像挿入部６０３ｂは、指定の倍速率Ｎに応じて決定される数の特殊ＰピクチャＰｓを各ピクチャ間に挿入していく。
【００７７】
そして動画像属性情報設定部６０４が、前記式（４）で求められる再生レートを動画像データのシーケンスヘッダＳＨ（図４参照）へ格納し、各ピクチャ毎に前記式（５）で求められるvbv-delayの値を対応するピクチャ層のヘッダに設定する。
以上の一連の処理により、トリックプレイ用動画像データ６２０が作成される。
【００７８】
上記したように本実施形態では、指定した再生レートで要求されるピクチャサイズを満たすことができないＩピクチャはＤＣＴ係数を削減後の状態で、間引かずそのまま挿入する。そして、再生レートを調整するために、前のＩピクチャがオーバーしたサイズを考慮し、条件を満たすような近傍のＰピクチャがあれば挿入する。なければ、特殊Ｐピクチャで調整する。これにより、Ｐピクチャが挿入された場合、倍速率は若干揺れる場合があるが、視覚的違和感（飛び飛びの感覚）は解消され、より滑らかで、低再生レートのトリックプレイ用動画像データが作成できる。
【００７９】
次に第３の実施形態の具体例について図９を参照して説明する。
今、倍速率３倍で、再生レートＲが指定され、目標ピクチャサイズをＴＳとし、抽出後のＩピクチャをＩi、元のＰピクチャをＰi,j、ＤＣＴ係数削減後のＩピクチャをＩi’、挿入する特殊ＰピクチャをＰｓとする。元のシーケンスがＧＯＰサイズ「９」で構成されるものとする。
【００８０】
通常再生用動画像データ６１０のシーケンスが
ＩiＢＢＰi,1ＢＢＰi,2ＢＢＩi+1ＢＢＰi+1,1ＢＢＰi+1,2ＢＢＩi+2…の並びで、Ｉiの近傍のＰをＰi,k（ｋ≦２）と定義する。
【００８１】
ピクチャ挿入部６０５は、Ｉi’＞ＴＳのとき、Ｐi,k≦（２×ＴＳ−Ｉi’）を満たすＰピクチャがあれば、それを挿入する。なければ特殊ＰピクチャＰｓ（Ｐｓ≦（２×ＴＳ−Ｉ’））を挿入する。
【００８２】
例えば、ソース映像としての通常再生用動画像データ６１０のシーケンス（ソース画像）が図９（ａ）に示すように
Ｉ1ＢＢＰ1,1ＢＢＰ1,2ＢＢＩ2ＢＢＰ2,1ＢＢＰ2,2ＢＢＩ3ＢＢＰ3,1ＢＢＰ3,2ＢＢＩ4ＢＢＰ4,1…
の場合、Ｉピクチャ抽出後は、
Ｉ1 Ｉ2 Ｉ3 Ｉ4 Ｉ5…
となる。
【００８３】
したがって、ＤＣＴ係数削減後のシーケンスは、図８（ｂ）に示すように
Ｉ1’Ｉ2’Ｉ3’Ｉ4’Ｉ5’…
となる。
【００８４】
ここで、Ｉ2’，Ｉ4’＞ＴＳで、Ｐ2,2≦（２×ＴＳ−Ｉ2’）で、Ｐ4,1，Ｐ4,2＞（２×ＴＳ−Ｉ4’）であるものとすると、削減に失敗したＩ2’，Ｉ4’を間引かず、ピクチャ挿入部６０５により、次のシーケンスとしてＰ2,2，Ｐｓが挿入される。したがって、フレーム再構成部６０３（内のフレーム間順方向予測画像挿入部６０３ｂ）による特殊ＰピクチャＰｓ挿入後には、図９（ｃ）に示すように
Ｉ1’ＰｓＰｓＩ2’ＰｓＰｓＰ2,2ＰｓＰｓＩ3’ＰｓＰｓＩ4’ＰｓＰｓＰｓＰｓＰｓＩ5’…
となる。
【００８５】
以上に述べた各実施形態では、トリックプレイ用動画像データ作成を例にとり説明してきたが、これに限るものではない。例えば本発明は、ＭＰＥＧの動画像情報をディスクへ格納し、ネットワークへストリーム配信するビデオサーバにも適用が可能である。この場合、トリックプレイ用動画像データをディスクへそのまま格納しておき、ビデオサーバが配信するという方法、或いは、ディスクへ格納された通常再生用動画像データからリアルタイムにトリックプレイ用動画像情報を作成しつつ、ネットワークへ配信するという方法のいずれも適用可能である。
【００８６】
一般的に、帯域に制限のあるネットワークへ配信する際には、再生レートは低く且つ高画質であることが、資源の有効利用と映像品質の保証という観点から最も望まれることである。本方式によるトリックプレイの実現がこのような用途に対し効果的であることがわかる。
【００８７】
なお、本発明は、上記各実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。更に、上記実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件から幾つかの構成要件が削除されても、発明が解決しようとする課題の欄で述べた課題が解決でき、発明の効果の欄で述べられている効果（の少なくとも１つ）が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。
【００８８】
【発明の効果】
以上詳述したように本発明によれば、フレーム間予測を用いた動画像圧縮方式により符号化された固定レート或いは可変レートの動画像データを扱うものにおいて、通常再生の動画像データを用いて、様々な倍速率や再生レートでのトリックプレイ用動画像データを作成でき、しかもネットワーク送出時に有利となる低ビットレートでのトリックプレイ再生を実現できる。
【００８９】
また本発明によれば、画質的にもできる限り元の映像を劣化させない品質を実現し、高速なトリックプレイをより滑らかに見せることができる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る画像処理装置の構成を示すブロック図。
【図２】ＭＰＥＧ２による画像圧縮の原理を示すブロック図。
【図３】ＭＰＥＧ２画像データの階層構成を示す図。
【図４】同第１の実施形態にてパディングされた動画像データの構成を示す図。
【図５】同第１の実施形態の具体例を説明するための図。
【図６】本発明の第２の実施形態に係る画像処理装置の構成を示すブロック図。
【図７】同第２の実施形態における、情報削減に失敗したＩピクチャの近傍のＰピクチャの定義例を示す図。
【図８】同第２の実施形態の具体例を説明するための図。
【図９】本発明の第３の実施形態の具体例を説明するための図。
【符号の説明】
１００，６００…画像処理装置
１０１，６０１…抽出部
１０２，６０２…フレーム内符号化情報削除部（削除手段）
１０３，６０３…フレーム再構成部
１０３ａ，６０３ａ…パディング部
１０３ｂ，６０３ｂ…フレーム間順方向予測画像挿入部
１０４，６０４…動画像属性情報設定部
１０４ａ，６０４ａ…再生レート設定部
１０４ｂ，６０４ｂ…バッファ制御情報設定部
１１０，６１０…通常再生用動画像データ
１２０，６２０…トリックプレイ用動画像データ
６０５…ピクチャ挿入部（挿入手段）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing method and apparatus for transmitting moving image data encoded by a moving image compression method using inter-frame prediction at a fixed rate or a variable rate, and particularly, fast forward, reverse fast forward (rewind) reproduction, and the like. The present invention relates to an image processing method and apparatus suitable for realizing trick play.
[0002]
[Prior art]
Currently, MPEG (Moving Picture Experts Group) international standard is a mainstream technology for encoding by a moving picture compression method using inter-frame prediction. Usually, in a moving image processing system based on MPEG, a moving image recorded on an analog VTR tape or a real-time analog moving image using a video camera or the like is encoded (encoded) by a dedicated encoder, and the moving image is recorded. Store it as an external storage device. The stored moving image file is decoded by a dedicated decoder and reproduced. There are two types of data transmission from the file to the decoder: a fixed rate or a variable rate. Conventionally, in this MPEG system, there is a method of creating moving picture data dedicated to trick play in order to realize fast forward / reverse fast forward reproduction (hereinafter referred to as trick play) like an analog VTR.
[0003]
In the video compressed by MPEG, we have a trick play video that can adjust the double speed rate and the bit rate (playback rate) at the time of playback from the viewpoint that the trick play does not necessarily have high image quality. A data creation method has been proposed (Japanese Patent Application No. 9-147846). This method has an excellent feature that trick play moving image data having a specific double speed rate and reproduction rate can be easily created. However, this method has a problem that it is impossible to create trick-play moving image data when the following event occurs.
[0004]
In the above method, the intra-frame encoded image (I picture) information extracted from the moving image data for normal reproduction is reduced until it becomes equal to or smaller than the size value calculated so as to satisfy the designated reproduction rate. At this time, even if the image information is reduced as much as possible, it may not be less than the target size value. However, the above method does not consider the occurrence of such an event. For this reason, when an event occurs in which the image information does not become the target size value or less, it is impossible to create trick-play moving image data at a specified playback rate.
[0005]
[Problems to be solved by the invention]
As described above, in the technique of encoding by the moving picture compression method using inter-frame prediction, the currently proposed trick play moving picture data creation method reduces intra-frame encoded picture data as much as possible. However, if the size does not fall below the target size value, there is a problem that it is impossible to create moving image data.
[0006]
The present invention has been made in view of the above circumstances, and its purpose is to create trick-play moving image data using normally reproduced moving image data encoded by a moving image compression method using inter-frame prediction. However, it is an object of the present invention to provide an image processing method and apparatus capable of flexibly setting the double speed rate and playback rate during trick play playback and reducing the number of cases where creation is impossible.
[0007]
[Means for Solving the Problems]
The present invention relates to an image processing method for transmitting moving image data encoded by performing moving image compression using inter-frame prediction at a fixed rate or a variable rate. The step of extracting the intra-frame encoded image data in order from the side determined in step (b), and the intra-frame encoding so that the extracted intra-frame encoded image data is equal to or less than the target size satisfying the specified reproduction bit rate. A step of reducing predetermined information from the image data, a step of extracting only the intra-frame encoded image data from which the information has been reduced, which is equal to or smaller than the target size, and intra-frame encoding of the extracted target size or smaller. The padding code is added to the frame so that the image data satisfies the specified playback bit rate. A step of inserting into the encoded image data, and a step of setting, for the header of the moving image data, a bit rate reflecting the bit rate at the time of reproduction specified for the header of the moving image data in which the padding code is inserted And a step of setting buffer control information for appropriately starting reproduction and random access when reproducing the moving image data with the padding code inserted therein. Here, as predetermined information to be reduced from intra-frame encoded image data, DCT (Discrete Cosine Transform) coefficients, preferably AC components of DCT coefficients, particularly high frequency components (AC components) that influence image quality, are targeted. It is good to do.
[0008]
As described above, the present invention is characterized by reducing predetermined information (for example, DCT coefficients) in intra-frame encoded image data (I picture) extracted from moving image data (generally, normal reproduction moving image data). However, an I picture that does not fall below the target size (target size) is not displayed in trick play but thinned out. As a result, when the number of pictures displayed during trick play is reduced, the play rate can be slightly changed, but trick play can be realized with a lower reproduction rate. In addition, slight fluctuations in the rate of double speed are not a big concern visually in high-speed trick play, but rather trick play at a low playback rate when considering moving video data as a stream on the network. The merit of being able to send moving image data is great.
[0009]
Here, by extracting only the I picture, the number of pictures decreases, and the number of frames in a GOP (Group of Pictures) also changes. Therefore, it is preferable to insert interframe forward prediction image data (P picture) having no data related to motion prediction between I pictures. The number of P pictures (special P pictures) inserted between the I pictures is preferably N as the designated double speed rate and M (N and M are integers, N ≦ M) as the number of frames in the GOP of the original moving picture. Then, it is good to set it to (M / N) -1.
[0010]
In addition, the present invention not only thins out I pictures that have not become smaller than the target size due to information reduction, but instead reduces them from the P pictures in the vicinity of the I picture in the moving image data to the target size or smaller. And the padding code is inserted not only for the I picture smaller than the target size but also for this P picture. It is characterized by. Here, when there is no corresponding P picture, a special P picture may be used.
[0011]
As a result, when there is an I picture that does not satisfy the specified bit rate and double speed ratio during playback, it is possible not only to thin it out, but also to be close to the image of the thinned I picture as a substitute. By inserting a P picture having a large number, it is possible to reduce the fluctuation of the double speed ratio and to suppress a more visually uncomfortable feeling (jumping feeling). Also, it is possible to create trick play moving image data with a low playback rate.
[0012]
In addition, the present invention does not thin out an I picture that has not become smaller than the target size due to information reduction, but instead, from among P pictures in the vicinity of the I picture in moving image data, A size that compensates for the size exceeding the target size is extracted and inserted immediately after the I picture. Accordingly, not only the I picture smaller than the target size but also the P picture It is characterized by the fact that it is inserted. Here, the upper limit of the size that compensates for the amount exceeding the target size may be a value obtained by subtracting the size of the I picture that did not become the target size or less from twice the target size. If there is no corresponding P picture, a special P picture may be used.
[0013]
As a result, when there is an I picture that cannot satisfy the specified bit rate or double speed ratio at the time of reproduction, instead of thinning out the I picture, a P picture that compensates (compensates for) the size that the I picture has exceeded Is inserted immediately after the I picture, the speed ratio may fluctuate slightly. it can.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0015]
[First Embodiment]
First, the principle of an image compression method for obtaining image data applied by the image processing apparatus according to the first embodiment of the present invention, that is, moving image data encoded by compressing a moving image using inter-frame prediction. This will be described with reference to FIG. 2, taking an example of image compression by MPEG2.
[0016]
In image compression by MPEG2, DCT (Discrete Cosine Transform) is applied to image data to perform quantization. That is, as shown in FIG. 2, an input image (original image) 210 to an MPEG2 compliant encoder (encoder) 200 is first divided into 8 × 8 pixel blocks. A DCT operation is performed by the DCT circuit 201 in the encoder 200 in units of blocks, and the obtained DCT coefficients are quantized by the quantization circuit 202 independently with a DC component (DC component) and an AC component (AC component). The quantization table 220 used for quantization is composed of a luminance signal quantization table and a color difference signal quantization table. Among the quantized DCT coefficients, the DC coefficient is encoded by the entropy encoding circuit 203 with a difference value using the DC coefficient of the immediately preceding block as a predicted value. The remaining AC components are rearranged by zigzag scanning within the block, and then encoded by the circuit 203. The pixel values (for example, luminance) that are randomly distributed before the conversion by the DCT are concentrated in the low frequency term after the DCT conversion. Therefore, image data can be compressed by performing an operation of dropping (removing) the high-frequency term. The above is the principle of image compression by MPEG2.
[0017]
Now, encoded moving image data obtained by performing image compression by MPEG2 using the encoder 200 in FIG. 2, that is, moving image data encoded by performing moving image compression using inter-frame prediction, is encoded. One form of reproduction is trick play (fast forward / reverse fast forward reproduction) as described in the section of the prior art.
[0018]
The image processing apparatus according to the present embodiment pays attention to how much importance should be given to the image quality during trick play in the MPEG2 moving image reproduction.
[0019]
High-speed trick play is often used mainly when searching for a specific scene in a video and jumping quickly to a necessary part or when understanding the contents of the video quickly. In such a case, the image of trick play does not necessarily have the same image quality as that during normal reproduction, and it is considered that there is no particular problem even if the image quality is slightly reduced. From this point of view, as a measure, Japanese Patent Application No. 9-147846 proposed a trick play moving image data creation method that can adjust the double speed rate and the bit rate (reproduction rate) during reproduction. .
[0020]
In this method, only the intra-frame encoded image (I picture) is extracted from the MPEG moving image data, and among the DCT coefficients in the extracted macro block of the I picture, the coefficient of the high frequency component that affects the image quality is reduced, By reducing the size of each I picture, the playback rate can be kept low.
[0021]
Also, a special inter-frame forward prediction image (P picture) (no motion prediction data) is inserted between I pictures and I pictures, and the rate of change can be made variable by the number of insertions, or I pictures can be reduced. By adjusting the number of DCT coefficients to be performed, it is possible to create a trick play moving image having a specified playback rate. Here, the calculation formula for the number of insertions of special P pictures is that the specified double speed ratio is N and the number of frames in the GOP (Group of Pictures) of the original moving image is M (N and M are integers, N ≦ M). Then, the following formula (1)
Number of inserted P pictures = (M / N) −1 (1)
Is represented by
[0022]
However, according to this method, depending on the value of the designated reproduction rate (bit rate) R, even if the DCT coefficients in the I picture are reduced as much as possible, the picture size (I In some cases, the picture size can not be reduced to TS. The formula for calculating the I picture size TS, that is, the I picture size (target picture size) TS which is a condition for realizing the specified reproduction rate R is the following formula (2):
I picture size TS
= R * (P_num + 1) / Frame- (P_sz * P_num + Hdr_sz) (2)
R: Designated playback rate (bps)
P_sz: Size of P picture (bit)
Hdr_sz: Header size (bits) (only when there is a sequence header or GOP header)
P_num: Number of P pictures to be inserted (determining the double speed ratio)
Frame: Number of displayed pictures per second
It is represented by
[0023]
If the I picture size TS represented by the above equation (2) cannot be reduced, it is practically impossible to create trick-play moving image data at the specified playback rate R. Although it depends on the original moving image data, it is usually more likely that this problem will occur if the reproduction rate is increased and the reproduction rate is decreased.
[0024]
Therefore, in the present embodiment, the above problem is solved by using an image processing apparatus described below.
[0025]
FIG. 1 is a block diagram showing a configuration of an image processing apparatus according to the first embodiment of the present invention.
The image processing apparatus 100 in FIG. 1 includes an extraction unit 101, an intra-frame encoded information deletion unit 102, a frame reconstruction unit 103, and a moving image attribute information setting unit 104.
[0026]
The extraction unit 101 receives the normal playback moving image data 110 encoded by the MPEG2 compliant encoder (200) as shown in FIG. 2, and extracts an intra-frame encoded image (I picture) from the moving image data 110. Extract. Here, the encoding format of the moving image data for normal reproduction may be either a fixed rate (CBR) or a variable rate (VBR).
The intra-frame encoded information deletion unit 102 reduces information from the I picture so that the size of each I picture extracted by the extraction unit 101 satisfies a condition that the playback rate R is satisfied.
[0027]
As a result of the information reduction by the intra-frame encoded information deletion unit 102, the frame reconstruction unit 103 has a size that satisfies the condition for an I picture that has become smaller than or equal to the size that satisfies the condition for the playback rate R. In this way, it is composed of a padding unit 103a for inserting padding codes and an inter-frame forward prediction image insertion unit 103b for inserting a special inter-frame forward prediction image (P picture) having no motion prediction data between I pictures. ing.
[0028]
The moving image attribute information setting unit 104 ensures that playback start and random access are appropriately performed when the decoder reproduces moving image data including an I picture in which a padding code has been inserted by the frame reconstruction unit 103. By setting the moving image attribute information for this purpose, moving image data 120 for trick play (for fast forward or reverse fast forward reproduction) is created.
[0029]
The moving image attribute information setting unit 104 calculates a reproduction rate and sets it in a moving image data header (sequence header), and a vbv-delay (vbv) described later as buffer control information for each picture. : video buffering verifier) is calculated and set in the header of the corresponding picture layer.
[0030]
Next, the operation of the image processing apparatus configured as described above will be described. It is assumed that the playback rate and double rate of the trick play moving image are designated (given) as parameters from the outside.
[0031]
First, the extraction unit 101 in the image processing apparatus 100 extracts an intra-frame encoded image (I picture) from the normal reproduction moving image data 110. The extraction direction of the I picture differs depending on whether trick-play moving image data 120 for fast-forward playback or reverse fast-forward playback is created. The extraction unit 101 starts extraction from the head side of moving image data for fast-forward playback, and starts extraction from the end side of moving image data for reverse-fast-forward playback.
[0032]
The intra-frame encoded information deletion unit 102 reduces the size of the I picture by reducing the information of the I picture for each I picture extracted by the extraction unit 101. In the MPEG2 example, the I picture information can be reduced by reducing the DCT coefficient obtained when the above-described DCT calculation is performed.
[0033]
Here, the DCT coefficient reduction method by the intraframe coding information deletion unit 102 will be described.
In MPEG2, a DCT coefficient exists for each macroblock. This coefficient is divided into a DC component and an AC component, and is quantized independently. It is the AC component that is to be deleted. Here, a part of the coefficient corresponding to the low frequency is left out of the AC component, and the coefficient corresponding to the high frequency is deleted. Various deletion algorithms can be considered. For example, it is possible to apply equal deletion from all macroblocks constituting a picture or change the number of deletions for each macroblock. It is also possible to take a heuristic approach. In any case, any algorithm may be used as long as it is a method for removing a high-frequency component from the AC component.
[0034]
Now, the intra-frame encoded information deletion unit 102 performs the above-described I picture information reduction processing until the I picture size TS is within the I picture size TS obtained by the above equation (2). If information reduction has failed, the fact is notified to the frame reconstruction unit 103.
[0035]
As a result, the padding unit 103a in the frame reconstruction unit 103 targets only the I picture that has been successfully reduced by the intra-frame encoded information deletion unit 102, that is, only the I picture whose information has been reduced to within the I picture size TS. And the following equation (3)
Padding size = TS-I ′ (3)
TS: I picture size obtained by equation (2)
I ′: I picture size after reduction
According to the above, padding is applied. That is, the padding code for the size calculated by Expression (3) is inserted.
[0036]
The reason why the padding code is inserted into the I picture whose information has been reduced until the I picture size is within the TS is as follows. First, since the size of the I picture is not originally fixed and the information is reduced until it is within the size TS, the amount of decoding processing when the moving image data is reproduced at a specified bit rate (reproduction rate) is left as it is. As a result, a buffer overflow and underflow are likely to occur on the decoder (image output) side. Therefore, in order to make the packet data length constant by adjusting the size of the I picture to the size TS calculated by the above equation (2) so that a normal decoding operation is performed when reproduced at a specified bit rate, Insert padding code. The reason for this is the same as that described in paragraphs 0026 to 0029 in the detailed description column of the specification originally attached to the application of Japanese Patent Application No. 9-147846.
[0037]
Now, the padding code is inserted into the I picture (padding process) by the padding section 103a as follows.
First, the padding code of the size (TS-I ′) calculated by the equation (3) is a kind of dummy data (for example, “0” data). This padding code (dummy data) is inserted (embedded) into an I picture (after information reduction) of size I ′ (I ′ <TS). FIG. 3 shows the hierarchical structure of MPEG2 image data. As shown in the figure, MPEG2 image data is composed of a sequence layer, a GOP layer, a picture layer, a slice layer, a macroblock layer, and a block layer. In the MPEG2 ES (Elementary Stream) specification, an arbitrary number of zeros can be inserted before a start code indicating the start of a slice layer. Therefore, a padding code having a size (TS-I ′) calculated by the equation (3), that is, a padding code PAD adapted to a designated bit rate is embedded in this portion of the I picture. FIG. 4 shows a sequence of moving image data in which the padding code PAD is embedded.
[0038]
The moving image data padded by the padding unit 103a has a frame structure of moving image data for trick play related to fast forward or reverse fast forward. This moving image data is subjected to a padding process only for I pictures extracted from the normal reproduction moving image data 110, and only I pictures whose size is reduced to TS or less by the intra-frame coding information deletion unit 102, The aim is to make the decoding processing amount constant by keeping the size constant.
[0039]
However, this is not always enough. That is, because only the I picture is extracted from the normal playback moving image data 110, the number of pictures is decreased, and the GOP (between the I picture and the I picture) is also changed. Therefore, the inter-frame forward prediction image insertion unit 103b in the frame reconstruction unit 103 inserts a P picture (special P picture) having no data related to motion prediction between I pictures. Since this special P picture does not have any data related to motion prediction, for example, the same picture as the immediately preceding I picture is used as the reproduced picture of the P picture. The number of special P pictures to be inserted is determined according to the designated double speed ratio N. By changing the number of special P pictures to be inserted, the time interval at which the I picture is processed can be changed, and the double rate can be changed.
[0040]
Here, the relationship between the double speed ratio (trick play magnification) N and the number of P pictures to be inserted will be described below.
As described above, in MPEG2, if the number of pictures between an I picture and the next I picture, that is, the size of the GOP is fixed, the ratio of the sizes becomes the trick play magnification as it is. Therefore, when the size of the stream GOP (the number of frames in the GOP) is M, the number of P pictures to be inserted in order to create fast-forward moving image data of N-speed, that is, N-times speed, M / N) -1.
[0041]
When the inter-frame forward predicted image insertion unit 103b inserts the number of special P pictures calculated by the equation (1) between I pictures, the inserted moving image data is input to the moving image attribute information setting unit 104. hand over.
[0042]
The moving picture attribute information setting unit 104 sets the playback rate (bps) for the moving picture data after the insertion of the P picture in order to stabilize the data amount in the buffer on the decoder side according to the following equation (4).
Playback rate = (I_sz + P_sz × P_num + Hdr_sz)
÷ (P_num + 1) / Frame (4)
I_sz: Size of I picture (bit)
P_sz: Size of P picture (bit)
P_num: Number of P pictures to be inserted
Hdr_sz: Header size (bits)
Frame: Number of displayed pictures per second
Calculate with The playback rate calculated according to this equation (4) reflects the specified playback rate, but does not necessarily match the specified playback rate, and after the size adjustment including P picture insertion is made. It is suitable for moving image data.
[0043]
Therefore, the moving image attribute information setting unit 104 uses the reproduction rate calculated according to Equation (4) instead of the designated reproduction rate, and resets the calculated reproduction rate in the sequence header SH (see FIG. 4) of the moving image data. To do. As a result, the playback rate in the sequence header SH (the playback rate for normal playback) is updated to the calculated playback rate. Thus, the playback rate is information that needs to be reset so that normal decoding (decoding) processing can be performed after frame reconstruction.
[0044]
Also, the moving image attribute information setting unit 104 recalculates the value of vbv-delay (vbv: video buffering verifier) for each picture of the moving image data after inserting the P picture, and sets it to the header of the corresponding picture layer. As described above, vbv-delay is information that needs to be reset so that decoding processing can be normally performed after frame reconstruction, similarly to the playback rate. This vbv-delay is a kind of buffer control information required for adjusting the amount of data in the decoder-side buffer (decoder buffer) during random access. According to the MPEG standard, this vbv delay is defined as follows.
[0045]
vbv_delay: 16-bit unsigned integer. When the bit rate is fixed, vbv_delay is used to set the initial occupancy of the decoder buffer so that the decoder buffer does not overflow or underflow at the start of picture decoding. vbv_delay measures the time required to fill the decoder buffer (VBV buffer) from the initial empty state to the correct level just before the current picture is removed from the decoder buffer at the target bit rate R.
[0046]
The value of vbv_delay is the number of 90 KHz system clock periods that the VBV buffer must wait after receiving the last byte of the picture start code. The value of vbv_delay of the nth (n> 0) picture (picture n) in the GOP is set to vbv_delay_nThen the vbv_delay_nIs the following equation (5)
vbv_delay_n= 90000 * B_n ^*/ R (5)
Is calculated by
[0047]
B_n ^*Is measured immediately before removing picture n from the buffer and after removing GOP layer data, sequence header SH data, and picture_start_code (picture layer start code) immediately before the data element of picture n. VBV occupancy rate.
[0048]
R is a bit rate (reproduction rate) represented by the number of bits per second. The fully accurate bit rate is used by the VBV model encoder rather than the rounded value encoded from the bit_rate field in the sequence header SH. For non-fixed bit rate operation, vbv_delay has a value of FFFF in hexadecimal.
[0049]
For example, consider a case where a certain moving image is to be reproduced from the middle. At first, since no data is stored in the output buffer, it cannot be reproduced as it is. Therefore, the decoder must wait for the decoding process until an appropriate amount of data accumulates in the buffer. A value directly related to the waiting time is vbv_delay. Since the calculation of the value of vbv_delay depends on the picture flow and the playback rate, in the case of trick play moving image data in which only the I picture is extracted, the previous value remains unchanged during random access (buffer) Underflow or overflow). Therefore, vbv delay is recalculated for each picture and a new setting is made. As a result, when trick-play moving image data is randomly accessed, the amount of data in the buffer can be adjusted correctly, and normal operation is possible.
[0050]
As described above, moving image data 120 for trick play (for fast forward or reverse fast forward reproduction) is created. The trick-play moving image data 120 is decoded and reproduced by a decoder system, so that, for example, a moving image reproduced in a fast-forward or reverse fast-forward manner is displayed on a television receiver. At this time, as a result of reproduction at the reproduction rate obtained by the above equation (4), each frame (I picture and P picture) having the same data amount is reproduced at the same reproduction rate, and the amount of data in the buffer is reduced. Can be stabilized.
[0051]
As described above, the feature of the first embodiment is that, among the I pictures extracted from the moving image data for normal reproduction, the I pictures that are not smaller than the target size even if the DCT coefficient is reduced are displayed in trick play. It is the point which applied the method of decimating without doing. As a result, the number of pictures displayed during trick play may be reduced, and in this case, the rate of double speed slightly fluctuates. However, on the other hand, there is an advantage that trick play can be realized with a lower reproduction rate. In addition, slight fluctuations in the rate of double speed are not a big concern visually in high-speed trick play, but rather trick play at a low playback rate when considering moving video data as a stream on the network. The merit of being able to send moving image data is great.
[0052]
Next, a specific example of the first embodiment will be described with reference to FIG.
Now, a reproduction rate R is specified at a triple speed rate (N = 3), the target I picture size is TS, the extracted I picture is Ii, and the I picture after the DCT coefficient reduction is inserted is Ii ′. Let Ps be a P picture. Further, it is assumed that the original sequence is configured with a GOP size “6”.
[0053]
The sequence of the normal playback moving image data 110 is as shown in FIG.
I1BBPBB I2BBPBB I3BBPBB I4BBPBB I5 ...
In this case, after extracting the I picture, as shown in FIG.
I1 I2 I3 I4 I5 I6 I7 ...
It becomes.
[0054]
Therefore, the sequence after the DCT coefficient reduction is as shown in FIG.
I1'I2'I3'I4'I5'I6'I7 '...
It becomes.
[0055]
Here, assuming that I3 ′ and I6 ′> TS, the inter-frame forward prediction image insertion unit 103b in the frame reconstruction unit 103 is as shown in FIG.
I1'I2'I4'I5'I7 '...
Is extracted as an insertion target of the special P picture Ps, and finally, by inserting the P picture Ps, as shown in FIG.
I1'Ps I2'Ps I4'Ps I5'Ps I7 '...
Is generated.
[0056]
[Second Embodiment]
In the first embodiment, even if the DCT coefficient is reduced, I pictures that do not fall below the target size are thinned out. For this reason, in the case of a moving image in which the frequency of thinning out the I picture is too high, there is a possibility that the scene skips too much and visually uncomfortable.
[0057]
Accordingly, a second embodiment of the present invention in which a neighboring P picture is substituted for the thinned out I picture will be described with reference to the drawings.
[0058]
FIG. 6 is a block diagram showing a configuration of an image processing apparatus according to the second embodiment of the present invention.
The image processing apparatus 600 of FIG. 6 includes an extraction unit 601, an intra-frame encoded information deletion unit 602, a frame reconstruction unit 603, a moving image attribute information setting unit 604, and a picture insertion unit 605. The extraction unit 601, the intra-frame coding information deletion unit 602, the frame reconstruction unit 603, and the moving image attribute information setting unit 604 are the extraction unit 101, the intra-frame coding information deletion unit 102, the frame in the image processing apparatus 100 of FIG. It corresponds to the reconstruction unit 103 and the moving image attribute information setting unit 104, and the functions thereof are almost the same.
[0059]
The image processing apparatus 600 in FIG. 6 is different from the image processing apparatus 100 having the configuration in FIG. 1 in that a picture insertion unit 605 is added. The picture insertion unit 605 is provided between the intra-frame encoded information deletion unit 602 and the frame reconstruction unit 603, and performs picture insertion processing according to the algorithm described below. That is, if the picture insertion unit 605 fails to reduce the information of the I picture by the intra-frame encoded information deletion unit 602, the picture insertion unit 605 inserts a suitable P picture in the vicinity instead of the I picture.
[0060]
In the present embodiment, attention is paid to the fact that an image having a display effect close to that of an I picture is likely to be obtained if the P picture is in the vicinity of the I picture due to the characteristics of MPEG. At this time, several methods can be considered for the definition and selection target of the neighboring P picture. For example, in the current GOP, the first definition that makes the next P picture in the sequence neighboring, or the second definition that makes all P pictures in the current GOP neighboring P pictures, or in the immediately preceding GOP (fast forward For example, a third definition in which all the P pictures in the GOP immediately after (in the case of reproduction) or in the immediately following GOP (in the case of reverse fast-forward reproduction) and in the current GOP are considered. This third definition example is shown in FIG.
[0061]
In addition, when there are a plurality of neighboring P pictures, as a method for determining (selecting) which one of them is to be inserted, for example, among the P pictures that are equal to or smaller than the target size TS obtained by the above equation (2) The first method of selecting the closest display time (to the I picture that failed to reduce information) or the second method of selecting the largest picture size among P pictures less than the target size TS applies it can.
[0062]
After defining and selecting a neighboring P picture as described above, if there is no P picture that meets the conditions, the neighboring P picture is given up and a special P picture is inserted instead. Can also be applied. However, in this case, the uncomfortable feeling of the video is somewhat larger than when a nearby P picture can be used.
[0063]
After performing the P picture insertion process by the picture insertion unit 605 using the above-described method, frame re-transmission is performed on the I picture whose information has been reduced or inserted in place of the I picture whose information has failed to be reduced. The padding unit 603a in the configuration unit 603 performs padding, and the inter-frame forward prediction image insertion unit 603b similarly inserts a special P picture between each picture according to a designated double rate. Then, the moving image attribute information setting unit 604 stores the playback rate in the sequence header of the moving image data, and sets the value of vbv-delay for each picture. Through this series of processing, moving image data 620 for trick play (for fast forward or reverse fast forward reproduction) is created.
[0064]
In the second embodiment described above, when there is an I picture that cannot satisfy the conditions of the designated playback rate and double speed rate, not only the I picture is thinned out, but also the thinned I picture and the image By inserting a P picture that may be close to, you can reduce fluctuations in the double speed ratio, suppress a more visually uncomfortable feeling (jumping feeling), and create moving image data for trick play with a low playback rate .
[0065]
Next, a specific example of the second embodiment will be described with reference to FIG.
Now, a reproduction rate R is specified at a triple speed ratio (N = 3), the target picture size is TS, the extracted I picture is Ii, the I picture after the DCT coefficient reduction is Ii ′, and the special P Let Ps be a picture. Further, it is assumed that the original sequence is configured with a GOP size “9”.
[0066]
The sequence of the moving image data 610 for normal reproduction is
In the sequence of IiBBPi, 1BBPi, 2BB Ii + 1BBPi + 1,1BBPi + 1,2BB Ii + 2..., P in the vicinity of Ii is defined as Pi, k (k ≦ 2).
[0067]
The picture insertion unit 605 thins out Ii ′ when Ii ′> TS, and if there is a P picture satisfying Pi, k ≦ TS, inserts it. If not, a special P picture Ps (Ps ≦ TS) is inserted.
[0068]
For example, as shown in FIG. 8A, a sequence of normal playback moving image data 610 as a source video is shown.
I1BBP1,1BBP1,2BB I2BBP2,1BBP2,2BB I3BBP3,1BBP3,2BB I4BBP4,1 ...
In the case of I picture extraction,
I1 I2 I3 I4 I5 ...
It becomes.
[0069]
Therefore, the sequence after the DCT coefficient reduction is as shown in FIG.
I1'I2'I3'I4'I5 '...
It becomes.
[0070]
Here, if I2 ′, I4 ′> TS, P2,1 ≦ TS, and P4,1, P4,2> TS, P2,1, Ps is inserted by the picture insertion unit 605. Therefore, after insertion of the special P picture Ps by the frame reconstruction unit 603 (intra-frame forward prediction image insertion unit 603b), as shown in FIG.
I1'PsPs P2,2PsPs I3'PsPs PsPsPs I5 '...
Is generated.
[0071]
[Third Embodiment]
In the second embodiment, in the case of moving image data in which there are many I pictures to be thinned out and there are no neighboring P pictures at all, or in the case of moving image data in which only P pictures that do not satisfy the condition exist There remains a possibility that a visual discomfort similar to that in the first embodiment may occur.
[0072]
Therefore, a third embodiment of the present invention will be described, to which a technique for adjusting a playback rate by inserting a neighboring P picture without thinning out I pictures is applied. The basic configuration of the image processing apparatus applied in the present embodiment is the same as that of the image processing apparatus having the configuration of FIG. 6 applied in the second embodiment. For this reason, in the following description, the structure of FIG. 6 is used for convenience.
[0073]
The extraction unit 601 extracts an I picture from the normal playback moving image data 610. The intra-frame encoded information deletion unit 602 performs DCT from each I picture extracted from the normal playback moving image data 610 so that the size of the I picture is equal to or smaller than the target picture size TS obtained by the above equation (2). Reduce the coefficient. If the DCT coefficient reduction (information reduction) does not result in the TS or less, the intra-frame encoded information deletion unit 602 leaves the I picture in the oversized size and notifies the picture insertion unit 605 to that effect.
[0074]
The picture insertion unit 605 inserts an I picture having a size exceeding the TS, if there is an appropriate neighboring P picture as the next sequence. The definition and selection method (selection algorithm) of neighboring P pictures are the same as those in the second embodiment. However, the size of the P picture as a condition is not TS but the following formula (6)
Size of P picture as insertion condition
= TS- (I'-TS) = 2 * TS-I '(6)
TS: I picture size obtained by equation (2)
I ′: I picture size after information reduction
Use the value obtained by.
[0075]
As can be seen from the above equation (6), the adjustment to the fact that the I picture exceeding the TS is left as it is is performed using the next P picture. The effectiveness of using neighboring P pictures has already been described in the second embodiment. If there is no appropriate neighboring P picture, the special P picture Ps is inserted instead, as in the second embodiment.
[0076]
The padding unit 603a in the frame reconstruction unit 603 performs padding according to the equation (3) on the P picture inserted by the picture insertion unit 605 as described above and the I picture whose information is reduced below the TS. . In padding for a P picture, I ′ in equation (3) may be replaced with the size of the P picture. Further, the inter-frame forward prediction image insertion unit 603b in the frame reconstruction unit 603 inserts the number of special P pictures Ps determined according to the designated double rate N between the pictures.
[0077]
Then, the moving image attribute information setting unit 604 stores the reproduction rate obtained by the equation (4) in the sequence header SH (see FIG. 4) of the moving image data, and vbv obtained by the equation (5) for each picture. -delay value is set in the header of the corresponding picture layer.
The trick play moving image data 620 is created by the series of processes described above.
[0078]
As described above, in this embodiment, an I picture that cannot satisfy the required picture size at the specified playback rate is inserted as it is without being thinned out in a state after the DCT coefficient is reduced. Then, in order to adjust the playback rate, the size of the previous I picture is taken into consideration, and if there is a neighboring P picture that satisfies the condition, it is inserted. If not, adjustment is performed with a special P picture. As a result, when a P picture is inserted, the double speed ratio may fluctuate slightly, but the visual discomfort (feeling of skipping) is eliminated, and moving picture data for trick play with a smoother and lower reproduction rate can be created. .
[0079]
Next, a specific example of the third embodiment will be described with reference to FIG.
Now, the reproduction rate R is specified at a triple speed rate, the target picture size is TS, the extracted I picture is Ii, the original P picture is Pi, j, and the I picture after the DCT coefficient reduction is Ii ′, The special P picture to be inserted is Ps. It is assumed that the original sequence is composed of GOP size “9”.
[0080]
The sequence of the moving image data 610 for normal reproduction is
In the sequence of IiBBPi, 1BBPi, 2BB Ii + 1BBPi + 1,1BBPi + 1,2BB Ii + 2..., P in the vicinity of Ii is defined as Pi, k (k ≦ 2).
[0081]
When Ii ′> TS, the picture insertion unit 605 inserts a P picture that satisfies Pi, k ≦ (2 × TS−Ii ′), if any. If not, a special P picture Ps (Ps ≦ (2 × TS-I ′)) is inserted.
[0082]
For example, a sequence (source image) of normal playback moving image data 610 as a source video is as shown in FIG.
I1BBP1,1BBP1,2BB I2BBP2,1BBP2,2BB I3BBP3,1BBP3,2BB I4BBP4,1 ...
In the case of I picture extraction,
I1 I2 I3 I4 I5 ...
It becomes.
[0083]
Therefore, the sequence after the DCT coefficient reduction is as shown in FIG.
I1'I2'I3'I4'I5 '...
It becomes.
[0084]
Here, if I2 ′, I4 ′> TS, P2,2 ≦ (2 × TS−I2 ′), and P4,1, P4,2> (2 × TS−I4 ′), the reduction is achieved. The picture insertion unit 605 inserts P2,2, Ps as the next sequence without thinning out the failed I2 'and I4'. Therefore, after the special P picture Ps is inserted by the frame reconstructing unit 603 (within the inter-frame forward prediction image insertion unit 603b), as shown in FIG.
I1'PsPs I2'PsPs P2,2PsPs I3'PsPs I4'PsPs PsPsPs I5 '...
It becomes.
[0085]
In each of the embodiments described above, the creation of trick play moving image data has been described as an example. However, the present invention is not limited to this. For example, the present invention can also be applied to a video server that stores MPEG moving image information on a disk and distributes the stream information to a network. In this case, trick-play moving image data is stored on the disc as it is and distributed by the video server, or trick-play moving image information is created in real time from the normal-play moving image data stored on the disc. However, any of the methods of distributing to a network is applicable.
[0086]
In general, when distributing to a network with limited bandwidth, it is most desirable that the playback rate is low and the image quality is high, from the viewpoint of effective use of resources and guarantee of video quality. It can be seen that the realization of trick play by this method is effective for such applications.
[0087]
The present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention at the stage of implementation. Further, the above embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some constituent requirements are deleted from all the constituent requirements shown in the embodiment, the problem described in the column of the problem to be solved by the invention can be solved, and the effect described in the column of the effect of the invention If (at least one of) is obtained, a configuration from which this configuration requirement is deleted can be extracted as an invention.
[0088]
【The invention's effect】
As described above in detail, according to the present invention, fixed-rate or variable-rate moving image data encoded by a moving image compression method using inter-frame prediction is used. It is possible to create trick-play moving image data at various speeds and playback rates, and to realize trick-play playback at a low bit rate, which is advantageous at the time of network transmission.
[0089]
In addition, according to the present invention, it is possible to realize a quality that does not deteriorate the original video as much as possible in terms of image quality, and to make high-speed trick play appear more smoothly.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an image processing apparatus according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing the principle of image compression by MPEG2.
FIG. 3 is a diagram showing a hierarchical structure of MPEG2 image data.
FIG. 4 is a view showing a configuration of moving image data padded in the first embodiment.
FIG. 5 is a view for explaining a specific example of the first embodiment;
FIG. 6 is a block diagram showing a configuration of an image processing apparatus according to a second embodiment of the present invention.
FIG. 7 is a view showing a definition example of a P picture in the vicinity of an I picture for which information reduction has failed in the second embodiment.
FIG. 8 is a diagram for explaining a specific example of the second embodiment;
FIG. 9 is a diagram for explaining a specific example of the third embodiment of the present invention.
[Explanation of symbols]
100, 600 ... Image processing apparatus
101, 601 ... extraction unit
102, 602 ... Intra-frame encoded information deletion unit (deletion means)
103, 603 ... Frame reconstruction unit
103a, 603a ... padding part
103b, 603b ... Inter-frame forward prediction image insertion unit
104,604 ... Moving picture attribute information setting unit
104a, 604a ... Playback rate setting section
104b, 604b... Buffer control information setting unit
110, 610 ... Normal reproduction moving image data
120, 620 ... Trick play video data
605 ... Picture insertion part (insertion means)

Claims

In an image processing method for transmitting moving image data encoded by performing moving image compression using inter-frame prediction at a fixed rate or a variable rate,
Extracting the intra-frame encoded image data in order from the side determined by the reproduction direction from the beginning or end of the moving image data;
Reducing the AC component of the DCT coefficient from the intra-frame encoded image data so that the extracted intra-frame encoded image data is equal to or less than a target size that satisfies the specified bit rate at the time of reproduction;
When the intra-frame encoded image data in which the information is reduced is not less than or equal to the target size, from among the inter-frame forward prediction image data in the vicinity of the intra-frame encoded image data in the moving image data The intra- frame forward-predicted image data having a size that compensates for the size of the intra-frame encoded image data exceeding the target size, and the intra-frame encoded image data is calculated from a size twice the target size. Extracting the inter-frame forward prediction image data having a size up to the value obtained by subtracting the size, and inserting it immediately after the intra-frame encoded image data;
The padding code is assigned to the image data so that the intra-frame encoded image data with the information reduced and the inserted inter-frame forward prediction image data satisfy the designated bit rate at the time of reproduction. Step to insert into,
Relative to the padding code inserted in video data header, and the step to set the bit rate reflecting the bit rate for the specified play,
A picture layer of each image data included in the moving image data includes buffer control information for appropriately starting reproduction and random access when reproducing the moving image data in which the padding code is inserted. an image processing method characterized by comprising the step of setting the header.

Insert interframe forward prediction image data having no motion prediction data between the intraframe encoded image data in the moving image data in which the padding code is inserted or between the inserted interframe forward prediction image data. The image processing method according to claim 1 , further comprising the step of:

In an image processing apparatus for transmitting moving image data encoded by performing moving image compression using inter-frame prediction at a fixed rate or a variable rate,
Extraction means for extracting intra-frame encoded image data in order from the side determined by the reproduction direction from the beginning or end of the moving image data;
Reduction means for reducing the AC component of the DCT coefficient from the intra-frame encoded image data so that the intra-frame encoded image data extracted by the extraction means is equal to or less than a target size that satisfies the designated bit rate at the time of reproduction. When,
When the intra-frame encoded image data whose information has been reduced by the reduction means does not become the target size or less, the inter-frame forward prediction image data in the vicinity of the intra-frame encoded image data in the moving image data The intra- frame forward prediction image data having a size that compensates for the amount of the intra-frame encoded image data exceeding the target size, and the intra-frame encoding is performed from a size twice the target size. Inserting means for extracting inter-frame forward prediction image data having a size up to a value obtained by subtracting the size of the image data and inserting it immediately after the intra-frame encoded image data;
Each of the intra-frame encoded image data whose information is reduced by the reducing means and the inter-frame forward prediction image data inserted by the inserting means satisfy the designated playback bit rate, respectively. Padding means for inserting a padding code into the image data;
A reproduction rate setting means padding code to the inserted moving image data header, to set the bit rate that reflects the bit rate during the specified reproduced by the padding means,
Each piece of image data included in the moving image data includes buffer control information for appropriately starting reproduction and random access when reproducing moving image data in which a padding code is inserted by the padding means. An image processing apparatus comprising: buffer control information setting means for setting in a header of a picture layer .