JP4117866B2

JP4117866B2 - Wavelet transform device and encoding / decoding device

Info

Publication number: JP4117866B2
Application number: JP24827699A
Authority: JP
Inventors: 啓行 ▲高▼橋
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-09-02
Filing date: 1999-09-02
Publication date: 2008-07-16
Anticipated expiration: 2019-09-02
Also published as: JP2001078189A

Description

【０００１】
【発明の属する技術分野】
本発明は、一般的にはウェーブレット変換を利用するシステム全般に係り、特に、画像データ等の圧縮・伸長システムに用いられるウェーブレット変換装置と符号化復号化装置に関する。
【０００２】
【従来の技術】
ウェーブレット変換は、周波数領域と時間領域を同時に表現できるという、フーリエ変換等には無い特長を有することで注目され、近年応用分野が広がりつつある。特に、データ圧縮ヘの応用は、大量のデータの蓄積及び伝送のために非常に有用である。例えば、文書のファクシミリ伝送、あるいはワールドワイドウエブのような画像の伝送に要する時間は、圧縮を使ってその画像の再生に必要とされるビット数を滅らすと、飛躍的に短縮される。
【０００３】
従来より、多くの様々なデ−タ圧縮手法が存在している。最も広く普及している圧縮方式としてＪＰＥＧ（Joint Photographic Experts Group）がある。ＪＰＥＧにおいては、入力シンボルまたは輝度データは量子化されてから出力符号語ヘ変換される。量子化は、データの重要な特徴量を保存する一方、重要でない特徴量を除去することを目的としている。量子化に先立ち、エネルギー集中のために変換が用いられるが、この変換として採用されているのがＤＣＴ（Discrete Cosine Transform）である。ところが、このＤＣＴを用いているＪＰＥＧに対して、さまざまな欠点が指摘されている。例えば、ブロックノイズやモスキートノイズが発生する問題である。画像信号処理の応用においては、これらの欠点を解消できる、効率的かつ高精度のデータ圧縮符号化方式を追求することに関心が集中している。その方式の中に、ウェーブレット（wavelet）処理方式がある。
【０００４】
２次元信号にウェーブレット変換を適用する場合には、水平方向低域通過型フィルタＨＬ（Horizontal Low）及び水平方向高域通過型フィルタＨＨ（Horizon−tal High）を使用して水平方向低域信号（Ｓ（Smooth）係数）及び水平方向高域信号（Ｄ(Detail)係数）に分離し、さらに各々のＳ係数及びＤ係数に対して垂直方向低域通過型フィルタＶＬ（Vertical Low）及び垂直方向高域通過型フィルタＶＨ（Vertical High）を使用して水平方向低域−垂直方向低域信号（ＳＳ係数）、水平方向低域−垂直方向高域信号（ＳＤ係数）、水平方向高域−垂直方向低域言号（ＤＳ係数）、及び水平方向高域−垂直方向高域信号（ＤＤ係数）に分離する。水平処理と垂直処理を１回行った出力をレべル１の出力と呼ぶ。また、上記の４種類の信号を周波数帯信号と呼ぶ。レべル２以上の出カを希望するのであれば、この処理をＳＳ係数に対して再帰的に行えばよい。レべル２ではＳＳ係数と、１ＳＤ係数及び２ＳＤ係数、１ＤＳ係数及び２ＤＳ係数、１ＤＤ係数及び２ＤＤ係数、の７つの周波数帯信号が得られる。最初に水平方向にフィルタを用い、次に垂直方向にフィルタを用いる場合について説明したが、この順序は逆でもよい。
【０００５】
ウェーブレット変換を利用する符号化復号化装置においては、以上の過程を経て得られた各周波数帯信号が符号化復号化部で圧縮される。圧縮は周波数帯信号毎にビット単位で行われる。ある周波数帯信号の、一番最初の画素のＭＳＢが処理の対象となる。この画素自体の状態と、周辺の画素の状態及び１つ上のレべルの状態が参照され、出カが決定される。次は、２番目の画素のＭＳＢが処理の対象となるのであるが、この際に一番最初に処理された画素の状態も参照される。以下、符号化されるべき領域に対しての一達の処理が終了すると、一番最初の画素の１つ下位の（ＭＳＢ―１）ビットが処理の対象となる。この際は同じビット深さの周辺画素の状態に加えて、ＭＳＢの状態も参照される。このようにして、符号化されるべき領域に対してＬＳＢまで符号化が行われる。復号化もほぼ同じ手順を経て行われる。
【０００６】
図３７は、従来のウェーブレット変換装置の一般的な構成であり、フレームメモリ４００、制御部４０１及びフィルタ４０２から構成される。フィルタ４０２にはどのような構成のものを用いてもよいが、ここでは低域通過型フィルタとしては２組のデータを用いて演算を行う２タップのフィルタを使用するものとする。高域通過型フィルタとしては、低域通過型フィルタの出力であるＳ係数のうち、現在の位置と１つ前及び１つ後の合計３組のデータを用いて演算を行う、６タップのフィルタを使用するものとする。
【０００７】
図３８に、上記フィルタを用いた場合のウェーブレット変換の処理の例を示す。（ａ）が水平方向の処理を示し、（ｂ）が垂直方向の処理を示す。同図（ａ）において、例えば００は０ライン目の０画素目のデータを意味し、１２は１ライン目の２画素目のデ−タを意味する（ライン、画素とも０番目から数えるものとする）。水平処理においては、同図（ａ）に示すように、水平方向低域通通型フィルタＨＬの０画素目の出力Ｓ００は、データ００と０１から求められ、１画素目の出力Ｓ０１はデータ０２と０８から求められる。また、水平方向高城通通型フィルタＨＨの０画素目の出力Ｈ００は、データ００の２つ前と１つ前のデータ（実在しない）、デ−タ００と０１、デ‐タ０２と０３から求められる。また、垂直処理においては、同図（ｂ）に示すように、垂直方向低域通過型フィルタＶＬの出力ＳＳ００は、データＳ００とＳ１０から求められる。垂直方向高域通過型フィルタＶＨの出力ＳＤ００は、データＳ００の２つ前と１つ前のデータ（実在しない）、データＳ００とＳ１０、データＳ２０とＳ３０から求められる。
【０００８】
図３９はウェーブレット変換が施される前のフレームメモリ上のデータを示している。このデータに対し、初めに水平方向の処理が施され、得られたＳ係数及びＤ係数が図４０のようなマッピングでフレームメモリに書き込まれる。図４０中、例えば１Ｓ００はレべル１のアドレス００のＳ係数を意味する。図４１は垂直処理を行った後の各係数を書き込む際のマッピングの例である。ここまでがレべル１の各係数の格納方法である。図４２はレべル２の水平方向の各係数の格納方法の例である。レべル２の処理は１ＳＳ係数に対してのみ行われるため、斜線で示した部分のデ‐タは用いられないことに注意されたい。次いで図４３に示すようなマッピングでレべル２の各係数が格納され、レべル２の処理が終了する。以下、所望のレべルの周波数帯信号が得られるまで順次処理が施される。
【０００９】
図４４に、従来の一般的なウェーブレット変換装置のタイミングチャートを示す。データのサイズを、画素方向サイズＸが１０２４、ライン方向サイズＹが１０２４、トー夕ル１ＭＢとし、レべル４までの処理を行う場合の処理時間について説明する。
【００１０】
時刻ｔ０からｔ１で、フレームメモリの全データに対してレべル１の処理を行い、時刻ｔ１からｔ２でレべル２の処理を行い、時刻ｔ２からｔ３でレべル３の処理を行い、時刻ｔ３からｔ４でレべル４の処理を行う。処理時間は、Ｘ＝Ｙ＝１０２４であるから、レべル１で１０２４×１０２４×４（４の内訳：画素方向の読み出し、画素方向の書き込み、ライン方向の読み出し、及びライン方向の書き込み）、レべル２で５１２×５１２×４、レべル３で２５６×２５６×４、そしてレべル４で１２８×１２８×４であり、トータルでは５４４０ｋクロック（１ｋ＝１０２４）となる。
【００１１】
ただし、この数値は、フレームメモリに対するライン方向の読み出し／書き込みのレイテンシが０の場合の最小の時間である。ＳＤＲＡＭ等の最近のメモリではバースト転送が行われるのが普通であり、画素方向のデータ転送は高速で行えるが、ライン方向のデータ転送はセンスアンプヘのプリチャージが必要となるためレイテンシ（遅れ）が発生する。これを回避するためにバンク切り替え等の手法が用いられることがあるが、ウェーブレット変換のように続けてライン方向のアクセスが発生するような場合には効果は期待できない。仮にレイテンシが２であった場合、処理時間は１０８８０ｋクロックに増加する。このレイテンシは前述の理由から回避するのは非常に困難であった。
【００１２】
符号化復号化装置においては、ウェーブレット変換の終了後、フレームメモリに書込まれた各周波数帯信号が符号化復号化器によって符号化される。画像信号は、隣接画素の相関、特に同一ビットプレーン内での相関が高いという特性を活かして圧縮率を上げている。このため、符号化の際には、あるまとまった領域のデータをビット単位（ある画素のデ‐タの、任意の１ビット）で扱っている。復号化は以上述べた動作のほぼ逆順で得られる。
【００１３】
なお、本発明に関連する符号化復号化装置、ウェーブレット変換装置、ウェーブレット変換のためのフィル夕に関するより詳細な情報は、特開平８−１３９９３５号公報などに見られる。また、符号化復号化器については、特開平９‐１２１１６８号公報に詳しい。更に、類似のウェーブレット変換装置に関する公知文献としては、特開平３‐２７６８７号公報、特開平５‐１６７９９７号公報、特開平５−１８３８８６号公報などがある。
【００１４】
【発明が解決しようとする課題】
従来、フレームメモリを介してウェーブレット変換を行う場合、前述のように、画素（水平）方向の処理は比較的高速に行うことができるが、ライン（垂直）方向の処理はフレームメモリのアクセス時にレイテンシが発生し、トータルの処理時間が非常に大きくなってしまうという問題があった。また、処理がレべル順で行われるため、高レべルの周波数帯信号が得られるまで時間がかかり、それまで符号化復号化処理を開始できないという問題があった。
【００１５】
よって、本発明の目的は、フレームメモリにＳＤＲＡＭのようなメモリが使用された場合にも、レイテンシの問題を解決し高速なウェーブレット変換処理が可能なウェーブレット変換装置を提供することにある。本発明のもう一つの目的は、全レベルの周波数帯信号をほぼ同時刻に得られる高速なウェーブレット変換装置を提供することにある。本発明の他の目的は、ウェーブレット変換を利用する高速な符号化復号化装置を提供することにある。
【００１６】
【課題を解決するための手段】
請求項１の発明は、フレームメモリからブロック単位でデータを読み込み、ｎレベル（ｎは３以上の整数）のウェーブレット変換処理を行い、得られたｎレベルのウェーブレット変換係数の周波数帯信号を前記フレームメモリに書き戻すウェーブレット変換装置であって、前記フレームメモリから読み込まれたデータを一時的に記憶するメモリと、少なくとも一つの第１演算要素及び複数の第２演算要素からなる演算部とを具備し、前記複数の第２演算要素は、前記フレームメモリから読み込まれた注目ブロックの周囲のブロックのデータからそれぞれレベル（ｎ−１）のウェーブレット変換係数中の低周波数帯信号を計算し、前記第１演算要素は、前記フレームメモリから読み込まれた前記注目ブロックのデータに対するｎレベルのウェーブレット変換処理を実行し、その際にレベルｎのウェーブレット変換処理は、該第１演算要素で得られたレベル（ｎ−１）の低周波数帯信号と前記複数の第２演算要素によって計算されたレベル（ｎ−１）の低周波数帯信号とを用いて行って、ｎレベルのウェーブレット変換係数の全周波数帯信号を計算し、前記第１演算要素のウェーブレット変換処理で得られたｎレベルの全周波数帯信号がブロック単位で前記フレームメモリに書き戻されることを特徴とする。
【００１７】
請求項２の発明は、請求項１記載のウェーブレット変換装置において、前記第１演算要素が各レベル対応のワークメモリを２面ずつ具備し、前記第２演算要素がワークメモリを２面具備し、前記フレームメモリの隣接する２ブロックのデータに対するウェーブレット変換処理が連続的に実行され、かつ、ウェーブレット変換処理と前記フレームメモリからのデータの読み込み及び前記フレームメモリへの周波数帯信号の書き戻が並行して行われることを特徴とする。
【００１８】
請求項３の発明は、請求項２記載のウェーブレット変換装置において、前記第１演算要素が、レベル１のウェーブレット変換処理に使用される第１フィルタとレベル２〜ｎのウェーブレット変換処理に使用される第２フィルタを具備するとともに各レベルの低周波数帯信号を計算するための手段を具備し、前記第１演算要素において、注目ブロックのデータに対するレベル１のウェーブレット変換処理とレベル２〜ｎのウェーブレット変換処理が並行して実行されることを特徴とする。
【００１９】
請求項４の発明は、請求項２記載のウェーブレット変換装置において、前記第１演算要素の各レベル対応のワークメモリが独立にアクセス可能な４個のメモリに分割されるとともに、前記第１演算要素がレベル１のウェーブレット変換処理に使用される第１フィルタとレベル２〜ｎのウェーブレット変換処理に使用される第２フィルタを具備し、ウェーブレット変換処理が前記２個のフィルタを使用して並列処理により実行されることを特徴とする。
【００２０】
請求項５の発明は、請求項１、２又は３記載のウェーブレット変換装置において、前記第１演算要素のウェーブレット変換処理によって得られた周波数帯信号の一部を一時的に記憶するためのバッファメモリをさらに具備し、このバッファメモリは外部からアクセス可能であることを特徴とする。
【００２１】
請求項６の発明は、フレームメモリと、このフレームメモリにアクセス可能な請求項１、２、３又は４記載のウェーブレット変換装置と、前記フレームメモリにアクセス可能な符号化復号化器とを具備する符号化復号化装置を特徴とする。
【００２２】
請求項７の発明は、フレームメモリと、このフレームメモリにアクセス可能な請求項５記載のウェーブレット変換装置と、前記フレームメモリ及び前記ウェーブレット変換装置のバッファメモリにアクセス可能な符号化復号化器とを具備する符号化復号化装置を特徴とする。
【００２３】
【発明の実施の形態】
以下、添付図面を参照し本発明の実施の形態について説明するが、説明を簡略にするため、添付図面中の複数の図面において同一部分又は対応部分には同一又は同様の参照番号を用いる。
【００２４】
なお、ウェーブレット変換の水平処理及び垂直処理のためのフィルタとして、２タップの低域通過型フィルタと、６タップの高域通過型フィルタをそれぞれ用いるものとする。また、ウェーブレット変換はレベル４まで行うものとする。すなわち、ｎ＝４とする。もちろん、レベル数はｎ＝３以上いくつに設定しても構わない。本発明は、レベル数が増えるほど大きな効果を発揮する。
【００２５】
図１は、本発明による符号化復号化装置の全体構成の一例を示す。符号化時には、フレームメモリ１００に外部からデータｄａｔａが書き込まれる。データがブロック単位で本発明によるウェーブレット変換装置１０１に読み込まれてブロック単位でウェーブレット変換処理が施され、得られた全ウェーブレット・レベルの周波数帯信号がブロック単位でフレームメモリ１００に書き戻される。この周波数帯信号を符号化復号化器１０２により符号化し、コードストリームｃｏｄｅとして外部に出力する。復号化時には、外部から入力されるコードストリームｃｏｄｅを符号化復号化器１０２で復号化して、ウェーブレット変換の各周波数帯信号をフレームメモリ１００上に復元する。これら周波数帯信号に対しウェーブレット変換装置１０１によって逆ウェーブレット変換が施され、元データがフレームメモリ１００上に復元され、それがｄａｔａとして外部に出力される。
【００２６】
本発明によるウェーブレット変換装置１０１は、ブロック単位でウェーブレット変換処理を行い、ブロック単位で全ウェーブレット・レベルの周波数帯信号をフレームメモリ１００に書き戻す。したがって、符号化復号化器１０２は、ウェーブレット変換処理を開始後すぐに符号化処理を開始し、ウェーブレット変換処理と並行して符号化処理を実行することができ、従来のように最高レベルの周波数帯信号が得られるまで符号化処理の開始を待たされることはない。したがって、後述のように本発明によるウェーブレット変換装置１０１が高速であることと相俟って、本発明の符号化復号化装置は従来よりはるかに高速な動作が可能である。
【００２７】
本発明による第１の実施例によれば、ウェーブレット変換装置１０１は図２に示すような内部構成を有する。このウェーブレット変換装置１０１は、演算部２００と、フレームメモリ１００から読み込まれたデータを一時的に記憶するためのカレントメモリ２０２及びキャッシュメモリ（ＣＭ１，ＣＭ２，ＣＭ３，ＣＭ４）２０３，２０４，２０５，２０６、装置全体のタイミング制御や装置内部及び外部とのデータ転送の制御などを行う主制御部２０７で構成される。演算部２００は、３×３個の演算要素（Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈ，Ｉ）２０１ａ，２０１ｂ，２０１ｃ，２０１ｄ，２０１ｅ，２０１ｆ，２０１ｇ，２０１ｈ，２０１ｉからなる。
【００２８】
本発明の第１の実施例によれば、演算要素（Ｅ）２０１ｅは図３に示すような構成とされる。図３に見られるように、演算要素（Ｅ）２０１ｅは、ＳＳ計算部２１０を含む制御部２１１、レベル１の水平処理及び垂直処理に使用されるフィルタ（１）２１３、レベル２，３，４の水平処理及び垂直処理に使用されるフィルタ（２）２１４を具備し、さらに、各レベルに対応したワークメモリを２面ずつ具備する。すなわち、レベル１対応のワークメモリとして、第０面のワークメモリ（Ｌ１＿０）２１６と第１面のワークメモリ（Ｌ１＿１）２１７、レベル２対応のワークメモリとして、第０面のワークメモリ（Ｌ２＿０）２１８と第１面のワークメモリ（Ｌ２＿１２１９）、レベル３対応のワークメモリとして、第０面のワークメモリ（Ｌ３＿０）２２０と第１面のワークメモリ（Ｌ３＿１）２２１、レベル４対応のワークメモリとして、第０面のワークメモリ（Ｌ４＿０）２２２と第１面のワークメモリ（Ｌ４＿１）２２３を備えている。制御部２１１は、演算要素（Ｅ）２０１ｅの内部のデータの転送、ＳＳ計算部２１０の動作タイミングの制御、主制御部２０７との間でデータの受け渡し等を行う部分である。ＳＳ計算部２１０は、レベル１，２，３のＳＳデータの計算を行う部分である。
【００２９】
本発明の第１の実施例によれば、レベル１対応のワークメモリ（Ｌ１＿０，Ｌ１＿１）２１６，２１７、キャッシュメモリ（ＣＭ１，ＣＭ３，ＣＭ４）２０３，２０５，２０６は、図４に示すように、ａ（画素方向）×ａ（ライン方向）のサイズを持つメモリである。キャッシュメモリ（ＣＭ２）２０４は図５に示すようなサイズのメモリである。レベル２対応のワ―クメモリ（Ｌ２＿０，Ｌ２＿１）２１８，２１９は、図６に示すように、ａ／２×ａ／２のサイズを持つメモリである。レベル３対応のワークメモリ（Ｌ３＿０、Ｌ３＿１）２２０，２２１は、図７に示すように、ａ／４×ａ／４のサイズである。レベル４対応のワークメモリ（Ｌ４＿０、Ｌ４＿１）２２２，２２３のサイズは、図８に示すように、画素方向、ライン方向共にａ／８である。図９はフレームメモリ１００のサイズを表す図であり、画素方向のサイズがＸ、ライン方向のサイズがＹである。図１０はカレントメモリ２０２のサイズを表す図であり、画素方向のサイズがＸ、ライン方向のサイズがａである。
【００３０】
本発明の第１の実施例によれば、演算要素（Ｅ）２０１ｅ以外の演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉはそれぞれ図１１に示すように、ＳＳ計算部２３０を含む制御部２３１と、ａ／８×ａ／８のサイズの２面のワークメモリ、すなわち第０面のワークメモリ（Ｌ４＿０）２３３及び第１面のワークメモリ（Ｌ４＿１）２３４から構成される。ＳＳ計算部２３０は、ウェーブレット・レベル３のＳＳデータの計算を行う部分である。制御部２３１は、演算要素内部のデータ転送の制御、ＳＳ計算部２３０の動作タイミングの制御、主制御部２０７との間でデータの受け渡し等を行う部分である。ＳＳ計算部２３０によって計算されたＳＳデータはワークメモリ（ｌ４＿０，Ｌ４＿１）２３３，２３４に格納される。
【００３１】
このような構成の本発明によるウェーブレット変換装置１０１は、ａ×ａのサイズのブロックを単位として処理を行う。この基本サイズの“ａ”はウェーブレット変換に用いるフィルタのタップ数と所望するウェーブレット・レべル数に基づいて決定される。ａが小さいほど、ウェーブレット変換装置１０１内部の各メモリのサイズを小さくすることができるというメリットがあるが、ａを決定する際には処理速度も考慮しなければならない。具体的には、以下で述べるような考え方で決定するのが良いであろう。
【００３２】
所望する最高のウェーブレット・レべルが４であり、かつ高域通過型フィルタが６タップであることから、レべル３のＳＳデータが６個あれば、レべル４のＳＳ、ＳＤ、ＤＳ、ＤＤデータを１個ずつ得られる。このための元データの最小サイズは４８×４８（１６×１６のサイズのブロックが３×３個）となる。すなわち、ａ＝１６が最小サイズである。以下、ａ＝１６として説明する。なお、仮に高域通過型フィルタが１０タップ、所望する最高レべルが５であれば、元データの最小サイズは１６０×１６０（３２×３２のサイズのブロックが５×５個）となり、ａ＝３２となる。
【００３３】
さて、ａ＝１６とした場合、Ｘ＝Ｙ＝１０２４であると、ウェーブレット変換装置１０１内部のメモリのワード数（ビット深さはフィルタの構成により異なる）は、次の通りである。まず、演算要素（Ｅ）２０１ｅにおいては、
ワークメモリ（Ｌ１＿０，Ｌ１＿１）で１６×１６×２＝５１２ワード
ワークメモリ（Ｌ２＿０，Ｌ２＿１）で８×８×２＝１２８ワード
ワークメモリ（Ｌ３＿０，Ｌ３＿１）で４×４×２＝３２ワード
ワークメモリ（Ｌ４＿０，Ｌ４＿１）で２×２×２＝８ワード
となる。それ以外の演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉのそれぞれにおいては、ワークメモリ（Ｌ４＿０，Ｌ４＿１）で２×２×２＝８ワードである。したがって、全演算部のワークメモリの合計ワード数は７４４ワードとなる。キャッシュメモリ（ＣＭ１，ＣＭ３，ＣＭ４）２０３，２０５，２０６で１６×１６×８＝７６８ワード、キャッシュメモリ（ＣＭ２）２０４で３２×６４−（１６×１６）＝１７９２ワード、したがってキャッシュメモリ全体で２５６０ワードとなる。カレントメモリ２０２は１０２４×１６＝１６３８４ワードである。よって、ウェーブレット変換装置１０１内部のトータルのメモリ・ワード数は１９６８８ワードとなり、１ｋ＝１０２４とすると１９ｋワード強と非常に小さいメモリ量で済む。また、この数値はフレームメモリ１００のライン方向のサイズとは無関係で、画素方向のサイズのみに依存する。
【００３４】
このウェーブレット変換装置１０１の動作の概略は次の通りである。フレームメモリ１００上のデータを１６×１６（ａ×ａ）のサイズのブロックに分割し、３×３個のブロック中の中央ブロックに対するウェーブレット変換処理を演算要素（Ｅ）２０１ｅで実行するとともに、周囲の８個のブロックに対するレベル３のＳＳデータの計算を演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉで行う。演算要素（Ｅ）２０１ｅは、レベル３までは対応ブロックの元データのみを用いてウェーブレット変換処理を行うことができるが、元データのサイズは１６×１６であるため、レベル３では２×２個のＳＳデータしか得られない（図１２に、１６×１６のサイズのブロック１個あたりのレベル３までの周波数帯信号のマッピングの一例を示した。この中に、ＳＳデータが２×２個含まれているのが解る）。６タップの高域通過型フィルタを用いる関係から、レベル４の変換のためにはレベル３のＳＳデータを少なくとも６×６個必要とする。演算要素（Ｅ）２０１ｅは、その不足したレベル３のＳＳデータとして、演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉの内部のワークメモリ（Ｌ４＿０又はＬ４＿１）２３３又は２３４に得られたＳＳデータを用いることにより、レベル４の変換処理を実行する。このようにして演算要素（Ｅ）２０１ｅの内部のワークメモリに得らる１ブロック分のレベル１からベル４までの周波数帯信号は、フレームメモリ１００の対応領域に書き戻される。
【００３５】
本実施例においては、このような演算要素（Ｅ）２０１ｅによるレベル４までのウェーブレット変換処理を、横方向に隣接した２ブロックに対して連続的に実行できるようにするため、前述のように、演算要素（Ｅ）２０１ｅは各レベル対応のワークメモリを２面ずつ持ち、それ以外の各演算要素も２×２のサイズのワークメモリを２面備えている。また、演算要素（Ｅ）２０１ｅは、レベル１の変換と他のレベルの変換を並列処理によって高速に実行するため、かつ、レベル２からレベル４の各レベルの変換を任意の順序で実行できるようにするため、前述のように、水平処理及び垂直処理用の２つの独立して動作するフィルタ２１３，２１４を備えるとともに、レベル１，２，３のＳＳデータの計算を行うＳＳ計算部２１０を備えている。
【００３６】
演算要素（Ｅ）２０１ｅの内部のワークメモリに得られた変換データは逐次、ブロック単位でフレームメモリ１００に書き戻されるので、他のブロックの処理のために各演算要素が必要とする一部の元データが書き換えられてしまう。このような書き換えられてしまう元データの一時記憶、及び、読み込んだデータの再読込みを回避するため、前述のようなカレントメモリ２０２とキャッシュメモリ（ＣＭ１〜ＣＭ４）２０３〜２０６が利用される。主制御部２０７は、３×３ブロック分の元データを（実在しないデータについてはミラーリングにより）常に準備するように動作し、また、それらメモリ及びフレームメモリ１００のデータを演算要素２０１ａ〜２０１ｉに対し振り分ける働きをする。この振り分け方は、以下に述べるように、フレームメモリ１００のどのブロックのデータを処理するかによって異なる。
【００３７】
図１３乃至図２１は、処理の進行の様子を説明するための図である。前述のように、演算要素（Ｅ）２０１ｅにはレベル対応のワークメモリが２面ずつ、その他の演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉの内部にはワークメモリが２面用意されている。図１３乃至図２１中の３×３の実線の格子は、第０面側のワークメモリを使用した処理に関連する演算要素（Ａ〜Ｉ）２０１ａ〜２０１ｉに対応した３×３個のブロックの位置を示し、３×３の点線の格子は、第１面側のワークメモリを使用した処理の際の演算要素（Ａ〜Ｉ）２０１ａ〜２０１ｉに対応したブロックの位置を示している。また、図１３乃至図２１において、０＿０，１＿０などはブロック名であり、ブロック名の前側の数字はライン方向（縦方向）のブロック・アドレス、後側の数字は画素方向（横方向）のブロック・アドレスを示している。
【００３８】
図２２は、演算要素（Ａ〜Ｉ）２０１ａ〜２０１ｉに必要な元データのキャッシュのためにキャッシュメモリ（ＣＭ１〜ＣＭ４）２０３〜２０６がどのようにの割り当てられるかを示している。なお、Ａ０〜Ｉ０は演算要素（Ａ〜Ｉ）２０１ａ〜２０１ｉが第０面のワークメモリを使用する場合に必要となるデータを意味し、Ａ１〜Ｉ１は演算要素（Ａ〜Ｉ）２０１ａ〜２０１ｉが第１面のワークメモリを使用する場合に必要なデータを意味する。
【００３９】
図２３乃至図２５は、ウェーブレット変換装置１０１のタイミングチャートである。図２３乃至図２５において、例えば０＿０などのブロック名のみが記してある部分あるいはＲが記してある部分はフレームメモリ１００、カレントメモリ２０２又はキャッシュメモリ２０３〜２０６からの読み込み（ＲＥＡＤ）を表し、ブロック名の上にＷが記して部分は書き込み（ＷＲＩＴＥ）を表し、ＯＷが記してある部分は上書き（ＯＶＥＲＷＲＩＴＥ）を表している。以下、図１３乃至図２１とタイミングチャートを参照しながら順を追って動作を説明する。
【００４０】
《図１３》最初のブロック行の左端の２ブロックの処理である。図２３の時刻ｔ０からｔ２の間に、フレームメモリ１００から０＿０ブロックのデータが読み込まれて演算要素（Ｅ，Ｄ）２０１ｅ，２０１ｄに送られ、かつカレントメモリ２０２に書き込まれる。演算要素（Ｅ）２０１ｅにおいては、その０−０ブロックのデータをワークメモリ（Ｌ１＿０）２１６に読み込む。演算要素（Ｄ）２０１ｄにおいては、ＳＳ計算部２３０で０−０ブロックのデータを用いてレベル３のＳＳデータを計算し、得られた２×２のＳＳデータをワークメモリ（４Ｌ＿１）２３４に格納する。なお、タイミングチャートにおいて、「Ａ０〜Ｉ０」は元データが演算要素の第０面のワークメモリに書き込まれるか、計算したＳＳデータが第０面のワークメモリに書き込まれるタイミングを表し、「Ａ１〜Ｉ１」は元データが演算要素の第１面のワークメモリに書き込まれるか、計算したＳＳデータが第１面のワークメモリに書き込まれるタイミングを表している。
【００４１】
次に、フレームメモリから０＿１ブロックのデータが読み込まれ、演算要素
（Ｆ）２０１ｆ及び演算要素（Ｅ）２０１ｅへ送られ、かつ、カレントメモリ２０２及びキャッシュメモリ（ＣＭ２）２０４に書き込まれる。演算要素（Ｅ）２０１ｅでは、そのデータをワークメモリ（Ｌ１＿１）２１７に読み込む。演算要素（Ｆ）２０１ｆでは、ＳＳ計算部２３０で０＿１データからレベル３のＳＳデータを計算し、それをワークメモリ（Ｌ４＿０）２３３に記憶する。
【００４２】
次に、フレームメモリ１００から１＿０ブロックのデータが読み込まれ、演算要素（Ｇ）２０１ｇ及び演算要素（Ｈ）２０１ｈに送られる。演算要素（Ｇ）２０１ｇではレベル３のＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｈ）２０１ｈではレベル３のデータを計算してワークメモリ（Ｌ４＿０）２３３に格納する。
【００４３】
次に、フレームメモリ１００から１＿１ブロックのデータが読み込まれ、演算要素（Ｉ）２０１ｉと演算要素（Ｈ）２０１ｈへ送られ、かつ、キャッシュメモリ（ＣＭ２）２０４に書き込まれる。演算要素（Ｉ）２０１ｉではレベル３のＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納し、また演算要素（Ｈ）２０１ｈではＳＳデータを計算してワークメモリ（Ｌ１＿１）２３４に格納する。
【００４４】
タイミングチャートには明示されていないが、以上の４ブロックのデータの読み込みと並行して、主制御部２０７の制御により、演算要素（Ａ〜Ｄ，Ｇ）２０１ａ〜２０１ｄ，２０１ｇの第０面のワークメモリ（Ｌ１＿０）２３３と、演算要素（Ａ，Ｂ）２０１ａ，２０１ｂの第１面のワークメモリ（Ｌ１＿１）２３４に、実在しない対応ブロックのミラーリング・データが書き込まれる。
【００４５】
このようにして、０＿０ブロック及び０＿１ブロックの処理のためのデータの用意が完了すると、まず演算要素（Ｅ）２０１ｅが第０面のワークメモリを使用して０＿０ブロックに対するウェーブレット変換処理を開始する。図２３における「Ｃａｌ＿０」がそのタイミングを表しているが処理の詳細は後述する。
【００４６】
０＿０ブロックのウェーブレット変換が終了すると、演算要素（Ｅ）２０１ｅの第０面のワークメモリ（Ｌ１＿０，Ｌ２＿０，Ｌ３＿０，Ｌ４＿０）２１６，２１８，２２０，２２２に得られたデータが直ちにフレームメモリ１００に書き戻される。それらワークメモリと、他の演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉ内の第０面のワークメモリ（Ｌ４＿０）２３３のデータは破棄される。
【００４７】
０＿０ブロックのウェーブレット変換と並行して、フレームメモリ１００から０＿２ブロックのデー夕が読み込まれ、演算要素（Ｆ）２０１ｆへ送られるとともに、カレントメモリ２０２及びキャッシュメモリ（ＣＭ２）２０４に書き込まれる。演算要素（Ｆ）２０１ｆでは、ＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納する。
【００４８】
続いて、フレームメモリ１００から１＿２ブロックのデータが読み込まれ、演算要素（Ｉ）２０１ｉへ送られ、同演算要素（Ｉ）２０１ｉでＳＳデータが計算されてそのワークメモリ（Ｌ４＿１）２３４に書き込まれ、同時に、１＿２ブロックのデータはキャッシュメモリ（ＣＭ２）２０４に書込まれる。図示されていなが、０＿２ブロックの読み込み時に、演算要素（Ｃ）２０１ｃに対応する実在しないブロックのミラーリング・データが、同演算要素に与えられてＳＳデータが計算され、その第１面のワークメモリ（Ｌ４＿１）２３４に格納される。
【００４９】
そして、０＿０ブロックに対するウェーブレット変換が終了した時点で、演算要素（Ｅ）２０１ｅは第１面のワークメモリ（Ｌ１＿１，Ｌ２＿１，Ｌ３＿１，Ｌ４＿１）２１７，２１９，２２１，２２３を使用し、０＿１ブロックに対するウェーブレット変換を開始する。図２３中の「Ｃａｌ＿１」がそのタイミングを示している。０＿１ブロックのウェーブレット変換が終了した時点で、演算要素（Ｅ）２０１ｅの第１面のワークメモリ（Ｌ１＿１，Ｌ２＿１，Ｌ３＿１，Ｌ４＿１）２１７，２１９，２２１，２２３に得られたデータが直ちにフレームメモリ１００に書き戻される。他の演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉ内の第１面のワークメモリ（Ｌ４＿１）２３４のデータは破棄される。
【００５０】
また、０＿１ブロックのウェーブレット変換と並行して、次の０＿２ブロックのウェーブレット変換のために、０＿１ブロックと１＿１ブロックのデータがキャッシュメモリ（ＣＭ２）２０４より読み出されて、演算要素（Ｄ，Ｇ）２０１ｄ，２０１ｇへそれぞれ転送され、また、０＿１ブロックのミラーリング・データが演算要素（Ａ）２０１ａへ転送される。演算要素（Ｄ）２０１ｄは０＿１ブロックのデータからレベル３のＳＳデータを計算して内部のワークメモリ（Ｌ４＿０）２３４に格納し、演算要素（Ｇ）２０１ｇは１＿１ブロックのデータからＳＳデータを計算して内部のワークメモリ（Ｌ４＿０）２３４に格納する。演算要素（Ａ）２０１ａはミラーリング・データのレベル３のＳＳデータを計算して、そのワークメモリＬ４＿０）２３４に格納する。
【００５１】
０＿１ブロックのウェーブレット変換が終了すると、０＿２ブロック及び０＿３ブロックのウェーブレット変換のために、０＿２ブロックと１＿２ブロックのデータがキャッシュメモリ（ＣＭ２）２０４より読み出されて演算要素（Ｅ，Ｄ）２０１ｅ，２０１ｄと演算要素（Ｈ，Ｇ）２０１ｈ，２０１ｇへそれぞれ転送される。また、０＿２ブロックのミラーリング・データが演算要素（Ａ，Ｂ）２０１ａ，２０１ｂへ転送される。演算要素（Ｅ）２０１ｅは０＿２ブロックのデータをワークメモリ（Ｌ１＿０）２１６に読み込み、演算要素（Ｄ）２０１ｄは０＿２ブロックのデータからＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｈ）２０１ｈは１＿２ブロックのデータからＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納し、演算要素（Ｇ）２０１ｇは１＿２ブロックのＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納する。演算要素（Ｂ）２０１ｂはミラーリング・データのＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納し、また、演算要素（Ａ）２０１ａはミラーリング・データのＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納する。ここまでが時刻ｔ３の処理であり、フレームメモリ１００上に０＿０ブロックと０＿１ブロックのレベル４までのウェーブレット変換データが得られた。
【００５２】
《図１４》２ブロック右の０＿２ブロック、０＿３ブロックの処理である。既に処理が終了し、フレームメモリ１００のデータが書き換えられているブロックは図１４中に網掛けして示されている（図１５乃至図２１においても同様である）。
【００５３】
時刻ｔ３から、フレームメモリ１００より０＿３ブロックのデータが読み出され、演算要素（Ｅ，Ｆ）２０１ｅ，２０１ｆへ転送されるとともに、キャッシュメモリ（ＣＭ２）２０４及びカレントメモリ２０２に書き込まれる。０＿３ブロックのミラーリング・データも演算要素（Ｂ，Ｃ）２０１ｂ，２０１ｃへ転送される。演算要素（Ｃ，Ｆ）２０１ｃ，２０１ｆは入力データから計算したＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納し、演算要素（Ｅ）２０１ｅは入力データをワークメモリ（Ｌ１＿１）２１７に格納し、演算要素（Ｂ）２０１ｂは入力データから計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。続いて、１＿３ブロックのデータがフレームメモリ１００より読み込まれ、演算要素（Ｈ，Ｉ）ｈ，ｉへ転送されるとともにキャッシュメモリ（ＣＭ２）２０４に書き込まれる。演算要素（Ｈ）２０１ｈは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｉ）２０１ｉは計算したＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納する。
【００５４】
この時点で、０＿２ブロックのウェーブレット変換が開始する。それと並行して、フレームメモリ１００から０＿４ブロックと１＿４ブロックのデータが読み込まれ、また、０＿４ブロックのミラーリング・データの演算要素（Ｃ）２０１ｃへの転送も行われる。
【００５５】
０＿２ブロックのウェーブレット変換が終了すると、０＿３ブロックのウェーブレット変換が開始し、また０＿２ブロックの変換データがフレームメモリ１００に書き戻される。０＿３ブロックの処理と並行して、次のブロックの処理のために、０＿１ブロックの処理の場合と同様なデータ転送が行われる。時刻ｔ４までに０＿３ブロックの変換データのフレームメモリ１００への書き戻しが終了する。時刻ｔ４からｔ５の間に、０＿４ブロックと０＿５ブロックが同様に処理される。
【００５６】
《図１５》最初のブロック行の右端の２ブロックの処理である。動作は図１３及び図１４とほぼ同様であるので、詳細は省略する。ただし、図２４に示すタイミングチャートの時刻ｔ７からｔ８に示すように、０＿６３ブロックの処理のための演算要素（Ｆ，Ｉ）２０１ｆ，２０１ｉに対応したデータは実在せず、そのミラー処理が行われるので、フレームメモリ１００から読み込まれるデータは少ない。
【００５７】
《図１６》次のブロック行の先頭の２ブロックの処理である。これは図２４のタイミングチャートでは時刻ｔ８からｔ９の期間に対応する。フレームメモリ１００から１＿０ブロックのデータが読み込まれ、演算要素（Ｅ）２０１ｅのワークメモリ（Ｌ１＿０）２１６及びキャッシュメモリ（ＣＭ３）２０５に書き込まれ、また、演算要素（Ｄ）２０１ｄへも送られてＳＳデータが計算され、そのワークメモリ（Ｌ４＿０）２３３に格納される。これと並行して、カレントメモリ２０２から０＿０ブロックのデータが読み出されて演算要素（Ａ，Ｂ）２０１ａ，２０１ｂへ送られてＳＳデータが計算され、演算要素（Ａ）２０１ａはＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｂ）２０１ｂはＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納する。また、０＿０ブロックのミラーリング・データも演算要素（Ａ）２０１ａへ送られてＳＳデータが計算され、そのワークメモリ（Ｌ４＿０）２３３に格納される。
【００５８】
次に、フレームメモリ１００より１＿１ブロックのデータが読み込まれて演算要素（Ｅ，Ｆ）２０１ｅ，２０１ｆへ送られるとともに、キャッシュメモリ（ＣＭ２，ＣＭ４）２０４，２０６へ書き込まれる。演算要素（Ｅ）２０１ｅはそのデータをワークメモリ（Ｌ１＿１）２１７に格納し、演算要素（Ｆ）２０１ｆはそのデータのＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納する。これと並行して、カレントメモリ２０２より０＿１ブロックのデータが読み出されてキャッシュメモリ（ＣＭ２）２０４に書き込まれ、かつ、演算要素（Ｂ，Ｃ）２０１ｂ，２０１ｃへ送られる。演算要素（Ｂ）２０１ｂはＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｃ）２０１ｃはＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納する。
【００５９】
次に、フレームメモリ１００より２＿０ブロックのデータが読み出されて演算要素（Ｇ，Ｈ）２０１ｇ，２０１ｈへ送られるとともにキャッシュメモリ（ＣＭ２）２０４に書き込まれ、また、カレントメモリ２０２より０＿２ブロックのデータが読み出されて演算要素（Ｃ）２０１ｃへ送られるとともにキャッシュメモリ（ＣＭ２）２０４に書き込まれる。演算要素（Ｃ，Ｇ）２０１ｃ，２０１ｇはそれぞれ入力データのＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｈ）２０１ｈは入力データのＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納する。演算要素（Ｇ）２０１ｇには２＿０ブロックのミラーリング・データも送られ、そのＳＳデータがワークメモリ（Ｌ４＿０）２３３に格納される。
【００６０】
次に、フレームメモリ１００より２＿１ブロックのデータが読み込まれ、キャッシュメモリ（ＣＭ２）２０４に書き込まれるとともに、演算要素（Ｈ，Ｉ）２０１ｈ，２０１ｉへ送られる。また、キャッシュメモリ（ＣＭ３）２０５より１＿０ブロックのデータが読み出されてカレントメモリ２０２に上書きされるとともに、そのミラーリング・データが演算要素（Ｄ）２０１ｄへ送られる。演算要素（Ｄ，Ｉ）２０１ｄ，２０１ｉはそれぞれの入力データのＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納し、演算要素（Ｈ）２０１ｈはその入力データのＳＳデータを計算しワークメモリ（Ｌ４＿１）２３４に格納する。
【００６１】
この時点から、演算要素（Ｅ）２０１ｅは、第０面のワークメモリ（Ｌ１＿０，Ｌ２＿０，Ｌ３＿０，Ｌ４＿０）２１６，２１８，２２０，２２２を使用して１＿０ブロックのウェーブレット変換を開始する。これと並行して、フレームメモリ１００より１＿２ブロックのデータが読み込まれてキャッシュメモリ（ＣＭ２）２０４及びカレントメモリ２０２に書き込まれるともに、演算要素（Ｆ）２０１ｆへ送られる。演算要素（Ｆ）２０１ｆは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。続いて、フレームメモリ１００より２＿２ブロックのデータが読み込まれてキャッシュメモリ（ＣＭ２）２０４に書き込まれるとともに演算要素（Ｉ）２０１ｉへ送られ、また、キャッシュメモリ（ＣＭ４）２０６より１＿１ブロックのデータが読み出されてカレントメモリ２０２に上書きされる。演算要素（Ｉ）２０１ｉは入力データのＳＳデータを計算してワークメモリ（Ｌ４＿１）２３４に格納する。
【００６２】
１＿０ブロックのウェーブレット変換が終了すると、その変換データがフレームメモリ１００へ書き戻されるとともに、演算要素（Ｅ）２０１ｅで第１面のワークメモリ（Ｌ１＿１，Ｌ２＿１，Ｌ３＿１，Ｌ４＿１）２１７，２１９，２２１，２２３を使用して１＿１ブロックのウェーブレット変換を開始する。この変換処理と並行して、次の１＿２ブロックのための準備が行われる。すなわち、キャッシュメモリ（ＣＭ２）２０４より０＿１ブロック，１＿１ブロック，２＿１ブロックが順次読み出され、演算要素（Ａ，Ｄ，Ｇ）２０１ａ，２０１ｄ，２０１ｇへそれぞれ送られる。これら演算要素は、それぞれの入力データのてＳＳデータを計算してワークメモリ（Ｌ４＿０）２３３に格納する。そして、１＿１ブロックの変換データがフレームメモリ１００に書き戻されることにより、１＿０ブロックと１＿１ブロックに対する処理が完了する。
【００６３】
《図１７》図１６の位置から２ブロック分右側に移動した位置で処理である。これは図２５のタイミングテャートでは時刻ｔ９からｔ１０に相当する。この処理パターンが全体の処理の大部分（約８８％）を占める。
【００６４】
まず、キャッシュメモリ（ＣＭ２）２０４より０＿２ブロックのデータが読み出されて演算要素（Ｂ，Ａ）２０１ｂ，２０１ａへ送られ、演算要素（Ｂ）２０１ｂでは計算したＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納し、演算要素（Ａ）２０１ａでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。次に、キャッシュメモリ（ＣＭ２）２０４より１＿２ブロックのデータが読み出され、演算要素（Ｅ）２０１ｅのワークメモリ（Ｌ１＿０）２１６に書き込まれるとともに、演算要素（Ｄ）２０１ｄへ送られる。この演算要素（Ｄ）２０１ｄでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。次に、キャッシュメモリ（ＣＭ２）２０４より２＿２ブロックのデータが読み出されて、演算要素（Ｇ，Ｈ）２０１ｇ，２０１ｈへ送られ、演算要素（Ｈ）２０１ｈでは計算したＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納し、演算要素（Ｇ）２０１ｇでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。次に、カレントメモリ２０２より０＿３ブロックのデータが読み出され、キャッシュメモリ（ＣＭ２）２０４に書き込まれるとともに演算要素（Ｂ，Ｃ）２０１ｂ，２０１ｃに送られ、演算要素（Ｂ）２０１ｂでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納し、演算要素（Ｃ）２０１ｃでは計算したＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納する。
【００６５】
次に、フレームメモリ１００より１＿３ブロックのデータが読み込まれ、カレントメモリ２０２、キャッシュメモリ（ＣＭ２）２０４及び演算要素（Ｅ）２０１ｅのワークメモリ（Ｌ１＿１）２１７に書き込まれるとともに、演算要素（Ｆ）２０１ｆへ送られる。演算要素（Ｆ）２０１ｆでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。
【００６６】
次にフレームメモリ１００より２＿３ブロックが読み込まれて、キャッシュメモリ（ＣＭ２）２０４に書き込まれるとともに演算要素（Ｈ，Ｉ）２０１ｈ，２０１ｉへ送られる。演算要素（Ｉ）２０１ｉでは計算したＳＳデータをワークメモリ（Ｌ４＿０）２３３に格納し、演算要素（Ｈ）２０１ｈでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。
【００６７】
このような処理が終わった時点から、演算要素（Ｅ）２０１ｅは第０面のワークメモリ（Ｌ１＿０，Ｌ２＿０，Ｌ３＿０，Ｌ４＿０）２１６，２１８，２２０，２２２を使用し、１＿２ブロックのウェーブレット変換を開始する。このウェーブレット変換と並行して、フレームメモリ１００より１＿４ブロックのデータが読み出され、キャッシュメモリ（ＣＭ２，ＣＭ３）２０４，２０５に書き込まれるとともに、演算要素（Ｆ）２０１ｆへ送られてＳＳデータが計算され、そのワークメモリ（Ｌ４＿１）２３４に格納される。次にフレームメモリ１００より２＿４ブロックのデータが読み込まれ、演算要素（Ｉ）２０１ｉへ送られるとともにキャッシュメモリ（ＣＭ２）２０４に書き込まれる。演算要素（Ｉ）２０１ｉでは計算したＳＳデータをワークメモリ（Ｌ４＿１）２３４に格納する。
【００６８】
そして、１＿２ブロックに対するウェーブレット変換が終了し、その変換データがフレームメモリ１００に書き戻されるが、これと並行して、カレントメモリ２０２より０＿４ブロックのデータが読み出され、キャッシュメモリ（ＣＭ２）２０４に書き込まれるとともに演算要素（Ｃ）２０１ｃへ送られ、そのＳＳデータが計算されてワークメモリ（Ｌ４＿１）２３４に格納される。続いて、キャッシュメモリ（ＣＭ３）２０５より１＿４データが読み出されてカレントメモリ２０２に書き込まれる。
【００６９】
演算要素（Ｅ）２０１ｅが第１面のワークメモリ（Ｌ１＿１，Ｌ２＿１，Ｌ３＿１，Ｌ４＿１）２１７，２１９，２２１，２２３を使用して１＿３ブロックに対するウェーブレット変換を開始するが、この変換処理と並行して次の１＿４ブロックに対する処理の準備のための操作が行われる。すなわち、キャッシュメモリ（ＣＭ２）２０４より０＿３ブロック、１＿３ブロック、２＿３ブロックのデータが順次読み出され、演算要素（Ａ，Ｄ，Ｇ）２０１ａ，２０１ｄ，２０１ｇへそれぞれ送られてＳＳデータが計算される。計算されたＳＳデータは演算要素（Ａ，Ｄ，Ｇ）２０１ａ，２０１ｄ，２０１ｇのワークメモリ（Ｌ４＿０）２３３にそれぞれ格納される。そして、１＿３ブロックに対するウェーブレット変換処理が終了し、その変換データがフレームメモリ１００に書き戻される。
【００７０】
《図１８》２番目のブロック行の右端の２ブロックに対する処理である。この場合の動作は、ここまでの説明から明らかであろうから、説明を割愛する。
【００７１】
以上の図１６乃至図１８に対応した処理が、最終ブロック行の１つ前のブロック行まで繰り返される。
【００７２】
《図１９》最終ブロック行の左端の２ブロックに対する処理である。この場合の動作は図１３の場合とほぼ同じであるので、その説明及びタイミングチャートは割愛する。
【００７３】
《図２０》次の２ブロックに対する処理の様子を示している。この場合の動作は図１４の場合とほぼ同じであるので、その説明及びタイミングチャートは割愛する。
【００７４】
《図２１》右端の２ブロックに対する処理であるが、この場合の動作は図１５の場合とほぼ同じであるので、その説明とタイミングチャートは割愛する。
【００７５】
図２６は、演算要素（Ｅ）２０１ｅにおけるウェーブレット変換処理のタイミングチャートである。時刻ｔ０からｔ１の期間に、フィルタ（１）２１３を利用して、ワークメモリ（Ｌ１＿０又はＬ１＿１）２１６又は２１７に書き込まれたデータに対するレべル１の水平処理が行われる。この水平処理と並行して、ＳＳ計算部２１０を利用してレベル１，２，３のＳＳデータが計算され、得られたレベル１，２，３のＳＳデータはワークメモリ（Ｌ２＿０又はＬ２＿１）２１８又は２１９，ワークメモリ（Ｌ３＿０又はＬ３＿１）２２０又は２２１，ワークメモリ（Ｌ４＿０又はＬ４＿１）２２２又は２２３にそれぞれ書き込まれる。
【００７６】
レベル１のＳＳデータは、水平方向に隣接する２個の元データの平均値を計算してＳデータを求め、次に垂直方向に隣接する２個のＳデータの平均値を計算することによって得られる。すなわち、２×２個の元データの平均値を計算することになる。レベル２のＳＳデータは、水平方向に隣接する２個のレベル１のＳＳデータの平均値を計算してＳデータを求め、次に、垂直方向に隣接する２個のＳデータの平均値を計算することによって得られる。すなわち、４×４個の元データの平均値を計算することになる。レベル３のＳＳデータは、水平方向に隣接する２個のレベル２のＳＳデータの平均値を計算してＳデータを求め、次に、垂直方向に隣接する２個のＳデータの平均値を計算することによって得られる。すなわち、８×８個の元データの平均値を計算することになる。１６×１６のサイズのブロックの場合、レベル３のＳＳデータが２×２個得られる。
【００７７】
レベル１の水平及び垂直処理のためのフィルタ（１）２１３とは別に、レベル２，３，４の水平及び垂直処理のためのフィルタ（２）２１４が用意されているため、レベル２，３，４の変換は、そのためのＳＳデータが用意できた時点から、任意の順番で、レベル１の変換と並行して行うことができる。
【００７８】
図２６のタイミングチャートでは、時刻ｔ１から、レベル１の垂直処理が開始し、同時にレベル２の水平処理が開始する。レベル２の水平処理は、フィルタ（２）２１４を使用して、ワークメモリ（Ｌ２＿０又はＬ２＿１）２１８又は２１９にあるレベル１のＳＳデータに対して実行される。得られたＳデータ及びＤデータはワークメモリ（Ｌ２＿０又はＬ２＿１）２１８又は２１９に書き戻される。次に、このＳデータ及びＤデータに対し、フィルタ（２）２１４を用いて垂直処理が実行され、レベル２の変換データがワークメモリ（Ｌ２＿０又はＬ２＿１）２１８又は２１９に得られる。
【００７９】
時刻ｔ２から、フィルタ（２）２１４を利用して、ワークメモリ（Ｌ３＿０又はＬ３＿１）２２０又は２２１にあるレベル２のＳＳデータに対する水平処理、垂直処理が順に実行され、レベル３の変換データがワークメモリ（Ｌ３＿０又はＬ３＿１）２２０又は２２１に得られる。時刻ｔ３から、演算要素（Ｅ）２０１ｅの内部のワークメモリ（Ｌ４＿０又はＬ４＿１）２２２又は２２３にあるレベル３のＳＳデータと、他の演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）２０１ａ〜２０１ｄ，２０１ｆ〜２０１ｉの内部の（Ｌ４＿０又はＬ４＿１）２３３又は２３４に得られているＳＳデータ（全体で６×６個のＳＳデータ）に対し、フィルタ（２）２１４を用いて水平処理と垂直処理が順次実行され、レベル４の変換データが演算要素（Ｅ）２０１ｅのワークメモリ（Ｌ４＿０又はＬ４＿１）２２２又は２２３に得られる。
【００８０】
レベル１はデータ数が多いため、レベル２，３，４の処理に要する時間を合計しても、レベル１の水平処理もしくは垂直処理に要する時間より少ないため、レベル１の垂直処理が終了する時刻ｔ５より前の時刻ｔ４でレベル２，３，４の変換は完了する。
【００８１】
図２７は、フレームメモリ１００を処理の内容が同じ部分毎に分割し、それぞれの部分に１から１５の識別番号を付けたものである。図１３乃至２５を用いて説明したように、フレームメモリ１００に対する読み出し／書き込みに関しては、１と１３の部分、２と１４の部分、３と１５の部分でほぼ同様の処理が行われる。同様に、４、７、１０の部分、５、８、１１の部分、６、９、１２の部分もほぼ同様の処理が行われる。一方、ウェーブレット変換はこれとは異なり、処理されるブロック数が各部分で異なる。１、３、１３、１５の部分では、演算要素の第０面（又は第１面）のワークメモリを利用して処理されるブロック数が２×２ブロックで、第１面（又は第０面）を利用して処理されるブロック数は３×２ブロックとなる。２、１４の部分では、いずれも処理されるブロック数は３×２ブロックである。４、７、１０、６、９、１２の部分では、第０面（又は第１面）のワークメモリを利用して処理されるブロック数が２×３ブロックで、第１面（又は第０面）のワークメモリを利用して処理されるブロック数が３×３ブロックとなる。５、８、１１の部分では処理されるブロック数はいずれも３×３ブロックとなる。
【００８２】
そこで、簡単のため、またタイミングチャートとの比較を容易にするため、１６×１６のサイズのブロックを時間単位として処理時間を計算する。ウェーブレット変換は、図１３乃至図２１に示す３×３の実線格子又は点線格子の中央のブロックのデータを対象にして行われるため、対象外のブロックに対しては各メモリはフリーとなり、ウェーブレット変換と並行してデータをフレームメモリに書込む、あるいは予め読み出しておくことができる。以上を考慮して、図２７における１から１５の各部分での処理時間を求めると、１の部分は１０ブロック、２の部分は１２ブロック、３の部分は８ブロック、４の部分は１０ブロック、５の部分は２０ブロック、６の部分は８ブロック、７の部分は１０ブロック、８の部分は２０ブロック、９の部分は８ブロック、１０の部分は１０ブロック、１１の部分は２０ブロック、１２の部分は８ブロック、１３の部分は１０ブロック、１４の部分は１２ブロック、１５の部分は８ブロックとなる。処理時間の合計は（Ｘ＝Ｙ＝１０２４としたとき）、最上段と最下段のブロックでは１０＋１２×３０＋８＝３７８ブロック、それ以外では（１０＋２０×３０＋８）×６２＝３８３１６ブロックとなり、合計で３９０７２ブロックとなる。
【００８３】
これをクロック数に換算すると、１ブロックのサイズが１６×１６であるから、９７６８ｋクロック（１ｋ＝１０２４）となる。このクロック数は、ウェーブレット変換とフレームメモリに対する読み出し／書き込みのスピードを同一とした場合であり、通常は内部動作は容易に高速化が可能であるから、この数値はワーストケースである。例えば、ウェーブレット変換に最も長い時間を要する５、８、１１の部分（８の部分が全体の８８％を占める）のウェーブレット変換に要する時間が半分になったとすると、処理時間は（２０−９）＋９／２＝１５．５ブロックとなり、全体での処理時間は３０７０２ブロックとなる。これはクロック数に換算すると７６７５．５ｋクロックとなり、ワーストケースに比較して約７９％の時間で全ての処理を終了することができる。
【００８４】
従来技術で問題となっていたライン方向のメモリアクセス時に生じるレイテンシは、本発明ではブロック単位でデータを転送をすることで回避することができる。具体的には、読み出し／書き込みは全て画素方向のデータ転送となり、画素方向のデータ転送時にメモリのセンスアンプをプリチャージする時間を充分確保することができる。すなわち、バンク切り替え等の手法を用いてレイテンシの問題を解決できるようになる。
【００８５】
さて、演算要素（Ｅ）２０１ｅの内部の各ワークメモリには、各レベルのＳＳ，ＤＳ，ＳＤ，ＤＤのデータが図２８のようなマッピングで記憶される。ｘはレベルを示す数字を意味する。ただし、レベル１、２，３のＳＳデータは、その上位レベルのデータで置き換えられるため意味を持たない。このような各レベルのデータは、従来と同様に、図４５に示すようなマッピングでフレームメモリ１００に書き戻すことも可能である。しかし、符号化復号化器１０２においては、重要度の高いレベル及び周波数帯信号からビットプレーン単位で符号化又は復号化を行うため、その処理順序を考慮して、周波数帯信号の配列を予めソートした形でフレームメモリ１００に書き戻すのが有利である。以下、これについて説明する。
【００８６】
図２９は、ウェーブレット変換により得られた各レべルの各周波数帯信号が重要度により並べ替えらる「アライメント（ａｌｉｇｎｍｅｎｔ）」と呼ばれる概念を表している。図２９において、１つの長方形が、あるレべルのある周波数帯信号を示し、その長さがビット深さを表している。ビット深さは、使用するフィルタの構成により異なってくるが、ここではＳＤ係数及びＤＳ係数が同じビット深さを有し、ＤＤ係数はＳＤ係数及ぴＤＳ係数より１ビットだけビット深さが深いものとして描かれている。符号化復号化器１０２においては、例えば画像を扱うような場合は、圧縮率を上げるためにデータの切り捨てを行う場合がある。アライメントは、その切り捨て方を決めるための１つの手段として用いられ、重要度の低いビットのデータが切り捨てられる。
【００８７】
図３０はビットプレーンという概念を説明する図である。データが例えば画像データであれば、ある画素（ｐｉｘｅｌ）は（ｘ、ｙ）で表されるアドレス空間と、ビット深さを持っている。図中、斜線で示した部分（例えば、ＭＳＢの部分）をビットプレーンと呼ぶ。符号化復号化器１０２では、処理はビットプレーン単位で、何画素かの塊で行われる。画像データは、ある画素に着目した場合、その周辺の画素との相関が高いことを利用して圧縮率を高めるためである。
【００８８】
符号化又は復号化は、ビットプレーン単位で重要度の高い方の、左から右に向かって実行されていく（ただし、ＳＳ係数は除外）。図２９のようなアライメントの場合、レベル４のＳＤ係数の最上位ビット（ＭＳＢ）、ＤＳ係数の最上位ビット、ＤＤ係数の最上位ビット、ＳＤ係数の１つ下位の（ＭＳＢ−１）ビット、．．．という順になる。したがって、図４５に示すようなマッピングで周波数帯信号がフレームメモリ１００に書き戻された場合には、符号化復号化器１０２でフレームメモリ１００に対し飛び飛びのアドレッシングが必要になることは明白であり、フレームメモリ１００がＳＤＲＡＭのようなバースト転送を行うメモリであるとレイテンシの関係から処理速度が上がらない。
【００８９】
これを回避するために、本実施例においては、レベル毎及び周波数帯毎に連続してアクセスすることが可能な配列となるように、周波数帯信号をその重要度の高い順に予めソートしてからフレームメモリ１００に書き戻す。すなわち、図２９のようなアライメントの場合、１６×１６の領域のデータを、図４６に示すように、フレームメモリ１００の左上から４ＳＳ，４ＳＤ，４ＤＳ，４ＤＤ，次に、３ＤＳ×２，３ＳＤ×２，３ＤＤ×２，．．．というような重要度の高い順で書き戻し、最後にレベル１のデータ（１ＤＳ，１ＳＤ，１ＤＤ）を書き戻す（レベル１のデータが領域の４分の３を占める）。このような周波数帯信号のソーティングは、例えば、制御部２１１において各ワークメモリから周波数帯信号を読み出す順序を制御することによって行われる。
【００９０】
このようにソートされた状態で書き戻されたならば、符号化復号化部器１０２において、切り捨てを行う場合にも行わない場合にも、フレームメモリ１００に対するアドレッシングが容易になり、かつ、レイテンシが発生しにくいので、符号化復号化器１０２からのフレームメモリ１００に対する読み出し／書き込みを高速化することができる。
【００９１】
本発明の第２の実施例によれば、図２に示した全体構成のウェーブレット変換装置１０１において、演算要素（Ｅ）２０１ｅの構成が図３１に示すように一部変更される。すなわち、各ワークメモリ２１６〜２２３が、ＳＳ，ＳＤ，ＤＳ，ＤＤの各データに割り当てられる、独立した４つのメモリ（ＳＳメモリ，ＳＤメモリ，ＤＳメモリ，ＤＤメモリと呼ぶことがある）に分割された構成とされる。制御部２１１はＳＳ計算部２１０を具備しない。また、フィルタ（１）２１３とフィルタ（２）２１４は、特定のレベルの処理専用に割り当てられるものではなく、各レベルの水平処理時と垂直処理時に制御部２１１によって入出力を切り替えて使用される。
【００９２】
本実施例のウェーブレット変換装置１０１の全体的動作は前記第１実施例と同様であり、演算要素（Ｅ）２０１ｅに関する動作のみが前記第１実施例と一部相違する。以下、演算要素（Ｅ）２０１ｅの動作を、図３２のメモリマップ、図３３及び図３４の接続図、図３５のタイミングチャートを参照して説明する。
【００９３】
フレームメモリ１００のデータが演算要素（Ｅ）２０１ｅのワークメモリ（Ｌ１＿０又はＬ１＿１）２１６又は２１７に転送される際に、制御部２１１により、データは図３２のメモリマップに示すような規則に従って、同ワークメモリを構成するＳＳメモリ，ＳＤメモリ，ＤＳメモリ，ＤＤメモリに振り分けて書き込まれる。この振り分けの規則は次の通りである。ＳＳメモリの０ライン目の０画素目にはフレームメモリ１００のブロックの０ライン目の０画素目のデータが格納され、ＳＤメモリの０ライン目の０画素目にはブロックの０ライン目の１画素目のデータが格納される。また、ＤＳメモリの０ライン目の０画素目にはブロックの１ライン目の０画素目のデータが格納され、ＤＤメモリの０ライン目の０画素目にはブロックの１ライン目の１画素目が格納される。すなわち、偶数ライン目（０ライン目も偶数ライン目と数える）の偶数画素目（０画素目も偶数画素目と数える）がＳＳメモリに格納され、偶数ライン目の奇数画素目がＳＤメモリに格納され、奇数ライン目の偶数画素目がＤＳメモリに格納され、奇数ライン目の奇数画素目がＤＤメモリに格納されるのである。
【００９４】
図３５のタイミングチャートの時刻ｔ０までに、上に述べたような形でワークメモリ（Ｌ１＿０又はＬ１＿０）２１６又は２１７にデータが格納されると、レベル１の水平処理が開始する。この時には、フィルタ（１）２１３及びフィルタ（２）２１４の入出力は、図３３に示すように、ワークメモリ（Ｌ１＿０又はＬ１＿０）２１６又は２１７のＳＳメモリ，ＳＤメモリ，ＤＳメモリ，ＤＤメモリと接続される。これら４つのメモリは独立しているため、同時刻に読み出し／書き込みを行うことができる。すなわち、偶数ライン目をフィルタ（１）２１３で、奇数ライン目をフィルタ（２）２１４で、並列に処理し、その結果を同時に書き込むのである。この水平処理が終わると、次にレベル１の垂直処理が開始する。この時には、フィルタ（１）２１３及びフィルタ（２）２１４の入出力は、図３４のように接続が変更される。すなわち、偶数画素目をフィルタ（１）２１３で、奇数画素目をフィルタ（２）２１４で、並列に処理し、結果を同時に書き込むのである。垂直処理で得られるレベル１のＳＳデータは、ＳＳメモリに書き込まれるとともに、制御部２１１により図３２に関連して説明した規則と同様の規則に従ってワークメモリ（Ｌ２＿０又はＬ２＿１）２１８又は２１９を構成する４つのメモリに振り分けて書き込まれる。
【００９５】
そして、時刻ｔ１からレベル２の処理が開始する。まず、フィルタ（１，２）２１３，２１４の入出力が、図３３に示すように、ワークメモリ（Ｌ２＿０又はＬ２＿１）２１８又は２１９を構成する４つのメモリに接続され、水平処理が実行される。水平処理が終わると、フィルタ（１，２）２１３，２１４の入出力が、図３４に示すような接続に切り替えられ、垂直処理が実行される。得られたレベル２のＳＳデータは、図３２に関連して説明したと同様の規則に従って、ワークメモリ（Ｌ３＿０又はＬ３＿１）２２０又は２２１を構成する４つのメモリに振り分けて書き込まれる。ここまでが時刻ｔ２である。以下、同様にしてレベル４まで処理が実行され、時刻ｔ４で処理が完了する。ただし、レベル４の処理の際には、他の演算要素で得られたレベル３のＳＳデータも利用されることは前記第１実施例の場合と同様である。
【００９６】
このように、本実施例においては、レベル１，２，３，４の処理をシーケンシャルに実行するが、各レベルの処理に要する時間は、前記第１実施例に比べて４分の１で済む。これは、各ワークメモリを構成する４つのメモリを同時にアクセスすることが可能であるためである。
【００９７】
本実施例における処理時間は次の通りである。図２７における１から１５の各部分での処理時間をブロツク単位で求めると、１の部分は４．６５６２５ブロック、２の部分は３．９８４３７５ブロック、３の部分は２．６５６２５ブロック、４の部分は４．６５６２５ブロック、５の部分は７．９７６５６２５ブロック、６の部分は２．６５６２５ブロック、７の部分は４．６５６２５ブロック、８の部分は７．９７６５６２５ブロック、９の部分は２．６５６２５ブロック、１０の部分は４．６５６２５ブロック、１１の部分は７．９７６５６２５ブロック、１２の部分は２．６５６２５ブロック、１３の部分は４．６５６２５ブロック、１４の部分は７．９７６５６２５ブロック、１５の部分は２．６５６２５ブロックとなる。図３５に示したように、ウェ‐ブレット変換の時間はブロックを単位とすると、レべル１では０．２５×２ブロック、レべル２では０．０６２５×２ブロック、レべル３では０．０１５６２５×２ブロック、そしてレべル４では０．０Ｏ３９０６２５×２ブロック、というように小数点以下の数が発生するが、ここでは省略せず計算を行っている。処理時間の合計は（Ｘ＝Ｙ＝１０２４としたとき）、最上段及び最下段のブロックでは４．６５６２５＋３．９８４３７５×３０＋２．６５６２５＝１２６．８４３７５ブロック、それ以外では（４．６５６２５＋７．９７６５６２５×３０＋２．６５６２５）×６０＝１４７９６．５６２５ブロックとなり、合計で１５０５０．２５ブロックとなる。クロック数に換算すると、１ブロックが１６×１６であるから、３７６２．５６２５ｋクロック（１ｋ＝１０２４）となる。このクロック数は、ウェーレット変換とフレームメモリに対する読み出し／書き込みのスピードを同一とした場合である。この処理速度は、前記第１実施例の７６７５．５ｋクロックに比べても４９％の処理時間で済み、従来技術と比べるとレイテンシが２の場合では３５％の処理時間で済むから、非常に高速な処理が可能であることが理解されよう。
【００９８】
さて、前述のように、符号化復号化器１０２ではビット単位で処理が行われるため、ウェーブレット変換に比べて数倍から１０数倍の処理時間がかかることは明らかである。データの読み出し／書き込みをフレームメモリ１００に対して行えば、その処理時間はさらに長くなる。また、前述のように、符号化復号化は重要度が高い周波数帯信号から順に行われるが、一般に高いレベルの周波数帯信号ほど重要度が高い。しかるに、従来はウェーブレット変換がレベル１から順次高いレベルへと行われたため、高レベルの周波数帯信号を得るにはウェーブレット変換の終了を待たなければならなかった。これに対し、本発明では、ウェーブレット変換が小さなブロックの単位で行われ、しかもブロックの全レベルの周波数帯信号がほぼ同時に得られる。
【００９９】
このような本発明のウェーブレット変換装置の特質に鑑み、本発明の第３の実施例によれば、ウェーブレット変換と符号化復号化処理との処理時間の差を吸収するために、図３６に簡略化して示すように、フレームメモリ１００のキャッシュメモリとして動作するバッファメモリ２５０が、前記第１実施例又は第２実施例と同様な構成のウェーブレット変換装置１０１内の主制御部２０７に付加される。このバッファメモリ２５０は符号化復号化器１０２によってアクセス可能とされる。本実施例のウェーブレット変換装置１０１では、ブロック単位のウェーブレット変換を実行して一部の周波数帯信号データ（例えば高レベルの周波数帯信号データ）をバッファメモリ２５０に書き込む。このウェーブレット変換と並行して、符号化復号化器１０２ではバッファメモリ２５０から取得できるデータはバッファメモリ２５０から読み込み、それ以外のデータはフレームメモリ１００から読み込みながら、符号化復号化処理を実行する。
【０１００】
例えば、フレームメモリ１００のサイズをＸ＝Ｙ＝１０２４とし、バッファメモリ２５０にレべル３とレべル４のデータを全て書込むとものとする。この場合、バッファメモリ２５０の必要サイズは、［１２８×１２８（レべル３分）＋６４×６４（レべル４分）］×３（ＳＳ係数は符号化復号化されないため、除かれる）＝６０ｋＢ（１ｋ＝１０２４）にすぎない。
【０１０１】
この場合の処理時間について述べる。従来技術ではウェーブレット変換に１０８８０ｋクロック（レイテンシ２の場合）かかり、符号化復号化器で処理するデータ数が１０２４ｋ−１６ｋ（ＳＳデータ分）＝１００８ｋであるので、１データの符号化復号化に要する時間を１０クロックとすると、符号化復号化器での処理時間は１００８０ｋクロックとなる。したがって、ウェーブレット変換と符号化復号化処理のトータル時間は２０９６０ｋクロックとなる。
【０１０２】
これに対し、本実施例においては、ウェーブレット変換装置１０１が前記第２実施例の構成であるとすると、ウェーブレット変換の処理時間は３７６２．５６２５ｋクロックとなる。前述のように、レべル３及びレべル４のサイズに対応したバッファメモリ２５０を用意した場合、そのデータ数は２４０ｋＢであるので、その符号化復号化処理にかかる時間は２４００ｋクロックとなるが、符号化復号化処理はウェーブレット変換と並列に行われるから、この２４００ｋクロックはウェーブレット変換のための時間に包含される。レべル１及びレべル２で処理すべきデータ数は９６０ｋＢであるので、その符号化復号化処理にかかる時間は９６００ｋクロックとなる。したがって、トータルの処理時間は、この９６００ｋクロックにウェーブレット変換の処理時間３７６２．５６２５ｋクロックを加えた１３３６２．５６２５ｋクロックとなり、従来技術に比べて約６４％まで短縮される。このように、処理全体を大幅に高速化できる。
【０１０３】
【発明の効果】
請求項１乃至５の各項記載の発明によるウェーブレット変換装置は、複数の演算要素を利用してブロック単位で所望レベルまでのウェーブレット変換処理を高速に行うことができるとともに、フレームメモリからのデータの読み込みもフレームメモリへの周波数帯信号の書き戻しもブロック単位で行うためレイテンシの影響を回避することができるため、フレームメモリに対するメモリアクセスも含めた処理全体を従来より大幅に高速化することができ、またブロック単位で全ウェーブレット・レベルの周波数帯信号がほぼ同時に得られる。請求項２記載の発明によるウェーブレット変換装置は、２つの隣接ブロックを連続的に処理しながらフレームメモリに対する読み書きを並行して行うことにより、処理速度をさらに高速化することができる。請求項３又は４記載の発明によるウェーブレット変換装置は、第１演算要素におけるウェーブレット変換処理を高速化することができるため、処理速度の一層の高速化が可能である。請求項６又は７記載の発明による符号化復号化装置は、ウェーブレット変換処理が高速化されるとともに、ブロック単位で全ウェーブレット・レベルの周波数帯信号が同時にフレームメモリ上に得られるため、符号化処理をウェーブレット変換処理の開始後すぐに開始しウェーブレット変換処理と並行して実行可能であり、したがって従来よりもはるかに高速な動作が可能である。特に、ウェーブレット変換装置によって周波数帯信号が重要度順にソートされた状態でフレームメモリに書き戻される場合には、符号化復号化器のフレームメモリへのアクセスを高速化できるため、非常に高速な処理が可能である。また、請求項７記載の発明による符号化復号化装置は、バッファメモリより取得可能なデータについてはフレームメモリをアクセスする必要がなくなるため、さらに高速な動作が可能である、等々の効果を得られる。
【図面の簡単な説明】
【図１】本発明による符号化復号化装置の全体構成の一例を示す図である。
【図２】本発明の第１の実施例によるウェーブレット変換装置の内部構成を示すブロック図である。
【図３】本発明の第１の実施例による演算要素（Ｅ）の内部構成を示すブロック図である。
【図４】ワークメモリ（Ｌ１＿０，Ｌ１＿１）、キャッシュメモリ（ＣＭ１，ＣＭ３，ＣＭ４）のサイズの説明図である。
【図５】キャッシュメモリ（ＣＭ２）のサイズの説明図である。
【図６】ワークメモリ（Ｌ２＿０，Ｌ２＿１）のサイズの説明図である。
【図７】ワークメモリ（Ｌ３＿０，Ｌ３＿１）のサイズの説明図である。
【図８】ワークメモリ（Ｌ４＿０，Ｌ４＿１）のサイズの説明図である。
【図９】フレームメモリのサイズの説明図である。
【図１０】カレントメモリのサイズの説明図である。
【図１１】本発明の第１の実施例による演算要素（Ａ〜Ｄ，Ｆ〜Ｉ）の内部構成を示すブロック図である。
【図１２】１６×１６サイズのブロック１個あたりのレベル３までの周波数帯信号のマッピングの一例を示す図である。
【図１３】最初のブロック行の左端の２ブロックの処理を説明するための図である。
【図１４】最初のブロック行の次の２ブロックの処理を説明するための図である。
【図１５】最初のブロック行の右端の２ブロックの処理を説明するための図である。
【図１６】２番目のブロック行の左端の２ブロックの処理を説明するための図である。
【図１７】２番目のブロック行の次の２ブロックの処理を説明するための図である。
【図１８】２番目のブロック行の右端の２ブロックの処理を説明するための図である。
【図１９】最後のブロック行の左端の２ブロックの処理を説明するための図である。
【図２０】最後のブロック行の次の２ブロックの処理を説明するための図である。
【図２１】最後のブロック行の右端の２ブロックの処理を説明するための図である。
【図２２】キャッシュメモリ（ＣＭ１〜ＣＭ４）によるデータのキャッシュを説明するための図である。
【図２３】ウェーブレット変換装置のタイミングチャートである。
【図２４】ウェーブレット変換装置のタイミングチャートである。
【図２５】ウェーブレット変換装置のタイミングチャートである。
【図２６】本発明の第１の実施例における演算要素（Ｅ）のタイミングチャートである。
【図２７】処理時間を説明するためにフレームメモリを複数の部分に分割した図である。
【図２８】演算要素（Ｅ）の各ワークメモリに得られる周波数帯信号のマッピングを示す図である。
【図２９】周波数帯信号のアライメントの説明図である。
【図３０】ビットプレーンの説明図である。
【図３１】本発明の第２の実施例における演算要素（Ｅ）の内部構成を示すブロック図である。
【図３２】本発明の第２の実施例における演算要素（Ｅ）内のワークメモリを構成するＳＳ，ＤＳ，ＳＤ，ＤＤメモリへのデータの振り分け方を示す図である。
【図３３】水平処理時のフィルタの入出力の接続方法を示す図である。
【図３４】垂直処理時のフィルタの入出力の接続方法を示す図である。
【図３５】本発明の第２の実施例における演算要素（Ｅ）のタイミングチャートである。
【図３６】本発明の第３の実施例による符号化復号化装置及びウェーブレット変換装置の構成を示す図である。
【図３７】従来のウェーブレット変換装置の一般的構成を示す図である。
【図３８】ウェーブレット変換の水平処理及び垂直処理のためのフィルタ演算の説明図である。
【図３９】ウェーブレット変換前のイメージデータのメモリマップの一例を示す図である。
【図４０】１Ｓ係数及び１Ｄ係数のためのメモリマップの一例を示す図である。
【図４１】１ＳＳ係数、１ＳＤ係数、１ＤＳ係数及び１ＤＤ係数のためのメモリマップの一例を示す図である。
【図４２】２Ｓ係数及び２Ｄ係数のためのメモリマップの一例を示す図である。
【図４３】２ＳＳ係数、２ＳＤ係数、２ＤＳ係数及び２ＤＤ係数のためのメモリマップの一例を示す図である。
【図４４】従来の一般的なウェーブレット変換装置のタイミングチャートである。
【図４５】レベル４までの周波数帯信号の一般的なメモリマップを示す図である。
【図４６】レベル４までの周波数帯信号を重要度の高い順に書き戻した場合のメモリマップの一例を示す図である。
【符号の説明】
１００フレームメモリ
１０１ウェーブレット変換装置
１０２符号化復号化器
２００演算部
２０１ａ〜２０１ｉ演算要素（Ａ〜Ｉ）
２０２カレントメモリ
２０３キャッシュメモリ（ＣＭ１）
２０４キャッシュメモリ（ＣＭ２）
２０５キャッシュメモリ（ＣＭ３）
２０６キャッシュメモリ（ＣＭ４）
２０７主制御部
２１０ＳＳ計算部
２１１制御部
２１３フィルタ（１）
２１４フィルタ（２）
２１６ワークメモリ（Ｌ１＿０）
２１７ワークメモリ（Ｌ１＿１）
２１８ワークメモリ（Ｌ２＿０）
２１９ワークメモリ（Ｌ２＿１）
２２０ワークメモリ（Ｌ３＿０）
２２１ワークメモリ（Ｌ３＿１）
２２２ワークメモリ（Ｌ４＿０）
２２３ワークメモリ（Ｌ４＿１）
２３０ＳＳ計算部
２３１制御部
２３３ワークメモリ（Ｌ４＿０）
２３４ワークメモリ（Ｌ４＿１）
２５０バッファメモリ[0001]
BACKGROUND OF THE INVENTION
The present invention generally relates to systems in general that use wavelet transform, and more particularly to a wavelet transform device and an encoding / decoding device used in a compression / decompression system for image data and the like.
[0002]
[Prior art]
The wavelet transform is attracting attention because it has a feature that the frequency domain and the time domain can be expressed at the same time. In particular, data compression applications are very useful for storing and transmitting large amounts of data. For example, the time required for facsimile transmission of a document or transmission of an image such as the World Wide Web can be drastically shortened if the number of bits required for reproduction of the image is lost using compression.
[0003]
Conventionally, many different data compression methods exist. The most widely used compression method is JPEG (Joint Photographic Experts Group). In JPEG, input symbols or luminance data are quantized and then converted into output codewords. Quantization aims to remove important feature quantities while preserving important feature quantities of data. Prior to quantization, transformation is used for energy concentration, and DCT (Discrete Cosine Transform) is adopted as this transformation. However, various disadvantages have been pointed out with respect to JPEG using DCT. For example, there is a problem that block noise or mosquito noise occurs. In the application of image signal processing, there is a focus on pursuing an efficient and highly accurate data compression encoding method that can eliminate these drawbacks. Among these methods, there is a wavelet processing method.
[0004]
When wavelet transform is applied to a two-dimensional signal, a horizontal low-pass signal (horizontal low) and a horizontal high-pass filter HH (horizontal high) are used. S (Smooth) coefficient) and a horizontal high-frequency signal (D (Detail) coefficient), and a vertical low-pass filter VL (Vertical Low) and a vertical high for each S coefficient and D coefficient. Using a low-pass filter VH (Vertical High), the horizontal low range-vertical low range signal (SS coefficient), horizontal low range-vertical high range signal (SD coefficient), horizontal high range-vertical direction The signal is separated into a low frequency code (DS coefficient) and a horizontal high frequency-vertical high frequency signal (DD coefficient). An output obtained by performing horizontal processing and vertical processing once is referred to as level 1 output. The above four types of signals are called frequency band signals. If output of level 2 or higher is desired, this process may be performed recursively on the SS coefficient. In level 2, seven frequency band signals of SS coefficient, 1SD coefficient and 2SD coefficient, 1DS coefficient and 2DS coefficient, 1DD coefficient and 2DD coefficient are obtained. Although the case of using the filter in the horizontal direction first and then using the filter in the vertical direction has been described, this order may be reversed.
[0005]
In an encoding / decoding device using wavelet transform, each frequency band signal obtained through the above process is compressed by an encoding / decoding unit. Compression is performed in units of bits for each frequency band signal. The MSB of the first pixel of a certain frequency band signal is the object of processing. The output is determined by referring to the state of the pixel itself, the state of the surrounding pixels, and the state of the next higher level. Next, the MSB of the second pixel is to be processed, but the state of the pixel processed first is also referred to at this time. Hereinafter, when the process for the region to be encoded is completed, the lowermost (MSB-1) bit of the first pixel is processed. At this time, in addition to the state of the peripheral pixels having the same bit depth, the state of the MSB is also referred to. In this way, encoding is performed up to LSB for the region to be encoded. Decoding is performed through almost the same procedure.
[0006]
FIG. 37 shows a general configuration of a conventional wavelet transform apparatus, which includes a frame memory 400, a control unit 401, and a filter 402. The filter 402 may have any configuration, but here, as the low-pass filter, a 2-tap filter that performs an operation using two sets of data is used. The high-pass filter is a 6-tap filter that uses the current position and the previous and next three sets of data among the S coefficients that are the output of the low-pass filter. Shall be used.
[0007]
FIG. 38 shows an example of wavelet transform processing when the filter is used. (A) shows processing in the horizontal direction, and (b) shows processing in the vertical direction. In FIG. 9A, for example, 00 means the data of the 0th pixel of the 0th line, and 12 means the data of the 2nd pixel of the 1st line (both line and pixel are counted from 0th). To do). In the horizontal processing, the output S00 of the 0th pixel of the horizontal low-pass filter HL is obtained from the data 00 and 01, and the output S01 of the first pixel is the data 02, as shown in FIG. It is obtained from 08. In addition, the output H00 of the 0th pixel of the horizontal Takagi filter HH is obtained from the data before and after data 00 (not existing), data 00 and 01, and data 02 and 03. It is done. In the vertical processing, as shown in FIG. 5B, the output SS00 of the vertical low-pass filter VL is obtained from the data S00 and S10. The output SD00 of the vertical high-pass filter VH is obtained from the data two previous and one previous data S00 (not existing), data S00 and S10, and data S20 and S30.
[0008]
FIG. 39 shows data on the frame memory before the wavelet transform is performed. This data is first processed in the horizontal direction, and the obtained S coefficient and D coefficient are written into the frame memory by mapping as shown in FIG. In FIG. 40, for example, 1S00 means the S coefficient of address 00 of level 1. FIG. 41 shows an example of mapping when writing each coefficient after performing vertical processing. This is the method for storing the coefficients of level 1. FIG. 42 shows an example of a method for storing each coefficient of the level 2 in the horizontal direction. Note that since the level 2 processing is performed only on the 1SS coefficient, the data in the hatched portion is not used. Next, each coefficient of level 2 is stored by mapping as shown in FIG. 43, and the processing of level 2 ends. Thereafter, processing is sequentially performed until a frequency band signal of a desired level is obtained.
[0009]
FIG. 44 shows a timing chart of a conventional general wavelet transform device. The processing time when processing up to level 4 will be described assuming that the data size is 1024 in the pixel direction size X, 1024 in the line direction size Y, and 1 MB in the tour bar.
[0010]
From time t0 to t1, level 1 processing is performed on all data in the frame memory, level 2 processing is performed from time t1 to t2, and level 3 processing is performed from time t2 to t3. Then, level 4 is processed from time t3 to time t4. Since the processing time is X = Y = 1024, 1024 × 1024 × 4 at level 1 (4 breakdown: pixel direction reading, pixel direction writing, line direction reading, and line direction writing), The level 2 is 512 × 512 × 4, the level 3 is 256 × 256 × 4, and the level 4 is 128 × 128 × 4. The total is 5440k clocks (1k = 1024).
[0011]
However, this numerical value is the minimum time when the read / write latency in the line direction with respect to the frame memory is zero. In recent memories such as SDRAM, burst transfer is normally performed, and data transfer in the pixel direction can be performed at high speed, but data transfer in the line direction requires precharge to the sense amplifier, so that latency (delay) is required. Will occur. In order to avoid this, a technique such as bank switching may be used. However, such an effect cannot be expected when access in the line direction occurs continuously like wavelet transform. If the latency is 2, the processing time increases to 10880k clocks. This latency is very difficult to avoid for the reasons described above.
[0012]
In the encoding / decoding device, after the wavelet transform is completed, each frequency band signal written in the frame memory is encoded by the encoding / decoding device. The image signal is increased in compression rate by utilizing the characteristic that the correlation between adjacent pixels, particularly the correlation within the same bit plane is high. For this reason, at the time of encoding, data in a certain area is handled in bit units (any one bit of data of a certain pixel). Decoding is obtained in almost the reverse order of the operations described above.
[0013]
More detailed information regarding the encoding / decoding device, wavelet transform device, and fill-up for wavelet transform related to the present invention can be found in Japanese Patent Laid-Open No. 8-139935. An encoding / decoding device is described in detail in JP-A-9-121168. Further, as publicly known documents concerning similar wavelet transform apparatuses, there are JP-A-3-27687, JP-A-5-167997, JP-A-5-183886, and the like.
[0014]
[Problems to be solved by the invention]
Conventionally, when wavelet transform is performed via a frame memory, as described above, processing in the pixel (horizontal) direction can be performed at a relatively high speed, but processing in the line (vertical) direction can be performed at the time of frame memory access. Occurs, and the total processing time becomes very long. In addition, since processing is performed in order of level, it takes time until a high-level frequency band signal is obtained, and there has been a problem that encoding / decoding processing cannot be started until then.
[0015]
Therefore, an object of the present invention is to provide a wavelet transform apparatus that can solve the latency problem and can perform high-speed wavelet transform processing even when a memory such as an SDRAM is used as the frame memory. Another object of the present invention is to provide a high-speed wavelet transform apparatus capable of obtaining frequency band signals of all levels at substantially the same time. Another object of the present invention is to provide a high-speed encoding / decoding device using wavelet transform.
[0016]
[Means for Solving the Problems]
  According to the first aspect of the present invention, data is read in units of blocks from a frame memory, wavelet transform processing of n levels (n is an integer of 3 or more) is performed, and the obtained frequency band signal of wavelet transform coefficients of n levels is the frame. A wavelet transform device for writing back to a memory, comprising: a memory for temporarily storing data read from the frame memory; and a computing unit comprising at least one first computing element and a plurality of second computing elements. The plurality of second calculation elements calculate low frequency band signals in wavelet transform coefficients of level (n−1), respectively, from data of blocks around the target block read from the frame memory, and The arithmetic element is an n-level way for the data of the block of interest read from the frame memory. The level n wavelet transform process is calculated by the level (n-1) low frequency band signal obtained by the first computation element and the plurality of second computation elements. And the total frequency band signal of the n-level wavelet transform coefficient is calculated using the low-frequency band signal of level (n-1), and all the n-level signals obtained by the wavelet transform processing of the first calculation element are calculated. The frequency band signal is written back to the frame memory in units of blocks.
[0017]
  The invention according to claim 2 is the wavelet transform device according to claim 1, wherein the first calculation element includes two work memories corresponding to each level, and the second calculation element includes two work memories. Wavelet transform processing is continuously performed on two adjacent blocks of data in the frame memory, and wavelet transform processing, data reading from the frame memory, and frequency band signal writing back to the frame memory are performed in parallel. It is characterized by being performed.
[0018]
  According to a third aspect of the present invention, in the wavelet transform device according to the second aspect, the first calculation element is used for a first filter used for a level 1 wavelet transform process and a level 2 to n wavelet transform process. A second filter and means for calculating a low frequency band signal at each level, wherein in the first computing element, a level 1 wavelet transform process and a level 2 to n wavelet transform for the data of the block of interest The processes are executed in parallel.
[0019]
  According to a fourth aspect of the present invention, in the wavelet transform device according to the second aspect, the work memory corresponding to each level of the first arithmetic element is divided into four memories that can be accessed independently, and the first arithmetic element Includes a first filter used for level 1 wavelet transform processing and a second filter used for level 2 to n wavelet transform processing, and the wavelet transform processing is performed in parallel using the two filters. It is executed.
[0020]
  A fifth aspect of the present invention is the wavelet transform apparatus according to the first, second, or third aspect, wherein the buffer memory temporarily stores a part of the frequency band signal obtained by the wavelet transform process of the first arithmetic element. The buffer memory is accessible from the outside.
[0021]
  The invention of claim 65. A coding / decoding device comprising: a frame memory; the wavelet transform device according to claim 1, 2, 3, or 4 that can access the frame memory; and the coding / decoding device that can access the frame memory. And
[0022]
  According to a seventh aspect of the present invention, there is provided a frame memory, the wavelet transform apparatus according to the fifth aspect capable of accessing the frame memory, and an encoding / decoder capable of accessing the frame memory and a buffer memory of the wavelet transform apparatus. The encoding / decoding device is provided.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. For simplicity, the same or similar reference numerals are used for the same or corresponding parts in a plurality of drawings in the accompanying drawings.
[0024]
  Note that a 2-tap low-pass filter and a 6-tap high-pass filter are respectively used as filters for horizontal processing and vertical processing of wavelet transform. Wavelet transformation is performed up to level 4.That is, n = 4.Of course, the number of levels isn = 3 or moreYou can set any number. The present invention is more effective as the number of levels increases.
[0025]
FIG. 1 shows an example of the overall configuration of an encoding / decoding device according to the present invention. At the time of encoding, data data is written to the frame memory 100 from the outside. Data is read into the wavelet transform apparatus 101 according to the present invention in units of blocks, wavelet transform processing is performed in units of blocks, and the obtained frequency band signals of all wavelet levels are written back to the frame memory 100 in units of blocks. This frequency band signal is encoded by the encoder / decoder 102 and output to the outside as a code stream code. At the time of decoding, the code stream code input from the outside is decoded by the encoder / decoder 102, and each frequency band signal of the wavelet transform is restored on the frame memory 100. These waveband signals are subjected to inverse wavelet transform by the wavelet transform device 101, the original data is restored on the frame memory 100, and is output to the outside as data.
[0026]
The wavelet transform apparatus 101 according to the present invention performs wavelet transform processing in units of blocks, and writes back frequency band signals of all wavelet levels to the frame memory 100 in units of blocks. Therefore, the encoder / decoder 102 can start the encoding process immediately after starting the wavelet transform process, and can execute the encoding process in parallel with the wavelet transform process. The start of the encoding process is not waited until the band signal is obtained. Therefore, as described later, coupled with the fact that the wavelet transform apparatus 101 according to the present invention is high-speed, the encoding / decoding apparatus according to the present invention can operate at a much higher speed than before.
[0027]
According to the first embodiment of the present invention, the wavelet transform apparatus 101 has an internal configuration as shown in FIG. The wavelet transform device 101 includes an arithmetic unit 200, a current memory 202 for temporarily storing data read from the frame memory 100, and cache memories (CM1, CM2, CM3, CM4) 203, 204, 205, 206. The main control unit 207 performs timing control of the entire apparatus and data transfer control inside and outside the apparatus. The arithmetic unit 200 includes 3 × 3 arithmetic elements (A, B, C, D, E, F, G, H, and I) 201a, 201b, 201c, 201d, 201e, 201f, 201g, 201h, and 201i. .
[0028]
According to the first embodiment of the present invention, the calculation element (E) 201e is configured as shown in FIG. As shown in FIG. 3, the calculation element (E) 201e includes a control unit 211 including an SS calculation unit 210, a filter (1) 213 used for level 1 horizontal processing and vertical processing, and levels 2, 3, and 4. The filter (2) 214 used for horizontal processing and vertical processing is provided, and two work memories corresponding to each level are provided. That is, the work memory (L1_0) 216 on the 0th surface and the work memory (L1_1) 217 on the 1st surface as the work memory corresponding to the level 1, and the work memory (L2_0) 218 on the 0th surface as the work memory corresponding to the level 2 Work memory (L2_1219) on the first side, level 3 compatible work memory, work memory (L3_0) 220 on the 0th side, work memory (L3_1) 221 on the first side, level 4 compatible work memory, A work memory (L4_1) 222 on the 0th surface and a work memory (L4_1) 223 on the first surface are provided. The control unit 211 is a part that transfers data inside the computing element (E) 201e, controls the operation timing of the SS calculation unit 210, and exchanges data with the main control unit 207. The SS calculation unit 210 is a part that calculates SS data of levels 1, 2, and 3.
[0029]
According to the first embodiment of the present invention, the work memories (L1_0, L1_1) 216, 217 and the cache memories (CM1, CM3, CM4) 203, 205, 206 corresponding to level 1 are as shown in FIG. This is a memory having a size of a (pixel direction) × a (line direction). The cache memory (CM2) 204 is a memory having a size as shown in FIG. The work memories (L2_0, L2_1) 218 and 219 corresponding to level 2 are memories having a size of a / 2 × a / 2 as shown in FIG. The work memories (L3_0, L3_1) 220 and 221 corresponding to the level 3 have a / 4 × a / 4 size as shown in FIG. As shown in FIG. 8, the size of the work memory (L4_0, L4_1) 222, 223 corresponding to level 4 is a / 8 in both the pixel direction and the line direction. FIG. 9 is a diagram showing the size of the frame memory 100, where the size in the pixel direction is X and the size in the line direction is Y. FIG. 10 is a diagram showing the size of the current memory 202, where the size in the pixel direction is X and the size in the line direction is a.
[0030]
According to the first embodiment of the present invention, calculation elements (A to D, F to I) 201a to 201d and 201f to 201i other than the calculation element (E) 201e are respectively SS calculation units as shown in FIG. 230, and a two-sided work memory of a / 8 × a / 8 size, that is, a zeroth-side work memory (L4_0) 233 and a first-side work memory (L4_1) 234. . The SS calculation unit 230 is a part that calculates wavelet level 3 SS data. The control unit 231 is a part that controls data transfer inside the computation element, controls the operation timing of the SS calculation unit 230, and exchanges data with the main control unit 207. The SS data calculated by the SS calculation unit 230 is stored in the work memory (l4_0, L4_1) 233, 234.
[0031]
The wavelet transform apparatus 101 according to the present invention having such a configuration performs processing in units of blocks of size a × a. The basic size “a” is determined based on the number of taps of the filter used for the wavelet transform and the desired number of wavelets and levels. There is a merit that the smaller a is, the smaller the size of each memory in the wavelet transform apparatus 101 is. However, when a is determined, the processing speed must also be taken into consideration. Specifically, it may be better to make a decision based on the concept described below.
[0032]
Since the best wavelet level desired is 4 and the high-pass filter is 6 taps, if there are 6 SS data for level 3, SS, SD, One DS and one DD data can be obtained. The minimum size of the original data for this is 48 × 48 (3 × 3 blocks of 16 × 16 size). That is, a = 16 is the minimum size. Hereinafter, description will be made assuming that a = 16. If the high-pass filter has 10 taps and the desired maximum level is 5, the minimum size of the original data is 160 × 160 (5 × 5 blocks of 32 × 32 size), and a = 32.
[0033]
When a = 16 and X = Y = 1024, the number of words in the memory inside the wavelet transform device 101 (bit depth varies depending on the filter configuration) is as follows. First, in the calculation element (E) 201e,
16 × 16 × 2 = 512 words in work memory (L1_0, L1_1)
8 × 8 × 2 = 128 words in work memory (L2_0, L2_1)
4 × 4 × 2 = 32 words in work memory (L3_0, L3_1)
2 × 2 × 2 = 8 words in work memory (L4_0, L4_1)
It becomes. In each of the other calculation elements (A to D, F to I) 201a to 201d and 201f to 201i, 2 × 2 × 2 = 8 words in the work memory (L4_0, L4_1). Therefore, the total number of words in the work memory of all arithmetic units is 744 words. 16 × 16 × 8 = 768 words in the cache memories (CM1, CM3, CM4) 203, 205, and 206, 32 × 64− (16 × 16) = 1792 words in the cache memory (CM2) 204, and therefore 2560 in the entire cache memory Become a word. The current memory 202 is 1024 × 16 = 16384 words. Therefore, the total number of memory words in the wavelet transform apparatus 101 is 19688 words, and if 1k = 1024, a very small memory amount of just over 19k words is sufficient. This numerical value is independent of the size of the frame memory 100 in the line direction and depends only on the size of the pixel direction.
[0034]
The outline of the operation of the wavelet transform apparatus 101 is as follows. The data on the frame memory 100 is divided into blocks of 16 × 16 (a × a) size, and the wavelet transform processing for the central block in the 3 × 3 blocks is executed by the arithmetic element (E) 201e and the surroundings The calculation of level 3 SS data for the eight blocks is performed by the arithmetic elements (A to D, F to I) 201a to 201d and 201f to 201i. The computation element (E) 201e can perform wavelet transform processing using only the original data of the corresponding block up to level 3, but since the size of the original data is 16 × 16, 2 × 2 in level 3 (FIG. 12 shows an example of mapping of frequency band signals up to level 3 per block of 16 × 16 size. This includes 2 × 2 SS data. It is understood that) Due to the relationship of using a 6-tap high-pass filter, at least 6 × 6 level 3 SS data are required for level 4 conversion. The computation element (E) 201e is stored in the work memory (L4_0 or L4_1) 233 or 234 inside the computation elements (A to D, F to I) 201a to 201d and 201f to 201i as the insufficient level 3 SS data. By using the obtained SS data, level 4 conversion processing is executed. Thus, the frequency band signals from level 1 to bell 4 for one block obtained in the work memory inside the arithmetic element (E) 201e are written back to the corresponding area of the frame memory 100.
[0035]
In the present embodiment, in order to continuously execute the wavelet transform processing up to level 4 by such an arithmetic element (E) 201e on two blocks adjacent in the horizontal direction, as described above, The calculation element (E) 201e has two work memories corresponding to each level, and each of the other calculation elements also has two work memory of 2 × 2 size. In addition, the computation element (E) 201e can perform level 1 conversion and other level conversion at high speed by parallel processing, and can perform level 2 to level 4 conversion in any order. In order to achieve this, as described above, two independent filters 213 and 214 for horizontal processing and vertical processing are provided, and an SS calculation unit 210 that calculates SS data of levels 1, 2, and 3 is provided. ing.
[0036]
Since the conversion data obtained in the work memory inside the calculation element (E) 201e is sequentially written back to the frame memory 100 in units of blocks, some of the calculation elements required for processing of other blocks The original data is overwritten. The current memory 202 and the cache memories (CM1 to CM4) 203 to 206 as described above are used in order to avoid such temporary storage of the original data that is rewritten and rereading of the read data. The main control unit 207 operates so as to always prepare original data for 3 × 3 blocks (by mirroring for nonexistent data), and the data in the memory and the frame memory 100 are sent to the arithmetic elements 201a to 201i. It works to distribute. This distribution method differs depending on which block data in the frame memory 100 is processed, as described below.
[0037]
13 to 21 are diagrams for explaining the progress of the processing. As described above, the calculation element (E) 201e has two levels of work memory corresponding to each level, and the other calculation elements (A to D, F to I) 201a to 201d and 201f to 201i have work memories. Two are available. The grids of 3 × 3 solid lines in FIG. 13 to FIG. 21 represent 3 × 3 blocks corresponding to the computation elements (A to I) 201a to 201i related to the processing using the work memory on the 0th plane side. The 3 × 3 dotted grid indicates the positions of the blocks corresponding to the calculation elements (A to I) 201a to 201i in the processing using the work memory on the first surface side. In FIG. 13 to FIG. 21, 0_0, 1_0, etc. are block names, the number on the front side of the block name is the block address in the line direction (vertical direction), and the number on the back side is the block in the pixel direction (horizontal direction). -Indicates an address.
[0038]
FIG. 22 shows how the cache memories (CM1 to CM4) 203 to 206 are allocated for caching the original data necessary for the arithmetic elements (A to I) 201a to 201i. A0 to I0 mean data required when the arithmetic elements (A to I) 201a to 201i use the work memory on the 0th plane, and A1 to I1 denote arithmetic elements (A to I) 201a to 201i. Means data required when the work memory of the first surface is used.
[0039]
23 to 25 are timing charts of the wavelet transform device 101. FIG. In FIG. 23 to FIG. 25, for example, a portion in which only a block name such as 0_0 is written or a portion in which R is written represents reading (READ) from the frame memory 100, the current memory 202, or the cache memories 203 to 206. The part with W on the name indicates writing (WRITE), and the part with OW indicates overwriting (OVERWRITE). Hereinafter, the operation will be described in order with reference to FIGS. 13 to 21 and the timing chart.
[0040]
<FIG. 13> This is processing of the two leftmost blocks in the first block row. 23, data of the 0_0 block is read from the frame memory 100, sent to the computation elements (E, D) 201e and 201d, and written to the current memory 202. In the arithmetic element (E) 201e, the 0-0 block data is read into the work memory (L1_0) 216. In the arithmetic element (D) 201d, the SS calculation unit 230 calculates the level 3 SS data using the 0-0 block data, and stores the obtained 2 × 2 SS data in the work memory (4L_1) 234. To do. In the timing chart, “A0 to I0” represents the timing at which the original data is written to the work memory on the 0th surface of the calculation element or the calculated SS data is written to the work memory on the 0th surface. “I1” represents the timing at which the original data is written into the work memory on the first surface of the computing element or the calculated SS data is written into the work memory on the first surface.
[0041]
Next, 0_1 block data is read from the frame memory and
(F) sent to 201 f and computing element (E) 201 e and written to current memory 202 and cache memory (CM 2) 204. The arithmetic element (E) 201e reads the data into the work memory (L1_1) 217. In the calculation element (F) 201f, the SS calculation unit 230 calculates SS data of level 3 from 0_1 data, and stores it in the work memory (L4_0) 233.
[0042]
Next, 1_0 block data is read from the frame memory 100 and sent to the computation element (G) 201g and the computation element (H) 201h. The arithmetic element (G) 201g calculates level 3 SS data and stores it in the work memory (L4_1) 234, and the arithmetic element (H) 201h calculates level 3 data and stores it in the work memory (L4_0) 233. .
[0043]
Next, 1_1 block data is read from the frame memory 100, sent to the computation element (I) 201i and the computation element (H) 201h, and written to the cache memory (CM2) 204. The arithmetic element (I) 201i calculates level 3 SS data and stores it in the work memory (L4_0) 233, and the arithmetic element (H) 201h calculates SS data and stores it in the work memory (L1_1) 234.
[0044]
Although not clearly shown in the timing chart, in parallel with the reading of the above four blocks of data, the main control unit 207 controls the 0th surface of the calculation elements (A to D, G) 201a to 201d and 201g. The mirroring data of the corresponding block that does not actually exist is written into the work memory (L1_0) 233 and the work memory (L1_1) 234 on the first surface of the arithmetic elements (A, B) 201a and 201b.
[0045]
In this way, when preparation of data for the processing of the 0_0 block and the 0_1 block is completed, the arithmetic element (E) 201e first starts wavelet transform processing for the 0_0 block using the work memory on the 0th plane. “Cal — 0” in FIG. 23 represents the timing, and details of the processing will be described later.
[0046]
When the wavelet transform of the 0_0 block is completed, the data obtained in the work memory (L1_0, L2_0, L3_0, L4_0) 216, 218, 220, 222 on the 0th surface of the computation element (E) 201e is immediately written to the frame memory 100. Returned. The data of the work memory (L4_0) 233 on the 0th plane in these work memories and the other calculation elements (A to D, F to I) 201a to 201d and 201f to 201i are discarded.
[0047]
In parallel with the wavelet transform of the 0_0 block, the data of the 0_2 block is read from the frame memory 100, sent to the computing element (F) 201f, and written to the current memory 202 and the cache memory (CM2) 204. In the calculation element (F) 201f, SS data is calculated and stored in the work memory (L4_1) 234.
[0048]
Subsequently, 1_2 block data is read from the frame memory 100, sent to the computation element (I) 201i, SS data is calculated by the computation element (I) 201i, and written to the work memory (L4_1) 234, At the same time, 1_2 blocks of data are written into the cache memory (CM2) 204. Although not shown in the figure, when the 0_2 block is read, the mirroring data of the nonexistent block corresponding to the calculation element (C) 201c is given to the calculation element, and the SS data is calculated. (L4_1) 234 is stored.
[0049]
When the wavelet transform for the 0_0 block is completed, the computation element (E) 201e uses the work memory (L1_1, L2_1, L3_1, L4_1) 217, 219, 221, 223 on the first surface, and the wavelet for the 0_1 block. Start conversion. “Cal — 1” in FIG. 23 indicates the timing. When the wavelet transform of the 0_1 block is completed, the data obtained in the work memory (L1_1, L2_1, L3_1, L4_1) 217, 219, 221, 223 of the first surface of the computation element (E) 201e is immediately stored in the frame memory 100. Will be written back. Data in the work memory (L4_1) 234 on the first surface in the other arithmetic elements (A to D, F to I) 201a to 201d and 201f to 201i is discarded.
[0050]
In parallel with the wavelet transform of the 0_1 block, the data of the 0_1 block and the 1_1 block are read from the cache memory (CM2) 204 for the next wavelet transform of the 0_2 block, and the calculation elements (D, G) The data is transferred to 201d and 201g, respectively, and the mirroring data of 0_1 block is transferred to the arithmetic element (A) 201a. The computation element (D) 201d calculates the level 3 SS data from the 0_1 block data and stores it in the internal work memory (L4_0) 234, and the computation element (G) 201g calculates the SS data from the 1_1 block data. And stored in the internal work memory (L4_0) 234. The arithmetic element (A) 201a calculates the level 3 SS data of the mirroring data and stores it in the work memory L4_0) 234.
[0051]
When the wavelet transform of the 0_1 block is completed, the data of the 0_2 block and the 1_2 block are read from the cache memory (CM2) 204 for the wavelet transform of the 0_2 block and the 0_3 block, and the calculation elements (E, D) 201e, 201d And arithmetic elements (H, G) 201h and 201g, respectively. Also, the 0_2 block mirroring data is transferred to the arithmetic elements (A, B) 201a and 201b. The calculation element (E) 201e reads the data of the 0_2 block into the work memory (L1_0) 216, and the calculation element (D) 201d calculates the SS data from the data of the 0_2 block and stores it in the work memory (L4_1) 234. The element (H) 201h calculates SS data from the data of the 1_2 block and stores it in the work memory (L4_0) 233, and the calculation element (G) 201g calculates SS data of the 1_2 block and stores it in the work memory (L4_1) 234. Store. The arithmetic element (B) 201b calculates the SS data of the mirroring data and stores it in the work memory (L4_0) 233, and the arithmetic element (A) 201a calculates the SS data of the mirroring data and calculates the work memory (L4_1). ) 234. The processing up to this point is time t3, and wavelet transform data up to level 4 of the 0_0 block and the 0_1 block is obtained on the frame memory 100.
[0052]
<FIG. 14> This is processing of the 0_2 block and the 0_3 block on the right of the two blocks. Blocks in which processing has already been completed and data in the frame memory 100 has been rewritten are shown in FIG. 14 by shading (the same applies to FIGS. 15 to 21).
[0053]
From time t3, 0_3 block data is read from the frame memory 100, transferred to the computation elements (E, F) 201e, 201f, and written to the cache memory (CM2) 204 and the current memory 202. The 0_3 block mirroring data is also transferred to the computation elements (B, C) 201b and 201c. The calculation elements (C, F) 201c and 201f store SS data calculated from the input data in the work memory (L4_0) 233, and the calculation element (E) 201e stores the input data in the work memory (L1_1) 217 and perform the calculation. The element (B) 201b stores the SS data calculated from the input data in the work memory (L4_1) 234. Subsequently, 1_3 block data is read from the frame memory 100, transferred to the arithmetic elements (H, I) h, i, and written to the cache memory (CM 2) 204. The arithmetic element (H) 201h stores the calculated SS data in the work memory (L4_1) 234, and the arithmetic element (I) 201i stores the calculated SS data in the work memory (L4_0) 233.
[0054]
At this point, wavelet transform of the 0_2 block starts. At the same time, the data of the 0_4 block and the 1_4 block are read from the frame memory 100, and the mirroring data of the 0_4 block is transferred to the calculation element (C) 201c.
[0055]
When the wavelet transform of the 0_2 block is completed, the wavelet transform of the 0_3 block is started, and the converted data of the 0_2 block is written back to the frame memory 100. In parallel with the 0_3 block processing, the same data transfer as in the 0_1 block processing is performed for the next block processing. By time t4, writing back of the converted data of the 0_3 block to the frame memory 100 is completed. Between time t4 and t5, the 0_4 block and the 0_5 block are processed in the same way.
[0056]
<FIG. 15> This is processing of two blocks at the right end of the first block row. Since the operation is almost the same as in FIGS. 13 and 14, the details are omitted. However, as shown from time t7 to time t8 in the timing chart shown in FIG. 24, data corresponding to the calculation elements (F, I) 201f and 201i for the processing of the 0_63 block does not actually exist, and the mirror processing is performed. Therefore, the data read from the frame memory 100 is small.
[0057]
<< FIG. 16 >> Processing of the first two blocks in the next block row. This corresponds to the period from time t8 to t9 in the timing chart of FIG. 1_0 block data is read from the frame memory 100, written to the work memory (L1_0) 216 and the cache memory (CM3) 205 of the computation element (E) 201e, and also sent to the computation element (D) 201d to be SS. Data is calculated and stored in the work memory (L4_0) 233. In parallel with this, 0_0 block data is read from the current memory 202 and sent to the calculation elements (A, B) 201a and 201b to calculate SS data. The calculation element (A) 201a converts the SS data into the work data. The calculation element (B) 201b stores the SS data in the work memory (L4_0) 233. Further, the mirroring data of the 0_0 block is also sent to the arithmetic element (A) 201a, SS data is calculated, and stored in the work memory (L4_0) 233.
[0058]
Next, 1_1 block data is read from the frame memory 100 and sent to the arithmetic elements (E, F) 201e and 201f, and also written to the cache memories (CM2, CM4) 204 and 206. The calculation element (E) 201e stores the data in the work memory (L1_1) 217, and the calculation element (F) 201f calculates the SS data of the data and stores it in the work memory (L4_0) 233. In parallel with this, 0_1 block data is read from the current memory 202, written to the cache memory (CM2) 204, and sent to the computing elements (B, C) 201b, 201c. The computing element (B) 201b calculates SS data and stores it in the work memory (L4_1) 234, and the computing element (C) 201c calculates SS data and stores it in the work memory (L4_0) 233.
[0059]
Next, 2_0 block data is read from the frame memory 100, sent to the computation elements (G, H) 201g and 201h, and written to the cache memory (CM2) 204, and the 0_2 block data from the current memory 202. Is read and sent to the computing element (C) 201 c and written to the cache memory (CM 2) 204. The calculation elements (C, G) 201c and 201g calculate the SS data of the input data and store it in the work memory (L4_1) 234, respectively, and the calculation element (H) 201h calculates the SS data of the input data and calculates the work memory ( L4_0) 233. The 2_0 block mirroring data is also sent to the computing element (G) 201g, and the SS data is stored in the work memory (L4_0) 233.
[0060]
Next, 2_1 blocks of data are read from the frame memory 100, written to the cache memory (CM2) 204, and sent to the arithmetic elements (H, I) 201h and 201i. Also, 1_0 block data is read from the cache memory (CM3) 205 and overwritten in the current memory 202, and the mirroring data is sent to the computing element (D) 201d. The calculation elements (D, I) 201d and 201i calculate the SS data of the respective input data and store them in the work memory (L4_0) 233, and the calculation element (H) 201h calculates the SS data of the input data and calculates the work memory. Store in (L4_1) 234.
[0061]
From this time point, the computing element (E) 201e starts wavelet transform of the 1_0 block using the work memory (L1_0, L2_0, L3_0, L4_0) 216, 218, 220, 222 on the 0th surface. In parallel with this, 1_2 blocks of data are read from the frame memory 100 and written to the cache memory (CM2) 204 and the current memory 202, and sent to the computing element (F) 201f. The calculation element (F) 201f stores the calculated SS data in the work memory (L4_1) 234. Subsequently, 2_2 blocks of data are read from the frame memory 100, written to the cache memory (CM2) 204 and sent to the arithmetic element (I) 201i, and 1_1 blocks of data are read from the cache memory (CM4) 206. The current memory 202 is overwritten. The arithmetic element (I) 201i calculates the SS data of the input data and stores it in the work memory (L4_1) 234.
[0062]
When the wavelet transform of the 1_0 block is completed, the converted data is written back to the frame memory 100, and the work memory (L1_1, L2_1, L3_1, L4_1) 217, 219, 221 on the first surface is calculated by the computation element (E) 201e. 223 is used to start the wavelet transform of the 1_1 block. In parallel with this conversion processing, preparation for the next 1_2 block is performed. That is, the 0_1 block, the 1_1 block, and the 2_1 block are sequentially read from the cache memory (CM2) 204 and are sent to the calculation elements (A, D, G) 201a, 201d, and 201g, respectively. These calculation elements calculate SS data for each input data and store them in the work memory (L4_0) 233. Then, the conversion data of the 1_1 block is written back to the frame memory 100, whereby the processing for the 1_0 block and the 1_1 block is completed.
[0063]
<FIG. 17> Processing is performed at a position moved to the right by two blocks from the position shown in FIG. This corresponds to the time t9 to t10 in the timing chart of FIG. This processing pattern occupies most of the entire processing (about 88%).
[0064]
First, 0_2 block data is read from the cache memory (CM2) 204 and sent to the computation elements (B, A) 201b and 201a. The computation element (B) 201b sends the calculated SS data to the work memory (L4_0) 233. In the arithmetic element (A) 201a, the calculated SS data is stored in the work memory (L4_1) 234. Next, 1_2 blocks of data are read from the cache memory (CM2) 204, written to the work memory (L1_0) 216 of the computing element (E) 201e, and sent to the computing element (D) 201d. The computing element (D) 201d stores the calculated SS data in the work memory (L4_1) 234. Next, 2_2 blocks of data are read from the cache memory (CM2) 204 and sent to the calculation elements (G, H) 201g and 201h. The calculation element (H) 201h stores the calculated SS data in the work memory (L4_0). ) And the calculated SS data is stored in the work memory (L4_1) 234 in the calculation element (G) 201g. Next, 0_3 block data is read from the current memory 202, written to the cache memory (CM2) 204 and sent to the computation elements (B, C) 201b and 201c, and the computation element (B) 201b calculates the calculated SS. Data is stored in the work memory (L4_1) 234, and the computing element (C) 201c stores the calculated SS data in the work memory (L4_0) 233.
[0065]
Next, 1_3 block data is read from the frame memory 100 and written to the current memory 202, the cache memory (CM2) 204, and the work memory (L1_1) 217 of the calculation element (E) 201e, and the calculation element (F) 201f. Sent to. The computing element (F) 201f stores the calculated SS data in the work memory (L4_1) 234.
[0066]
Next, the 2_3 block is read from the frame memory 100, written to the cache memory (CM2) 204, and sent to the arithmetic elements (H, I) 201h and 201i. The computing element (I) 201i stores the calculated SS data in the work memory (L4_0) 233, and the computing element (H) 201h stores the calculated SS data in the work memory (L4_1) 234.
[0067]
When such processing is completed, the calculation element (E) 201e uses the work memory (L1_0, L2_0, L3_0, L4_0) 216, 218, 220, and 222 on the 0th plane and starts wavelet transform of the 1_2 block. To do. In parallel with this wavelet transform, 1_4 block data is read from the frame memory 100, written to the cache memories (CM2, CM3) 204, 205, and sent to the calculation element (F) 201f to calculate SS data. And stored in the work memory (L4_1) 234. Next, 2_4 blocks of data are read from the frame memory 100, sent to the arithmetic element (I) 201i, and written to the cache memory (CM2) 204. The computing element (I) 201i stores the calculated SS data in the work memory (L4_1) 234.
[0068]
Then, the wavelet transform for the 1_2 block is completed, and the converted data is written back to the frame memory 100. At the same time, the 0_4 block data is read from the current memory 202 and stored in the cache memory (CM2) 204. The SS data is written and sent to the arithmetic element (C) 201c, and the SS data is calculated and stored in the work memory (L4_1) 234. Subsequently, 1_4 data is read from the cache memory (CM 3) 205 and written to the current memory 202.
[0069]
The computing element (E) 201e starts wavelet transform for the 1_3 block using the work memory (L1_1, L2_1, L3_1, L4_1) 217, 219, 221, and 223 on the first surface. An operation for preparing a process for the next 1_4 block is performed. That is, data of the 0_3 block, 1_3 block, and 2_3 block are sequentially read from the cache memory (CM2) 204 and sent to the calculation elements (A, D, G) 201a, 201d, and 201g, respectively, and SS data is calculated. . The calculated SS data is stored in the work memory (L4_0) 233 of the computation elements (A, D, G) 201a, 201d, 201g, respectively. Then, the wavelet transform process for the 1_3 block is completed, and the converted data is written back to the frame memory 100.
[0070]
FIG. 18 is a process for the rightmost two blocks of the second block row. Since the operation in this case will be clear from the above description, the description is omitted.
[0071]
The processes corresponding to FIGS. 16 to 18 are repeated up to the block row immediately before the last block row.
[0072]
<FIG. 19> This is processing for the two leftmost blocks in the last block row. Since the operation in this case is almost the same as the case of FIG. 13, the description and timing chart are omitted.
[0073]
<< FIG. 20 >> The process for the next two blocks is shown. Since the operation in this case is almost the same as that in FIG. 14, the description and the timing chart are omitted.
[0074]
<FIG. 21> This is processing for the two blocks at the right end, but the operation in this case is almost the same as in FIG. 15, so the description and timing chart are omitted.
[0075]
FIG. 26 is a timing chart of wavelet transform processing in the calculation element (E) 201e. During the period from time t0 to t1, level 1 horizontal processing is performed on the data written in the work memory (L1_0 or L1_1) 216 or 217 using the filter (1) 213. In parallel with this horizontal processing, SS data of levels 1, 2, and 3 is calculated using the SS calculation unit 210, and the obtained SS data of levels 1, 2, and 3 is the work memory (L2_0 or L2_1) 218. Or 219, work memory (L3_0 or L3_1) 220 or 221, and work memory (L4_0 or L4_1) 222 or 223, respectively.
[0076]
Level 1 SS data is obtained by calculating the average value of two original data adjacent in the horizontal direction to obtain S data, and then calculating the average value of two adjacent S data in the vertical direction. It is done. That is, an average value of 2 × 2 original data is calculated. For level 2 SS data, the average value of two level 1 SS data adjacent in the horizontal direction is calculated to obtain S data, and then the average value of two adjacent S data in the vertical direction is calculated. It is obtained by doing. That is, an average value of 4 × 4 original data is calculated. For level 3 SS data, calculate the average value of two level 2 SS data adjacent in the horizontal direction to obtain S data, and then calculate the average value of two adjacent S data in the vertical direction. It is obtained by doing. That is, the average value of 8 × 8 original data is calculated. In the case of a 16 × 16 size block, 2 × 2 pieces of level 3 SS data are obtained.
[0077]
In addition to the filter (1) 213 for level 1 horizontal and vertical processing, a filter (2) 214 for level 2, 3 and 4 horizontal and vertical processing is provided. The conversion of 4 can be performed in parallel with the conversion of level 1 in an arbitrary order from the time when SS data for that purpose is prepared.
[0078]
In the timing chart of FIG. 26, level 1 vertical processing starts from time t1, and level 2 horizontal processing starts simultaneously. Level 2 horizontal processing is performed on level 1 SS data in work memory (L2_0 or L2_1) 218 or 219 using filter (2) 214. The obtained S data and D data are written back to the work memory (L2_0 or L2_1) 218 or 219. Next, vertical processing is performed on the S data and D data using the filter (2) 214, and level 2 conversion data is obtained in the work memory (L2_0 or L2_1) 218 or 219.
[0079]
From time t2, using the filter (2) 214, horizontal processing and vertical processing for level 2 SS data in the work memory (L3_0 or L3_1) 220 or 221 are executed in order, and level 3 conversion data is stored in the work memory. (L3_0 or L3_1) 220 or 221 is obtained. From time t3, level 3 SS data in the work memory (L4_0 or L4_1) 222 or 223 inside the computation element (E) 201e and the other computation elements (A to D, F to I) 201a to 201d, 201f Horizontal processing and vertical processing are sequentially executed using the filter (2) 214 for SS data (6 × 6 SS data in total) obtained in (L4_0 or L4_1) 233 or 234 inside ~ 201i Then, the level 4 conversion data is obtained in the work memory (L4_0 or L4_1) 222 or 223 of the calculation element (E) 201e.
[0080]
Since level 1 has a large amount of data, the time required for level 2, 3, and 4 processing is less than the time required for level 1 horizontal processing or vertical processing, so the time at which level 1 vertical processing ends The conversion of levels 2, 3, and 4 is completed at time t4 before t5.
[0081]
In FIG. 27, the frame memory 100 is divided into parts having the same processing contents, and identification numbers 1 to 15 are assigned to the respective parts. As described with reference to FIGS. 13 to 25, regarding the reading / writing with respect to the frame memory 100, substantially the same processing is performed in the parts 1 and 13, the parts 2 and 14, and the parts 3 and 15. Similarly, substantially the same processing is performed on the parts 4, 7, 10, 5, 8, 11, and 6, 9, 12. On the other hand, in the wavelet transform, the number of blocks to be processed is different in each part. In the parts 1, 3, 13, and 15, the number of blocks processed using the work memory on the 0th plane (or the 1st plane) of the calculation element is 2 × 2 blocks, and the 1st plane (or 0th plane) ) Is processed by 3 × 2 blocks. In the parts 2 and 14, the number of blocks to be processed is 3 × 2 blocks. In the parts 4, 7, 10, 6, 9, and 12, the number of blocks processed using the work memory on the 0th plane (or the 1st plane) is 2 × 3 blocks, and the 1st plane (or the 0th plane) is processed. The number of blocks processed using the work memory of 3) is 3 × 3 blocks. In the portions of 5, 8, and 11, all blocks processed are 3 × 3 blocks.
[0082]
Therefore, for the sake of simplicity and easy comparison with the timing chart, the processing time is calculated using a block of 16 × 16 size as a time unit. Since the wavelet transform is performed on the data in the center block of the 3 × 3 solid line grid or the dotted line grid shown in FIGS. 13 to 21, each memory is free for the non-target block, and the wavelet transform is performed. In parallel, the data can be written into the frame memory or read out in advance. In consideration of the above, the processing time in each of the parts 1 to 15 in FIG. 27 is calculated. The 1 part is 10 blocks, the 2 part is 12 blocks, the 3 part is 8 blocks, and the 4 part is 10 blocks. 5 part is 20 blocks, 6 part is 8 blocks, 7 part is 10 blocks, 8 part is 20 blocks, 9 part is 8 blocks, 10 part is 10 blocks, 11 part is 20 blocks, The 12 part is 8 blocks, the 13 part is 10 blocks, the 14 part is 12 blocks, and the 15 part is 8 blocks. The total processing time (when X = Y = 1024) is 10 + 12 × 30 + 8 = 378 blocks in the uppermost block and the lowermost block, and (10 + 20 × 30 + 8) × 62 = 38316 blocks in other blocks, for a total of 39072 blocks It becomes.
[0083]
When this is converted into the number of clocks, since the size of one block is 16 × 16, 9768k clocks (1k = 1024) are obtained. This number of clocks is the case where the wavelet transform and the reading / writing speed with respect to the frame memory are the same. Usually, the internal operation can be easily speeded up, so this numerical value is the worst case. For example, if the time required for the wavelet transform of the parts 5, 8, and 11 that require the longest time for the wavelet transform (8 parts occupy 88% of the total) is halved, the processing time is (20-9). + 9/2 = 15.5 blocks, and the total processing time is 30702 blocks. This is 7675.5k clocks in terms of the number of clocks, and all processes can be completed in about 79% of the time compared to the worst case.
[0084]
In the present invention, the latency that occurs during memory access in the line direction, which is a problem in the prior art, can be avoided by transferring data in units of blocks. Specifically, all reading / writing is data transfer in the pixel direction, and a sufficient time for precharging the sense amplifier of the memory at the time of data transfer in the pixel direction can be secured. That is, the latency problem can be solved by using a method such as bank switching.
[0085]
Now, SS, DS, SD, DD data of each level is stored in each work memory inside the calculation element (E) 201e by mapping as shown in FIG. x means a number indicating a level. However, the SS data at levels 1, 2, and 3 are meaningless because they are replaced with the higher-level data. Such levels of data can be written back to the frame memory 100 by mapping as shown in FIG. However, since the encoding / decoding unit 102 performs encoding or decoding on a bit plane basis from a highly important level and frequency band signal, the arrangement of frequency band signals is pre-sorted in consideration of the processing order. It is advantageous to write back to the frame memory 100 in this manner. This will be described below.
[0086]
FIG. 29 shows a concept called “alignment” in which each frequency band signal of each level obtained by wavelet transform is rearranged according to importance. In FIG. 29, one rectangle represents a certain frequency band signal at a certain level, and its length represents the bit depth. The bit depth varies depending on the configuration of the filter used. Here, the SD coefficient and the DS coefficient have the same bit depth, and the DD coefficient is one bit deeper than the SD coefficient and the DS coefficient. It is drawn as a thing. In the encoder / decoder 102, for example, when an image is handled, the data may be truncated to increase the compression rate. Alignment is used as one means for determining how to cut off, and bits of less important bits are cut off.
[0087]
FIG. 30 is a diagram for explaining the concept of a bit plane. For example, if the data is image data, a certain pixel has an address space represented by (x, y) and a bit depth. In the drawing, the shaded portion (for example, the MSB portion) is called a bit plane. In the coder / decoder 102, processing is performed in units of bit planes in blocks of several pixels. This is because when image data focuses on a certain pixel, the compression ratio is increased by utilizing the fact that the correlation with the surrounding pixels is high.
[0088]
Encoding or decoding is performed from left to right, which is more important in bit plane units (however, SS coefficients are excluded). In the case of the alignment as shown in FIG. 29, the most significant bit (MSB) of the level 4 SD coefficient, the most significant bit of the DS coefficient, the most significant bit of the DD coefficient, the one least significant (MSB-1) bit of the SD coefficient, . . . It becomes the order. Therefore, when the frequency band signal is written back to the frame memory 100 by mapping as shown in FIG. 45, it is clear that the encoding / decoding device 102 needs to perform addressing with respect to the frame memory 100. If the frame memory 100 is a memory that performs burst transfer such as SDRAM, the processing speed does not increase due to latency.
[0089]
In order to avoid this, in this embodiment, the frequency band signals are pre-sorted in descending order of importance so as to be an array that can be accessed continuously for each level and each frequency band. Write back to the frame memory 100. That is, in the case of the alignment as shown in FIG. 29, the data of the 16 × 16 area is converted into 4SS, 4SD, 4DS, 4DD, and then 3DS × 2, 3SD × from the upper left of the frame memory 100 as shown in FIG. 2,3DD × 2,. . . The level 1 data (1DS, 1SD, 1DD) is finally written back in the order of importance, and the level 1 data occupies three-quarters of the area. Such sorting of frequency band signals is performed, for example, by controlling the order of reading the frequency band signals from each work memory in the control unit 211.
[0090]
If the data is written back in such a sorted state, the encoding / decoding unit 102 can easily address the frame memory 100 with or without performing the truncation. Since it does not occur easily, reading / writing from / to the frame memory 100 from the encoder / decoder 102 can be accelerated.
[0091]
According to the second embodiment of the present invention, in the wavelet transform apparatus 101 having the overall configuration shown in FIG. 2, the configuration of the calculation element (E) 201e is partially changed as shown in FIG. That is, each of the work memories 216 to 223 is divided into four independent memories (sometimes referred to as SS memory, SD memory, DS memory, and DD memory) that are assigned to SS, SD, DS, and DD data. The configuration is The control unit 211 does not include the SS calculation unit 210. Further, the filter (1) 213 and the filter (2) 214 are not assigned exclusively for processing at a specific level, and are used by switching the input / output by the control unit 211 during horizontal processing and vertical processing of each level. .
[0092]
The overall operation of the wavelet transform apparatus 101 of this embodiment is the same as that of the first embodiment, and only the operation relating to the arithmetic element (E) 201e is partially different from that of the first embodiment. Hereinafter, the operation of the calculation element (E) 201e will be described with reference to the memory map of FIG. 32, the connection diagrams of FIGS. 33 and 34, and the timing chart of FIG.
[0093]
When the data in the frame memory 100 is transferred to the work memory (L1_0 or L1_1) 216 or 217 of the computing element (E) 201e, the data is processed by the control unit 211 according to the rules shown in the memory map of FIG. The data is written to the SS memory, SD memory, DS memory, and DD memory constituting the work memory. The rules for this distribution are as follows. The 0th pixel of the 0th line of the block of the frame memory 100 is stored in the 0th pixel of the 0th line of the SS memory, and the 1st pixel of the 0th line of the block is stored in the 0th pixel of the 0th line of the SD memory. The pixel data is stored. The 0th pixel of the first line of the block is stored in the 0th pixel of the 0th line of the DS memory, and the 0th pixel of the 0th line of the DD memory is stored in the first pixel of the first line of the block. Is stored. That is, the even-numbered pixel (the 0th line is counted as the even-numbered line) is stored in the SS memory, and the odd-numbered pixel in the even-numbered line is stored in the SD memory. Then, the even-numbered pixels on the odd-numbered lines are stored in the DS memory, and the odd-numbered pixels on the odd-numbered lines are stored in the DD memory.
[0094]
When data is stored in the work memory (L1_0 or L1_0) 216 or 217 in the manner described above by time t0 in the timing chart of FIG. 35, level 1 horizontal processing starts. At this time, the input / output of the filter (1) 213 and the filter (2) 214 is connected to the SS memory, SD memory, DS memory, and DD memory of the work memory (L1_0 or L1_0) 216 or 217 as shown in FIG. Is done. Since these four memories are independent, reading / writing can be performed at the same time. That is, the even lines are processed in parallel by the filter (1) 213 and the odd lines by the filter (2) 214, and the results are simultaneously written. When this horizontal processing is finished, level 1 vertical processing starts next. At this time, the input and output of the filter (1) 213 and the filter (2) 214 are changed as shown in FIG. That is, the even pixels are processed in parallel by the filter (1) 213 and the odd pixels are processed in parallel by the filter (2) 214, and the results are written simultaneously. Level 1 SS data obtained by the vertical processing is written to the SS memory, and the work memory (L2_0 or L2_1) 218 or 219 is configured by the control unit 211 according to the same rules as those described with reference to FIG. It is distributed and written to four memories.
[0095]
Then, level 2 processing starts from time t1. First, as shown in FIG. 33, the inputs and outputs of the filters (1, 2) 213 and 214 are connected to four memories constituting the work memory (L2_0 or L2_1) 218 or 219, and horizontal processing is executed. When the horizontal processing ends, the input and output of the filters (1, 2) 213 and 214 are switched to the connection as shown in FIG. 34, and the vertical processing is executed. The obtained level 2 SS data is distributed and written in the four memories constituting the work memory (L3_0 or L3_1) 220 or 221 in accordance with the same rules as described with reference to FIG. This is the time t2. Thereafter, the processing is similarly executed up to level 4, and the processing is completed at time t4. However, in the case of the level 4 processing, the level 3 SS data obtained by other calculation elements are also used as in the case of the first embodiment.
[0096]
As described above, in this embodiment, the processing of levels 1, 2, 3, and 4 is executed sequentially, but the time required for the processing of each level is one-fourth that of the first embodiment. . This is because the four memories constituting each work memory can be accessed simultaneously.
[0097]
The processing time in the present embodiment is as follows. When the processing time in each part 1 to 15 in FIG. 27 is obtained in block units, 1 part is 4.65625 blocks, 2 part is 3.984375 blocks, 3 part is 2.65625 block, 4 parts Is 4.656625 blocks, 5 is 7.9765625 blocks, 6 is 2.656625 blocks, 7 is 4.65625 blocks, 8 is 7.9765625 blocks, 9 is 2.65625 blocks 10 part is 4.65625 block, 11 part is 7.9765625 block, 12 part is 2.656625 block, 13 part is 4.656625 block, 14 part is 7.97665625 block, 15 part is 2.65625 blocks. As shown in FIG. 35, when the time of wavelet conversion is in units of blocks, level 1 is 0.25 × 2 blocks, level 2 is 0.0625 × 2 blocks, and level 3 is Numbers below the decimal point are generated, such as 0.0156625 × 2 blocks, and level 4 is 0.0O390625 × 2 blocks, but calculation is performed without omission here. The total processing time (when X = Y = 1024) is 4.65625 + 3.984375 × 30 + 2.665625 = 126.84375 blocks in the uppermost block and the lowermost block, and (4.665625 + 7.9756625 × 30 + 2) otherwise. .65625) × 60 = 147796.5625 blocks, for a total of 15050.25 blocks. In terms of the number of clocks, since one block is 16 × 16, it becomes 3762.5625k clock (1k = 1024). This number of clocks corresponds to the case where the speed of reading / writing with respect to the wavelet conversion and the frame memory is the same. This processing speed is only 49% faster than the 7675.5k clock of the first embodiment, and is 35% faster when the latency is 2, compared to the prior art. It will be understood that this is possible.
[0098]
As described above, since processing is performed in units of bits in the encoder / decoder 102, it is apparent that the processing time is several to ten times that of the wavelet transform. If data is read / written to / from the frame memory 100, the processing time is further increased. As described above, encoding / decoding is performed in order from a frequency band signal having a higher importance level. Generally, a higher level frequency band signal has a higher importance level. However, conventionally, wavelet transformation has been performed sequentially from level 1 to a higher level, and in order to obtain a high-level frequency band signal, the end of wavelet transformation had to be waited for. On the other hand, in the present invention, wavelet transform is performed in units of small blocks, and furthermore, frequency band signals at all levels of the blocks can be obtained almost simultaneously.
[0099]
In view of the characteristics of the wavelet transform apparatus of the present invention, according to the third embodiment of the present invention, in order to absorb the difference in processing time between the wavelet transform and the encoding / decoding process, FIG. 36 is simplified. As shown, the buffer memory 250 that operates as a cache memory of the frame memory 100 is added to the main control unit 207 in the wavelet transform device 101 having the same configuration as that of the first embodiment or the second embodiment. The buffer memory 250 can be accessed by the encoder / decoder 102. In the wavelet transform apparatus 101 according to the present embodiment, a part of the frequency band signal data (for example, high-level frequency band signal data) is written in the buffer memory 250 by performing wavelet transform in units of blocks. In parallel with this wavelet transform, the encoder / decoder 102 executes encoding / decoding processing while reading data that can be acquired from the buffer memory 250 from the buffer memory 250 and reading other data from the frame memory 100.
[0100]
For example, it is assumed that the size of the frame memory 100 is X = Y = 1024, and all data of level 3 and level 4 are written in the buffer memory 250. In this case, the required size of the buffer memory 250 is [128 × 128 (level 3 minutes) + 64 × 64 (level 4 minutes)] × 3 (SS coefficients are excluded because they are not encoded / decoded) = It is only 60 kB (1 k = 1024).
[0101]
The processing time in this case will be described. In the prior art, the wavelet transform takes 10880k clock (in the case of latency 2), and the number of data to be processed by the encoder / decoder is 1024k-16k (for SS data) = 1008k, which is required for encoding / decoding one data. If the time is 10 clocks, the processing time in the encoder / decoder is 1,080 k clocks. Therefore, the total time of wavelet transform and encoding / decoding processing is 20960 k clocks.
[0102]
On the other hand, in this embodiment, if the wavelet transform apparatus 101 has the configuration of the second embodiment, the processing time of the wavelet transform is 3762.5625 k clocks. As described above, when the buffer memory 250 corresponding to the size of the level 3 and the level 4 is prepared, since the number of data is 240 kB, the time required for the encoding / decoding process is 2400 k clocks. However, since the encoding / decoding process is performed in parallel with the wavelet transform, this 2400 k clock is included in the time for the wavelet transform. Since the number of data to be processed at level 1 and level 2 is 960 kB, the time required for the encoding / decoding process is 9600 k clocks. Accordingly, the total processing time is 13622.5625 k clock obtained by adding the wavelet transform processing time 3762.5625 k clock to the 9600 k clock, which is reduced to about 64% as compared with the prior art. In this way, the entire process can be greatly speeded up.
[0103]
【The invention's effect】
  Claims 1 to 5The wavelet transform device according to the invention described in each of the items can perform wavelet transform processing up to a desired level in a block unit using a plurality of arithmetic elements at a high speed, and can also read data from the frame memory to the frame memory. Since the frequency band signal is written back in block units, the influence of latency can be avoided, so the entire process including memory access to the frame memory can be significantly faster than before, and in block units. The frequency band signals of all wavelet levels can be obtained almost simultaneously. A wavelet transform device according to the invention of claim 2 is provided.TwoBy reading and writing to the frame memory in parallel while continuously processing adjacent blocks, the processing speed can be further increased. Since the wavelet transformation device according to the third or fourth aspect of the invention can speed up the wavelet transformation processing in the first arithmetic element, the processing speed can be further increased.Claim 6 or 7The encoding / decoding device according to the invention described above speeds up the wavelet transform process and simultaneously obtains the frequency band signals of all wavelet levels in a block unit on the frame memory. It starts immediately after the start and can be executed in parallel with the wavelet transform process, and thus can operate at a much higher speed than in the prior art. In particular, when the frequency band signals are sorted back in order of importance by the wavelet transform device and written back to the frame memory, the access to the frame memory of the encoder / decoder can be speeded up. Is possible. Also,Claim 7The encoding / decoding device according to the described invention eliminates the need to access the frame memory for data that can be acquired from the buffer memory, so that it is possible to achieve higher speed operation and the like.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an example of the overall configuration of an encoding / decoding device according to the present invention.
FIG. 2 is a block diagram showing an internal configuration of the wavelet transform device according to the first embodiment of the present invention.
FIG. 3 is a block diagram showing an internal configuration of a calculation element (E) according to the first embodiment of the present invention.
FIG. 4 is an explanatory diagram of sizes of work memories (L1_0, L1_1) and cache memories (CM1, CM3, CM4);
FIG. 5 is an explanatory diagram of the size of a cache memory (CM2).
FIG. 6 is an explanatory diagram of the size of a work memory (L2_0, L2_1).
FIG. 7 is an explanatory diagram of the size of a work memory (L3_0, L3_1).
FIG. 8 is an explanatory diagram of a size of a work memory (L4_0, L4_1).
FIG. 9 is an explanatory diagram of the size of a frame memory.
FIG. 10 is an explanatory diagram of the size of a current memory.
FIG. 11 is a block diagram showing an internal configuration of arithmetic elements (A to D, F to I) according to the first embodiment of the present invention.
FIG. 12 is a diagram illustrating an example of mapping of frequency band signals up to level 3 per block of 16 × 16 size.
FIG. 13 is a diagram for explaining processing of two blocks at the left end of the first block row;
FIG. 14 is a diagram for explaining processing of the next two blocks of the first block row;
FIG. 15 is a diagram for explaining processing of two blocks at the right end of the first block row;
FIG. 16 is a diagram for explaining processing of two blocks at the left end of the second block row;
FIG. 17 is a diagram for explaining processing of the next two blocks of the second block row;
FIG. 18 is a diagram for explaining processing of two blocks at the right end of the second block row;
FIG. 19 is a diagram for explaining processing of two blocks at the left end of the last block row;
FIG. 20 is a diagram for explaining processing of the next two blocks of the last block row;
FIG. 21 is a diagram for explaining processing of two blocks at the right end of the last block row;
FIG. 22 is a diagram for explaining data caching by cache memories (CM1 to CM4);
FIG. 23 is a timing chart of the wavelet transform device.
FIG. 24 is a timing chart of the wavelet transform device.
FIG. 25 is a timing chart of the wavelet transform device.
FIG. 26 is a timing chart of a calculation element (E) in the first embodiment of the present invention.
FIG. 27 is a diagram in which a frame memory is divided into a plurality of parts in order to explain the processing time.
FIG. 28 is a diagram showing mapping of frequency band signals obtained in each work memory of the computation element (E).
FIG. 29 is an explanatory diagram of alignment of frequency band signals.
FIG. 30 is an explanatory diagram of a bit plane.
FIG. 31 is a block diagram showing an internal configuration of a calculation element (E) in the second embodiment of the present invention.
FIG. 32 is a diagram showing a method of distributing data to SS, DS, SD, and DD memories constituting the work memory in the calculation element (E) in the second embodiment of the present invention.
FIG. 33 is a diagram illustrating an input / output connection method of a filter during horizontal processing.
FIG. 34 is a diagram illustrating a filter input / output connection method during vertical processing;
FIG. 35 is a timing chart of the calculation element (E) in the second embodiment of the present invention.
FIG. 36 is a diagram showing a configuration of an encoding / decoding device and a wavelet transform device according to a third example of the present invention.
FIG. 37 is a diagram illustrating a general configuration of a conventional wavelet transform device.
FIG. 38 is an explanatory diagram of filter operations for horizontal processing and vertical processing of wavelet transform.
FIG. 39 is a diagram illustrating an example of a memory map of image data before wavelet transform.
FIG. 40 is a diagram illustrating an example of a memory map for 1S coefficients and 1D coefficients.
FIG. 41 is a diagram illustrating an example of a memory map for a 1SS coefficient, a 1SD coefficient, a 1DS coefficient, and a 1DD coefficient.
FIG. 42 is a diagram illustrating an example of a memory map for 2S coefficients and 2D coefficients.
FIG. 43 is a diagram illustrating an example of a memory map for 2SS coefficients, 2SD coefficients, 2DS coefficients, and 2DD coefficients.
FIG. 44 is a timing chart of a conventional general wavelet transform device.
FIG. 45 is a diagram illustrating a general memory map of frequency band signals up to level 4;
FIG. 46 is a diagram showing an example of a memory map when frequency band signals up to level 4 are written back in descending order of importance.
[Explanation of symbols]
100 frame memory
101 Wavelet transform device
102 Coding decoder
200 Calculation unit
201a to 201i computing elements (A to I)
202 Current memory
203 Cache memory (CM1)
204 Cache memory (CM2)
205 Cache memory (CM3)
206 Cache memory (CM4)
207 Main control unit
210 SS calculator
211 Control unit
213 Filter (1)
214 Filter (2)
216 Work memory (L1_0)
217 Work memory (L1_1)
218 Work memory (L2_0)
219 Work memory (L2_1)
220 Work memory (L3_0)
221 Work memory (L3_1)
222 Work memory (L4_0)
223 Work memory (L4_1)
230 SS calculator
231 control unit
233 Work memory (L4_0)
234 Work memory (L4_1)
250 buffer memory

Claims

A wavelet transform device that reads data from a frame memory in block units, performs wavelet transform processing of n levels (n is an integer of 3 or more), and writes the obtained frequency band signals of n level wavelet transform coefficients back to the frame memory Because
A memory for temporarily storing data read from the frame memory;
A calculation unit including at least one first calculation element and a plurality of second calculation elements;
The plurality of second calculation elements calculate low frequency band signals in wavelet transform coefficients of level (n−1), respectively, from data of blocks around the target block read from the frame memory,
The first calculation element executes an n-level wavelet transform process on the data of the block of interest read from the frame memory, and at this time, the level n wavelet transform process is obtained by the first calculation element. Performing using the low frequency band signal of level (n-1) and the low frequency band signal of level (n-1) calculated by the plurality of second calculation elements, all the frequencies of the wavelet transform coefficients of n level Calculate the band signal,
An n-level all frequency band signal obtained by the wavelet transformation process of the first arithmetic element is written back to the frame memory in units of blocks.

The first calculation element has two work memories corresponding to each level , the second calculation element has two work memories, and wavelet transform processing for two adjacent blocks of data in the frame memory is continuous. 2. The wavelet transform apparatus according to claim 1, wherein the wavelet transform process, the data read from the frame memory, and the write back of the frequency band signal to the frame memory are performed in parallel.

Wherein the first computing element calculates the low frequency band signal of each level as well as comprising a first second filter used in wavelet transform process of the filter and level 2~n used in wavelet transform process of the level 1 comprising means for, in the above first computing element, according to claim 2, wherein the wavelet transform process of the wavelet transform process and level 2~n level 1 for data of the target block is characterized in that it is performed in parallel Wavelet transform device.

The work memory corresponding to each level of the first calculation element is divided into four memories that can be accessed independently, and the first filter used for the level 1 wavelet transform processing and the level 2 comprising a second filter used in wavelet transform processing ~n, wavelet transform apparatus according to claim 2, wherein the wavelet transform process is characterized in that it is executed by the parallel processing using the two filters.

A buffer memory for temporarily storing a part of the frequency band signal obtained by the wavelet transform processing of the first arithmetic element is further provided, and the buffer memory is accessible from the outside. Item 4. The wavelet transform device according to item 1, 2 or 3.

5. An encoding / decoding apparatus comprising: a frame memory; a wavelet transform apparatus according to claim 1, 2, 3, or 4 that can access the frame memory; and an encoding / decoding apparatus that can access the frame memory.

6. A coding / decoding apparatus comprising: a frame memory; a wavelet transform apparatus capable of accessing the frame memory; and a coding / decoding apparatus capable of accessing the frame memory and a buffer memory of the wavelet transform apparatus. .