JP3749826B2

JP3749826B2 - Inverse discrete cosine transform circuit

Info

Publication number: JP3749826B2
Application number: JP2000280824A
Authority: JP
Inventors: 義治上谷
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2000-09-14
Filing date: 2000-09-14
Publication date: 2006-03-01
Anticipated expiration: 2020-09-14
Also published as: JP2002091942A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像データの伸長時や圧縮時に使用される逆離散コサイン変換回路に係り、特に８点の逆離散コサイン変換回路に関する。
【０００２】
【従来の技術】
現在、複数チャネルのＴＶ放送番組の同時表示が可能なＴＶ受信機が普及している。地上波や衛星によるディジタルＴＶ放送においても、ＴＶ受信機では複数チャンネルの放送番組の同時表示を可能とすることが望まれる。
【０００３】
ディジタルＴＶ放送においては、ＭＰＥＧ方式による画像データの圧縮が用いられ、ＴＶ受信機側では圧縮された画像データを伸長して元の画像データに戻す伸長処理装置(デコーダ)が用いられる。ディジタル放送用のＴＶ受信機において複数チャネルの放送番組の同時表示を実現しようとする場合、複数チャネルに対応して複数のデコーダを用意することが考えられるが、装置の規模が大きくなるという問題がある。従って、複数チャンネルの圧縮画像データを短時間に伸張処理可能な小規模なデコーダが望まれる。さらに、高精細ＴＶ放送(いわゆるハイビジョン放送)の複数画面の同時表示においても対応可能にするためには、デコーダにはさらに高速性も要求される。
【０００４】
ＭＰＥＧ方式は基本的に、動き補償予測と離散コサイン変換(以下、ＤＣＴという)及び可変長符号化の３つの要素の組み合わせで画像データの圧縮を行う技術であり、デコーダ側ではエンコーダ側とは逆に、可変長復号化と逆離散コサイン変換(以下、逆ＤＣＴという)及び動き補償予測の組み合わせで圧縮画像データの伸長を行う。逆ＤＣＴは、エンコーダの局部復号系にも用いられる。ＭＰＥＧ方式のエンコーダやデコーダでは、逆ＤＣＴ回路の高速化と回路規模削減が大きな課題の一つとなる。
【０００５】
この要求に対応した従来の技術として、特開平５−１８１８９６号公報に小規模で高速処理可能なＤＣＴ／逆ＤＣＴ回路の例が示されている。この回路は８点のＤＣＴと逆ＤＣＴを可能とするものであり、前段の９個の加減算器はＤＣＴ専用で、逆ＤＣＴのみの機能は７個の固定係数乗算器と９個の加減算器で構成可能である。しかし、この回路構成では固定係数乗算器の処理速度の２倍の画素レートの画像データにまでしか対応できず、高精細ＴＶ放送の複数画面の同時表示のためには、固定係数乗算器や加減算器の途中演算結果を保持するためのレジスタの挿入が必要になり、回路規模が増大するという問題があった。
【０００６】
【発明が解決しようとする課題】
上述したように、従来の技術では逆ＤＣＴに際して固定係数乗算器の処理速度の２倍の画素レートの画像データにまでしか対応できず、高精細ＴＶ放送の複数画面を同時表示しようとすると、固定係数乗算器や加減算器の途中演算結果を保持するためのレジスタが必要になるため回路規模が増大し、コストが高くなるという問題があった。
【０００７】
本発明は、複数の高精細画像等の復号に必要な高速処理を小規模な回路で実現可能にする逆離散コサイン変換回路を提供することを目的とする。
【０００８】
【課題を解決するための手段】
本発明は、高速演算アルゴリズムを使用して４つのパスに分けて並列処理することにより固定係数乗算器の使用を可能にし、４つの積和回路と９個の加減算回路からなる構成で、乗算器の動作速度の４倍の画素レートの画像処理を可能にするようにしたものである。
【０００９】
すなわち、本発明に係る逆離散コサイン変換回路は、入力される８点の離散コサイン変換係数データのうちの偶数番目のデータを２点ずつ同時に入力して逆離散コサイン変換処理を行う第１の部分逆ＤＣＴ回路と、８点の離散コサイン変換係数データのうちの奇数番目のデータを２点ずつ同時に入力して逆離散コサイン変換処理を行う第２の部分逆ＤＣＴ回路と、第１及び第２の部分逆ＤＣＴ回路からの出力データを加減算して８点の逆離散コサイン変換データを得る出力演算回路とを備えたことを特徴とする。
【００１０】
より具体的には、第１の部分逆ＤＣＴ回路は、第１及び第２の入力端子に同時に入力される２点の離散コサイン変換データの加減算を行う第１の加減算回路と、第１及び第２の入力端子に同時に入力される２点の離散コサイン変換データの積和演算を行う第１の積和回路と、第１の加減算回路の出力データと第１の積和回路の出力データとの加減算を互いに異なるタイミングで行う第２及び第３の加減算回路とにより構成される。
【００１１】
また、第２の部分逆ＤＣＴ回路は、第３及び第４の入力端子に同時に入力される２点の離散コサイン変換データを多重する多重処理回路と、第３及び第４の入力端子に同時に入力される２点の離散コサイン変換データの積和演算を行う第２の積和回路と、多重処理回路の出力データと第２の積和回路の出力データとの加減算を互いに異なるタイミングで行う第４及び第５の加減算回路と、第４及び第５の加減算回路の出力データの積和演算を互いに異なるタイミングで行う第３及び第４の積和回路とにより構成される。
【００１２】
さらに、出力演算回路は、第２の加減算回路の出力データと第３の積和回路の出力データとの加減算を互いに異なるタイミングで行う第６及び第７の加減算回路と、第３の加減算回路の出力データと第４の積和回路の出力データとの加減算を互いに異なるタイミングで行う第７及び第８の加減算回路とにより構成される。
【００１３】
また、第１乃至第４の積和回路は固定係数乗算器を有し、二つの入力データに対して該固定係数乗算器により固定係数の乗算と四捨五入処理を行うことを特徴とする。
【００１４】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態を説明する。
図１に、本発明の一実施形態に係る逆ＤＣＴ回路の構成を示す。この逆ＤＣＴ回路は、大きく分けて、第１の部分逆ＤＣＴ回路１０、第２の部分逆ＤＣＴ回路２０及び出力演算回路３０から構成される。
【００１５】
この逆ＤＣＴ回路は、図２の演算フローグラフに示す演算アルゴリズムを実行するものであり、第１の部分逆ＤＣＴ回路１０の処理は図２の１０Ａの部分、第２の部分逆ＤＣＴ回路２０の処理は図２の２０Ａの部分、出力演算回路３０の処理は図２の３０Ａの部分にそれぞれ相当している。図３(ａ)(ｂ)(ｃ)は、図２の演算フローグラフで使用している部分演算の内容の説明図である。この演算フローを数式で表すと、次の通りである。
【００１６】
cos(α＋β)＝cosα・cosβ−sinα・sinβ
cos(α−β)＝cosα・cosβ＋sinα・sinβ
2cosα・cosβ＝cos(α−β)＋cos(α＋β)
2sinα・sinβ＝cos(α−β)−cos(α＋β)
cos2α＋cos2β＝2cos(α−β)・cos(α＋β)
cos2α−cos2β＝−2sin(α−β)・sin(α＋β)
2・(cosα)²＝1＋cos2α
2・(sinα)²＝1−cos2α
cos(π/4+β)=cosπ/4・(cosβ-sinβ)
cos(π/4-β)=cosπ/4・(cosβ+sinβ)
cosπ/4=1/sq(2)
--------------dct--------------------
A=x0+x7
B=x3+x4
C=x1+x6
D=x2+x5
f0=(A+B)+(C+D)
f4=(A+B)-(C+D)
f2=sq(2)*{(A-B)cosπ/8＋(C-D)sinπ/8}
f6=sq(2)*{(A-B)sinπ/8−(C-D)cosπ/8}
J=x0-x7
K=x3-x4
L=x1-x6
M=x2-x5
f1＝sq(2)*(J・cosπ/16+K・sinπ/16+L・cos3π/16+M・sin3π/16)
f3＝sq(2)*(J・cos3π/16-K・sin3π/16-L・sinπ/16-M・cosπ/16)
f5＝sq(2)*(J・sin3π/16+K・cos3π/16-L・cosπ/16+M・sinπ/16)
f7＝sq(2)*(J・sinπ/16-K・cosπ/16-L・sin3π/16+M・cos3π/16)
--------------idct--------------------
f0+f4=2*(A+B)
f0-f4=2*(C+D)
f2*sq(2)*cosπ/8+f6*sq(2)*sinπ/8
=2*(A-B)*{(cosπ/8)²+(sinπ/8)²}=2*(A-B)
f2*sq(2)*cosπ/8-f6*sq(2)*sinπ/8
=2*(C-D)*{(sinπ/8)²+(cosπ/8)²}=2*(C-D)
2*(A+B)+2*(A-B)=4*A=4*(x0+x7)
2*(A+B)-2*(A-B)=4*B=4*(x3+x4)
2*(C+D)+2*(C-D)=4*C=4*(x1+x6)
2*(C+D)-2*(C-D)=4*D=4*(x2+x5)
(f3-f5)・cosπ/4
=sq(2)・cosπ/4・{J・(cos3π/16-sin3π/16)-K・(cos3π/16+sin3π/16)+L・(cosπ/16-sinπ/16)-M・(cosπ/16+sinπ/16)}
=sq(2)*{J・cos7π/16-K・cosπ/16+L・cos5π/16-M・cos3π/16}
=sq(2)*{J・sinπ/16-K・cosπ/16+L・sin3π/16-M・cos3π/16}
(f3+f5)・cosπ/4
=sq(2)*{J・cosπ/16+K・cos7π/16-L・cos3π/16-M・cos5π/16}
=sq(2)*{J・cosπ/16+K・sinπ/16-L・cos3π/16-M・sin3π/16}
f1+(f3+f5)・cosπ/4
＝sq(2)*(J・cosπ/16+K・sinπ/16+L・cos3π/16+M・sin3π/16)+sq(2)*(J・cosπ/16+K・sinπ/16-L・cos3π/16-M・sin3π/16)
＝2*sq(2)*(J・cosπ/16+K・sinπ/16)f1-(f3+f5)・cosπ/4
＝sq(2)*(J・cosπ/16+K・sinπ/16+L・cos3π/16+M・sin3π/16)-sq(2)*(J・cosπ/16+K・sinπ/16-L・cos3π/16-M・sin3π/16)
＝2*sq(2)*(L・cos3π/16+M・sin3π/16)
(f3-f5)・cosπ/4+f7
＝sq(2)*(J・sinπ/16-K・cosπ/16+L・sin3π/16-M・cos3π/16)
+sq(2)*(J・sinπ/16-K・cosπ/16-L・sin3π/16+M・cos3π/16)
＝2*sq(2)*(J・sinπ/16-K・cosπ/16)
(f3-f5)・cosπ/4-f7
＝sq(2)*(J・sinπ/16-K・cosπ/16+L・sin3π/16-M・cos3π/16)
-sq(2)*(J・sinπ/16-K・cosπ/16-L・sin3π/16+M・cos3π/16)
＝2*sq(2)*(L・sin3π/16-M・cos3π/16)
2*sq(2)*(J・cosπ/16+K・sinπ/16)*sq(2)*cosπ/16
+2*sq(2)*(J・sinπ/16-K・cosπ/16)*sq(2)*sinπ/16
＝4*J*{(cosπ/16)²+(sinπ/16)²}
＝4*J=4*(x0-x7)
2*sq(2)*(J・cosπ/16+K・sinπ/16)*sq(2)*sinπ/16-2*sq(2)*(J・sinπ/16-K・cosπ/16)*sq(2)*cosπ/16
＝4*K*{(sinπ/16)²+(cosπ/16)²}
＝4*K=4*(x3-x4)
2*sq(2)*(L・cos3π/16+M・sin3π/16)*sq(2)*cos3π/16
+2*sq(2)*(L・sin3π/16-M・cos3π/16)*sq(2)*sin3π/16
＝4*L*{(cosπ/16)²+(sinπ/16)²}
＝4*L=4*(x1-x6)
2*sq(2)*(L・cos3π/16+M・sin3π/16)*sq(2)*sin3π/16
+2*sq(2)*(L・sin3π/16-M・cos3π/16)*sq(2)*cos3π/16
＝4*M*{(sinπ/16)²+(cosπ/16)²}
＝4*M=4*(x3-x4)
さらに、これらの和差の計算により、８倍の画素値が再生されるが、３ビットシフトにより正常な画素値となる。このビットシフトは、２次元処理完了後に纏めて行う。
【００１７】
本実施形態では、図示しないＤＣＴ回路により８点の画素配列{x0,x1,…,x6,x7}を１ブロックとしてブロック単位にＤＣＴを行って得られた８点のＤＣＴ係数データ{f0,f1,…,f6,f7}が逆ＤＣＴ回路に入力され、この８点ＤＣＴ係数データに対して逆ＤＣＴを行う場合について説明する。
【００１８】
ここで、f0,f1,…,f6,f7は１ブロック内の画像の各周波数成分のＤＣＴ係数データであり、f0は画像のＤＣ成分に対応し、f1,f2,…と順次高い周波数成分に対応し、f7が最高周波数成分に対応する。また、f0,f2,f4,f6,f8を偶数番目のデータ、f1,f3,f5,f7を奇数番目のデータとする。
【００１９】
図１において、入力端子Ａinには８点ＤＣＴ係数データのデータf2とf0が交互に入力され、入力端子Ｂinには８点ＤＣＴ係数データのデータf3とf1が交互に入力され、入力端子Ｃinには８点ＤＣＴ係数データのデータf6とf4が交互に入力され、そして入力端子Ｄinには８点ＤＣＴ係数データのデータf5とf7が交互に入力される。
【００２０】
第１の部分逆ＤＣＴ回路１０は入力端子Ａin，Ｃinに入力される偶数番目のデータに対する逆ＤＣＴを行い、第２の部分逆ＤＣＴ回路２０は、入力端子Ｂin，Ｄinに入力される奇数番目のデータに対する逆ＤＣＴを行う。出力演算回路３０は、第１及び第２の部分逆ＤＣＴ回路１０，２０からの出力データを加減算して、８点の逆ＤＣＴデータを出力端子Ａout，Ｂout，Ｃout，Ｄoutへ出力する。
【００２１】
以下、図１の各部の詳細な構成を図４〜図２９により説明する。図４〜図１８は図１の各部の詳細な構成を示し、図１９〜図２８は積和回路で使用される固定係数乗算器の構成を示し、図２９は積和回路で使用される加算器の構成を示している。
【００２２】
［第１の部分逆ＤＣＴ回路１０について］
第１の部分逆ＤＣＴ回路１０は、遅延回路１１，１２、加減算回路１３、積和回路１４、加減算回路１５及び加減算回路１６からなり、入力端子Ａin，Ｃinに入力される偶数番目のデータ(f0,f4またはf2,f6)に対する逆ＤＣＴを行う。この第１の部分逆ＤＣＴ回路１０の１次元目の入力順序及び出力順序を表１及び表２に示す。
【００２３】
【表１】

【００２４】
【表２】

【００２５】
また、表２を数式で表すと次のようになる。
jd15a＝{jf0＋jf4}＋{jf6*√2*sin(π/8)＋jf2*√2*cos(π/8)}
jd15b＝{jf0＋jf4}−{jf6*√2*sin(π/8)＋jf2*√2*cos(π/8)}
jd16a＝{jf0−jf4}＋{jf2*√2*sin(π/8)−jf6*√2*cos(π/8)}
jd16b＝{jf0−jf4}−{jf2*√2*sin(π/8)−jf6*√2*cos(π/8)}
(j＝0, 1, ・・・, 7)
(遅延回路１１及び１２)
遅延回路１１及び１２は、例えば図４に示すようにいずれも３個のＤ型フリップフロップ(ＤＦＦ)１０１〜１０３を縦続接続して構成され、それぞれ入力端子Ａin，Ｃinから入力されるデータ(f0,f4またはf2,f6)を３クロック期間遅延させて出力する。これらの遅延回路１１及び１２は、第１の部分逆ＤＣＴ回路０の出力データを後述する第２の部分逆ＤＣＴ回路２０の出力データとタイミングを合わせて出力するためのいわゆる遅延補償用である。遅延回路１１及び１２の入出力及び内部状態の関係をそれぞれ表３及び表４に示す。
【００２６】
【表３】

【００２７】
【表４】

【００２８】
(加減算回路１３)
加減算回路１３は、図５に示すように切替器１１１，１１２、ＤＦＦ１１３，１１４、ビット反転器１１５及び加算器１１６から構成され、遅延回路１１及び１２により３クロック期間遅延されて同時に入力されるデータf0,f4またはf2,f6を切替器１１１，１１２を介してＤＦＦ１１３，１１４により２クロック期間保持し、それらのデータの和(f0+f4＝d13a)と差(f0-f4＝d13b)をビット反転器１１５を介して加算器１１６により時分割で演算し、和(d13a)、差(d13b)を順次出力する。
【００２９】
切替器１１１，１１２は、制御信号en4eが高レベルのときＨ側に入力される遅延回路１１１，１１２の出力データを選択し、en4eが低レベルのときＬ側に入力されるＤＦＦ１１３，１１４の出力データを選択する。ビット反転器１１５は、制御信号Tg2eが高レベルのときビット反転を行って加算器１１６から差のデータを出力させ、Tg2eが低レベルのときはビット反転を行わず、加算器１１６から和のデータを出力させる。
【００３０】
この加減算回路１３は、１次元目の演算ではＤＣＴ係数データの量子化誤差を考慮してダイナミックレンジが増加するため、１２ビット入力に対して１３ビット出力となるが、２次元目の演算ではその量子化誤差の考慮が不要なので、ダイナミックレンジは変化せず、１６ビット入力に対して１６ビットの出力となる。加減算回路１３の入出力及び内部状態の関係を表５に示す。
【００３１】
【表５】

【００３２】
(積和回路１４)
積和回路１４(Ｔ1/8)は、図６に示すように切替器１２１，１２２、ＤＦＦ１２３，１２４、切替器１２５，１２６、固定乗算器１２７，１２８、ＤＦＦ１２９，１３０、ビット反転器１３１及び加算器１３２により構成される。
【００３３】
この積和回路１４では、遅延回路１１及び１２により３クロック期間遅延されて同時に入力されるデータｆ２，ｆ６を切替器１２１，１２２を介してＤＦＦ１２３，１２４により２クロック期間保持し、それらのデータに対して切替器１２５，１２６を介して固定乗算器１２７，１２８により固定の乗算係数√2*sin(π/8)，√2*cos(π/8)の乗算を交互に行って、各乗算結果をＤＦＦ１２９，１３０に保持する。
【００３４】
そして、ビット反転器１３１及び加算器１３２により各々の乗算結果の和(f6*√2*sin(π/8)＋f2*√2*cos(π/8)＝ｄ14a)及び差(f2*√2*sin(π/8)-f6*√2*cos(π/8)＝ｄ14b)を時分割で演算し、和(ｄ14a)、差(ｄ14b)を順次出力する。
【００３５】
切替器１２１，１２２，１２５，１２６及びビット反転器１３１の動作は、図５で説明した切替器１１１，１１１２及びビット反転器１１５の動作と同様である(以下、同様とする)。
【００３６】
【表６】

【００３７】
固定乗算器１２７，１２８は、固定乗算係数を表１に示す様な１５ビットのデータとすると、１次元目の演算ではＤＣＴ係数データの量子化誤差を考慮して図１９及び図２１に示す様な回路構成になるが、２次元目の演算ではその量子化誤差の考慮が不要なので、図１９や図２１の回路に比べ演算結果の最上位ビットの位置が下がり、図２０や図２２に示す様な回路構成になる。
【００３８】
また、ビット反転器１３１及び加算器１３２による加減算においてはダイナミックレンジの変化は無いが、ここでは図２９に示すように２０ビットの演算結果に対して正方向の四捨五入を行い、１６ビットで出力する加算回路を加算器１３２に用いるものとする。この積和回路１４の入出力及び内部状態の関係を表７に示す。
【００３９】
【表７】

【００４０】
(加減算回路１５)
加減算回路１５は、図５に示した加減算回路１３と同様、図７に示すように切替器１４１，１４２、ＤＦＦ１４３，１４４、ビット反転器１４５及び加算器１４６からなる。
【００４１】
この加減算回路１５では、加減算回路１３及び積和回路１４から同時に入力される加算結果ｄ13a，ｄ14aを切替器１４１，１４２を介してＤＦＦ１４３，１４４で２クロック期間保持し、それらのデータの和(ｄ13a＋ｄ14a＝ｄ15a)と差(ｄ13a−ｄ14a＝ｄ15b)をビット反転器１４５を介して加算器１４６により時分割で演算し、和(ｄ15a)、差(ｄ15b)を順次出力する。
【００４２】
この加減算においては、１次元目の演算では加減算回路１３からの入力データが１３ビットであり、３ビット左シフト(８倍)して積和回路１４からの入力データを加減算するが、２次元目の演算では加減算回路１３及び積和回路１４からの入力データはいずれも１６ビットであるので、ビットシフトせずにそのまま加減算を行う。尚、これらの演算によるダイナミックレンジの変化は無い。加減算回路１５の入出力及び内部状態の関係を表８に示す。
【００４３】
【表８】

【００４４】
(加減算回路１６)
加減算回路１６も同様に、図８に示すように切替器１５１，１５２、ＤＦＦ１５３，１５４、ビット反転器１５５及び加算器１５６から構成される。
この加減算回路１６では、加減算回路１３及び積和回路１４から同時に入力される減算結果ｄ13b，ｄ14bを２クロック期間保持し、それらのデータの和(ｄ13b＋ｄ14b＝ｄ16a)と差(ｄ13b-ｄ14b＝ｄ16b)をビット反転器１５５を介して加算器１５６により時分割で演算し、和(ｄ16a)、差(ｄ16b)を順次出力する。このように加減算回路１６の加減算の処理内容は加減算回路１５と同じであり、演算タイミングのみが加減算回路１５と異なる。加減算回路１６の入出力及び内部状態の関係を表９に示す。
【００４５】
【表９】

【００４６】
［第２の部分逆ＤＣＴ回路２０について］
第２の部分逆ＤＣＴ回路２０は、多重処理回路２１、積和回路２２、加減算回路２３、加減算回路２４、積和回路２５及び積和回路２６からなり、入力端子Ｂin，Ｄinに入力される奇数番目のデータ(f1,f7またはf3,f7)に対する逆ＤＣＴを行う。この第２の部分逆ＤＣＴ回路２０の１次元目の入力順序及び出力順序を表１０及び表１１に示す。
【００４７】
【表１０】

【００４８】
【表１１】

【００４９】
(多重処理回路２１)
多重処理回路２１は、図９に示すように切替器１６１，１６２、ＤＦＦ１６３及び切替器１６４により構成され、入力端子Ｂin，Ｄinから同時に入力されるデータf1,f7を２クロック期間保持し、f1、f7の順序で時分割出力する。この多重処理回路２１の入出力及び内部状態の関係を表１２に示す。
【００５０】
【表１２】

【００５１】
(積和回路２２)
積和回路２２(Ｔ1/4)は、図１０に示すように切替器１７１，１７２、ＤＦＦ１７３，１７４、ビット反転器１７５、加算器１７６、ＤＦＦ１７７及び固定乗算器１７８により構成される。
【００５２】
この積和回路２２では、入力端子Ｂin及びＤinから同時に入力されるデータｆ３，ｆ５を切替器１７１，１７２を介してＤＦＦ１７３，１７４により２クロック期間保持し、これらのデータの和(f3＋f5)と差(f3−f5)をビット反転器１７５を介して加算器１７６により時分割で演算する。そして、これらの演算結果をＤＦＦ１７７により保持して、固定乗算器１７７で乗算係数cos(π/4)を乗じ、（ｆ３＋f5)*cos(π/4)、(f3−f5)*cos(π/4)を順次出力する。
【００５３】
この１次元目のＩＤＣＴ演算では、入力が１２ビットであり、加減算結果(f3＋f5),(f3−f5)は１３ビットであるが、２次元目のＩＤＣＴ演算では、演算精度を確保するために入力が１６ビット程度必要であり、加減算結果は１７ビット程度になる。
【００５４】
また、固定乗算器１７８は乗算結果に対して正方向の四捨五入を含む機構にすることが可能であり、乗算係数を表６に示したような１５ビットのデータとすると、１次元目の演算ではＤＣＴ係数の量子化誤差を考慮して図２３に示す様な回路構成になるが、２次元目の演算ではその量子化誤差の考慮が不要なので、図２３の回路に比べ演算結果の最上位ビットの位置が下がり、図２４に示す様な回路構成になる。積和回路２２の入出力及び内部状態の関係を表１３に示す。
【００５５】
【表１３】

【００５６】
(加減算回路２３)
加減算回路２３は、図１１に示すよう切替器１８１，１８２、ＤＦＦ１８３，１８４、ビット反転器１８５、加算器１８６及びＤＦＦ１８７から構成され、多重処理回路２１及び積和回路２２から同時に入力されるデータｆ１と乗算結果(f3＋f5)*cos(π/4)を切替器１８１，１８２を介してＤＦＦ１８３，１８４により２クロック期間保持し、それらのデータの和(f1＋(f3＋f5)*cos(π/4)＝ｄ23a)と差(f1−(f3＋f5)*cos(π/4)＝ｄ23b)をビット反転器１８５を介して加算器１８６により時分割で演算し、ＤＦＦ１８７を介して和(ｄ23a)、差(ｄ23b)を順次出力する。
【００５７】
この加減算においては、１次元目の演算では多重処理回路２１からの入力データが１２ビットであり、このデータを１ビット符号拡張し、さらに３ビット左シフト(８倍)した後に積和回路２２からの入力データと加減算するが、２次元目の演算では多重処理回路２１及び積和回路２２からの入力データはいずれも１６ビットであるので、符号拡張やビットシフトを行わずに、そのまま加減算を行う。なお、これらの演算によるダイナミックレンジの変化は無い。また、後述する加減算回路２４の出力タイミングと合わせるために、加減算結果はＤＦＦ１８７により１クロック期間遅延されて出力される。加減算回路２３の入出力及び内部状態の関係を表１４に示す。
【００５８】
【表１４】

【００５９】
(加減算回路２４)
加減算回路２４は、図１２に示すように切替器１９１，１９２、ＤＦＦ１９３，１９４、ビット反転器１９５及び加算器１９６から構成され、多重処理回路２１及び積和回路２２から同時に入力されるデータf7と乗算結果(f3−f5)*cos(π/4)を切替器１９１，１９２を介してＤＦＦ１９３，１９４により２クロック期間保持し、それらのデータの和(f7＋(f3−f5)*cos(π/4)＝ｄ24a)と差(f7−(f3−f5)*cos(π/4)＝ｄ24b)をビット反転器１９５を介して加算器１９６により時分割で演算し、和(ｄ24a)、差(ｄ24b)を順次出力する。
【００６０】
この加減算においては、１次元目の演算では多重処理回路２１からの入力データが１２ビットであり、このデータを１ビット符号拡張し、積和回路２２からの１６ビットの入力データの上位１３ビットと加減算し、積和回路２２からの入力データの下位３ビットと合わせて出力するが、２次元目の演算では多重処理回路２１及び積和回路２２からの入力データはいずれも１６ビットであるため、ビットシフトを行わずに、そのまま加減算を行う。なお、これらの演算によるダイナミックレンジの変化は無い。この加減算回路２４の入出力及び内部状態の関係を表１５に示す。
【００６１】
【表１５】

【００６２】
(積和回路２５)
積和回路２５(Ｔ1/16)は、図１３に示すように切替器２０１，２０２、ＤＦＦ２０３，２０４、切替器２０５，２０６、固定乗算器２０７，２０８、ＤＦＦ２０９，２１０、ビット反転器２１１及び加算器２１２により構成される。
【００６３】
この積和回路２５では、加減算回路２３及び２４から同時に入力される加算結果ｄ23a，ｄ24aを切替器２０１，２０２を介してＤＦＦ２０３，２０４により２クロック期間保持し、それらのデータに対して切替器２０５，２０６を介して固定乗算器２０７，２０８により固定の乗算係数√2*sin(π/16),√2*cos(π/16)の乗算を交互に行って、各乗算結果をＤＦＦ２０９，２１０に保持する。
【００６４】
そして、ビット反転器２１１及び加算器２１２により各々の乗算結果の和(ｄ24a*√2*sin(π/16)＋ｄ23a*√2*cos(π/16)＝ｄ25a)及び差(ｄ23a*√2*sin(π/16)−ｄ24a*√2*cos(π/16)＝ｄ25b)を時分割で演算し、和(ｄ25a)、差(ｄ25b)を順次出力する。
【００６５】
固定乗算器２０７，２０８は、ＤＣＴ係数データの量子化誤差の考慮が不要なので、固定乗算係数を表１に示す様な１５ビットのデータとすると、図２５及び図２６に示す様な回路構成になる。
【００６６】
また、ビット反転器２１１及び加算器２１２による加減算においてはダイナミックレンジの変化は無く、ここでも図２９に示すように２０ビットの演算結果に対して正方向の四捨五入を行い、１６ビットで出力する加算回路を加算器２１２に用いている。積和回路２５の入出力及び内部状態の関係を表１６に示す。
【００６７】
【表１６】

【００６８】
(積和回路２６)
積和回路２６(Ｔ3/16)は、図１３に示した積和回路２５と同様、図１４に示すように切替器２２１，２２２、ＤＦＦ２２３，２２４、切替器２２５，２２６、固定乗算器２２７，２２８、ＤＦＦ２２９，２３０、ビット反転器２３１及び加算器２３２により構成される。
【００６９】
この積和回路２６では、加減算回路２３及び２４から同時に入力される減算結果ｄ23b，ｄ24bを切替器２２１，２２２を介してＤＦＦ２２３，２２４により２クロック期間保持し、それらのデータに対して切替器２２５，２２６を介して固定乗算器２２７，２２８により固定の乗算係数√2*sin(3π/16)と√2*cos(3π/16)の乗算を交互に行い、各乗算結果をＤＦＦ２２９，２３０に保持する。
【００７０】
そして、ビット反転器２３１及び加算器２３２により各々の乗算結果の和(ｄ24b*√2*sin(3π/16)＋ｄ23b*√2*cos(3π/16)＝ｄ26a)及び差(ｄ23b*√2*sin(3π/16)−ｄ24b*√2*cos(3π/16)＝ｄ26b)を時分割で演算し、和(ｄ26a)、差(ｄ26b)を順次出力する。
【００７１】
固定乗算器２２７，２２８は、ＤＣＴ係数データの量子化誤差の考慮が不要なので、固定乗算係数を表１に示す様な１５ビットのデータとすると、図２７及び図２８に示す様な回路構成になる。
【００７２】
また、ビット反転器２３１及び加算器２３２による加減算においてはダイナミックレンジの変化は無く、ここでも図２９に示すように２０ビットの演算結果に対して正方向の四捨五入を行い、１６ビットで出力する加算回路を加算器２１２に用いている。積和回路２６の入出力及び内部状態の関係を表１７に示す。
【００７３】
【表１７】

【００７４】
［出力演算回路３０について］
出力演算回路３０は、４つの加減算回路３１，３２，３３，３４からなり、第１及び第２の部分逆ＤＣＴ回路１０，２０からの出力データに対して加減算処理を行うことにより、最終的な逆ＤＣＴデータとして８点の画素配列{x0,x1,…,x6,x7}を出力端子Aout,Bout,Cout,Doutへ出力する。この出力演算回路３０の１次元目の入力順序及び出力順序を表１８及び表１９に示す。
【００７５】
【表１８】

【００７６】
【表１９】

【００７７】
また、表１９を数式で表すと次のようになる。

(加減算回路３１)
加減算回路３１は、図１５に示すように切替器２４１，２４２、ＤＦＦ２４３，２４４、ビット反転器２４５、加算器２４６及びＤＦＦ２４７，２４８から構成され、加減算回路１５及び積和回路２５から同時に入力される加算結果ｄ15a，ｄ16aを切替器２４１，２４２を介してＤＦＦ２４３，２４４により２クロック期間保持し、それらのデータの和(ｄ15a＋ｄ16a＝ｘ０)と差(ｄ15a−ｄ16a＝ｘ７)をビット反転器２４５を介して加算器２４６により時分割で演算し、和(ｘ０)、差(ｘ７)を順次出力する。この加減算結果は、後述する加減算回路３４の出力タイミングと合わせるためにＤＦＦ２４７，２４８により２クロック期間遅延されて出力される。加減算回路３１の入出力及び内部状態の関係を表２０に示す。
【００７８】
【表２０】

【００７９】
(加減算回路３２)
加減算回路３２は、図１６に示すように切替器２５１，２５２、ＤＦＦ２５３，２５４、ビット反転器２５５、加算器２５６及びＤＦＦ２５７から構成され、加減算回路１５及び積和回路２５から同時に入力される減算結果ｄ15b，ｄ16bを切替器２５１，２５２を介してＤＦＦ２５３，２５４により２クロック期間保持し、それらのデータの和(ｄ15b＋ｄ16b＝ｘ３)と差(ｄ15b−ｄ16b＝ｘ４)をビット反転器２５５を介して加算器２５６により時分割で演算し、和(ｘ３)、差(ｘ４)を順次ＤＦＦ２４７，２４８を介して出力する。この加減算結果は、後述する加減算回路３４の出力タイミングと合わせるためにＤＦＦ２５７により１クロック期間遅延されて出力される。この加減算回路３２の入出力及び内部状態の関係を表２１に示す。
【００８０】
【表２１】

【００８１】
(加減算回路３３)
加減算回路３３は、図１７に示すように切替器２６１，２６２、ＤＦＦ２６３，２６４、ビット反転器２６５、加算器２６６及びＤＦＦ２６７から構成され、加減算回路１６及び積和回路２６から同時に入力される加算結果ｄ16a，ｄ26aを切替器２６１，２６２を介してＤＦＦ２６３，２６４により２クロック期間保持し、それらのデータの和(ｄ16a＋ｄ26a＝ｘ１)と差(ｄ16a−ｄ26a＝ｘ６)をビット反転器２６５を介して加算器２６６により時分割で演算し、和(x1)、差(x6)を順次出力する。この加減算結果は、後述する加減算回路３４の出力タイミングと合わせるためにＤＦＦ２６７により１クロック期間遅延されて出力される。加減算回路３３の入出力及び内部状態の関係を表２２に示す。
【００８２】
【表２２】

【００８３】
(加減算回路３４)
加減算回路３４は、図１８に示すように切替器２７１，２７２、ＤＦＦ２７３，２７４、ビット反転器２７５及び加算器２７６から構成され、加減算回路１６及び積和回路２６から同時に入力される減算結果ｄ16b，ｄ26bを切替器２７１，２７２を介してＤＦＦ２７３，２７４により２クロック期間保持し、それらのデータの和(ｄ16b＋ｄ26b＝x2)と差(ｄ16b−ｄ26b＝x5)をビット反転器２７５を介して加算器２７６により時分割で演算し、和(x2)、差(x5)を順次出力する。加減算回路３４の入出力及び内部状態の関係を表２３に示す。
【００８４】
【表２３】

【００８５】
図３０〜図３３は、本実施形態の動作をタイミングチャートで表したものであり、図３０は遅延回路１１，１２と多重処理回路２１及び積和回路２２の動作タイミング、図３１は加減算回路１３と積和回路１４及び加減算回路２３，２４の動作タイミング、図３２は加減算回路１５，１６と積和回路２５，２６の動作タイミング、図３３は加減算回路４１〜４４の動作タイミングをそれぞれ示している。
【００８６】
上述した本実施形態に係る逆離散コサイン変換回路の演算方法によれば、乗算器の演算速度の４倍の速度で変換処理が可能になり、表２４に示す様な順序で入力される２次元ＤＣＴ係数に対して、１次元目のＩＤＣＴ結果が、表２５に示すような順序で出力される。
【００８７】
【表２４】

【００８８】
【表２５】

【００８９】
２次元ＩＤＣＴ回路は、図３４に示すように制御部１００による制御の下で第１の１次元ＩＤＣＴ回路１０１により１次元（８点）ＩＤＣＴを８ライン処理した後、転置処理部１０２で行列を転置して、第２の１次元ＩＤＣＴ回路１０３により再度１次元（８点）ＩＤＣＴを８ライン処理することによって実現される。ここで、上述の式で使用したｊはライン番号となる。
【００９０】
【発明の効果】
以上説明したように本発明によれば、高速演算アルゴリズムを使用し、４つのパスに分けて並列処理することにより、回路規模の小さな固定係数乗算器の使用を可能にし、４つの積和回路と９個の加減算回路からなる小規模の回路構成で、固定乗算器の動作速度の４倍の画素レートの逆ＤＣＴ処理が可能になる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る逆ＤＣＴ回路の全体構成を示すブロック図
【図２】同実施形態における逆ＤＣＴ演算フローを示す図
【図３】図２の逆ＤＣＴ演算フロー図で使用した詳細な演算内容を示す図
【図４】同実施形態における遅延回路１１及び１２の詳細回路を示す図
【図５】同実施形態における加減算回路１３の詳細回路を示す図
【図６】同実施形態における積和回路１４の詳細回路を示す図
【図７】同実施形態における加減算回路１５の詳細回路を示す図
【図８】同実施形態における加減算回路１６の詳細回路を示す図
【図９】同実施形態における多重処理回路２１の詳細回路を示す図
【図１０】同実施形態における積和回路２２の詳細回路を示す図
【図１１】同実施形態における加減算回路２３の詳細回路を示す図
【図１２】同実施形態における加減算回路２４の詳細回路を示す図
【図１３】同実施形態における積和回路２５の詳細回路を示す図
【図１４】同実施形態における積和回路２６の詳細回路を示す図
【図１５】同実施形態における加減算回路３１の詳細回路を示す図
【図１６】同実施形態における加減算回路３２の詳細回路を示す図
【図１７】同実施形態における加減算回路３３の詳細回路を示す図
【図１８】同実施形態における加減算回路３４の詳細回路を示す図
【図１９】同実施形態における１次元目の積和回路１４内の第１の固定乗算器を示す図
【図２０】同実施形態における２次元目の積和回路１４内の第１の固定乗算器を示す図
【図２１】同実施形態における１次元目の積和回路１４内の第２の固定乗算器を示す図
【図２２】同実施形態における２次元目の積和回路１４内の第２の固定乗算器を示す図
【図２３】同実施形態における１次元目の積和回路２２内の固定乗算器を示す図
【図２４】同実施形態における２次元目の積和回路２２内の固定乗算器を示す図
【図２５】同実施形態における積和回路２５内の第１の固定乗算器を示す図
【図２６】同実施形態における積和回路２５内の第２の固定乗算器を示す図
【図２７】同実施形態における積和回路２６内の第１の固定乗算器を示す図
【図２８】同実施形態における積和回路２６内の第２の固定乗算器を示す図
【図２９】同実施形態における積和回路１４，２５及び２６で使用される加算器を示す図
【図３０】同実施形態における遅延回路１１，１２と多重処理回路２１及び積和回路２２の動作タイミングを示す図
【図３１】同実施形態における加減算回路１３と積和回路１４及び加減算回路２３，２４の動作タイミングを示す図
【図３２】同実施形態における加減算回路１５，１６と積和回路２５，２６の動作タイミングを示す図
【図３３】同実施形態における加減算回路４１〜４４の動作タイミングを示す図
【図３４】２次元逆ＤＣＴ回路の概略構成を示すブロック図
【符号の説明】
１０…第１の部分逆ＤＣＴ回路
１１，１２…３クロック遅延回路
１３，１５，１６…加減算回路
１４…２つの加減算器と１つの固定乗算器で構成される積和回路
２０…第２の部分逆ＤＣＴ回路
２１…多重処理回路
２２…１つの加減算器と１つの固定乗算器で構成される積和回路
２３，２４…加減算回路
２５，２６…２つの固定乗算器と１つの加減算器で構成される積和回路
３０…出力演算回路
３１〜３３…加減算回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an inverse discrete cosine transform circuit used when decompressing or compressing image data, and more particularly to an 8-point inverse discrete cosine transform circuit.
[0002]
[Prior art]
Currently, TV receivers capable of simultaneously displaying TV broadcast programs of a plurality of channels are in widespread use. Even in digital TV broadcasting by terrestrial or satellite, it is desired that a TV receiver can simultaneously display a broadcast program of a plurality of channels.
[0003]
In digital TV broadcasting, compression of image data by the MPEG method is used, and a decompression processing device (decoder) that decompresses the compressed image data and restores the original image data is used on the TV receiver side. When simultaneously displaying a broadcast program of a plurality of channels in a digital broadcast TV receiver, it is conceivable to prepare a plurality of decoders corresponding to the plurality of channels. However, there is a problem that the scale of the apparatus increases. is there. Accordingly, a small-scale decoder capable of decompressing a plurality of channels of compressed image data in a short time is desired. Furthermore, in order to be able to cope with simultaneous display of a plurality of screens of high-definition TV broadcasting (so-called high-vision broadcasting), the decoder is also required to have higher speed.
[0004]
The MPEG system is basically a technique for compressing image data by combining three elements of motion compensated prediction, discrete cosine transform (hereinafter referred to as DCT) and variable length coding, and the decoder side is opposite to the encoder side. In addition, decompression of compressed image data is performed by a combination of variable length decoding, inverse discrete cosine transform (hereinafter referred to as inverse DCT) and motion compensation prediction. The inverse DCT is also used for the local decoding system of the encoder. In MPEG encoders and decoders, increasing the speed of the inverse DCT circuit and reducing the circuit scale are major issues.
[0005]
An example of a DCT / inverse DCT circuit capable of high-speed processing on a small scale is disclosed in Japanese Patent Application Laid-Open No. 5-181896 as a conventional technique that meets this requirement. This circuit enables 8-point DCT and inverse DCT. The 9 adders / subtracters in the previous stage are dedicated to DCT, and the function of only inverse DCT is 7 fixed coefficient multipliers and 9 adders / subtractors. It is configurable. However, this circuit configuration can only handle image data with a pixel rate twice that of the processing speed of the fixed coefficient multiplier. For simultaneous display of multiple screens of high-definition TV broadcasts, a fixed coefficient multiplier or addition / subtraction is possible. Therefore, it is necessary to insert a register for holding the operation result in the middle of the device, resulting in a problem that the circuit scale increases.
[0006]
[Problems to be solved by the invention]
As described above, the conventional technique can deal only with image data having a pixel rate twice as high as the processing speed of the fixed coefficient multiplier during inverse DCT. There is a problem that the circuit scale is increased and the cost is increased because a register for holding the intermediate calculation results of the coefficient multiplier and the adder / subtracter is required.
[0007]
An object of the present invention is to provide an inverse discrete cosine transform circuit that enables high-speed processing necessary for decoding a plurality of high-definition images or the like with a small-scale circuit.
[0008]
[Means for Solving the Problems]
The present invention makes it possible to use a fixed coefficient multiplier by performing parallel processing by dividing into four paths using a high-speed arithmetic algorithm, and is configured with four product-sum circuits and nine adder / subtractors. This enables image processing at a pixel rate four times as high as the operation speed.
[0009]
That is, the inverse discrete cosine transform circuit according to the present invention performs the inverse discrete cosine transform processing by simultaneously inputting even-numbered data of 8 points of input discrete cosine transform coefficient data two by two simultaneously. An inverse DCT circuit, a second partial inverse DCT circuit that performs an inverse discrete cosine transform process by inputting odd-numbered data of 8 points of discrete cosine transform coefficient data two by two simultaneously, and the first and second And an output arithmetic circuit for obtaining 8-point inverse discrete cosine transform data by adding / subtracting output data from the partial inverse DCT circuit.
[0010]
More specifically, the first partial inverse DCT circuit includes a first addition / subtraction circuit that performs addition / subtraction of two points of discrete cosine transform data input to the first and second input terminals at the same time; A product-sum operation of two points of discrete cosine transform data input simultaneously to two input terminals. Sum of products circuit And second 1 Output data of the adder / subtractor circuit First product-sum circuit Addition / subtraction with output data at different timings 2nd and 3rd And an addition / subtraction circuit.
[0011]
The second partial inverse DCT circuit is a multiprocessing circuit that multiplexes two points of discrete cosine transform data that are simultaneously input to the third and fourth input terminals, and is simultaneously input to the third and fourth input terminals. A second sum of products of the discrete cosine transform data of the two points Sum of products circuit When, Multiple processing Circuit output data and Second product-sum circuit Addition / subtraction with output data at different timings 4th and 5th An addition / subtraction circuit of 4th and 5th 3rd and 4th which perform the product-sum operation of the output data of the adder / subtracter circuit at different timings Sum of products circuit It consists of.
[0012]
Further, the output arithmetic circuit outputs the output data of the second addition / subtraction circuit and the third data Sum of products circuit The sixth and seventh addition / subtraction circuits that perform addition / subtraction with the output data at different timings, the output data of the third addition / subtraction circuit, and the fourth Sum of products circuit The seventh and eighth addition / subtraction circuits perform addition / subtraction with the output data at different timings.
[0013]
Also, First to fourth product-sum circuits Has a fixed coefficient multiplier And two A fixed coefficient multiplier and a rounding process are performed on one input data by the fixed coefficient multiplier.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 shows a configuration of an inverse DCT circuit according to an embodiment of the present invention. This inverse DCT circuit is roughly composed of a first partial inverse DCT circuit 10, a second partial inverse DCT circuit 20, and an output arithmetic circuit 30.
[0015]
This inverse DCT circuit executes the computation algorithm shown in the computation flow graph of FIG. 2, and the processing of the first partial inverse DCT circuit 10 is the portion 10A of FIG. The processing corresponds to the portion 20A in FIG. 2, and the processing of the output arithmetic circuit 30 corresponds to the portion 30A in FIG. FIGS. 3A, 3B, and 3C are explanatory diagrams of the contents of the partial calculation used in the calculation flow graph of FIG. This calculation flow is expressed as follows.
[0016]
cos (α + β) = cosα ・ cosβ−sinα ・ sinβ
cos (α−β) = cosα ・ cosβ + sinα ・ sinβ
2cosα ・ cosβ = cos (α−β) + cos (α + β)
2sinα ・ sinβ = cos (α-β) -cos (α + β)
cos2α + cos2β = 2cos (α−β) ・ cos (α + β)
cos2α-cos2β = -2sin (α-β) ・ sin (α + β)
2 ・ (cosα) ² = 1 + cos2α
2 ・ (sinα) ² = 1−cos2α
cos (π / 4 + β) = cosπ / 4 ・ (cosβ-sinβ)
cos (π / 4-β) = cosπ / 4 ・ (cosβ + sinβ)
cosπ / 4 = 1 / sq (2)
-------------- dct --------------------
A = x0 + x7
B = x3 + x4
C = x1 + x6
D = x2 + x5
f0 = (A + B) + (C + D)
f4 = (A + B)-(C + D)
f2 = sq (2) * {(AB) cosπ / 8 + (CD) sinπ / 8}
f6 = sq (2) * {(AB) sinπ / 8− (CD) cosπ / 8}
J = x0-x7
K = x3-x4
L = x1-x6
M = x2-x5
f1 ＝ sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16 + L ・ cos3π / 16 + M ・ sin3π / 16)
f3 ＝ sq (2) * (J ・ cos3π / 16-K ・ sin3π / 16-L ・ sinπ / 16-M ・ cosπ / 16)
f5 ＝ sq (2) * (J ・ sin3π / 16 + K ・ cos3π / 16-L ・ cosπ / 16 + M ・ sinπ / 16)
f7 = sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16-L ・ sin3π / 16 + M ・ cos3π / 16)
-------------- idct --------------------
f0 + f4 = 2 * (A + B)
f0-f4 = 2 * (C + D)
f2 * sq (2) * cosπ / 8 + f6 * sq (2) * sinπ / 8
= 2 * (AB) * {(cosπ / 8) ² + (sinπ / 8) ² } = 2 * (AB)
f2 * sq (2) * cosπ / 8-f6 * sq (2) * sinπ / 8
= 2 * (CD) * {(sinπ / 8) ² + (cosπ / 8) ² } = 2 * (CD)
2 * (A + B) + 2 * (AB) = 4 * A = 4 * (x0 + x7)
2 * (A + B) -2 * (AB) = 4 * B = 4 * (x3 + x4)
2 * (C + D) + 2 * (CD) = 4 * C = 4 * (x1 + x6)
2 * (C + D) -2 * (CD) = 4 * D = 4 * (x2 + x5)
(f3-f5) ・ cosπ / 4
= sq (2) ・ cosπ / 4 ・ {J ・ (cos3π / 16-sin3π / 16) -K ・ (cos3π / 16 + sin3π / 16) + L ・ (cosπ / 16-sinπ / 16) -M ・ ( cosπ / 16 + sinπ / 16)}
= sq (2) * {J ・ cos7π / 16-K ・ cosπ / 16 + L ・ cos5π / 16-M ・ cos3π / 16}
= sq (2) * {J ・ sinπ / 16-K ・ cosπ / 16 + L ・ sin3π / 16-M ・ cos3π / 16}
(f3 + f5) ・ cosπ / 4
= sq (2) * {J ・ cosπ / 16 + K ・ cos7π / 16-L ・ cos3π / 16-M ・ cos5π / 16}
= sq (2) * {J ・ cosπ / 16 + K ・ sinπ / 16-L ・ cos3π / 16-M ・ sin3π / 16}
f1 + (f3 + f5) ・ cosπ / 4
＝ sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16 + L ・ cos3π / 16 + M ・ sin3π / 16) + sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16- L ・ cos3π / 16-M ・ sin3π / 16)
= 2 * sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16) f1- (f3 + f5) ・ cosπ / 4
＝ sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16 + L ・ cos3π / 16 + M ・ sin3π / 16) -sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16- L ・ cos3π / 16-M ・ sin3π / 16)
= 2 * sq (2) * (L ・ cos3π / 16 + M ・ sin3π / 16)
(f3-f5) ・ cosπ / 4 + f7
= Sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16 + L ・ sin3π / 16-M ・ cos3π / 16)
+ sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16-L ・ sin3π / 16 + M ・ cos3π / 16)
= 2 * sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16)
(f3-f5) ・ cosπ / 4-f7
= Sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16 + L ・ sin3π / 16-M ・ cos3π / 16)
-sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16-L ・ sin3π / 16 + M ・ cos3π / 16)
= 2 * sq (2) * (L ・ sin3π / 16-M ・ cos3π / 16)
2 * sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16) * sq (2) * cosπ / 16
+ 2 * sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16) * sq (2) * sinπ / 16
= 4 * J * {(cosπ / 16) ² + (sinπ / 16) ² }
= 4 * J = 4 * (x0-x7)
2 * sq (2) * (J ・ cosπ / 16 + K ・ sinπ / 16) * sq (2) * sinπ / 16-2 * sq (2) * (J ・ sinπ / 16-K ・ cosπ / 16) * sq (2) * cosπ / 16
= 4 * K * {(sinπ / 16) ² + (cosπ / 16) ² }
= 4 * K = 4 * (x3-x4)
2 * sq (2) * (L ・ cos3π / 16 + M ・ sin3π / 16) * sq (2) * cos3π / 16
+ 2 * sq (2) * (L ・ sin3π / 16-M ・ cos3π / 16) * sq (2) * sin3π / 16
= 4 * L * {(cosπ / 16) ² + (sinπ / 16) ² }
= 4 * L = 4 * (x1-x6)
2 * sq (2) * (L ・ cos3π / 16 + M ・ sin3π / 16) * sq (2) * sin3π / 16
+ 2 * sq (2) * (L ・ sin3π / 16-M ・ cos3π / 16) * sq (2) * cos3π / 16
= 4 * M * {(sinπ / 16) ² + (cosπ / 16) ² }
= 4 * M = 4 * (x3-x4)
Further, by calculating these sums and differences, eight times the pixel value is reproduced, but a normal pixel value is obtained by the 3-bit shift. This bit shift is performed collectively after the two-dimensional processing is completed.
[0017]
In the present embodiment, 8-point DCT coefficient data {f0, f1 obtained by performing DCT on a block basis by using a DCT circuit (not shown) as an 8-point pixel array {x0, x1,..., X6, x7} as one block. ,..., F6, f7} are input to the inverse DCT circuit, and a case where inverse DCT is performed on the 8-point DCT coefficient data will be described.
[0018]
Here, f0, f1,..., F6, f7 are DCT coefficient data of each frequency component of the image in one block, and f0 corresponds to the DC component of the image, and f1, f2,. F7 corresponds to the highest frequency component. Further, f0, f2, f4, f6, and f8 are even-numbered data, and f1, f3, f5, and f7 are odd-numbered data.
[0019]
In FIG. 1, 8-point DCT coefficient data f2 and f0 are alternately input to the input terminal Ain, and 8-point DCT coefficient data data f3 and f1 are alternately input to the input terminal Bin. The 8-point DCT coefficient data f6 and f4 are alternately input, and the 8-point DCT coefficient data f5 and f7 are alternately input to the input terminal Din.
[0020]
The first partial inverse DCT circuit 10 performs inverse DCT on the even-numbered data input to the input terminals Ain and Cin, and the second partial inverse DCT circuit 20 is the odd-numbered odd number input to the input terminals Bin and Din. Perform inverse DCT on the data. The output arithmetic circuit 30 adds and subtracts the output data from the first and second partial

inverse DCT circuits

10 and 20 and outputs eight points of inverse DCT data to the output terminals Aout, Bout, Cout and Dout.
[0021]
The detailed configuration of each part in FIG. 1 will be described below with reference to FIGS. 4 to 18 show the detailed configuration of each part of FIG. 1, and FIGS. Sum of products circuit FIG. 29 shows the configuration of the fixed coefficient multiplier used in FIG. Sum of products circuit 2 shows the configuration of an adder used in FIG.
[0022]
[First Partial Inverse DCT Circuit 10]
The first partial inverse DCT circuit 10 includes

delay circuits

11 and 12, an addition / subtraction circuit 13, a product-sum circuit 14, an addition / subtraction circuit 15 and an addition / subtraction circuit 16, and includes even-numbered data (f0) input to input terminals Ain and Cin. , f4 or f2, f6). Tables 1 and 2 show the input order and output order of the first dimension of the first partial inverse DCT circuit 10.
[0023]
[Table 1]

[0024]
[Table 2]

[0025]
Table 2 is expressed as follows.
jd15a = {jf0 + jf4} + {jf6 * √2 * sin (π / 8) + jf2 * √2 * cos (π / 8)}
jd15b = {jf0 + jf4} − {jf6 * √2 * sin (π / 8) + jf2 * √2 * cos (π / 8)}
jd16a = {jf0−jf4} + {jf2 * √2 * sin (π / 8) −jf6 * √2 * cos (π / 8)}
jd16b = {jf0−jf4} − {jf2 * √2 * sin (π / 8) −jf6 * √2 * cos (π / 8)}
(j = 0, 1, ..., 7)
(Delay circuits 11 and 12)
For example, as shown in FIG. 4, each of the

delay circuits

11 and 12 includes three D-type flip-flops (DFF) 101 to 103 connected in cascade, and data (f0) input from input terminals Ain and Cin, respectively. , f4 or f2, f6) are delayed by 3 clock periods and output. These

delay circuits

11 and 12 are for so-called delay compensation for outputting the output data of the first partial inverse DCT circuit 0 in synchronization with the output data of the second partial inverse DCT circuit 20 described later. Tables 3 and 4 show the relationship between the input / output and internal states of the

delay circuits

11 and 12, respectively.
[0026]
[Table 3]

[0027]
[Table 4]

[0028]
(Addition / subtraction circuit 13)
As shown in FIG. 5, the adder / subtractor circuit 13 is composed of

switches

111 and 112,

DFFs

113 and 114, a bit inverter 115 and an adder 116, and is delayed by three clock periods by the

delay circuits

11 and 12 and input simultaneously. f0, f4 or f2, f6 are held by

DFF

113, 114 via

switchers

111, 112 for two clock periods, and the sum (f0 + f4 = d13a) and difference (f0-f4 = d13b) of those data are bit-inverted The adder 116 performs time division calculation via the unit 115, and sequentially outputs the sum (d13a) and the difference (d13b).
[0029]
The

selectors

111 and 112 select the output data of the

delay circuits

111 and 112 input to the H side when the control signal en4e is at the high level, and the outputs of the

DFFs

113 and 114 input to the L side when the en4e is at the low level. Select data. The bit inverter 115 performs bit inversion when the control signal Tg2e is at a high level and outputs difference data from the adder 116, and does not perform bit inversion when the Tg2e is at a low level, and the sum data from the adder 116. Is output.
[0030]
This addition / subtraction circuit 13 increases the dynamic range in consideration of the quantization error of the DCT coefficient data in the first-dimensional operation, so that the 13-bit output is obtained with respect to the 12-bit input. Since it is not necessary to consider the quantization error, the dynamic range does not change and a 16-bit output is obtained for a 16-bit input. Table 5 shows the relationship between the input / output and the internal state of the adder / subtractor circuit 13.
[0031]
[Table 5]

[0032]
(Product-sum circuit 14)
As shown in FIG. 6, the product-sum circuit 14 (T1 / 8) includes

switches

121 and 122,

DFFs

123 and 124,

switches

125 and 126, fixed

multipliers

127 and 128,

DFFs

129 and 130, a bit inverter 131, and an addition. The device 132 is configured.
[0033]
In the product-sum circuit 14, the data f 2 and f 6 that are delayed for 3 clock periods by the

delay circuits

11 and 12 and simultaneously input are held by the

DFFs

123 and 124 via the

switches

121 and 122 for 2 clock periods. On the other hand, fixed

multipliers

127 and 128 via

switchers

125 and 126 alternately perform fixed multiplication coefficients √2 * sin (π / 8) and √2 * cos (π / 8), respectively. The result is held in

DFF

129 and 130.
[0034]
Then, the sum (f6 * √2 * sin (π / 8) + f2 * √2 * cos (π / 8) = d14a) and the difference (f2 * √2) of the respective multiplication results are obtained by the bit inverter 131 and the adder 132. * sin (π / 8) −f6 * √2 * cos (π / 8) = d14b) is calculated in a time division manner, and the sum (d14a) and the difference (d14b) are sequentially output.
[0035]
The operations of the

switches

121, 122, 125, 126 and the bit inverter 131 are the same as the operations of the switches 111, 1112 and the bit inverter 115 described in FIG. 5 (hereinafter the same).
[0036]
[Table 6]

[0037]
If the fixed

multipliers

127 and 128 are 15-bit data as shown in Table 1, the fixed

multipliers

127 and 128 take the quantization error of the DCT coefficient data into consideration in the first dimension calculation as shown in FIGS. However, since the quantization error need not be considered in the second-dimensional calculation, the position of the most significant bit of the calculation result is lower than that of the circuits of FIGS. 19 and 21, as shown in FIGS. 20 and 22. The circuit configuration is as follows.
[0038]
In addition, there is no change in the dynamic range in the addition / subtraction by the bit inverter 131 and the adder 132, but here, as shown in FIG. 29, the 20-bit calculation result is rounded in the positive direction and output in 16 bits. Assume that an adder circuit is used for the adder 132. Table 7 shows the relationship between the input / output and the internal state of the product-sum circuit 14.
[0039]
[Table 7]

[0040]
(Addition / subtraction circuit 15)
Like the addition / subtraction circuit 13 shown in FIG. 5, the addition / subtraction circuit 15 includes switches 141 and 142, DFFs 143 and 144, a bit inverter 145, and an adder 146 as shown in FIG. 7.
[0041]
In the addition / subtraction circuit 15, the addition results d13a and d14a simultaneously input from the addition / subtraction circuit 13 and the product-sum circuit 14 are held in the DFFs 143 and 144 via the switches 141 and 142 for two clock periods, and the sum of these data (d13a + d14a = D15a) and the difference (d13a-d14a = d15b) are calculated by the adder 146 via the bit inverter 145 in a time division manner, and the sum (d15a) and the difference (d15b) are sequentially output.
[0042]
In this addition / subtraction, the input data from the addition / subtraction circuit 13 is 13 bits in the first dimension operation, and the input data from the product-sum circuit 14 is added / subtracted by shifting left by 3 bits (8 times). In this calculation, since the input data from the adder / subtractor circuit 13 and the product-sum circuit 14 are both 16 bits, addition / subtraction is performed as it is without bit shift. There is no change in the dynamic range due to these calculations. Table 8 shows the relationship between the input / output and the internal state of the adder / subtractor circuit 15.
[0043]
[Table 8]

[0044]
(Addition / subtraction circuit 16)
Similarly, the adder / subtractor circuit 16 is composed of

switches

151 and 152,

DFFs

153 and 154, a bit inverter 155, and an adder 156 as shown in FIG.
The addition / subtraction circuit 16 holds the subtraction results d13b and d14b input simultaneously from the addition / subtraction circuit 13 and the product-sum circuit 14 for two clock periods, and the sum (d13b + d14b = d16a) and difference (d13b-d14b = d16b) of these data. Is calculated by the adder 156 via the bit inverter 155 in a time division manner, and the sum (d16a) and the difference (d16b) are sequentially output. As described above, the addition / subtraction processing content of the addition / subtraction circuit 16 is the same as that of the addition / subtraction circuit 15, and only the operation timing is different from that of the addition / subtraction circuit 15. Table 9 shows the relationship between the input / output and the internal state of the adder / subtractor circuit 16.
[0045]
[Table 9]

[0046]
[About Second Partial Inverse DCT Circuit 20]
The second partial inverse DCT circuit 20 includes a multiprocessing circuit 21, a product-sum circuit 22, an addition / subtraction circuit 23, an addition / subtraction circuit 24, a product-sum circuit 25, and a product-sum circuit 26, and is input to input terminals Bin and Din. Perform inverse DCT on the second data (f1, f7 or f3, f7). The first order input order and output order of the second partial inverse DCT circuit 20 are shown in Table 10 and Table 11.
[0047]
[Table 10]

[0048]
[Table 11]

[0049]
(Multiple processing circuit 21)
As shown in FIG. 9, the multi-processing circuit 21 includes

switchers

161 and 162, a DFF 163, and a switch 164, and holds data f1 and f7 input simultaneously from the input terminals Bin and Din for two clock periods, f1, Output time division in the order of f7. Table 12 shows the relationship between the input / output and the internal state of the multiprocessing circuit 21.
[0050]
[Table 12]

[0051]
(Product-sum circuit 22)
As shown in FIG. 10, the product-sum circuit 22 (T1 / 4) includes

switchers

171, 172,

DFFs

173, 174, a bit inverter 175, an adder 176, a DFF 177, and a fixed multiplier 178.
[0052]
In the product-sum circuit 22, the data f3 and f5 input simultaneously from the input terminals Bin and Din are held by the

DFFs

173 and 174 via the

switches

171 and 172 for two clock periods, and the sum (f3 + f5) and difference of these data are stored. (f3−f5) is calculated by the adder 176 via the bit inverter 175 in a time division manner. These calculation results are held by the DFF 177 and multiplied by the multiplication coefficient cos (π / 4) by the fixed multiplier 177 to obtain (f3 + f5) * cos (π / 4), (f3−f5) * cos (π / 4) are output sequentially.
[0053]
In this first-dimensional IDCT calculation, the input is 12 bits, and the addition / subtraction results (f3 + f5) and (f3-f5) are 13 bits. In the second-dimensional IDCT calculation, the input is performed to ensure the calculation accuracy. Is about 16 bits, and the addition / subtraction result is about 17 bits.
[0054]
Further, the fixed multiplier 178 can be configured to include a rounding in the positive direction with respect to the multiplication result. If the multiplication coefficient is 15-bit data as shown in Table 6, the first-dimensional operation Considering the quantization error of the DCT coefficient, the circuit configuration is as shown in FIG. 23. However, since the quantization error need not be considered in the second-dimensional operation, the most significant bit of the operation result compared to the circuit of FIG. As a result, the circuit configuration as shown in FIG. Table 13 shows the relationship between the input / output and the internal state of the product-sum circuit 22.
[0055]
[Table 13]

[0056]
(Addition / subtraction circuit 23)
As shown in FIG. 11, the adder / subtracter circuit 23 includes

switches

181, 182,

DFFs

183, 184, a bit inverter 185, an adder 186, and a DFF 187, and data f1 input simultaneously from the multiprocessing circuit 21 and the product-sum circuit 22 And the multiplication result (f3 + f5) * cos (π / 4) are held by the

DFFs

183 and 184 via the

switches

181 and 182 for two clock periods, and the sum (f1 + (f3 + f5) * cos (π / 4) = d23a) and the difference (f1− (f3 + f5) * cos (π / 4) = d23b) are calculated by the adder 186 via the bit inverter 185 in a time division manner, and the sum (d23a) and the difference (d23b) are obtained via the DFF187. ) Are output sequentially.
[0057]
In this addition / subtraction, the input data from the multi-processing circuit 21 is 12 bits in the first-dimensional operation, and this data is subjected to 1-bit code extension and further shifted left by 3 bits (8 times). However, since the input data from the multiprocessing circuit 21 and the product-sum circuit 22 are both 16 bits in the second-dimensional operation, the addition / subtraction is performed without performing sign extension or bit shift. . There is no change in the dynamic range due to these calculations. Further, in order to match the output timing of the addition / subtraction circuit 24 described later, the addition / subtraction result is delayed by one clock period and output by the DFF 187. Table 14 shows the relationship between the input / output of the adder / subtractor circuit 23 and the internal state.
[0058]
[Table 14]

[0059]
(Addition / subtraction circuit 24)
As shown in FIG. 12, the adder / subtractor circuit 24 is composed of

switches

191, 192,

DFFs

193, 194, a bit inverter 195, and an adder 196, and data f7 input simultaneously from the multiprocessing circuit 21 and the product-sum circuit 22 The multiplication result (f3−f5) * cos (π / 4) is held by the

DFFs

193 and 194 via the

switches

191 and 192 for two clock periods, and the sum (f7 + (f3−f5) * cos (π / 4) = d24a) and the difference (f7− (f3−f5) * cos (π / 4) = d24b) are calculated by the adder 196 through the bit inverter 195 in a time division manner, and the sum (d24a), difference ( d24b) are sequentially output.
[0060]
In this addition / subtraction, the input data from the multi-processing circuit 21 is 12 bits in the first dimension operation, and this data is subjected to 1-bit code extension, and the upper 13 bits of the 16-bit input data from the product-sum circuit 22 Addition / subtraction and output together with the lower 3 bits of the input data from the product-sum circuit 22, but since the input data from the multiplex processing circuit 21 and the product-sum circuit 22 are both 16 bits in the second dimension operation, Addition / subtraction is performed without performing bit shift. There is no change in the dynamic range due to these calculations. Table 15 shows the relationship between the input / output and the internal state of the adder / subtractor circuit 24.
[0061]
[Table 15]

[0062]
(Product-sum circuit 25)
As shown in FIG. 13, the product-sum circuit 25 (T1 / 16) includes

switches

201 and 202,

DFFs

203 and 204,

switches

205 and 206, fixed

multipliers

207 and 208,

DFFs

209 and 210, a bit inverter 211, and an adder. The device 212 is configured.
[0063]
In the product-sum circuit 25, the addition results d23a and d24a simultaneously input from the adder /

subtractor circuits

23 and 24 are held by the

DFFs

203 and 204 via the

switchers

201 and 202 for two clock periods, and the switcher 205 is used for these data. , 206, fixed

multipliers

207, 208 are alternately multiplied by fixed multiplication coefficients √2 * sin (π / 16), √2 * cos (π / 16), and the multiplication results are converted into

DFFs

209, 210. Hold on.
[0064]
Then, the sum (d24a * √2 * sin (π / 16) + d23a * √2 * cos (π / 16) = d25a) and the difference (d23a * √2) of the respective multiplication results are obtained by the bit inverter 211 and the adder 212. * sin (π / 16) −d24a * √2 * cos (π / 16) = d25b) is calculated in a time division manner, and the sum (d25a) and the difference (d25b) are sequentially output.
[0065]
Since the fixed

multipliers

207 and 208 do not need to consider the quantization error of the DCT coefficient data, assuming that the fixed multiplication coefficient is 15-bit data as shown in Table 1, the circuit configuration is as shown in FIGS. Become.
[0066]
In addition, there is no change in the dynamic range in the addition / subtraction by the bit inverter 211 and the adder 212, and here, as shown in FIG. 29, the 20-bit calculation result is rounded in the positive direction and output in 16 bits. A circuit is used for the adder 212. Table 16 shows the relationship between the input and output of the product-sum circuit 25 and the internal state.
[0067]
[Table 16]

[0068]
(Product-sum circuit 26)
As in the product-sum circuit 25 shown in FIG. 13, the product-sum circuit 26 (T 3/16) has switching

units

221, 222,

DFFs

223, 224, switching

units

225, 226, fixed

multipliers

227, 228,

DFFs

229 and 230, a bit inverter 231, and an adder 232.
[0069]
In the product-sum circuit 26, the subtraction results d23b and d24b input simultaneously from the adder /

subtractor circuits

23 and 24 are held by the

DFFs

223 and 224 via the

switchers

221 and 222 for two clock periods, and the switch 225 is used for these data. , 226 and fixed

multipliers

227 and 228 are alternately multiplied by fixed multiplication coefficients √2 * sin (3π / 16) and √2 * cos (3π / 16), and the multiplication results are sent to DFFs 229 and 230, respectively. Hold.
[0070]
Then, the sum (d24b * √2 * sin (3π / 16) + d23b * √2 * cos (3π / 16) = d26a) and difference (d23b * √2) of the multiplication results are obtained by the bit inverter 231 and the adder 232. * sin (3π / 16) −d24b * √2 * cos (3π / 16) = d26b) is calculated in a time division manner, and the sum (d26a) and the difference (d26b) are sequentially output.
[0071]
Since the fixed

multipliers

227 and 228 do not need to consider the quantization error of the DCT coefficient data, if the fixed multiplication coefficient is 15-bit data as shown in Table 1, the circuit configuration as shown in FIGS. 27 and 28 is obtained. Become.
[0072]
In addition, there is no change in the dynamic range in the addition / subtraction by the bit inverter 231 and the adder 232, and here, as shown in FIG. 29, the 20-bit operation result is rounded in the positive direction, and the addition is output in 16 bits. A circuit is used for the adder 212. Table 17 shows the relationship between the input / output and the internal state of the product-sum circuit 26.
[0073]
[Table 17]

[0074]
[About the output arithmetic circuit 30]
The output arithmetic circuit 30 includes four addition /

subtraction circuits

31, 32, 33, and 34. By performing addition / subtraction processing on the output data from the first and second partial

inverse DCT circuits

10 and 20, the final result is obtained. An eight-point pixel array {x0, x1,..., X6, x7} is output to the output terminals Aout, Bout, Cout, Dout as inverse DCT data. Tables 18 and 19 show the first order input order and output order of the output arithmetic circuit 30.
[0075]
[Table 18]

[0076]
[Table 19]

[0077]
Table 19 is represented by the following formula.

(Addition / subtraction circuit 31)
As shown in FIG. 15, the adder / subtracter circuit 31 includes

switchers

241 and 242,

DFFs

243 and 244, a bit inverter 245, an adder 246 and DFFs 247 and 248, and is simultaneously input from the adder / subtractor circuit 15 and the product-sum circuit 25. The addition results d15a and d16a are held by the

DFFs

243 and 244 via the

switches

241 and 242 for two clock periods, and the sum (d15a + d16a = x0) and the difference (d15a-d16a = x7) of these data are sent via the bit inverter 245. The adder 246 performs time division calculation and sequentially outputs the sum (x0) and the difference (x7). The addition / subtraction results are output after being delayed by two clock periods by the

DFFs

247 and 248 in order to match the output timing of the addition / subtraction circuit 34 described later. Table 20 shows the relationship between the input and output of the adder / subtractor circuit 31 and the internal state.
[0078]
[Table 20]

[0079]
(Addition / subtraction circuit 32)
As shown in FIG. 16, the adder / subtracter circuit 32 includes

switchers

251 and 252,

DFFs

253 and 254, a bit inverter 255, an adder 256 and a DFF 257, and a subtraction result input simultaneously from the adder / subtractor circuit 15 and the product-sum circuit 25. d15b and d16b are held by the

DFFs

253 and 254 via the

switchers

251 and 252 for two clock periods, and the sum (d15b + d16b = x3) and the difference (d15b−d16b = x4) are added via the bit inverter 255. The unit 256 performs time division calculation, and outputs the sum (x3) and the difference (x4) sequentially through the

DFFs

247 and 248. This addition / subtraction result is output after being delayed by one clock period by the DFF 257 in order to match the output timing of the addition / subtraction circuit 34 described later. Table 21 shows the relationship between the input / output and the internal state of the adder / subtractor circuit 32.
[0080]
[Table 21]

[0081]
(Addition / subtraction circuit 33)
As shown in FIG. 17, the adder / subtracter circuit 33 includes

switchers

261 and 262,

DFFs

263 and 264, a bit inverter 265, an adder 266 and a DFF 267, and an addition result input simultaneously from the adder / subtracter circuit 16 and the product-sum circuit 26. d16a and d26a are held by the

DFFs

263 and 264 via the

switches

261 and 262 for two clock periods, and the sum (d16a + d26a = x1) and the difference (d16a-d26a = x6) of these data are added via the bit inverter 265. The calculation is performed in a time-sharing manner by the device 266, and the sum (x1) and the difference (x6) are sequentially output. This addition / subtraction result is output after being delayed by one clock period by the DFF 267 in order to match the output timing of the addition / subtraction circuit 34 described later. Table 22 shows the relationship between the input / output and the internal state of the addition / subtraction circuit 33.
[0082]
[Table 22]

[0083]
(Addition / subtraction circuit 34)
As shown in FIG. 18, the adder / subtracter circuit 34 includes

switchers

271, 272,

DFFs

273, 274, a bit inverter 275, and an adder 276. Subtraction results d16b, which are simultaneously input from the adder / subtractor circuit 16 and the product-sum circuit 26, d26b is held by the

DFFs

273 and 274 via the

switches

271 and 272 for two clock periods, and the sum (d16b + d26b = x2) and the difference (d16b−d26b = x5) of the data are added via the bit inverter 275 to the adder 276. To calculate in a time-sharing manner, and sequentially output the sum (x2) and difference (x5). Table 23 shows the relationship between the input / output and the internal state of the addition / subtraction circuit 34.
[0084]
[Table 23]

[0085]
30 to 33 are timing charts showing the operation of this embodiment. FIG. 30 shows the operation timing of the

delay circuits

11 and 12, the multiprocessing circuit 21 and the product-sum circuit 22, and FIG. FIG. 32 shows the operation timings of the addition /

subtraction circuits

15 and 16 and the product-

sum circuits

25 and 26, and FIG. 33 shows the operation timings of the addition / subtraction circuits 41 to 44, respectively. .
[0086]
According to the calculation method of the inverse discrete cosine transform circuit according to the present embodiment described above, the conversion process can be performed at a speed four times the calculation speed of the multiplier, and two-dimensional input is performed in the order shown in Table 24. For the DCT coefficient, the IDCT results of the first dimension are output in the order shown in Table 25.
[0087]
[Table 24]

[0088]
[Table 25]

[0089]
34, the two-dimensional IDCT circuit processes eight lines of one-dimensional (8-point) IDCT by the first one-dimensional IDCT circuit 101 under the control of the control unit 100 as shown in FIG. This is realized by transposing and processing the one-dimensional (8-point) IDCT for eight lines again by the second one-dimensional IDCT circuit 103. Here, j used in the above equation is a line number.
[0090]
【The invention's effect】
As described above, according to the present invention, it is possible to use a fixed coefficient multiplier having a small circuit scale by using a high-speed arithmetic algorithm and performing parallel processing by dividing into four paths. With a small circuit configuration composed of nine adder / subtracters, it is possible to perform inverse DCT processing at a pixel rate four times the operating speed of the fixed multiplier.
[Brief description of the drawings]
FIG. 1 is a block diagram showing the overall configuration of an inverse DCT circuit according to an embodiment of the present invention.
FIG. 2 is a diagram showing an inverse DCT calculation flow in the embodiment;
FIG. 3 is a diagram showing detailed calculation contents used in the inverse DCT calculation flowchart of FIG. 2;
FIG. 4 is a diagram showing a detailed circuit of

delay circuits

11 and 12 in the same embodiment;
FIG. 5 is a diagram showing a detailed circuit of an addition / subtraction circuit 13 in the same embodiment;
FIG. 6 is a view showing a detailed circuit of a product-sum circuit 14 in the same embodiment;
FIG. 7 is a diagram showing a detailed circuit of an addition / subtraction circuit 15 in the same embodiment;
FIG. 8 is a diagram showing a detailed circuit of an addition / subtraction circuit 16 in the same embodiment;
FIG. 9 is a view showing a detailed circuit of a multiprocessing circuit 21 in the embodiment;
FIG. 10 is a diagram showing a detailed circuit of a product-sum circuit 22 in the same embodiment;
FIG. 11 is a diagram showing a detailed circuit of an addition / subtraction circuit 23 in the same embodiment;
FIG. 12 is a diagram showing a detailed circuit of an addition / subtraction circuit 24 in the same embodiment;
FIG. 13 is a diagram showing a detailed circuit of a product-sum circuit 25 in the same embodiment;
FIG. 14 is a diagram showing a detailed circuit of a product-sum circuit 26 in the same embodiment;
FIG. 15 is a diagram showing a detailed circuit of an addition / subtraction circuit 31 in the same embodiment;
FIG. 16 is a diagram showing a detailed circuit of an addition / subtraction circuit 32 in the same embodiment;
FIG. 17 is a diagram showing a detailed circuit of an addition / subtraction circuit 33 in the same embodiment;
18 is a diagram showing a detailed circuit of an addition / subtraction circuit 34 in the same embodiment; FIG.
FIG. 19 is a diagram showing a first fixed multiplier in the first-dimensional product-sum circuit 14 in the same embodiment;
FIG. 20 is a diagram showing a first fixed multiplier in the second-dimensional product-sum circuit 14 in the same embodiment;
FIG. 21 is a diagram showing a second fixed multiplier in the first-dimensional product-sum circuit 14 in the same embodiment;
FIG. 22 is a diagram showing a second fixed multiplier in the second-dimensional product-sum circuit 14 in the same embodiment;
FIG. 23 is a diagram showing a fixed multiplier in the first-dimensional product-sum circuit 22 in the same embodiment;
FIG. 24 is a diagram showing a fixed multiplier in the second-dimensional product-sum circuit 22 in the same embodiment;
FIG. 25 is a diagram showing a first fixed multiplier in the product-sum circuit 25 in the same embodiment;
FIG. 26 is a diagram showing a second fixed multiplier in the product-sum circuit 25 in the same embodiment;
FIG. 27 is a diagram showing a first fixed multiplier in the product-sum circuit 26 in the same embodiment;
FIG. 28 is a diagram showing a second fixed multiplier in the product-sum circuit 26 in the same embodiment;
FIG. 29 is a diagram showing an adder used in the product-

sum circuits

14, 25 and 26 in the same embodiment;
30 is a diagram showing operation timings of the

delay circuits

11 and 12, the multiprocessing circuit 21, and the product-sum circuit 22 in the same embodiment. FIG.
31 is a diagram showing operation timings of the addition / subtraction circuit 13, the product-sum circuit 14, and the addition /

subtraction circuits

23 and 24 in the embodiment.
FIG. 32 is a view showing the operation timing of the addition /

subtraction circuits

15 and 16 and the product-

sum circuits

25 and 26 in the same embodiment;
33 is a diagram showing the operation timing of the addition / subtraction circuits 41 to 44 in the same embodiment. FIG.
FIG. 34 is a block diagram showing a schematic configuration of a two-dimensional inverse DCT circuit.
[Explanation of symbols]
10: First partial inverse DCT circuit
11, 12 ... 3 clock delay circuit
13, 15, 16 ... addition / subtraction circuit
14: Product-sum circuit composed of two adders / subtracters and one fixed multiplier
20 ... Second partial inverse DCT circuit
21. Multiple processing circuit
22... Product-sum circuit composed of one adder / subtracter and one fixed multiplier
23, 24 ... Addition / subtraction circuit
25, 26... Product-sum circuit composed of two fixed multipliers and one adder / subtracter
30: Output arithmetic circuit
31-33 ... addition / subtraction circuit

Claims

A first partial inverse DCT circuit for performing inverse discrete cosine transform processing by simultaneously inputting even-numbered data of 8 points of input discrete cosine transform coefficient data two by two, and the 8-point discrete cosine transform coefficient Add and subtract output data from a second partial inverse DCT circuit that performs inverse discrete cosine transform processing by simultaneously inputting odd-numbered data of two points at a time, and the first and second partial inverse DCT circuits. And an output arithmetic circuit for obtaining inverse discrete cosine transform data of 8 points ,
The first partial inverse DCT circuit includes a first addition / subtraction circuit that performs addition / subtraction of two points of discrete cosine transform data input simultaneously to the first and second input terminals, and first and second input terminals. A first product-sum circuit that performs a product-sum operation of two points of discrete cosine transform data that are input simultaneously, and an addition / subtraction of the output data of the first addition-subtraction circuit and the output data of the first product-sum circuit are mutually performed An inverse discrete cosine transform circuit comprising: a second and a third add / subtract circuit that perform at different timings, wherein the first product-sum circuit includes a fixed coefficient multiplier that performs multiplication and rounding of a fixed coefficient .

A first partial inverse DCT circuit for performing inverse discrete cosine transform processing by simultaneously inputting even-numbered data of 8 points of input discrete cosine transform coefficient data two by two, and the 8-point discrete cosine transform coefficient Add and subtract output data from a second partial inverse DCT circuit that performs inverse discrete cosine transform processing by simultaneously inputting odd-numbered data of two points at a time, and the first and second partial inverse DCT circuits. An output arithmetic circuit for obtaining inverse discrete cosine transform data of 8 points,
The second partial inverse DCT circuit is simultaneously input to a multiprocessing circuit that multiplexes two points of discrete cosine transform data input simultaneously to the third and fourth input terminals, and to the third and fourth input terminals. A second product-sum circuit that performs a product-sum operation on the two discrete cosine transform data, and an addition / subtraction of the output data of the multiprocessing circuit and the output data of the second product-sum circuit at different timings. 4 and the fifth addition / subtraction circuit, and the third and fourth product / sum circuits that perform the product-sum operation on the output data of the fourth and fifth addition / subtraction circuits at different timings, respectively. 3. The inverse discrete cosine transform circuit according to claim 3, wherein each of the third and fourth product-sum circuits includes a fixed coefficient multiplier that performs multiplication and rounding of a fixed coefficient.

  A first partial inverse DCT circuit for performing inverse discrete cosine transform processing by simultaneously inputting even-numbered data of 8 points of input discrete cosine transform coefficient data two by two, and the 8-point discrete cosine transform coefficient Add and subtract output data from a second partial inverse DCT circuit that performs inverse discrete cosine transform processing by simultaneously inputting odd-numbered data of two points at a time, and the first and second partial inverse DCT circuits. An output arithmetic circuit for obtaining inverse discrete cosine transform data of 8 points,
  The first partial inverse DCT circuit includes a first addition / subtraction circuit that performs addition / subtraction of two points of discrete cosine transform data input simultaneously to the first and second input terminals, and first and second input terminals. A first product-sum circuit that performs a product-sum operation of two points of discrete cosine transform data that are input simultaneously, and an addition / subtraction of the output data of the first addition-subtraction circuit and the output data of the first product-sum circuit are mutually performed Second and third addition / subtraction circuits that perform at different timings,
  The second partial inverse DCT circuit is simultaneously input to a multiprocessing circuit that multiplexes two points of discrete cosine transform data input simultaneously to the third and fourth input terminals, and to the third and fourth input terminals. A second product-sum circuit that performs a product-sum operation on the two discrete cosine transform data, and an addition / subtraction of the output data of the multiprocessing circuit and the output data of the second product-sum circuit at different timings. 4 and 5th addition / subtraction circuit, and third and fourth product / sum circuits that perform product-sum operations on output data of the fourth and fifth addition / subtraction circuits at different timings,
  The inverse discrete cosine transform circuit characterized in that each of the first, second, third and fourth product-sum circuits has a fixed coefficient multiplier for performing multiplication and rounding processing of a fixed coefficient.

The output arithmetic circuit includes sixth and seventh addition / subtraction circuits for adding / subtracting the output data of the second addition / subtraction circuit and the output data of the third product-sum circuit at different timings, and the third addition / subtraction circuit. 4. The inverse discrete cosine transform circuit according to claim 3, further comprising seventh and eighth addition / subtraction circuits for performing addition / subtraction between output data of the circuit and output data of the fourth product-sum circuit at different timings. .