JP4198331B2

JP4198331B2 - Recording device

Info

Publication number: JP4198331B2
Application number: JP2001132418A
Authority: JP
Inventors: 圭二日室
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2001-04-27
Filing date: 2001-04-27
Publication date: 2008-12-17
Anticipated expiration: 2021-04-27
Also published as: JP2002330390A

Description

【０００１】
【発明の属する技術分野】
本発明は録画装置に関し、例えば、放送されるビデオや音声を含む番組を、ハードディスク（ＨＤＤ）やデジタルビデオディスク（ＤＶＤ）などの記録メディアに録画する録画装置に関する。
【０００２】
【従来の技術】
昨今、記録保存メディアやその周辺装置および画像処理技術などが急速に進歩してきている。これにより、現行のテレビ放送波の品質を維持し、個人が気軽に映像データ（放送コンテンツ）をＨＤＤ（ハードディスク）やＤＶＤ（デジタルビデオディスク）などの記録メディアに保存したり編集するといった機能を実現する次世代のビデオ録画装置が提供されてきている。
【０００３】
このような録画技術環境のなかにおいて、たとえば、特開平７−１８２３６５号公報の「マルチメディア会議録作成支援装置および方法」にはキーワード、発言者などを、画像あるいは音声認識してその重要度を判定し、その結果にしたがってダイジェスト版を作成する旨が開示されている。
【０００４】
また、特開平１１−１９６３８５号公報の「蓄積型情報放送システムと、このシステムの受信端末装置」には、ＴＶコンテンツのダイジェスト版を、ＥＰＧ（電子番組ガイド）としてローカルに受信し、嗜好分析やキーワード検索後、受信する本コンテンツを決定し、蓄積する技術が開示されている。
【０００５】
【発明が解決しようとする課題】
しかしながら、上記に示されるような従来の技術にあっては、音声認識によって発言者を特定してダイジェスト版を作成するものの、たとえば、スポーツ中継などにおいて注目度の高いシーンを抽出しダイジェスト版を撮影することができず、かつ簡単な構成および低録画領域でのダイジェスト版を作成するものではなかった。
【０００６】
本発明は、上記の課題に鑑みてなされたものであって、簡単な構成でダイジェスト版の出力を可能にすることを目的とする。
【０００７】
【課題を解決するための手段】
上記の課題を解決するために、本発明にかかる録画装置は、
動画情報や音声情報からなるコンテンツを録画する録画装置において、
前記コンテンツを一定時間ごとに複数のシーンに分割し、
該分割された複数のシーンの音声レベルに基づいて作成した情報により、
前記複数のシーンのうち音声レベルがその前又は後のシーンの音声レベルより所定の音声レベル以上相対的に高いシーン及びその前後のシーンをダイジェストシーンとして抽出する際に、前記前後のシーンの切り出す数を変える手段を備えている
構成とした。
【０００８】
ここで、ユーザからの指示をうけて前記ダイジェストシーンを抽出する動作を行う構成とできる。
【０００９】
また、前記コンテンツを記憶する手段を備え、前記記憶したコンテンツについて前記ダイジェストシーンを抽出する動作を行う構成とできる。この場合、前記記憶するコンテンツの形式がＭＰＥＧ形式である構成とできる。
【００１０】
また、前記コンテンツの前記複数のシーンへの分割は更にコマーシャルでの分割を含む構成、また、前記コンテンツの前記複数のシーンへの分割は更に場面変更での分割を含む構成とできる。
【００１１】
また、前記抽出したダイジェストシーンをつなぎ合わせて出力する手段を備えている構成とできる。また、前記抽出するシーンの数をシーン数及び時間の少なくともいずれかに基づいて変更する構成とできる。
【００１２】
また、前記ダイジェストシーンの抽出結果に基づいて前記コンテンツを編集し、編集結果を保存するとともに、編集前のコンテンツを削除する構成とできる。
【００１３】
【発明の実施の形態】
以下、本発明にかかる録画装置の好適な実施の形態について添付図面を参照し、詳細に説明する。なお、本発明はこの実施の形態により限定されるものではない。
【００１４】
まず、録画装置の構成について説明する。図１は、本発明の実施の形態にかかる録画装置の構成を示すブロック図である。この録画装置１０は、通常のＶＴＲ（ＶＣＲ）などと同様にテレビ番組などの動画情報を録画する録画環境を実現するものである。このため、録画装置１０にはこの装置全体を統括的に制御するコントローラ１１が設けられている。コントローラ１１には、後述するように、外部入力部１２と、放送チューナ１３と、画像取込圧縮部１４と、録画部１５と、編集部１６と、他の操作ＳＷ部１７と、ＳＷ１８と、が接続されている。
【００１５】
コントローラ１１は、高機能のマイクロコンピュータ・システムで構成される。すなわち、コントローラ１１は、制御プログラムにしたがって統括的な制御を実行するＣＰＵ２０と、制御プログラムなどが格納されているＲＯＭ２１と、ワーキングメモリとして用いられるＲＡＭ２２と、予約録画などに用いられるタイマー２３と、を備えている。
【００１６】
外部入力部１２は、コントローラ１１を介してユーザが各種の入力操作を行なうように、入力キー群、液晶やＬＥＤなどによる表示パネルなどによって構成されている。すなわち、外部入力部１２は、リモートコントローラあるいは各装置に設けられているスイッチなどを備え、開始信号、中断信号、番組開始時刻、番組終了時刻などを設定するように構成されている。
【００１７】
画像取込圧縮部１４は、たとえば、動画像をキャプチャ（ｃａｐｔｕｒｅ：ファイルとして取りこむ）した後、ＭＰＥＧフォーマットで圧縮処理を行なう。なお、ＭＰＥＧは、ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ／ＭｏｖｉｎｇＰｉｃｔｕｒｅＩｍａｇｅＥｘｐｅｒｔｓＧｒｏｕｐの略称であり、カラー動画像符号化方式の標準化作業を推進する組織により標準化された符号化方式である。
【００１８】
動画の圧縮符号化方式は、テレビ会議用に作られたアルゴリズムであるＤＣＴ（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ：離散コサイン変換）を用いており、リアルタイムで符号化できる。また、ＭＰＥＧには、Ｈ．２６１、ＭＰＥＧ１、ＭＰＥＧ２といったポピューラーな３つの方式があるが、記録メディアや入出力機能、放送メディアなどに合わせて選択されるもので、このいずれであってもよく、さらに他の動画圧縮方式であってもよい。
【００１９】
放送チューナ１３は、通常のテレビと同様の働きをするものであり、一般のテレビの代用であってもよい。解凍部１９は、圧縮方式がＭＰＥＧフォーマットである場合に、通常のＴＶ信号（ＮＴＳＣ（ＮａｔｉｏｎａｌＴｅｌｅｖｉｓｉｏｎＳｙｓｔｅｍＣｏｍｍｉｔｔｅｅ）方式）に復号化（デコード）し、テレビでの視聴が可能な信号を出力するものであり、復号化された画像を再生し、表示装置（図示せず）に送る。
【００２０】
録画部（保存装置）１５は、ＨＤＤ（ハードディスク）やＤＶＤ（デジタルビデオディスク）などの保存メディアであり、圧縮された番組データ（画像、音声など）を保存する装置である。録画部（保存装置）１５には、通常録画領域２５とダイジェスト版録画領域２６とが設けられている。なお、この実施の形態では、通常録画領域２５とダイジェスト版録画領域２６とを設けているが、場合によってはこの２つ録画領域は特に設けなくてもよい。
【００２１】
編集部１６は、ＭＰＥＧデータの切り取り、音声レベルのサーチ、タグ情報作成、一時データの保存、ＭＰＥＧデータのマージ（ｍｅｒｇｅ）機能などを行なうブロックである。ＳＷ１８は、ダイジェスト版作成用のスイッチである。
【００２２】
つぎに、以上のように構成された録画装置の動作について説明する。
通常の番組録画は、ＶＴＲと同様に放送チューナ１３の出力を、画像取込圧縮部１４でキャプチャした後に所定のＭＰＥＧフォーマットで圧縮し、録画部１５に保存する。また、タイマ予約の場合は、通常録画領域２５に録画される。
【００２３】
つぎに、本発明の特徴となる動作について説明する。図２は、本発明にかかる録画装置の動作例を示すフローチャートである。まず、ＳＷ１８がユーザによって押下されると（ステップＳ１１）、指定されたコンテンツ全域の音声を編集部１６でサーチ（音声スキャン）する（ステップＳ１２）。続いて、あらかじめ定めたスレッシュレベル以上の範囲のタグ情報を作成する（ステップＳ１３）。すなわち、音声レベルの高い部分（特定した値を越えた部分）のタグ（インデックス）情報を作成する。つまり、ここでは記憶したコンテンツについてシーンの分割を行うようにしている。
【００２４】
続いて、タグ情報時間データにおける前時間−Ｚ時間、後時間＋Ｚ時間を算出する（ステップＳ１４）。さらに、上記タグ情報を元にタグ領域をマージし（ステップＳ１５）、ファイル名をつけて保存し（ステップＳ１６）、本コンテンツを削除する（ステップＳ１７）。
【００２５】
すなわち、ここでは、タグ情報の前後の一定時間あるいはあらかじめ分割されているシーン数に基づくシーン（タグ情報±１シーンなど）が切り出され、それぞれがつなぎ合わされ、録画部１５のダイジェスト版録画領域２６に保存される。なお、特定コンテンツの選択方法は、従来のＶＴＲ，ＣＤなどと同様に行なう。
【００２６】
また、上記タグ情報を作成した後、タグ情報＋α部分を切り出し、マージ（１つの順序付けられたリストを作成する）する。マージされて作成完了したダイジェスト版は、ダイジェスト版録画領域２６に別名で保存される。このタグ情報＋α部分の作成方法として、シーン数指定による方法、時間指定による方法を用いる。
【００２７】
また、タグ情報作成方法として、音声レベルがあるレベルを越えた範囲のタグ情報を作成する方法や、特定シーンの音声レベル（瞬間、または平均）と全体平均の音声レベルの比率が一定レベルを越えたシーンのタグ情報作成方法を採用する。
【００２８】
つぎに、ダイジェスト版作成例について図３、図４を用いて説明する。図３は、本発明の実施の形態にかかるダイジェスト版作成例（その１）を示す説明図である。図３における符号１００ａは３分毎に分割された本コンテンツ、符号１１０ａはダイジェスト版である。この例では、分割区間における平均音声レベルをサーチし、音声スレッシュレベルが５以上のものダイジェスト版１１０ａとして作成する。
【００２９】
すなわち、本コンテンツ１００ａは、あらかじめ一定時間（ここでは、３分）単位で分割しておき、各分割単位の平均音声レベルを算出しておく。ダイジェスト版作成時に一定の音声スレッシュレベルを設定（この例では５以上とする）を設定しておき、そのレベル以上の場所にタグ情報を付加する。なお、この付加方法は、別領域に、タグ情報・領域Ｎｏまたは時間範囲情報のペアで確保する。続いて、タグ部のみをマージ（１つの順序付けられたリストを作成する）してダイジェスト版１１０ａを作成し、別領域に別名で保存する。
【００３０】
このシーン分割の方法は、時間単位以外の図示しない方法（場面変更認識、ＣＭ−ＣＭ間など）でもよい。また、音声レベルの検出は、前後の音声レベルの比率、たとえば、対前シーン平均音声レベル≧２の部分でタグ情報作成などによって行なう。
【００３１】
図４は、本発明の実施の形態にかかるダイジェスト版作成例（その２）を示す説明図である。図４における符号１００ｂは３分毎に分割された本コンテンツ、符号１１０ｂはダイジェスト版である。ここでは、本コンテンツ１００ｂをアナログ的に音声スキャンし、音声レベル１０１が、音声スレッシュレベル１０２を越えた領域から、前後の一定時間を抜き取り、タグ情報とする。その後は前述と同様に、タグ部のみをマージ（１つの順序付けられたリストを作成する）してダイジェスト版１１０ｂを作成し、別領域に別名で保存する。
【００３２】
なお、上述したＳＷ１８を設けずに、簡易ダイジェスト版録画モードを選択して番組を録画した後、簡易ダイジェスト版を作成して保存し、元コンテンツを削除することにより、録画領域を短縮する構成としてもよい。
【００３３】
上述における音声レベルのタグ付けは、絶対レベルだけでなく、全域平均レベルに対する特定部分の音声レベルの比率にしたがって行なうか、あるいはコンテンツ全域を、あらかじめ細部に分割し、タグ情報シーンの前後のシーンに対する音声レベルの絶対値または比率で設定してもよい。
【００３４】
また、上記特定部分やコンテンツの細部分割は、図示しないが、時間分割による方法、コマーシャルの検出による方法、画像認識、ズーミング検出などによる細かいシーン分割など従来からの方法のいずれかを用いて実現される。また、タグ部分前後の切り出しは、時間によるものの他に、上述の方法で分割したシーン数、あるいは時間とシーンの組み合わせなどを用いてもよい。
【００３５】
したがって、以上述べてきた録画装置によれば、簡単な構成で、ダイジェスト版の作成を行なうことができる。特に、スポーツ中継などのダイジェスト版の作成では、注目度の高い場面において特にアナウンサ／解説者／観客による音声レベルが高くなるため、この高い音声レベルの部分を利用することで簡単でレベルの高い簡易ダイジェスト版を作成することができる。
【００３６】
また、タグ前後のシーン切り出し時間を変えることにより、自分好みのダイジェスト作成機能にカスタマイズすることが可能になる。さらに、キーワード検出や画像解析といった従来の方法に比べ、より簡単な方法で盛り上がった部分のみを視聴するダイジェスト版を作成することができる。
【００３７】
このように、本実施形態にかかる録画装置によれば、無線または有線で放送されるビデオや音声を含む番組などから供給されるコンテンツを録画する録画装置において、前記コンテンツを保存する録画保存手段と、前記録画保存手段に保存されているコンテンツの音声レベルを検出し、当該音声レベルにしたがってタグ情報を作成する編集手段と、を備えているので、番組などの録画対象の画像をＨＤＤやＤＶＤなどの保存メディアに録画する際に、録画領域における全域に対して音声レベルをサーチし、その音声レベルが周りより高い部分を抽出し、その抽出した部分、たとえば、スポーツ番組などにおいてアナウンサ・解説者や観客の歓声による音声が高い部分の注目シーンについてタグ（インデックス）情報を作成することが可能になるので、簡単でレベルの高いダイジェスト版の作成が実現すると共に、元コンテンツを削除すれば、録画領域を削減することができる。
【００３８】
また、この場合、編集手段は、タグ情報近辺のシーンを自動編集し、簡易ダイジェスト版を作成することにより、たとえば、スポーツ番組などにおいてアナウンサ・解説者および観客の歓声による音声が高い音声レベルを注目シーンの基準として求めたタグ情報を用いるため、レベルの高い注目シーンのダイジェスト版を作成することができる。
【００３９】
また、編集手段は、音声レベルとして音量の絶対値を用いて音声レベルを検出することにより、音声レベルの絶対値があらかじめ定めた閾値を越えた範囲をタグ情報として付加するため、簡単な方法による注目シーンのタグ情報を作成することができる。
【００４０】
また、編集手段は、音声レベルとしてタグ情報近辺または全体の平均音量との比率を用いることにより、音声レベルをスキャンしてタグ情報を作成する際に、平均音声レベル、タグ情報前後のシーンとの比率を使用するため、スポーツ番組などの注目シーンをさらに正確に確保することができる。
【００４１】
また、タグ情報近辺のシーンをあらかじめ分割設定されたシーンの数にしたがってタグ情報近辺のシーンを自動編集することにより、自分好みのダイジェスト作成機能にカスタマイズすることができる。
【００４２】
また、タグ情報前後の特定時間にしたがってタグ情報近辺のシーンを自動編集するので、自分好みのダイジェスト作成機能にカスタマイズすることができる。
【００４３】
【発明の効果】
以上説明したように、本発明に係る録画装置によれば、コンテンツを一定時間ごとに複数のシーンに分割し、該分割された複数のシーンの音声レベルに基づいて作成した情報により、複数のシーンのうち音声レベルがその前又は後のシーンの音声レベルより所定の音声レベル以上相対的に高いシーン及びその前後のシーンをダイジェストシーンとして抽出する際に、前記前後のシーンの切り出す数を変える手段を備えているので、簡単な構成でレベルの高いダイジェスト版の作成が可能になる。
【図面の簡単な説明】
【図1】本発明の実施の形態にかかる録画装置の構成を示すブロック図である。
【図２】本発明にかかる録画装置の動作例を示すフローチャートである。
【図３】本発明の実施の形態にかかるダイジェスト版作成例（その１）を示す説明図である。
【図４】本発明の実施の形態にかかるダイジェスト版作成例（その２）を示す説明図である。
【符号の説明】
１０録画装置
１１コントローラ
１２外部入力部
１３放送チューナ
１４画像取込圧縮部
１５録画部（保存装置）
１６編集部
１８ＳＷ
２５通常録画領域
２６ダイジェスト版録画領域[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a recording apparatus, for example, regarding a program including a video and audio to be broadcast, the recording apparatus for recording on a recording medium such as a hard disk (HDD) or a digital video disc (DVD).
[0002]
[Prior art]
In recent years, recording and storage media, peripheral devices, image processing technology, and the like have been rapidly advanced. As a result, it is possible to maintain the quality of the current TV broadcast wave and to enable individuals to easily save and edit video data (broadcast content) on recording media such as HDD (hard disk) and DVD (digital video disk). Next-generation video recording devices have been provided.
[0003]
In such a recording technology environment, for example, “Multimedia conference record creation support apparatus and method” disclosed in Japanese Patent Application Laid-Open No. 7-182365 recognizes keywords, speakers, and the like by image or voice recognition, and assigns importance to them. It is disclosed that a digest version is prepared according to the determination result.
[0004]
In addition, “Storage type information broadcasting system and receiving terminal device of this system” disclosed in Japanese Patent Application Laid-Open No. 11-196385 receives a digest version of TV content locally as an EPG (electronic program guide) for preference analysis and A technique for determining and storing the content to be received after keyword search is disclosed.
[0005]
[Problems to be solved by the invention]
However, in the conventional technique as described above, a digest version is created by identifying a speaker by voice recognition. For example, a scene with a high degree of attention is extracted in a sports broadcast or the like, and a digest version is shot. It was not possible to create a digest version with a simple configuration and a low recording area.
[0006]
The present invention has been made in view of the above problems, and an object thereof is to enable output of a digest version with a simple configuration.
[0007]
[Means for Solving the Problems]
In order to solve the above-described problem, a recording apparatus according to the present invention includes:
In a recording device that records content consisting of video information and audio information,
The content is divided into a plurality of scenes at regular intervals,
By the information created based on the sound level of the divided scenes,
When extracting the front and rear of the scene of the plurality of predetermined audio level over a relatively higher scene及patron audio level from the voice level of the before or after the scene of the scene as a digest scene is cut out of the front and rear scene Has means to change the number
The configuration.
[0008]
Here, an operation for extracting the digest scene in response to an instruction from the user can be performed.
[0009]
Also comprises a means for storing the content, for the stored content may configured to perform an operation for extracting the digest scene. In this case, it the configuration format of the content to be the storage is an MPEG format.
[0010]
The configuration includes a division in addition commercials divided into the plurality of scenes of the content, also divided into the plurality of scenes of the content may be configured further including a dividing of the scene changes.
[0011]
Further, the configuration includes means for outputting by connecting the digest scene of the extracted. The number of scenes to be extracted can be changed based on at least one of the number of scenes and time.
[0012]
Further, the content can be edited based on the digest scene extraction result, the edited result can be saved, and the content before editing can be deleted.
[0013]
DETAILED DESCRIPTION OF THE INVENTION
Preferred embodiments of a recording apparatus according to the present invention will be described below in detail with reference to the accompanying drawings. The present invention is not limited to this embodiment.
[0014]
First, the configuration of the recording apparatus will be described. FIG. 1 is a block diagram showing a configuration of a recording apparatus according to an embodiment of the present invention. The recording apparatus 10 realizes a recording environment for recording moving image information such as a television program as in a normal VTR (VCR). For this reason, the recording apparatus 10 is provided with a controller 11 for comprehensively controlling the entire apparatus. As will be described later, the controller 11 includes an external input unit 12, a broadcast tuner 13, an image capture compression unit 14, a recording unit 15, an editing unit 16, another operation SW unit 17, a SW 18, Is connected.
[0015]
The controller 11 is composed of a highly functional microcomputer system. That is, the controller 11 includes a CPU 20 that performs overall control according to a control program, a ROM 21 that stores a control program, a RAM 22 that is used as a working memory, and a timer 23 that is used for reserved recording. I have.
[0016]
The external input unit 12 is configured by an input key group, a display panel using liquid crystal, LED, or the like so that the user can perform various input operations via the controller 11. That is, the external input unit 12 includes a remote controller or a switch provided in each device, and is configured to set a start signal, an interruption signal, a program start time, a program end time, and the like.
[0017]
For example, the image capturing / compression unit 14 captures a moving image (capture: captures it as a file) and then performs compression processing in the MPEG format. Note that MPEG is an abbreviation for Moving Picture Experts Group / Moving Picture Image Experts Group, and is an encoding method standardized by an organization that promotes standardization work for color moving image encoding methods.
[0018]
The moving image compression encoding method uses DCT (Discrete Cosine Transform), which is an algorithm created for video conferencing, and can be encoded in real time. Also, MPEG includes H.264. There are three popular systems such as H.261, MPEG1, and MPEG2, but these are selected according to the recording medium, input / output function, broadcast medium, etc., and any of these may be used, and other video compression systems. May be.
[0019]
The broadcast tuner 13 functions in the same way as a normal television, and may be a substitute for a general television. The decompression unit 19 decodes (decodes) an ordinary TV signal (NTSC (National Television System Committee) method) and outputs a signal that can be viewed on a television when the compression method is MPEG format. Yes, the decoded image is reproduced and sent to a display device (not shown).
[0020]
The recording unit (storage device) 15 is a storage medium such as an HDD (hard disk) or a DVD (digital video disk), and is a device that stores compressed program data (images, sounds, etc.). The recording unit (storage device) 15 is provided with a normal recording area 25 and a digest version recording area 26. In this embodiment, the normal recording area 25 and the digest version recording area 26 are provided. However, in some cases, these two recording areas may not be provided.
[0021]
The editing unit 16 is a block that performs MPEG data cutting, audio level search, tag information creation, temporary data storage, MPEG data merge function, and the like. SW18 is a switch for creating a digest version.
[0022]
Next, the operation of the recording apparatus configured as described above will be described.
In normal program recording, the output of the broadcast tuner 13 is captured by the image capture compression unit 14 and then compressed in a predetermined MPEG format and stored in the recording unit 15 in the same manner as the VTR. In the case of timer reservation, recording is performed in the normal recording area 25.
[0023]
Next, operations that characterize the present invention will be described. FIG. 2 is a flowchart showing an operation example of the recording apparatus according to the present invention. First, when the SW 18 is pressed by the user (step S11), the editing unit 16 searches (voice scan) the audio of the entire designated content (step S12). Subsequently, tag information in a range equal to or higher than a predetermined threshold level is created (step S13). That is, tag (index) information of a portion with a high audio level (a portion exceeding the specified value) is created. In other words, the scene is divided for the stored content here.
[0024]
Subsequently, the previous time-Z time and the later time + Z time in the tag information time data are calculated (step S14). Further, the tag areas are merged based on the tag information (step S15), a file name is assigned and saved (step S16), and the content is deleted (step S17).
[0025]
In other words, here, scenes (tag information ± 1 scene, etc.) based on a predetermined time before or after the tag information or the number of scenes divided in advance are cut out and connected to each other in the digest version recording area 26 of the recording unit 15. Saved. The specific content selection method is performed in the same manner as conventional VTRs, CDs, and the like.
[0026]
Further, after the tag information is created, the tag information + α portion is cut out and merged (one ordered list is created). The digest version merged and completed is stored with a different name in the digest version recording area 26. As a method of creating the tag information + α portion, a method by specifying the number of scenes and a method by specifying time are used.
[0027]
In addition, tag information creation methods include creating tag information in a range where the audio level exceeds a certain level, or the ratio of the audio level (instantaneous or average) of a specific scene to the overall average audio level exceeds a certain level. The tag information creation method for the selected scene is adopted.
[0028]
Next, an example of creating a digest version will be described with reference to FIGS. FIG. 3 is an explanatory diagram showing a digest version creation example (part 1) according to the embodiment of the present invention. In FIG. 3, reference numeral 100a is the main content divided every three minutes, and reference numeral 110a is a digest version. In this example, the average voice level in the divided section is searched, and the digest version 110a having a voice threshold level of 5 or more is created.
[0029]
That is, the content 100a is divided in advance in units of a predetermined time (here, 3 minutes), and the average audio level of each division unit is calculated. When a digest version is created, a certain voice threshold level is set (in this example, 5 or more), and tag information is added to a place above that level. In this addition method, a tag information / area No or time range information pair is secured in another area. Subsequently, only the tag part is merged (one ordered list is created) to create a digest version 110a, which is stored in another area with a different name.
[0030]
This scene division method may be a method (not shown) other than the time unit (scene change recognition, between CMs). Also, the detection of the audio level is performed by creating tag information at a ratio of the audio levels before and after, for example, the portion of the average audio level of the previous scene ≧ 2.
[0031]
FIG. 4 is an explanatory diagram showing a digest version creation example (part 2) according to the embodiment of the present invention. In FIG. 4, reference numeral 100b is the main content divided every three minutes, and reference numeral 110b is a digest version. Here, the content 100b is voice-scanned in an analog manner, and a certain time before and after the voice level 101 exceeds the voice threshold level 102 is extracted as tag information. After that, as described above, only the tag part is merged (one ordered list is created) to create a digest version 110b, which is stored in another area with a different name.
[0032]
In addition, without providing the SW 18 described above, after recording a program by selecting the simple digest version recording mode, a simple digest version is created and stored, and the original content is deleted, thereby shortening the recording area. Also good.
[0033]
The tagging of the audio level in the above is performed according to the ratio of the audio level of the specific part with respect to the average level of the entire area as well as the absolute level, or the entire area of the content is divided into details in advance and the scenes before and after the tag information scene are applied. You may set by the absolute value or ratio of an audio | voice level.
[0034]
The detailed division of the specific part and content is realized by using any of the conventional methods such as time division method, commercial detection method, fine scene division by image recognition, zooming detection, etc., although not shown. The In addition to time-based clipping before and after the tag portion, the number of scenes divided by the above-described method or a combination of time and scene may be used.
[0035]
Therefore, according to the recording apparatus described above, a digest version can be created with a simple configuration. In particular, when creating a digest version for sports broadcasts, etc., the voice level of the announcer / explanator / audience is particularly high in scenes with high attention. A digest version can be created.
[0036]
Also, by changing the scene cut-out time before and after the tag, it is possible to customize it to a digest creation function of your choice. Furthermore, compared to conventional methods such as keyword detection and image analysis, it is possible to create a digest version in which only a portion that is raised by a simpler method is viewed.
[0037]
As described above, according to the recording apparatus according to the present embodiment, in the recording apparatus that records content supplied from a program including video or audio that is broadcast wirelessly or by wire, the recording storage unit stores the content. detects the audio level of the content stored in the recording storing means, an editing means for creating tag information in accordance with the sound level, is provided with the, recording target image such as a program HDD or a DVD, etc. When recording on the storage media, search the audio level for the entire recording area, extract the part where the audio level is higher than the surroundings, for example, announcer, commentator, It is now possible to create tag (index) information for attention scenes where the voice is high due to audience cheers. Since, the creation of simple and high-level digest with implementing, by deleting the original content, it is possible to reduce the recording area.
[0038]
Also, in this case, the editing means automatically edits the scenes near the tag information and creates a simple digest version, for example, paying attention to the high sound level due to the cheers of announcers, commentators and spectators in sports programs etc. Since the tag information obtained as a scene reference is used, a digest version of a high-level scene of interest can be created.
[0039]
Further, the editing means detects the sound level using the absolute value of the sound volume as the sound level, and adds a range where the sound level absolute value exceeds a predetermined threshold as tag information. Tag information of the scene of interest can be created.
[0040]
Further, the editing means uses the ratio of the average sound volume near the tag information or the entire average volume as the sound level, so that when the sound level is scanned and the tag information is created, the average sound level, the scene before and after the tag information, Since the ratio is used, a scene of interest such as a sports program can be more accurately secured.
[0041]
In addition, by automatically editing the scene in the vicinity of the tag information according to the number of the pre-division setting the scene in the vicinity of the tag information scene, it can be customized to digest ability to create their liking.
[0042]
In addition, since the scene near the tag information is automatically edited according to the specific time before and after the tag information, it can be customized to a digest creation function of your choice.
[0043]
【The invention's effect】
As described above, according to the recording apparatus of the present invention, the content is divided into a plurality of scenes at regular time intervals, and a plurality of scenes are obtained based on information created based on the audio levels of the divided scenes. means for varying the time of extracting the front and rear of the scene of the audio level before or after the relatively high predetermined sound level higher than the audio level of the scene scene及patron as digest scene, the number of cutting of the preceding and following scenes of Therefore, it is possible to create a high-level digest version with a simple configuration.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a recording apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart showing an operation example of the recording apparatus according to the present invention.
FIG. 3 is an explanatory diagram showing a digest version creation example (part 1) according to the embodiment of the present invention;
FIG. 4 is an explanatory diagram showing a digest version creation example (part 2) according to the embodiment of the present invention;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 Recording apparatus 11 Controller 12 External input part 13 Broadcast tuner 14 Image taking compression part 15 Recording part (storage apparatus)
16 Editing section 18 SW
25 Normal recording area 26 Digest version recording area

Claims

In a recording device that records content consisting of video information and audio information,
The content is divided into a plurality of scenes at regular intervals,
By the information created based on the sound level of the divided scenes,
When extracting the front and rear of the scene of the plurality of predetermined audio level over a relatively higher scene及patron audio level from the voice level of the before or after the scene of the scene as a digest scene is cut out of the front and rear scene A recording apparatus comprising means for changing the number .

The recording apparatus according to claim 1, wherein the operation of extracting the digest scene is performed in response to an instruction from a user.

3. The recording apparatus according to claim 1, further comprising means for storing the content, and performing an operation of extracting the digest scene from the stored content.

4. The recording apparatus according to claim 3, wherein a format of the stored content is an MPEG format.

In the recording apparatus according to any one of claims 1 to 4, the recording apparatus characterized by comprising a division in more commercials divided into the plurality of scenes of the content.

In the recording apparatus according to any one of claims 1 to 5, the recording apparatus characterized by comprising a division in the divided further scene change to the plurality of scenes of the content.

In the recording apparatus according to any one of claims 1 to 6, the recording apparatus characterized by comprising means for outputting by connecting the digest scene of the extracted.

2. The recording apparatus according to claim 1 , wherein the number of scenes before and after extraction is changed based on at least one of the number of scenes and time.

In the recording apparatus according to any one of claims 1 to 8, edit the content based on the digest scene of the extraction result, stores the editing result, and deletes the contents of the unedited recording apparatus.