JP4335087B2

JP4335087B2 - Sound playback device

Info

Publication number: JP4335087B2
Application number: JP2004221107A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2004-07-29
Filing date: 2004-07-29
Publication date: 2009-09-30
Anticipated expiration: 2024-07-29
Also published as: JP2006042091A

Description

本発明は、ＣＤ・ＤＶＤ等を用いた民生・業務用途における鑑賞用のパッケージ映像・音楽再生分野、放送事業者・公共施設の事業者等が商業目的で配信する環境映像・環境音楽分野において好適な音声信号および映像信号の再生技術に関する。 INDUSTRIAL APPLICABILITY The present invention is suitable in the field of packaged video / music playback for viewing in consumer / business use using CDs / DVDs, and in the field of environmental video / environmental music distributed for commercial purposes by broadcasters / business operators in public facilities TECHNICAL FIELD OF THE INVENTION

従来より、複数の映像を同一画面上に表示させる技術が利用されている。複数の映像を同時に表示させようとする場合、現実には、コンピュータが、先頭の映像ファイルから順に処理することになるが、コンピュータの処理能力が追いつかないときには、後の映像ファイルのフレームが処理されなくなることがある。特に、後の映像ファイルに記録されている映像の動きが速い場合、処理されないフレームがあると、ぎこちない動きの映像になるという問題があった。 Conventionally, a technique for displaying a plurality of videos on the same screen has been used. When trying to display multiple videos at the same time, the computer actually processes in order from the first video file, but when the computer's processing power cannot keep up, the frames of the subsequent video file are processed. It may disappear. In particular, when the motion of the video recorded in the later video file is fast, there is a problem that if there is a frame that is not processed, the video has awkward motion.

複数の映像を表示する場合については、優先的に表示したい映像に優先度を与えて大きな画面で表示するという技術も存在する（例えば、特許文献１参照）。
特開２００１−３４２５０号公報 In the case of displaying a plurality of videos, there is also a technique of giving priority to videos to be displayed preferentially and displaying them on a large screen (for example, see Patent Document 1).
JP 2001-34250 A

しかしながら、上記従来の技術では、限られたハードウェア資源の制約下でマルチ表示画面をスムーズに行うことはできないという問題がある。 However, the conventional technology has a problem that a multi-display screen cannot be smoothly performed under the limitation of limited hardware resources.

そこで、本発明は、限られたハードウェア環境において、複数の映像を再生する場合であっても、滑らかな再生が可能となる映像信号の再生装置を提供することを課題とする。 Therefore, an object of the present invention is to provide a video signal playback apparatus that can smoothly play back a plurality of videos in a limited hardware environment.

上記課題を解決するため、本発明では、複数の映像フレームを有し、各映像フレームに音響データブロックが対応付けて添付された音付映像ファイルを複数再生する音付映像の再生装置を、音付映像ファイルの映像フレーム間の差分に基づくフレーム間差分特徴量を記録したフレーム間差分テーブルと、複数の前記音付映像ファイルから各々対応する映像フレームに添付された音響データブロックを抽出し、１つの合成音響データブロックに合成する音響データブロック合成手段と、前記合成音響データブロックを音響出力デバイスに書き込み、音響再生させる音響出力手段と、前記音響出力手段において再生中の合成音響データブロックに対応する再生時刻を取得する処理制御手段と、前記各音付映像ファイル内の、前記取得した再生時刻に対応する映像フレームについて、前記フレーム間差分テーブルを参照し、当該映像フレームについての優先度を算出する優先度算出手段と、前記取得した再生時刻に対応する各音付映像ファイルの映像フレームの優先度に基づいて、当該映像フレームを表示するか否かを決定するとともに、表示すると決定された映像フレームを映像表示デバイスのメモリに書き込み、画面表示させる表示制御手段を有する構成としたことを特徴とする。 In order to solve the above-described problems, the present invention provides a sound video playback apparatus that plays back a plurality of video files with sound, each having a plurality of video frames and an audio data block associated with each video frame. Extract an inter-frame difference table that records inter-frame difference feature values based on differences between video frames of an attached video file, and an audio data block attached to each corresponding video frame from the plurality of video files with sound. Acoustic data block synthesis means for synthesizing into two synthesized acoustic data blocks; acoustic output means for writing the synthesized acoustic data block to an acoustic output device for acoustic reproduction; and synthetic acoustic data block being reproduced in the acoustic output means Processing control means for acquiring a playback time, and the acquired playback time in each video file with sound Referring to the inter-frame difference table for the corresponding video frame, priority calculation means for calculating the priority for the video frame, and the priority of the video frame of each sound-added video file corresponding to the acquired playback time Whether or not to display the video frame is determined based on the above, and the display control means for writing the video frame determined to be displayed to the memory of the video display device and displaying the screen is provided. .

本発明によれば、音付映像ファイルを複数同時に再生する際、再生する各ファイルにおける映像フレーム間の差分であるフレーム間差分を参照して各映像フレームの優先度を算出し、この優先度に従って、各映像フレームを表示するかどうかを決定するようにしたので、差分が大きく、動きの速い映像については、優先して表示されることとなり、限られたハードウェア環境においても、滑らかな再生が可能となる。 According to the present invention, when a plurality of sound-added video files are simultaneously played back, the priority of each video frame is calculated with reference to the inter-frame difference that is the difference between video frames in each file to be played back, and according to this priority Since it is determined whether to display each video frame, a video with a large difference and fast movement is displayed with priority, and smooth playback is possible even in a limited hardware environment. It becomes possible.

（１．音付映像ファイルの構造）
以下、本発明の実施形態について図面を参照して詳細に説明する。まず、本発明に係る音付映像の再生装置で再生の対象とする音付映像ファイルの構造について説明する。図１に、汎用的な音付映像ファイルの構造を模式的に表現した図を示す。図１において、Ｖは映像フレーム、Ａは音響データブロックを示しており、０〜Ｎは映像フレームに対応したフレーム番号を示している。フレーム番号は、先頭から第何番目のフレームであるかを示す番号であり、全体でフレーム数がＮ＋１の場合、０〜Ｎの番号が付されている。汎用的な映像フォーマットでは、１秒間３０フレームで構成されており、例えば、３分の動画データであれば、５４００フレームで構成されることになる。また、映像フレームＶは、圧縮されているのが通常であり、圧縮方式により、１つの映像フレームＶから静止画像を復元できる場合もあり、他の映像フレームＶを利用しなければ静止画像を復元できない場合もある。音響データは、フレーム単位、すなわち１／３０秒単位で区分され、音響データブロックＡとして記録される。例えば、サンプリング周波数４８ｋＨｚでステレオ音響信号をサンプリングした場合は、１つの音響データブロックには、１／３０秒に相当する３２００サンプルが記録されることになる。 (1. Structure of video file with sound)
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, the structure of a sound-added video file to be played back by the sound-added video playback apparatus according to the present invention will be described. FIG. 1 schematically shows the structure of a general-purpose sound-added video file. In FIG. 1, V indicates a video frame, A indicates an audio data block, and 0-N indicate frame numbers corresponding to the video frames. The frame number is a number indicating the number of the frame from the beginning. When the number of frames is N + 1 as a whole, numbers 0 to N are assigned. The general-purpose video format is composed of 30 frames per second. For example, if the video data is 3 minutes, it is composed of 5400 frames. In addition, the video frame V is usually compressed, and a still image may be restored from one video frame V by a compression method. If no other video frame V is used, the still image is restored. Sometimes it is not possible. The acoustic data is divided in units of frames, that is, in units of 1/30 seconds, and recorded as an acoustic data block A. For example, when a stereo sound signal is sampled at a sampling frequency of 48 kHz, 3200 samples corresponding to 1/30 seconds are recorded in one sound data block.

図１に示したような音付映像ファイルは、公知の手法により作成することができる。すなわち、ビデオカメラ等で撮影することにより得られる映像データと音響データが対応付けられた音付映像ファイルをそのまま使用しても良いし、撮影された映像データのみを使用するようにし、別に音響データを準備して先の映像データに対応付けて追記録するようにしても良い（音質面から後者の方法が一般的）。本実施形態では、別々に得られた映像データと音響データを対応付けて記録することにより作成している。 The sound-added video file as shown in FIG. 1 can be created by a known method. That is, a sound-attached video file in which video data obtained by shooting with a video camera or the like is associated with audio data may be used as it is, or only video data that has been shot is used. May be prepared and additionally recorded in association with the previous video data (the latter method is generally used in terms of sound quality). In the present embodiment, it is created by associating and recording separately obtained video data and audio data.

（２．音付映像ファイルの作成）
続いて、本実施形態における音付映像ファイルの作成について説明する。本実施形態では、本システムを、音付映像ファイルに素材として記録された音響データを合成して再生する装置に利用する場合を想定して説明する。本システムにおいては、合成する音付映像ファイルの数を適宜変更することができるが、本実施形態では、最大５つまで選択可能とした場合について説明する。この場合、音響データは、５つのトラックに設定されて再生されることになるが、各トラックについて例えば５つの楽曲を選択可能となるようにすると、全部で２５の音響データが必要となる。そのため、まず、録音等により得られたアナログの音響データをデジタル化して２５個のデジタルの音響データを得る。アナログの音響信号のデジタル化は、従来の一般的なＰＣＭの手法を用いて行う。具体的には、所定のサンプリング周波数でアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行う。このようにしてデジタル化した音響データは、量子化ビット数に応じた値をもつサンプルの時系列の集合となる。例えば、サンプリング周波数を４８ｋＨｚ、量子化ビット数を１６ビットとした場合、１秒間のアナログ音響信号は、−３２７６８〜３２７６７の値をとるサンプル４８０００個からなるデジタル音響データに変換されることになる。 (2. Creation of video file with sound)
Next, creation of a video file with sound in the present embodiment will be described. In the present embodiment, description will be made assuming that this system is used in an apparatus that synthesizes and reproduces audio data recorded as a material in a sound-added video file. In this system, the number of sound-added video files to be synthesized can be changed as appropriate. In the present embodiment, a case where a maximum of five video files can be selected will be described. In this case, the acoustic data is set to five tracks and reproduced. However, if, for example, five pieces of music can be selected for each track, a total of 25 acoustic data are required. Therefore, first, analog sound data obtained by recording or the like is digitized to obtain 25 digital sound data. Digitization of an analog acoustic signal is performed using a conventional general PCM method. Specifically, an analog sound signal is sampled at a predetermined sampling frequency, and an amplitude is converted into digital data using a predetermined number of quantization bits. The acoustic data digitized in this way becomes a set of time series of samples having a value corresponding to the number of quantization bits. For example, when the sampling frequency is 48 kHz and the number of quantization bits is 16 bits, an analog sound signal for 1 second is converted into digital sound data composed of 48000 samples having values of −32768 to 32767.

複数の音響データを合成して１つの再生音響データとして再生するためには、合成対象とする音響データの各再生時間が同一となるように加工する必要がある。これは、１つの音響データを基準として、他の音響データの各サンプル（各時刻において所定のビット数で量子化したもの）が、基準とした素材音響信号に時間的かつ音楽的に同期するように調整する処理を行う。また、本実施形態では、再生する利用者が、自由に音楽の構成を変化させることが可能なように、各音響データをメロディ、コード、リズム等のパートに分けて作成している。各音響データは、上述のようにアナログの音響信号をＰＣＭ等の手法でデジタルデータ化したものである。 In order to synthesize a plurality of pieces of sound data and reproduce them as one piece of reproduced sound data, it is necessary to process the sound data to be synthesized to have the same reproduction time. This is because each sample of other acoustic data (quantized with a predetermined number of bits at each time) is synchronized temporally and musically with the reference material acoustic signal on the basis of one acoustic data. Process to adjust to. Further, in this embodiment, each sound data is created by dividing it into parts such as melody, chord, rhythm, etc. so that the user who reproduces can freely change the composition of music. Each acoustic data is obtained by converting an analog acoustic signal into digital data by a technique such as PCM as described above.

ここで、５つのトラックとして設定する各音響データについて説明する。図２は、各トラックの音響データの信号波形を模式的に示したものである。図２の例では、各音響データが左右（Ｌ・Ｒ）２チャンネルで構成されるステレオ音響データの場合を示している。図２においては、説明の簡略化のため、信号の振幅値がある程度以上のレベルを有する部分、すなわち非無音部については同一の振幅で波形を示し、無音部は波形が無い状態で示している。複数のトラックの楽曲を合成して再生し、なおかつ各トラックについての楽曲を複数から選択可能とする場合、どのような組合せになっても、合成後の楽曲がまともなものとなるようにするために、各音響データは所定の規則に従って作成される必要がある。したがって、各トラックにおいては、どの楽曲を選択しても非無音部と無音部の時間的位置が原則同じになるように構成されている。すなわち、例えばトラック１用として準備される５つの各音響データは、原則同一位置に非無音部（有音部）、無音部を有するものとなるが、音楽的な変化が乏しくなることを避けるため、音楽規則上支障がない範囲で、非無音部と無音部の長さを多少変化させることも行われる。 Here, each acoustic data set as five tracks will be described. FIG. 2 schematically shows a signal waveform of acoustic data of each track. In the example of FIG. 2, a case is shown in which each piece of acoustic data is stereo acoustic data composed of two left and right (LR) channels. In FIG. 2, for simplification of explanation, a portion where the amplitude value of the signal has a level of a certain level or more, that is, a non-silent portion shows a waveform with the same amplitude, and a silent portion is shown without a waveform. . When combining and playing music from multiple tracks and making it possible to select multiple songs for each track, to ensure that the combined music is decent regardless of the combination In addition, each acoustic data needs to be created according to a predetermined rule. Therefore, each track is configured such that, regardless of which piece of music is selected, the temporal positions of the silent section and the silent section are basically the same. That is, for example, each of the five acoustic data prepared for the track 1 has a non-silent part (sound part) and a silent part at the same position in order to avoid poor musical change. The lengths of the non-silence portion and the silence portion are also changed somewhat within a range that does not hinder the music rules.

図２に示したような波形の各音響データを合成して再生すると、まず、最初にトラック１とトラック５からの音が聞こえ、次に、トラック５からの音が消えてトラック３からの音が聞こえ、次に、トラック１とトラック３からの音が消えてトラック２とトラック４からの音が聞こえ、次に、トラック２とトラック４からの音が消えてトラック１とトラック３からの音が聞こえ、次に、トラック３からの音が消えてトラック５からの音が聞こえ、最後にトラック１とトラック５からの音が消えるということになる。 When the acoustic data having the waveforms as shown in FIG. 2 are synthesized and reproduced, the sound from the track 1 and the track 5 is first heard, and then the sound from the track 5 disappears and the sound from the track 3 is heard. Then, the sound from track 1 and track 3 disappears and the sound from track 2 and track 4 is heard, then the sound from tracks 2 and 4 disappears and the sound from tracks 1 and 3 Then, the sound from the track 3 disappears, the sound from the track 5 is heard, and finally the sounds from the track 1 and the track 5 disappear.

一方、映像データは、上記音響データに記録された音の内容に合わせたものを撮影し、撮影したデータを所定の方式で圧縮符号化する。例えば、各トラックに各国を代表するような楽器の音響データを設定する場合、その国の風景を撮影した映像を映像データとして撮影する。続いて、別々に得られた映像データと音響データを統合して１つの音付映像ファイルとすることになるが、これは、映像データの各フレームに対して、対応する時間相当の音響データブロックを記録することにより行われる。例えば、映像データが３０ｆｐｓ（フレーム／秒）、音響データが４８ｋＨｚステレオでサンプリングされている場合、３２００サンプルを１つの音響データブロックとして１つの映像フレームと対応付けて記録される。すなわち、図１に示したような形態で記録されることになる。 On the other hand, the video data is photographed according to the sound content recorded in the sound data, and the photographed data is compression-coded by a predetermined method. For example, when the acoustic data of a musical instrument representing each country is set for each track, an image of the scenery of the country is captured as image data. Subsequently, the separately obtained video data and audio data are integrated into one sound-added video file, which corresponds to an audio data block corresponding to the time corresponding to each frame of the video data. This is done by recording. For example, when video data is sampled at 30 fps (frame / second) and audio data is sampled at 48 kHz stereo, 3200 samples are recorded as one audio data block in association with one video frame. That is, it is recorded in the form as shown in FIG.

本実施形態においては、さらに、メニュー画面用にデータを加工し、記録している。具体的には、上記２５個の音付映像ファイルから先頭の１０秒を取り出し、メニュー用映像ファイルとデモ再生用音響ファイルを作成している。メニュー用映像ファイルは、上記音付映像ファイルから取り出した１０秒分の映像データを、各映像フレーム単位で２５個合成してメニュー用合成フレームを生成する。これにより、３００枚のメニュー用合成フレームから構成されるメニュー用映像ファイルが得られる。一方、デモ再生用音響ファイルは、上記音付映像ファイルから取り出した１０秒分の音響データブロックを連続して記録することにより得られる。また、映像を合成して再生する際に、その背景とする場合には、別途背景映像用の映像データである背景映像ファイルを用意しておく。背景映像ファイルは、構造的には、１秒３０フレームの通常の映像ファイルである。 In the present embodiment, data is further processed and recorded for the menu screen. Specifically, the first 10 seconds are extracted from the 25 sound-added video files, and a menu video file and a demo playback sound file are created. The menu video file generates a menu composite frame by synthesizing 25 pieces of video data for 10 seconds extracted from the sound-added video file for each video frame. Thus, a menu video file composed of 300 menu composite frames is obtained. On the other hand, the demo playback audio file is obtained by continuously recording 10-second audio data blocks extracted from the sound-added video file. In addition, when a video is synthesized and played back, a background video file, which is video data for background video, is prepared separately as the background. The background video file is structurally a normal video file of 30 frames per second.

（３．フレーム間差分テーブルの作成）
まず、事前の準備として、各音付映像ファイル内の各映像フレーム間の差分の特徴量を記録したフレーム間差分テーブルを作成する。フレーム間差分特徴量は、連続する２つの映像フレーム間において、記録されている映像にどの程度動きがあるかを示すものである。フレーム間差分特徴量の算出にあたっては、まず、連続するフレーム間の同一の座標（ｘ，ｙ）の差分ｄ（ｘ，ｙ）を算出し、フレーム内における全画素（ｘ，ｙ）についての差分ｄ（ｘ，ｙ）を求める。なお、差分ｄ（ｘ，ｙ）は以下の〔数式１〕で算出される。 (3. Creation of difference table between frames)
First, as an advance preparation, an inter-frame difference table is created in which feature amounts of differences between video frames in each sound-added video file are recorded. The inter-frame difference feature amount indicates how much the recorded video is moved between two consecutive video frames. In calculating the inter-frame difference feature quantity, first, a difference d (x, y) of the same coordinates (x, y) between consecutive frames is calculated, and the difference for all the pixels (x, y) in the frame is calculated. d (x, y) is obtained. The difference d (x, y) is calculated by the following [Equation 1].

〔数式１〕
ｄ（ｘ，ｙ）＝［｛Ｒ（ｎ，ｘ，ｙ）−Ｒ（ｎ−１，ｘ，ｙ）｝²＋｛Ｇ（ｎ，ｘ，ｙ）−Ｇ（ｎ−１，ｘ，ｙ）｝²＋｛Ｂ（ｎ，ｘ，ｙ）−Ｂ（ｎ−１，ｘ，ｙ）｝²＋］^1/2 [Formula 1]
d (x, y) = [{R (n, x, y) -R (n-1, x, y)} ² + {G (n, x, y) -G (n-1, x, y) )} ² + {B (n, x, y) -B (n-1, x, y)} ² +] ^1/2

なお、上記〔数式１〕において、ｎ−１およびｎはフレーム番号を示している。全画素に対して算出された画素差分ｄ（ｘ，ｙ）の平均値がフレーム間差分特徴量として、フレーム間差分テーブルに記録される。フレーム間差分テーブルに記録された情報の一例を図３に示す。図３に示すように、全音付映像ファイルの連続する全フレーム間についての差分特徴量が記録されている。 In the above [Equation 1], n−1 and n indicate frame numbers. The average value of the pixel differences d (x, y) calculated for all pixels is recorded in the inter-frame difference table as the inter-frame difference feature amount. An example of information recorded in the inter-frame difference table is shown in FIG. As shown in FIG. 3, the difference feature amount between all the continuous frames of the video file with all sounds is recorded.

（４．システム構成）
図４は、本発明に係る映像信号再生装置の一実施形態を示すシステム構成図である。図４において、１０は音付映像ファイル記憶手段、１１はフレーム間差分テーブル、１２はデモ再生用音響ファイル記憶手段、１３はメニュー用映像ファイル記憶手段、１４は背景映像ファイル記憶手段、２０はファイル選択手段、３０は音響データブロック合成手段、４０は処理制御手段、５０は優先度算出手段、６０は表示制御手段、７０は音響出力手段、８０は映像表示デバイスである。 (4. System configuration)
FIG. 4 is a system configuration diagram showing an embodiment of a video signal reproducing apparatus according to the present invention. In FIG. 4, 10 is a sound video file storage means, 11 is an inter-frame difference table, 12 is a demo playback sound file storage means, 13 is a menu video file storage means, 14 is a background video file storage means, and 20 is a file. Selection means, 30 is an acoustic data block synthesizing means, 40 is a processing control means, 50 is a priority calculation means, 60 is a display control means, 70 is an acoustic output means, and 80 is a video display device.

図４において、音付映像ファイル記憶手段１０は、音付映像ファイルを記憶するための記憶装置である。デモ再生用音響ファイル記憶手段１２は、デモ再生用音響ファイルを記録した記憶装置である。メニュー用映像ファイル記憶手段１３は、メニュー用の映像ファイルを記録した記憶装置である。背景映像ファイル記憶手段１４は、背景映像ファイルを記録した記憶装置である。音響データブロック合成手段３０は、音付映像ファイルから音響データブロックを抽出して合成する機能を有している。処理制御手段４０は、音響再生と映像再生の処理タイミングを制御する機能を有している。優先度算出手段５０は、複数の音付映像ファイルの映像フレームの表示優先度を算出する機能を有している。表示制御手段６０は、算出された表示優先度にしたがって、映像フレームを映像表示デバイス８０のメモリに書き込み画面表示させる機能を有している。音響出力手段７０は、処理制御手段４０の指示タイミングにしたがって、合成された音響データブロックを音として出力する機能を有している。具体的には、コンピュータに装着された合成音響データブロックをサウンドデバイスに書き込み音響再生させる処理を行う。図３に示したシステムは、現実には、コンピュータおよびその周辺機器等のハードウェア、コンピュータに搭載する専用のソフトウェアにより実現される。特に、音響データブロック合成手段３０、処理制御手段４０、優先度算出手段５０、表示制御手段６０はＣＰＵが専用のソフトウェアによる指示を実行することにより実現される。 In FIG. 4, a sound-added video file storage means 10 is a storage device for storing a sound-added video file. The demo playback sound file storage means 12 is a storage device that records a demo playback sound file. The menu video file storage means 13 is a storage device in which a menu video file is recorded. The background video file storage means 14 is a storage device that records a background video file. The sound data block synthesizing unit 30 has a function of extracting and synthesizing the sound data block from the sound-added video file. The processing control means 40 has a function of controlling processing timings for sound reproduction and video reproduction. The priority calculating means 50 has a function of calculating display priority of video frames of a plurality of sound-added video files. The display control means 60 has a function of writing a video frame in the memory of the video display device 80 and displaying it on the screen according to the calculated display priority. The sound output means 70 has a function of outputting the synthesized sound data block as sound according to the instruction timing of the processing control means 40. Specifically, a process of writing a synthesized sound data block attached to a computer to a sound device and reproducing the sound is performed. The system shown in FIG. 3 is actually realized by hardware such as a computer and its peripheral devices, and dedicated software installed in the computer. In particular, the acoustic data block synthesizing unit 30, the process control unit 40, the priority calculation unit 50, and the display control unit 60 are realized by the CPU executing instructions with dedicated software.

（５．処理動作）
続いて、図４に示したシステムの処理動作について説明する。まず、システムを起動すると、図５に示すようなメニュー画面が表示される。図５において、Ｅは映像を表示するための映像表示領域であり、映像区画Ｅ１１〜映像区画Ｅ５５の２５の区画に分けられて映像が表示されている。ｃ１〜ｃ５は映像区画を選択していることを示すカーソルであり、１行に１つ用意されている。すなわち、映像は１行につき１つ選択可能となっており、初期状態では、図５に示すように左端の映像区画が選択されている。このメニュー画面では、図５に示すように２５個の映像が表示されているように見えるが、システムとしては、メニュー用映像ファイルを再生する処理を行っているだけであり、実際には、ディスプレイには１つの映像が１秒３０フレームのペースで表示されている。ただし、もともと異なる２５個の映像フレームを合成したものであるため、見る側から見ると、映像区画ごとに映像の内容が異なっている。この表示処理は、具体的には、表示制御手段６０が、メニュー用映像ファイル記憶手段１３からメニュー用映像ファイルを抽出し、再生することにより行われる。 (5. Processing operation)
Next, the processing operation of the system shown in FIG. 4 will be described. First, when the system is activated, a menu screen as shown in FIG. 5 is displayed. In FIG. 5, E is a video display area for displaying a video, and the video is divided into 25 sections of a video section E11 to a video section E55. c1 to c5 are cursors indicating that a video section is selected, and one cursor is prepared for each line. That is, one video can be selected per line, and in the initial state, the leftmost video section is selected as shown in FIG. On this menu screen, it seems that 25 images are displayed as shown in FIG. 5, but the system only performs a process of reproducing the menu image file. One video is displayed at a pace of 30 frames per second. However, since 25 different video frames are originally synthesized, the content of the video is different for each video section when viewed from the viewing side. Specifically, this display processing is performed by the display control unit 60 extracting the menu video file from the menu video file storage unit 13 and playing it back.

このようなメニュー画面上で、利用者は合成再生の対象とする音付映像ファイルを選択することになる。具体的には、利用者が、表示されている映像区画をクリックすることにより行われる。利用者は、各行につき１つの映像区画を選択し、全部で５つの映像区画が選択されることになる。利用者が他の映像区画をクリックすると、クリックされた映像上にカーソルが移動すると共に、その映像区画に対応するデモ再生用音響ファイルが、デモ再生用音響ファイル記憶手段１２から抽出され、音響出力手段７０により再生出力される。すなわち、利用者にしてみると、クリックした映像区画に対応した音を、その場で聴くことができる。上述のように、デモ再生用音響ファイルは１０秒程度の長さであり、利用者が合成再生する対象となる演奏等の内容を確認するために用いられる。このようにして、利用者は、各行につき、１つの映像区画を選択していく。利用者が選択した後の画面の状態を図６に示す。図６の例では、１行目では映像区画Ｅ１２、２行目では映像区画Ｅ２３、３行目では映像区画Ｅ３５、４行目では映像区画Ｅ４１、５行目では映像区画Ｅ５３がそれぞれ選択されたことを示している。 On such a menu screen, the user selects a video file with sound to be synthesized and reproduced. Specifically, this is performed by the user clicking on the displayed video section. The user selects one video section for each row, and a total of five video sections are selected. When the user clicks on another video section, the cursor moves on the clicked video, and a demo playback sound file corresponding to the video section is extracted from the demo playback sound file storage means 12 and output as sound. Reproduced and output by means 70. That is, for the user, the sound corresponding to the clicked video section can be heard on the spot. As described above, the demo playback sound file has a length of about 10 seconds and is used by the user to confirm the contents of a performance or the like to be synthesized and played back. In this way, the user selects one video section for each row. The state of the screen after the user has selected is shown in FIG. In the example of FIG. 6, the video segment E12 is selected in the first row, the video segment E23 is selected in the second row, the video segment E35 is selected in the third row, the video segment E41 is selected in the fourth row, and the video segment E53 is selected in the fifth row. It is shown that.

ここで、このようなメニュー画面の構造について説明しておく。メニュー画面は、図７に示すような素材選択ボタン群、メニュー用映像フレーム、カーソル用オーバレイウィンドウの３つのレイヤーで構成されている。そして、素材選択ボタン群の上にメニュー用映像フレームを重ね、さらにその上にカーソル用オーバレイウィンドウを重ねることにより図５、図６に示した映像表示領域における表示が行われることになる。そして、利用者が映像区画上をクリックすると、その下の素材選択ボタンが反応し、カーソルレイヤー上の対応する行に配置されたカーソルが移動することになる。例えば、利用者がメニュー画面上で、図７（ｄ）に示すような３行３列目の映像区画をクリックすると、図７（ｅ）に示すように３行目のカーソルが１列目から３列目に移動することになる。この際、選択された映像区画に対応したデモ再生用音響ファイルが再生されることになる。５つの映像区画が選択された状態で、再生ボタンをクリックすると、選択された映像区画に対応する音付映像ファイルが音付映像選択ファイル記憶手段１０から抽出され、再生されることになる。 Here, the structure of such a menu screen will be described. The menu screen is composed of three layers of a material selection button group, a menu video frame, and a cursor overlay window as shown in FIG. Then, a menu video frame is superimposed on the material selection button group, and a cursor overlay window is further superimposed on the menu video frame, whereby display in the video display area shown in FIGS. 5 and 6 is performed. When the user clicks on the video section, the material selection button below it reacts, and the cursor placed on the corresponding line on the cursor layer moves. For example, when the user clicks the video section in the third row and the third column as shown in FIG. 7D on the menu screen, the cursor in the third row starts from the first column as shown in FIG. 7E. It will move to the third row. At this time, the audio file for demo reproduction corresponding to the selected video section is reproduced. When the play button is clicked in a state where five video sections are selected, a sound-added video file corresponding to the selected video section is extracted from the sound-added video selection file storage means 10 and played back.

以下、選択された複数の音付映像ファイルの再生について説明する。再生指示が行われると、音響データブロック合成手段３０は、選択された５つの各音付映像ファイルから、先頭の映像フレームに対応する音響データブロックを抽出し、合成する。具体的には、音付映像ファイルの映像フレームに添付されている音響データブロックに対して、同一時刻に対応する値の総和を算出することにより、１つの合成音響データブロックを作成する。この時、１フレーム分の１／３０秒に対応する音響データ単位に処理をすると、短過ぎて効率が悪く後述する映像再生を行う処理に回す時間的余裕が不十分になるため、実施例では１５フレーム分（０．５秒）ずつ音響データブロックを抽出して、合成するようにしている。そして、処理制御手段４０による指示にしたがって、合成音響データブロックを音響出力手段７０が音として再生する。 Hereinafter, reproduction of a plurality of selected video files with sound will be described. When a reproduction instruction is given, the audio data block synthesizing unit 30 extracts and synthesizes an audio data block corresponding to the first video frame from the selected five video files with sound. Specifically, one synthesized audio data block is created by calculating the sum of values corresponding to the same time for the audio data block attached to the video frame of the video file with sound. At this time, if processing is performed in units of acoustic data corresponding to 1/30 second for one frame, the efficiency is too short and the time for processing to perform video playback described later becomes insufficient. An acoustic data block is extracted every 15 frames (0.5 seconds) and synthesized. And according to the instruction | indication by the process control means 40, the sound output means 70 reproduces | regenerates a synthetic | combination sound data block as a sound.

音響出力手段７０が合成音響データブロックの再生を開始して、続いて再生する合成音響データブロックを待機させておくための音響出力手段７０内のバッファも満杯になると、ＣＰＵは、次のフレームの音響データブロックの処理をして音響出力手段７０に書き込むことができないため、そのままでは待ち状態に陥るが、処理制御手段４０は、その待ち時間を利用して、音響データの処理から映像フレームの処理に移行し、優先度算出手段５０、表示制御手段６０として、映像フレームの処理を行うことになる。各音付映像ファイルの先頭（第０フレーム）の映像フレームについては、表示制御手段６０は、無条件で映像表示デバイス８０に表示させる処理を行う。この際、背景映像ファイル記憶手段１４に記憶された背景映像ファイルから背景映像フレームを抽出し、背景映像フレームの上に重ねて表示させることもできる。背景映像としては動画を使用することもできるが、映像表示負荷が増大し、本願の趣旨から外れるため、実施例では静止画として第１フレームを表示するときだけ行うようにしている。このときの映像表示画面の様子を図８に示す。この際、表示制御手段６０は、前記選択された各音付映像ファイルの映像フレームに対して所定の画素アドレスだけオフセットをかけて出力する処理を行い、選択された５つの音付映像ファイルの映像フレームが互いに重ならないようにして出力している。 When the sound output means 70 starts reproduction of the synthesized sound data block and the buffer in the sound output means 70 for waiting for the synthesized sound data block to be subsequently reproduced is also full, the CPU Since the sound data block cannot be processed and written to the sound output means 70, the process control means 40 uses the waiting time to process the video frame from the sound data processing. Then, the priority level calculation unit 50 and the display control unit 60 process the video frame. For the first video frame (0th frame) of each video file with sound, the display control means 60 performs a process of displaying the video display device 80 unconditionally. At this time, it is also possible to extract a background video frame from the background video file stored in the background video file storage means 14 and display it on the background video frame. Although a moving image can be used as the background video, the video display load increases and deviates from the spirit of the present application. Therefore, in the embodiment, only the first frame is displayed as a still image. The state of the video display screen at this time is shown in FIG. At this time, the display control means 60 performs a process of outputting the selected video frame of each sound-added video file with an offset by a predetermined pixel address, and outputs the images of the five selected sound-added video files. Outputs are made so that the frames do not overlap each other.

次の第１フレームからは、優先度を算出し、それに従った表示を行うことになる。具体的には、まず、再生中の合成音響データブロックに対応する再生時刻Ｔを取得する。そして、再生するべき映像フレームを抽出するため、再生時刻Ｔに対応したフレーム番号ｎを算出する。音付映像フレーム内の映像フレームは１秒３０フレームで構成され、音響データは１秒４８０００サンプルで構成されているため、音響データの再生時刻Ｔとフレーム番号ｎとは一般に１対１に対応しない。例えば、サウンドデバイスで再生中の合成音響データブロックのサンプルが第ｋサンプル目であって、サンプリング周波数が４８０００の場合、ｎ＝３０・ｋ／４８０００で算出される。映像フレームの表示速度が速い場合で、直前の映像フレームを再生した時刻から１０００サンプルしか経過していないと、フレーム番号の変化量はｎ＝０となり、同一のフレーム番号の映像フレームが重複して再生されることになり、逆に映像フレームの表示速度が遅い場合で、直前の映像フレームを再生した時刻から１００００サンプル経過すると、フレーム番号の変化量はｎ＝６となり、５フレーム飛ばされて６フレーム目の映像が再生されることになる。そして、取得したフレーム番号ｎを利用して、フレーム間差分テーブルから選択された５つの音付映像ファイルについてのフレーム間差分特徴量Ｄ１〜Ｄ５を得る。フレーム間差分特徴量Ｄ１〜Ｄ５が得られたら、これらを利用して、以下の〔数式２〕を利用して表示優先度Ｐ１〜Ｐ５を算出する。 From the next first frame, the priority is calculated, and display according to the priority is performed. Specifically, first, the reproduction time T corresponding to the synthetic sound data block being reproduced is acquired. Then, in order to extract a video frame to be reproduced, a frame number n corresponding to the reproduction time T is calculated. Since the video frame in the video frame with sound is composed of 30 frames per second and the acoustic data is composed of 48000 samples per second, the reproduction time T of the acoustic data and the frame number n generally do not correspond one-to-one. . For example, when the sample of the synthesized sound data block being reproduced by the sound device is the kth sample and the sampling frequency is 48000, n = 30 · k / 48000 is calculated. If the display speed of the video frame is fast and only 1000 samples have elapsed since the time when the previous video frame was played back, the amount of change in the frame number is n = 0, and video frames with the same frame number overlap. On the contrary, when the display speed of the video frame is low and when 10000 samples have elapsed from the time when the previous video frame was played back, the amount of change in the frame number is n = 6 and 5 frames are skipped and 6 The video of the frame is reproduced. Then, using the acquired frame number n, inter-frame difference feature amounts D1 to D5 are obtained for the five sound-added video files selected from the inter-frame difference table. When the inter-frame difference feature amounts D1 to D5 are obtained, the display priorities P1 to P5 are calculated using the following [Equation 2] using these.

〔数式２〕
Ｄｍ／Ｄｉ＞２．０の場合、Ｐｉ＝６
１．８＜Ｄｍ／Ｄｉ≦２．０の場合、Ｐｉ＝５
１．６＜Ｄｍ／Ｄｉ≦１．８の場合、Ｐｉ＝４
１．４＜Ｄｍ／Ｄｉ≦１．６の場合、Ｐｉ＝３
１．２＜Ｄｍ／Ｄｉ≦１．４の場合、Ｐｉ＝２
Ｄｍ／Ｄｉ≦１．２の場合、Ｐｉ＝１ [Formula 2]
When Dm / Di> 2.0, Pi = 6
When 1.8 <Dm / Di ≦ 2.0, Pi = 5
When 1.6 <Dm / Di ≦ 1.8, Pi = 4
When 1.4 <Dm / Di ≦ 1.6, Pi = 3
When 1.2 <Dm / Di ≦ 1.4, Pi = 2
When Dm / Di ≦ 1.2, Pi = 1

なお、〔数式２〕において、Ｄｍは、Ｄ１〜Ｄ５の最大値であり、ｉは１〜５の値をとる。すなわち、〔数式２〕によれば、ＤｍをＤｉで除算した値が小さいほど、Ｐｉの値が小さくなり、表示優先度が高くなる。また、表示優先度Ｐ１〜Ｐ５の値は重複を許すため、例えば、Ｐ１〜Ｐ５の値が全て同一になる場合もある。 In [Formula 2], Dm is the maximum value of D1 to D5, and i takes a value of 1 to 5. That is, according to [Equation 2], the smaller the value obtained by dividing Dm by Di, the smaller the value of Pi and the higher the display priority. In addition, since the values of the display priorities P1 to P5 allow overlapping, for example, the values of P1 to P5 may all be the same.

あるフレーム番号について、各音付映像ファイルについての表示優先度が得られたら、フレーム表示カウンタ値Ｃを利用して、各映像フレームを表示するか否かの判断を行う。具体的には、以下の〔数式３〕により算出する。 When the display priority for each sound-added video file is obtained for a certain frame number, the frame display counter value C is used to determine whether to display each video frame. Specifically, it is calculated by the following [Equation 3].

〔数式３〕
Ｃ％Ｐｉ＝Ｐｉ−１の場合、ＯＮ
Ｃ％Ｐｉ＝Ｐｉ−１でない場合、ＯＦＦ [Formula 3]
ON when C% Pi = Pi-1
OFF if not C% Pi = Pi-1.

すなわちフレーム表示カウンタ値ＣをＰｉで割った余りを算出し、これがＰｉ−１に一致する場合、表示をＯＮにし、そうでなければ表示をＯＦＦに設定することになる。続いて、映像フレームの表示を行う。具体的には、ＯＮに設定された映像フレームのみを映像表示デバイス８０に出力する。この際、映像フレームが圧縮されている場合には、映像フレームを抽出した後に復号を行い、出力する。ＯＦＦに設定された映像フレームは既に表示されている前の映像フレームのデータが静止画として維持される。この処理が終わると、フレーム表示カウンタ値Ｃが１加算される。以上の処理を、音響出力手段７０が合成音響データブロックの処理待ちが終了し、処理制御手段４０からの再生停止指示を受信するまで、繰り返し行う。 That is, the remainder obtained by dividing the frame display counter value C by Pi is calculated, and if this coincides with Pi-1, the display is turned ON, otherwise the display is set OFF. Subsequently, a video frame is displayed. Specifically, only the video frame set to ON is output to the video display device 80. At this time, if the video frame is compressed, the video frame is extracted and then decoded and output. In the video frame set to OFF, the data of the previous video frame already displayed is maintained as a still image. When this process ends, the frame display counter value C is incremented by one. The above processing is repeated until the sound output means 70 has finished waiting for processing of the synthesized sound data block and has received a playback stop instruction from the processing control means 40.

ここで、本装置により映像フレームがどのように処理されるかについて、具体例を用いて説明する。図９は、選択された５つの音付映像ファイルが第１トラックから第５トラックとして処理される場合の第１フレームから第６フレームまでを示す。図９において、矩形内はフレーム間差分特徴量の値を示している。例えば、図中「フレーム０１差分２００」とあるのは、第０フレームと第１フレームのフレーム間差分特徴量が２００であることを示している。表示処理としては、まず、全てのトラックの先頭フレームである第０フレームは無条件に表示される。第０フレームと第１フレームのフレーム間差分特徴量Ｄｉ（ｉ＝１〜５）を利用して上記〔数式２〕により優先度Ｐｉ（ｉ＝１〜５）を算出する。図９の例では、Ｄ１＝２００、Ｄ２＝１５０、Ｄ３＝１３０、Ｄ４＝１１５、Ｄ５＝１０５であるので、Ｄｍ＝Ｄ１＝２００となる。したがって、Ｄｍ／Ｄ１＝１．０であるため、上記〔数式２〕よりＰ１＝１となり、Ｄｍ／Ｄ２≒１．３３であるため、上記〔数式２〕よりＰ２＝２となり、Ｄｍ／Ｄ３≒１．５４であるため、上記〔数式２〕よりＰ３＝３となり、Ｄｍ／Ｄ４≒１．７４であるため、上記〔数式２〕よりＰ４＝４となり、Ｄｍ／Ｄ５≒１．９０であるため、上記〔数式２〕よりＰ５＝５となる。 Here, how a video frame is processed by this apparatus will be described using a specific example. FIG. 9 shows the first to sixth frames when the five selected video files with sound are processed from the first track to the fifth track. In FIG. 9, the rectangle indicates the inter-frame difference feature value. For example, “frame 01 difference 200” in the figure indicates that the interframe difference feature quantity between the 0th frame and the first frame is 200. As a display process, first, the 0th frame which is the first frame of all tracks is displayed unconditionally. The priority Pi (i = 1 to 5) is calculated by the above [Equation 2] using the inter-frame difference feature amount Di (i = 1 to 5) of the 0th frame and the first frame. In the example of FIG. 9, since D1 = 200, D2 = 150, D3 = 130, D4 = 115, D5 = 105, Dm = D1 = 200. Therefore, since Dm / D1 = 1.0, P1 = 1 from the above [Equation 2], and Dm / D2≈1.33, so from the above [Equation 2], P2 = 2 and Dm / D3≈ Since it is 1.54, P3 = 3 from the above [Equation 2] and Dm / D4≈1.74, and from [Equation 2], P4 = 4 and Dm / D5≈1.90. From the above [Equation 2], P5 = 5.

フレーム表示カウンタ値Ｃは初期状態において「０」に設定され、以降フレーム番号が１増えるごとに１追加されるので、第１フレームの処理においては初期状態のため、Ｃ＝０である。すると、上記〔数式３〕より第１トラックだけＯＮとなり、第２トラック〜第５トラックはＯＦＦとなる。第１フレームの処理の後、フレーム表示カウンタ値Ｃが１追加されるため、第２フレームの処理においては、Ｃ＝１となる。すると、上記〔数式３〕より第１トラックと第２トラックがＯＮとなり、第３トラック〜第５トラックはＯＦＦとなる。同様にして表示するフレームと表示しないフレームが決定されるため、図９に示したフレーム間差分特徴量を持つ場合、図１０に示すような表示／非表示の決定が行われることになる。図１０において、矩形の枠が示されているフレームは表示されるフレームである。上記〔数式３〕によれば、Ｐｉ＝１の場合、常に表示され、Ｐｉ＝２の場合、カウンタ値Ｃが奇数の場合表示され、Ｐｉ＝３〜６の場合、それぞれＰｉで除算した余りが２〜５の場合に表示されることになる。これは、Ｐｉ＝１〜６に応じて、表示レートがそれぞれ、３０ｆｐｓ、１５ｆｐｓ、１０ｆｐｓ、７．５ｆｐｓ、６ｆｐｓ、５ｆｐｓとなることを示している。 Since the frame display counter value C is set to “0” in the initial state and is incremented by 1 every time the frame number is incremented by 1 thereafter, C = 0 because of the initial state in the processing of the first frame. Then, from the above [Equation 3], only the first track is turned on, and the second to fifth tracks are turned off. Since the frame display counter value C is incremented by 1 after the processing of the first frame, C = 1 in the processing of the second frame. Then, from the above [Equation 3], the first track and the second track are turned on, and the third to fifth tracks are turned off. Similarly, frames to be displayed and frames not to be displayed are determined. Therefore, when the inter-frame difference feature amount shown in FIG. 9 is provided, display / non-display as shown in FIG. 10 is determined. In FIG. 10, a frame in which a rectangular frame is shown is a frame to be displayed. According to the above [Equation 3], when Pi = 1, it is always displayed, when Pi = 2, it is displayed when the counter value C is an odd number, and when Pi = 3 to 6, the remainder obtained by dividing by Pi respectively. In the case of 2-5, it will be displayed. This indicates that the display rates are 30 fps, 15 fps, 10 fps, 7.5 fps, 6 fps, and 5 fps according to Pi = 1 to 6, respectively.

図９の例では、フレーム間差分特徴量が各トラックにおいて、変動しない場合を示したが、通常は、フレーム間差分特徴量は同一トラックにおいても変動する。フレーム間差分特徴量が変動する場合の例を図１１示す。図１１に示すようなフレーム間差分特徴量を有する場合、各フレームの表示／非表示は、図１２に示すような状態となる。 In the example of FIG. 9, the case where the inter-frame difference feature amount does not change in each track is shown, but usually, the inter-frame difference feature amount also changes in the same track. FIG. 11 shows an example when the inter-frame difference feature value fluctuates. When the interframe difference feature amount as shown in FIG. 11 is included, the display / non-display of each frame is as shown in FIG.

以上のような音響データブロック合成手段３０、優先度算出手段５０、表示制御手段６０、音響出力手段７０による処理が、処理制御手段４０の指示にしたがって、音付映像ファイルの全音響データブロックの再生が終了するまで繰り返し行われる。これにより、図８に示した映像表示画面には、音響データが音として連続して表示される間に、対応する映像データが動画として再生されることになる。上記のように、選択された音付映像ファイルのうち、優先度の低い映像フレームでは当初から所定の個数のフレームに対して表示処理を行わないようにするため、ＣＰＵの処理能力に余裕が生まれ、逆に優先度の高い、動きの速い映像については、映像フレームを欠落させずに表示することができ、ＣＰＵの処理能力が低くて、全ての映像フレームをスペック上円滑に表示できない場合でも、全体的に円滑な動きを表現することができる。 The processing by the sound data block synthesizing unit 30, the priority calculating unit 50, the display control unit 60, and the sound output unit 70 as described above is performed to reproduce all the sound data blocks of the sound-added video file in accordance with the instruction of the processing control unit 40. It is repeated until is finished. Thereby, on the video display screen shown in FIG. 8, while the acoustic data is continuously displayed as sound, the corresponding video data is reproduced as a moving image. As described above, in the selected video files with sound, video frames with low priority are not subjected to display processing for a predetermined number of frames from the beginning, so there is room for CPU processing capacity. On the contrary, high-priority, fast-moving video can be displayed without missing video frames, even if the CPU processing power is low and all video frames cannot be displayed smoothly on the specs, Overall smooth movement can be expressed.

また、本システムでは、映像フレームに音響データブロックを対応づけて記録した音付映像ファイルを対象として処理を行っているため、利用者は、コンピュータで汎用的に扱われているフォーマットの音付映像ファイルさえ用意して、本システムの音付映像ファイル記憶手段１０に記憶させておけば、本システムで利用することが可能となる。すなわち、利用者としては、素材として音付映像ファイルを準備するだけで、再生する複数の音付映像ファイルを随時入れ替えることにより、再生する映像および音楽コンテンツにバリエーションを与えることが可能になり、長時間にわたって変化する環境映像およびＢＧＭを提供することも可能になる。 In addition, in this system, since processing is performed on a sound-added video file that is recorded by associating an audio data block with a video frame, the user can add a sound-added video in a format commonly used by computers. If even a file is prepared and stored in the sound-added video file storage means 10 of this system, it can be used in this system. In other words, users can add variations to video and music content to be played by simply preparing a video file with sound as a material and replacing multiple video files with sound as needed. It is also possible to provide environmental images and BGM that change over time.

（６．音声が映像より長く記録されている場合）
上記実施形態においては、音付映像ファイルに映像フレームと音響データブロックが同一の時間記録されている場合について説明したが、音付映像ファイルには、映像フレームと音響データブロックが同一の時間記録されていない場合がある。次に、映像フレームより音響データブロックの記録時間が長い場合について説明する。図１３（ａ）は、映像フレームより音響データブロックの記録時間が長い音付映像ファイル内の、映像フレームと音響データブロックの状態を模式的に示した図である。 (6. When audio is recorded longer than video)
In the above embodiment, the case where the video frame and the audio data block are recorded in the sound video file for the same time has been described. However, the video frame and the audio data block are recorded for the same time in the sound video file. There may not be. Next, the case where the recording time of the audio data block is longer than the video frame will be described. FIG. 13A is a diagram schematically showing the state of the video frame and the audio data block in the sound-added video file in which the recording time of the audio data block is longer than that of the video frame.

図１３（ａ）に示すように、音響データブロックが、音付映像ファイル内の全体に渡って記録されているのに対し、映像フレームは、再生時刻Ｔｅまでしか記録されておらず、再生時刻Ｔｅ以降は、ブランク・フレームとなっている。このような場合、本システムは、時刻Ｔｅまでは、上記実施形態と同様に処理を行うが、時刻Ｔｅ以降は、第０フレームから処理を行うことになる。具体的には、処理制御手段４０は、音付映像ファイル記憶手段１０から音付映像ファイルを最初に読み込んだ際に、再生時刻Ｔｅまでしか映像が記録されていないことを認識し、その音付映像ファイルについて取得した再生時刻ＴがＴｅを超えたときに、再生時刻ＴをＴｅで除算した余りの値を、再生時刻として優先度算出手段５０に渡すことになる。この際、図１３（ｂ）に示すように、時刻Ｔｅまでに再生された映像フレームは、バッファメモリなどの高速記憶手段に記憶され、表示制御手段６０がここから読み出して繰り返し再生をすることになる。なお、実際の演算上は、時刻Ｔ、Ｔｅの計算はフレーム番号で行われることになる。 As shown in FIG. 13A, the audio data block is recorded over the entire video file with sound, whereas the video frame is recorded only up to the reproduction time Te. After Te, it is a blank frame. In such a case, this system performs processing in the same manner as in the above embodiment until time Te, but after time Te, processing is performed from the 0th frame. Specifically, the processing control unit 40 recognizes that the video is recorded only until the reproduction time Te when the audio video file is first read from the audio video file storage unit 10, and the audio When the reproduction time T acquired for the video file exceeds Te, the remainder value obtained by dividing the reproduction time T by Te is passed to the priority calculation means 50 as the reproduction time. At this time, as shown in FIG. 13 (b), the video frame reproduced by time Te is stored in high-speed storage means such as a buffer memory, and the display control means 60 reads out from here and repeatedly reproduces it. Become. In actual calculations, the times T and Te are calculated using the frame number.

以上、本発明の好適な実施形態について説明したが、本発明は、上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、選択された映像を表示する際、背景映像とともに表示するようにしたが、背景映像を用いないようにしても良い。また、上記実施形態では、音付映像ファイル選択用のメニュー画面を表示する際、複数の映像を合成したメニュー用映像フレームファイルを作成しておき、これを表示するようにしたが、各映像フレームをそのまま表示させるようにしても良い。 The preferred embodiment of the present invention has been described above. However, the present invention is not limited to the above embodiment, and various modifications can be made. For example, in the above embodiment, when the selected video is displayed, it is displayed together with the background video. However, the background video may not be used. Further, in the above embodiment, when displaying a menu screen for selecting a sound-added video file, a menu video frame file in which a plurality of videos are combined is created and displayed. May be displayed as it is.

また、上記実施形態では、音響データについては、圧縮を行わない方式としたが、圧縮を行ったものを音付映像ファイルに各映像フレームと対応付けて記録するようにしても良い。この場合、音響出力手段７０は、圧縮符号化方式に対応して、復号して、再生する機能が必要となる。ただし、音響データは、映像データに比べてデータ量が小さいため、大量の映像フレームを扱う本システムにおいては、あまり問題にならない。 In the above embodiment, the audio data is not compressed, but the compressed data may be recorded in a video file with sound in association with each video frame. In this case, the sound output means 70 is required to have a function of decoding and reproducing in accordance with the compression encoding method. However, since the amount of audio data is smaller than that of video data, there is not much problem in this system that handles a large amount of video frames.

また、上記実施形態では、映像フレームを表示する際、図７に示すようにそのまま並べて表示したが、各映像についてマスクする等の加工を行うようにしても良い。この場合、音付映像ファイルの映像フレームには、画素ごとに上書きをするか否かを識別するためのマスクデータを添付しておく。そして、表示制御手段３０は、表示する際、選択された各音付映像ファイルの映像フレームの一部の画素に対しては、そのマスクデータに基づいて、表示しないという制御を行うようにする。 Further, in the above embodiment, when displaying video frames, they are displayed side by side as shown in FIG. 7, but processing such as masking of each video may be performed. In this case, mask data for identifying whether to overwrite each pixel is attached to the video frame of the video file with sound. Then, when displaying, the display control means 30 performs control not to display on some pixels of the video frame of each selected video file with sound based on the mask data.

本発明で再生の対象とする音付動画ファイルの構造を示す図である。It is a figure which shows the structure of the moving image file with a sound made into reproduction object by this invention. 合成対象とする各音響信号の信号波形を示す図である。It is a figure which shows the signal waveform of each acoustic signal made into a synthetic | combination object. フレーム間差分テーブルの一例を示す図である。It is a figure which shows an example of the difference table between frames. 本発明に係る音付映像の再生装置の一実施形態を示す機能ブロック図である。It is a functional block diagram which shows one Embodiment of the reproduction | regeneration apparatus of a video with a sound concerning this invention. 本システムにおいて、再生する対象を選択するためのメニュー画面の初期状態を示す図である。In this system, it is a figure which shows the initial state of the menu screen for selecting the object to reproduce | regenerate. 利用者により再生の対象が選択された状態のメニュー画面の状態を示す図である。It is a figure which shows the state of the menu screen in the state as which the reproduction | regeneration object was selected by the user. メニュー画面の映像表示領域を構成する各レイヤーの構造を示す図である。It is a figure which shows the structure of each layer which comprises the video display area of a menu screen. 選択された映像を複数表示した状態を示す図である。It is a figure which shows the state which displayed multiple selected images | videos. 再生対象となる５つの音付映像ファイルのフレーム間差分特徴量を示す図である。It is a figure which shows the interframe difference feature-value of five video files with a sound used as reproduction | regeneration object. 図９に示したフレームの表示／非表示を示す図である。It is a figure which shows the display / non-display of the flame | frame shown in FIG. 音付映像ファイルのフレーム間差分特徴量が変化する場合を示す図である。It is a figure which shows the case where the interframe difference feature-value of a video file with sound changes. 図１１に示したフレームの表示／非表示を示す図である。It is a figure which shows the display / non-display of the flame | frame shown in FIG. 映像フレームより音響データブロックの記録時間が長い音付映像ファイル内についての処理を示す図である。It is a figure which shows the process about the inside of a sound-added video file whose recording time of an audio data block is longer than a video frame.

Explanation of symbols

１０・・・音付映像ファイル記憶手段
１１・・・フレーム間差分テーブル
１２・・・デモ再生用音響ファイル記憶手段
１３・・・メニュー用映像ファイル記憶手段
１４・・・背景映像ファイル記憶手段
２０・・・ファイル選択手段
３０・・・音響データブロック合成手段
４０・・・処理制御手段
５０・・・優先度算出手段
６０・・・表示制御手段
７０・・・音響出力手段
８０・・・映像表示デバイス
Ｃ・・・フレーム表示カウンタ値
ｃ１〜ｃ５・・・映像区画選択用カーソル
ｄ（ｘ，ｙ）・・・画素差分
Ｄｉ・・・フレーム間差分特徴量
Ｅ１〜Ｅ５・・・映像区画
Ｆ・・・サンプリング周波数
ｎ・・・フレーム番号
Ｐｉ・・・表示優先度

DESCRIPTION OF SYMBOLS 10 ... Sound-added video file storage means 11 ... Inter-frame difference table 12 ... Demo reproduction sound file storage means 13 ... Menu video file storage means 14 ... Background video file storage means 20. ..File selection means 30 ... acoustic data block synthesis means 40 ... processing control means 50 ... priority calculation means 60 ... display control means 70 ... acoustic output means 80 ... video display device C: Frame display counter value c1-c5: Video segment selection cursor d (x, y): Pixel difference Di: Inter-frame difference feature amount E1-E5: Video segment F ...・ Sampling frequency n ・・・ Frame number Pi ・・・ Display priority

Claims

A device that has a plurality of video frames and reproduces a plurality of video files with sound in which an audio data block is associated with each video frame,
An inter-frame difference table that records inter-frame difference features based on differences between video frames of a sound-added video file;
An audio data block synthesizing unit that extracts an audio data block attached to each corresponding video frame from the plurality of video files with sound, and synthesizes the audio data block into one synthesized audio data block;
A sound output means for writing the synthesized sound data block to a sound output device and reproducing the sound;
Processing control means for obtaining a reproduction time corresponding to the synthesized sound data block being reproduced in the sound output means;
Priority calculation means for calculating the priority for the video frame with reference to the inter-frame difference table for the video frame corresponding to the acquired reproduction time in each video file with sound;
It counts the display execution number of times of the execution display frame, based on the priority of the video frame of the video file with the sound corresponding to the playback time and the display execution times and acquisition, whether to display the video frame Display control means for writing the video frame determined to be displayed to the memory of the video display device and displaying it on the screen,
An apparatus for reproducing video with sound, comprising:

A device that has a plurality of video frames and reproduces a plurality of video files with sound in which an audio data block is associated with each video frame,
An inter-frame difference table that records inter-frame difference features based on differences between video frames of a sound-added video file;
An audio data block synthesizing unit that extracts an audio data block attached to each corresponding video frame from the plurality of video files with sound, and synthesizes the audio data block into one synthesized audio data block;
A sound output means for writing the synthesized sound data block to a sound output device and reproducing the sound;
Processing control means for obtaining a reproduction time corresponding to the synthesized sound data block being reproduced in the sound output means;
Priority calculation means for calculating the priority for the video frame with reference to the inter-frame difference table for the video frame corresponding to the acquired reproduction time in each video file with sound;
Based on the priority of the video frame of each video file with sound corresponding to the acquired playback time, it is determined whether to display the video frame, and the video frame determined to be displayed is stored in the memory of the video display device Display control means for writing to and displaying on the screen ,
When no video frame exists in the sound-added video file after a predetermined playback time Te and only an audio data block is recorded, the display control means is configured such that the acquired playback time T is greater than the playback time Te. the reproducing apparatus of the sound with the image, characterized in der Rukoto those that substitute a reproduction time T to T% Te.

In claim 2 ,
The video frame up to the reproduction time Te in the sound-added video file is stored in high-speed storage means that can be accessed at high speed, and the display control means reads out the video frame from the high-speed storage means. A sound video playback device.

A device that has a plurality of video frames and reproduces a plurality of video files with sound in which an audio data block is associated with each video frame,
Sound-added video file storage means for storing a plurality of the sound-added video files;
File selection means for selecting a plurality of sound-added video files to be reproduced from the sound-added video file storage means;
An inter-frame difference table that records inter-frame difference features based on differences between video frames of a sound-added video file;
An audio data block synthesizing unit that extracts an audio data block attached to each corresponding video frame from the plurality of selected video files with sound, and synthesizes the audio data block into one synthesized audio data block;
A sound output means for writing the synthesized sound data block to a sound output device and reproducing the sound;
Processing control means for obtaining a reproduction time corresponding to the synthesized sound data block being reproduced in the sound output means;
Priority calculation means for calculating the priority for the video frame with reference to the inter-frame difference table for the video frame corresponding to the acquired reproduction time in each video file with sound;
Based on the priority of the video frame of each video file with sound corresponding to the acquired playback time, it is determined whether to display the video frame, and the video frame determined to be displayed is stored in the memory of the video display device Display control means for writing and displaying on the screen,
An apparatus for reproducing video with sound, comprising:

In claim 4 ,
The file selection means displays a menu composite frame created by synthesizing video frames extracted from a predetermined sound-added video file among the sound-added video files stored in the sound-added video file storage means. A sound-equipped video reproduction apparatus, wherein a sound-added video file to be reproduced is selected by writing to a memory of a device, displaying on a screen, and selecting a video section on the screen.

In claim 5 ,
It has a function of preparing a demo playback audio file in which only audio data is extracted from a sound-added video file in advance, and playing back the demo playback audio file corresponding to the video section selected on the screen. To play video with sound.

In any one of Claims 1-6 ,
The acoustic data block synthesizing unit and the acoustic output unit continue processing until waiting for writing to the acoustic output device, and the processing control unit, when waiting for writing, the priority calculating unit, An apparatus for reproducing sound-added video, wherein each process in the display control means is executed, and the processing by the sound data block synthesizing means and the sound output means is performed again.

In any one of Claims 1-7 ,
The sound data block synthesizing unit calculates one summation sound by calculating a sum of values corresponding to the sound data blocks for the sound data blocks attached to the video frames of the plurality of sound-added video files. An apparatus for reproducing video with sound, characterized in that it creates a data block.

In any one of Claims 1-8 ,
The video frames of the audio video file stored in the audio video file storage means are recorded in a compressed and encoded state by a predetermined method, and the display control means An apparatus for reproducing sound-added video, wherein a video frame is extracted from a video file and then decoded, and the decoded video frame is written in a memory of the video display device.

In any one of Claims 1-9 ,
The sound data block recorded in the sound-added video file stored in the sound-added video file storage means is recorded in a state compressed and encoded by a predetermined method, and the sound data block combining means The sound-added video is characterized in that one audio data block attached to the video frames of the selected plurality of sound-added video files is extracted and then decoded to create one composite audio data block. Playback device.

A program for causing a computer to function as the sound-added video reproduction apparatus according to any one of claims 1 to 10 .