JP7651835B2

JP7651835B2 - Information processing method, information processing system, and program

Info

Publication number: JP7651835B2
Application number: JP2020174321A
Authority: JP
Inventors: 直之安立; 克己石川; 大智井芹; 祐二小池; 謙一良齋藤; 康之介加藤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2025-03-27
Anticipated expiration: 2040-10-16
Also published as: JP2022065694A

Description

本開示は、動画と音とを含むコンテンツを生成するための技術に関する。 This disclosure relates to technology for generating content that includes video and audio.

動画と音とを含むコンテンツを作成するための各種の技術が従来から提案されている。例えば特許文献１には、楽曲の曲調が変化する時点において動画が切替わるようにスライドショー動画を生成する技術が開示されている。 Various techniques have been proposed for creating content that includes video and sound. For example, Patent Document 1 discloses a technique for generating a slideshow video in which the video switches when the melody of the music changes.

特開２００７－１８８５６１号公報JP 2007-188561 A

特許文献１の技術においては、楽曲の曲調が変化する時点において動画が強制的に切替わるため、作成者が意図した動画を含むコンテンツを作成することは実際には困難である。以上の事情を考慮して、本開示のひとつの態様は、動画データが表す動画に対する影響を抑制しながら、当該動画と音との間に統一感があるコンテンツを生成することを目的とする。 In the technology of Patent Document 1, the video is forcibly switched when the melody of the music changes, making it difficult to actually create content that includes the video intended by the creator. In consideration of the above circumstances, one aspect of the present disclosure aims to generate content that has a sense of unity between the video and the sound while suppressing the impact on the video represented by the video data.

以上の課題を解決するために、本開示のひとつの態様に係る情報処理方法は、複数の動画区間を含む動画データと複数の音区間を含む音データとを処理する情報処理方法であって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点において、前記複数の音区間のうちの第１音区間から当該第１音区間以外の第２音区間に切替わるように、前記音データを処理する。本開示の他の態様に係る情報処理方法は、複数の動画区間を含む動画データと音を表す音データとを処理する情報処理方法であって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点を含む遷移期間内において音量が減少するように、前記音データを処理する。 In order to solve the above problems, an information processing method according to one aspect of the present disclosure is an information processing method for processing video data including a plurality of video segments and sound data including a plurality of sound segments, and processes the sound data so as to switch from a first sound segment among the plurality of sound segments to a second sound segment other than the first sound segment at a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment. An information processing method according to another aspect of the present disclosure is an information processing method for processing video data including a plurality of video segments and sound data representing sound, and processes the sound data so as to reduce the volume within a transition period including a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment.

本開示のひとつの態様に係る情報処理システムは、複数の動画区間を含む動画データと複数の音区間を含む音データとを処理する情報処理システムであって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点において、前記複数の音区間のうちの第１音区間から当該第１音区間以外の第２音区間に切替わるように、前記音データを処理する音データ処理部を具備する。本開示の他の態様に係る情報処理システムは、複数の動画区間を含む動画データと音を表す音データとを処理する情報処理システムであって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点を含む遷移期間内において音量が減少するように、前記音データを処理する音データ処理部を具備する。 An information processing system according to one aspect of the present disclosure is an information processing system that processes video data including a plurality of video segments and sound data including a plurality of sound segments, and includes a sound data processing unit that processes the sound data so as to switch from a first sound segment among the plurality of sound segments to a second sound segment other than the first sound segment at a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment. An information processing system according to another aspect of the present disclosure is an information processing system that processes video data including a plurality of video segments and sound data representing sound, and includes a sound data processing unit that processes the sound data so as to reduce the volume within a transition period that includes a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment.

本開示のひとつの態様に係るプログラムは、複数の動画区間を含む動画データと複数の音区間を含む音データとを処理するためのプログラムであって、コンピュータを、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点において、前記複数の音区間のうちの第１音区間から当該第１音区間以外の第２音区間に切替わるように、前記音データを処理する音データ処理部として機能させる。本開示の他の態様に係るプログラムは、複数の動画区間を含む動画データと音を表す音データとを処理するためのプログラムであって、コンピュータを、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点を含む遷移期間内において音量が減少するように、前記音データを処理する音データ処理部として機能させる。 A program according to one aspect of the present disclosure is a program for processing video data including multiple video segments and sound data including multiple sound segments, and causes a computer to function as a sound data processing unit that processes the sound data so as to switch from a first sound segment among the multiple sound segments to a second sound segment other than the first sound segment at a boundary point between a first video segment among the multiple video segments and a second video segment subsequent to the first video segment. A program according to another aspect of the present disclosure is a program for processing video data including multiple video segments and sound data representing sound, and causes a computer to function as a sound data processing unit that processes the sound data so as to reduce the volume within a transition period that includes a boundary point between a first video segment among the multiple video segments and a second video segment subsequent to the first video segment.

第１実施形態に係る情報システムの構成を例示するブロック図である。1 is a block diagram illustrating a configuration of an information system according to a first embodiment. 編集システムの構成を例示するブロック図である。FIG. 1 is a block diagram illustrating a configuration of an editing system. 編集システムの機能的な構成を例示するブロック図である。FIG. 2 is a block diagram illustrating an example of a functional configuration of the editing system. 編集システムの制御装置が実行する動作のフローチャートである。4 is a flowchart of an operation executed by a control device of the editing system. 第１実施形態における編集処理の説明図である。FIG. 4 is an explanatory diagram of an editing process in the first embodiment. 第１実施形態における編集処理のフローチャートである。4 is a flowchart of an editing process in the first embodiment. 第２実施形態における編集処理の説明図である。FIG. 11 is an explanatory diagram of an editing process in the second embodiment. 第２実施形態における編集処理のフローチャートである。13 is a flowchart of an editing process according to a second embodiment. 第３実施形態における編集処理の説明図である。FIG. 13 is an explanatory diagram of an editing process in the third embodiment. 第３実施形態における編集処理のフローチャートである。13 is a flowchart of an editing process according to a third embodiment. 第４実施形態における編集処理の説明図である。FIG. 13 is an explanatory diagram of an editing process in the fourth embodiment. 第５実施形態における端末装置の構成を例示するブロック図である。FIG. 13 is a block diagram illustrating a configuration of a terminal device in a fifth embodiment. 第５実施形態における端末装置の機能的な構成を例示するブロック図である。FIG. 13 is a block diagram illustrating a functional configuration of a terminal device according to a fifth embodiment.

Ａ：第１実施形態
図１は、第１実施形態における情報システム１００の構成を例示するブロック図である。第１実施形態の情報システム１００は、端末装置１０と編集システム２０とを具備する。端末装置１０と編集システム２０とは、例えばインターネット等の通信網３０を介して相互に通信する。 A: First embodiment Fig. 1 is a block diagram illustrating a configuration of an information system 100 according to a first embodiment. The information system 100 according to the first embodiment includes a terminal device 10 and an editing system 20. The terminal device 10 and the editing system 20 communicate with each other via a communication network 30 such as the Internet.

端末装置１０は、例えば携帯電話機、スマートフォン、タブレット端末またはパーソナルコンピュータ等の情報端末である。端末装置１０は、素材データＤを編集システム２０に送信する。素材データＤは、動画データＸ1と音データＹ1とを含む。動画データＸ1は、動画を表すデータである。例えば、端末装置１０に搭載された撮像装置により動画データＸ1が生成される。音データＹ1は、動画データＸ1の動画に対して並行に再生されるべき音を表すデータである。具体的には、第１実施形態の音データＹ1は、動画データＸ1の動画の背景音楽として再生される楽曲の演奏音（楽器音または歌唱音）を表すデータである。 The terminal device 10 is an information terminal such as a mobile phone, a smartphone, a tablet terminal, or a personal computer. The terminal device 10 transmits material data D to the editing system 20. The material data D includes video data X1 and sound data Y1. The video data X1 is data representing a video. For example, the video data X1 is generated by an imaging device mounted on the terminal device 10. The sound data Y1 is data representing a sound to be played in parallel to the video of the video data X1. Specifically, the sound data Y1 in the first embodiment is data representing the performance sound (instrument sound or singing sound) of a piece of music played as background music for the video of the video data X1.

編集システム２０は、端末装置１０から受信した素材データＤを利用してコンテンツＣを生成するコンピュータシステムである。コンテンツＣは、動画データＸ2と音データＹ2とを含む映像コンテンツである。動画データＸ2は、動画データＸ1の編集により生成される。音データＹ2は、音データＹ1の編集により生成される。すなわち、素材データＤは、コンテンツＣの素材となるデータである。編集システム２０は、コンテンツＣを端末装置１０に送信する。端末装置１０は、編集システム２０から受信したコンテンツＣを再生する。すなわち、動画データＸ2が表す動画と音データＹ2が表す音（具体的には楽曲の演奏音）とが並行に再生される。 The editing system 20 is a computer system that generates content C using material data D received from the terminal device 10. Content C is video content including video data X2 and sound data Y2. Video data X2 is generated by editing video data X1. Sound data Y2 is generated by editing sound data Y1. In other words, material data D is data that becomes the material of content C. The editing system 20 transmits content C to the terminal device 10. The terminal device 10 plays content C received from the editing system 20. In other words, the video represented by video data X2 and the sound represented by sound data Y2 (specifically, the sound of a musical piece being played) are played in parallel.

図２は、編集システム２０の構成を例示するブロック図である。編集システム２０は、制御装置２１と記憶装置２２と通信装置２３とを具備する。なお、編集システム２０は、単体の装置で実現されるほか、相互に別体で構成された複数の装置でも実現される。 Figure 2 is a block diagram illustrating the configuration of an editing system 20. The editing system 20 includes a control device 21, a storage device 22, and a communication device 23. Note that the editing system 20 may be realized by a single device, or may be realized by multiple devices configured separately from each other.

制御装置２１は、編集システム２０の各要素を制御する単数または複数のプロセッサである。具体的には、例えばＣＰＵ（Central Processing Unit）、ＳＰＵ（Sound Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＦＰＧＡ（Field Programmable Gate Array）、またはＡＳＩＣ（Application Specific Integrated Circuit）等の１種類以上のプロセッサにより、制御装置２１が構成される。 The control device 21 is a single or multiple processors that control each element of the editing system 20. Specifically, the control device 21 is configured with one or more types of processors, such as a CPU (Central Processing Unit), an SPU (Sound Processing Unit), a DSP (Digital Signal Processor), an FPGA (Field Programmable Gate Array), or an ASIC (Application Specific Integrated Circuit).

記憶装置２２は、制御装置２１が実行するプログラムと制御装置２１が使用する各種のデータとを記憶する単数または複数のメモリである。記憶装置２２は、例えば磁気記録媒体もしくは半導体記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せで構成される。また、編集システム２０に対して着脱される可搬型の記録媒体、または制御装置２１が通信網３０を介して書込または読出を実行可能な記録媒体（例えばクラウドストレージ）を、記憶装置２２として利用してもよい。 The storage device 22 is a single or multiple memories that store the programs executed by the control device 21 and various data used by the control device 21. The storage device 22 is composed of a known recording medium, such as a magnetic recording medium or a semiconductor recording medium, or a combination of multiple types of recording media. In addition, a portable recording medium that is detachable from the editing system 20, or a recording medium (e.g., cloud storage) to which the control device 21 can write or read via the communication network 30 may be used as the storage device 22.

通信装置２３は、端末装置１０との間で通信網３０を介して通信する。具体的には、通信装置２３は、端末装置１０から送信された素材データＤを受信する。また、通信装置２３は、素材データＤから生成したコンテンツＣを端末装置１０に送信する。 The communication device 23 communicates with the terminal device 10 via the communication network 30. Specifically, the communication device 23 receives material data D transmitted from the terminal device 10. The communication device 23 also transmits content C generated from the material data D to the terminal device 10.

図３は、編集システム２０の機能的な構成を例示するブロック図である。編集システム２０の制御装置２１は、記憶装置２２に記憶されたプログラムを実行することで、素材データＤからコンテンツＣを生成および提供するための複数の機能（素材データ取得部５１，動画データ処理部５２，音データ処理部５３およびコンテンツ提供部５４）を実現する。素材データ取得部５１は、端末装置１０から送信された素材データＤを通信装置２３により取得する。 Figure 3 is a block diagram illustrating an example of the functional configuration of the editing system 20. The control device 21 of the editing system 20 executes a program stored in the storage device 22 to realize multiple functions (a material data acquisition unit 51, a video data processing unit 52, an audio data processing unit 53, and a content providing unit 54) for generating and providing content C from material data D. The material data acquisition unit 51 acquires material data D transmitted from the terminal device 10 via the communication device 23.

動画データ処理部５２は、素材データＤの動画データＸ1から動画データＸ2を生成する。具体的には、動画データ処理部５２は、動画データＸ1に対して画像処理を実行することで動画データＸ2を生成する。画像処理は、例えば動画のうち特定の区間の抽出または画質の調整等の各種の処理を含む。なお、動画データＸ1および動画データＸ2の形式は任意である。 The video data processing unit 52 generates video data X2 from video data X1 of the material data D. Specifically, the video data processing unit 52 generates video data X2 by executing image processing on the video data X1. The image processing includes various processes such as extracting a specific section of a video or adjusting image quality. Note that the video data X1 and video data X2 may be in any format.

音データ処理部５３は、素材データＤの音データＹ1から音データＹ2を生成する。第１実施形態における音データＹ1および音データＹ2は、例えば音の波形を表すサンプルの時系列で構成される。動画データ処理部５２が生成した動画データＸ2と音データ処理部５３が生成した音データＹ2とによりコンテンツＣが構成される。すなわち、動画データ処理部５２および音データ処理部５３は、素材データＤからコンテンツＣを生成する要素として機能する。コンテンツ提供部５４は、動画データＸ2と音データＹ2とを含むコンテンツＣを、通信装置２３から端末装置１０に送信する。 The sound data processing unit 53 generates sound data Y2 from sound data Y1 of the material data D. In the first embodiment, the sound data Y1 and sound data Y2 are composed of a time series of samples representing sound waveforms, for example. Content C is composed of the video data X2 generated by the video data processing unit 52 and the sound data Y2 generated by the sound data processing unit 53. In other words, the video data processing unit 52 and the sound data processing unit 53 function as elements that generate content C from material data D. The content providing unit 54 transmits content C including the video data X2 and sound data Y2 from the communication device 23 to the terminal device 10.

図４は、制御装置２１が実行する動作の具体的な手順を例示するフローチャートである。端末装置１０の利用者からの指示を契機として図４の処理が開始される。処理が開始されると、制御装置２１（素材データ取得部５１）は、端末装置１０から送信された素材データＤを通信装置２３により受信する（Ｓa）。動画データ処理部５２は、素材データＤの動画データＸ1から動画データＸ2を生成する（Ｓb）。音データ処理部５３は、素材データＤの音データＹ1から音データＹ2を生成する（Ｓc：編集処理）。コンテンツ提供部５４は、動画データＸ2と音データＹ2とを含むコンテンツＣを通信装置２３から端末装置１０に送信する（Ｓd）。 Figure 4 is a flow chart illustrating the specific steps of the operation executed by the control device 21. The process in Figure 4 is started in response to an instruction from the user of the terminal device 10. When the process is started, the control device 21 (material data acquisition unit 51) receives material data D transmitted from the terminal device 10 via the communication device 23 (Sa). The video data processing unit 52 generates video data X2 from video data X1 of the material data D (Sb). The sound data processing unit 53 generates sound data Y2 from sound data Y1 of the material data D (Sc: editing process). The content providing unit 54 transmits content C including the video data X2 and sound data Y2 from the communication device 23 to the terminal device 10 (Sd).

図５は、編集処理Ｓcの説明図である。動画データＸ2は、時間軸上に配列された複数（Ｍ個）の動画区間Ｖ1～ＶMを含む（Ｍは２以上の自然数）。各動画区間Ｖm（ｍ＝１～Ｍ）は、動画データＸ2が表す動画をシーン毎に時間軸上で区分した期間である。Ｍ個の動画区間Ｖ1～ＶMは、相互に間隔をあけずに時間軸上で連続し、ひとつの主題に関連する一連のストーリーを構成する。例えば、ひとつの企業を紹介する動画、または当該企業が取扱う製品を紹介する動画等、観念的に統一感がある一連のストーリーが、Ｍ個の動画区間Ｖ1～ＶMにより構成される。Ｍ個の動画区間Ｖ1～ＶMのうち任意の１個の動画区間Ｖmは、例えば、動画のうち意味的な纏まりがある１個の場面に対応する区間、または、撮像装置による１回の撮像動作で撮像された区間である。ただし、１個の動画区間Ｖmが複数の場面を含む場合、または、１個の動画区間Ｖmが複数回の撮像動作で撮像された期間を含む場合も想定される。なお、Ｍ個の動画区間Ｖ1～ＶMの時間的または観念的な連続性は必須ではない。例えば、各動画区間Ｖmが間隔をあけて前後する形態、または２以上の動画区間Ｖmに関する主題が相違する形態も想定される。また、１個の動画区間Ｖmにおける意味的な纏まりも必須ではない。 Figure 5 is an explanatory diagram of the editing process Sc. The video data X2 includes multiple (M) video sections V1 to VM arranged on a time axis (M is a natural number equal to or greater than 2). Each video section Vm (m = 1 to M) is a period in which the video represented by the video data X2 is divided on the time axis by scene. The M video sections V1 to VM are continuous on the time axis without any gaps between them, and form a series of stories related to one theme. For example, a series of stories with a sense of conceptual unity, such as a video introducing a company or a video introducing products handled by the company, are composed of the M video sections V1 to VM. Any one of the M video sections V1 to VM is, for example, a section corresponding to a scene that has a semantic unity in the video, or a section captured in one imaging operation by an imaging device. However, it is also possible that one video section Vm includes multiple scenes, or that one video section Vm includes a period captured in multiple imaging operations. Note that the M video sections V1 to VM do not necessarily need to be temporally or conceptually continuous. For example, it is possible for each video section Vm to occur before and after another video section Vm with an interval between them, or for two or more video sections Vm to have different themes. Also, it is not essential that each video section Vm has a semantic unity.

各動画区間Ｖmは可変長である。各動画区間Ｖmの時間長は、基本的には相違する。ただし、任意の２以上の動画区間Ｖmにわたり時間長が共通してもよい。動画データＸ2においては、時間軸上で相前後する動画区間Ｖmと動画区間Ｖm+1との境界の時点（以下「境界点」という）Ｐmが指定される。境界点Ｐmは、動画区間Ｖmの端点（具体的には終点）とも換言される。 Each video section Vm has a variable length. The duration of each video section Vm is basically different. However, the duration may be the same across any two or more video sections Vm. In video data X2, a time point Pm (hereinafter referred to as the "boundary point") at the boundary between video sections Vm and Vm+1, which are adjacent to each other on the time axis, is specified. The boundary point Pm can also be said to be the end point (specifically, the endpoint) of the video section Vm.

音データＹ1は、時間軸上に配列された複数（Ｍ個）の音区間Ａ1～ＡMを含む。動画データＸ2の各動画区間Ｖmと音データＹ1の各音区間Ａmとは相互に対応する。具体的には、動画データＸ1の各動画区間Ｖmと音データＹ1の各音区間Ａmとが相互に対応付けられた素材データＤを、素材データ取得部５１は端末装置１０から取得する。例えば、端末装置１０の利用者からの指示に応じて各動画区間Ｖmに対応する音区間Ａmが選択される。なお、音データＹ1は、Ｍ個の音区間Ａ1～ＡMにわたり連続するひとつのデータ、または、相異なる音区間Ａmに対応する複数のデータの集合である。 The sound data Y1 includes multiple (M) sound intervals A1 to AM arranged on the time axis. Each video interval Vm of the video data X2 corresponds to each sound interval Am of the sound data Y1. Specifically, the material data acquisition unit 51 acquires material data D from the terminal device 10, in which each video interval Vm of the video data X1 corresponds to each sound interval Am of the sound data Y1. For example, the sound interval Am corresponding to each video interval Vm is selected in response to an instruction from the user of the terminal device 10. Note that the sound data Y1 is a single piece of data that is continuous across M sound intervals A1 to AM, or a collection of multiple data that correspond to different sound intervals Am.

各音区間Ａmは、音データＹ1が表す楽曲を音楽的な意味に応じて時間軸上で区分した構造区間である。例えば、各音区間Ａmは、イントロ（intro）、Ａメロ（verse）、Ｂメロ（bridge）、サビ（chorus）およびアウトロ（outro）等の期間に該当する。各音区間Ａmの端点（始点または終点）を指定するデータが、音データＹ1には設定される。各音区間Ａmの端点を指定するデータは、各構造区間の始点を指示するリハーサルマークに相当する。なお、各音区間Ａmの時間長は、当該音区間Ａmに対応する動画区間Ｖmの時間長を上回る。 Each sound interval Am is a structural interval that divides the song represented by the sound data Y1 on the time axis according to its musical meaning. For example, each sound interval Am corresponds to a period such as the intro, verse, bridge, chorus, and outro. Data that specifies the endpoints (start or end points) of each sound interval Am is set in the sound data Y1. The data that specifies the endpoints of each sound interval Am corresponds to a rehearsal mark that indicates the start point of each structural interval. The duration of each sound interval Am exceeds the duration of the video interval Vm that corresponds to that sound interval Am.

音データ処理部５３は、動画データＸ2のＭ個の動画区間Ｖ1～ＶMの各々において当該動画区間Ｖmに対応する音区間Ａmが開始されるように、音データＹ1を加工することで音データＹ2を生成する。すなわち、音データ処理部５３は、Ｍ個の動画区間Ｖ1～ＶMのうち動画区間Ｖmと直後の動画区間Ｖm+1との境界点Ｐmにおいて、Ｍ個の音区間Ａ1～ＡMのうち音区間Ａmから直後の音区間Ａm+1に切替わるように、音データＹ1から音データＹ2を生成する。例えば、動画区間Ｖ1と動画区間Ｖ2との境界点Ｐ1において音区間Ａ1から音区間Ａ2への切替が発生し、動画区間Ｖ2と動画区間Ｖ3との境界点Ｐ2において音区間Ａ2から音区間Ａ3への切替が発生する。具体的には、第１実施形態の音データ処理部５３は、音区間Ａmのうち境界点Ｐmに一致する途中の時点から直後の音区間Ａm+1に切替わるように、音データＹ1を処理することで音データＹ2を生成する。 The sound data processing unit 53 processes the sound data Y1 to generate the sound data Y2 so that the sound section Am corresponding to the video section Vm starts in each of the M video sections V1 to VM of the video data X2. That is, the sound data processing unit 53 generates the sound data Y2 from the sound data Y1 so that at the boundary point Pm between the video section Vm and the immediately succeeding video section Vm+1 among the M video sections V1 to VM, the sound data Y2 is generated from the sound data Y1 so that the sound section Am of the M sound sections A1 to AM is switched to the immediately succeeding sound section Am+1. For example, at the boundary point P1 between the video sections V1 and V2, the sound section A1 is switched to the sound section A2, and at the boundary point P2 between the video sections V2 and V3, the sound section A2 is switched to the sound section A3. Specifically, the sound data processing unit 53 of the first embodiment generates the sound data Y2 by processing the sound data Y1 so that the sound section Am is switched from the midpoint that coincides with the boundary point Pm to the immediately succeeding sound section Am+1.

具体的には、第１実施形態の音データ処理部５３は、音データＹ1の音区間Ａmのうち始点を含む一部の期間（以下「特定区間」という）Ｂmを抽出し、相異なる音区間Ａmに対応するＭ個の特定区間Ｂ1～ＢMを時系列に順番で相互に連結することで音データＹ2を生成する。特定区間Ｂmは、音区間Ａmのうち当該音区間Ａmの始点から動画区間Ｖmの時間長にわたる区間である。音区間Ａmのうち終点を含む一部の区間（特定区間Ｂm以外の区間）は除去される。以上の説明から理解される通り、各音区間Ａmの時間長が動画区間Ｖmの時間長に一致するように、音区間Ａmのうち末尾側の区間が除去される。 Specifically, the sound data processing unit 53 of the first embodiment extracts a portion of a period (hereinafter referred to as a "specific section") Bm including the start point of a sound section Am of sound data Y1, and generates sound data Y2 by linking M specific sections B1 to BM corresponding to different sound sections Am in chronological order. The specific section Bm is a section of the sound section Am that extends from the start point of the sound section Am to the duration of the video section Vm. A portion of the sound section Am including the end point (sections other than the specific section Bm) is removed. As can be understood from the above explanation, the end section of the sound section Am is removed so that the duration of each sound section Am matches the duration of the video section Vm.

図６は、編集処理Ｓcの具体的な手順を例示するフローチャートである。動画データ処理部５２による動画データＸ2の生成を契機として編集処理Ｓcが開始される。 Figure 6 is a flowchart illustrating the specific steps of the editing process Sc. The editing process Sc is started when the video data processing unit 52 generates video data X2.

編集処理Ｓcが開始されると、音データ処理部５３は、動画データＸ2のＭ個の動画区間Ｖ1～ＶMから１個の動画区間Ｖmを選択する（Ｓc11）。各動画区間Ｖmが時系列の順番で順次に選択される。音データ処理部５３は、音データＹ1のＭ個の音区間Ａ1～ＡMのうち動画区間Ｖmに対応する１個の音区間Ａmを選択する（Ｓc12）。音データ処理部５３は、選択中の音区間Ａmのうち当該音区間Ａmの始点から動画区間Ｖmの時間長にわたる特定区間Ｂmを抽出する（Ｓc13：抽出処理）。音データ処理部５３は、抽出処理Ｓc13で抽出した特定区間Ｂmを、直前の抽出処理Ｓc13で抽出した特定区間Ｂm-1の末尾に連結する（Ｓc14：連結処理）。なお、最初の音区間Ａ1から抽出された特定区間Ｂ1は、音データＹ2の先頭に配置される。 When the editing process Sc is started, the sound data processing unit 53 selects one video section Vm from the M video sections V1 to VM in the video data X2 (Sc11). Each video section Vm is selected sequentially in chronological order. The sound data processing unit 53 selects one sound section Am corresponding to the video section Vm from the M sound sections A1 to AM in the sound data Y1 (Sc12). The sound data processing unit 53 extracts a specific section Bm from the start point of the selected sound section Am and spanning the duration of the video section Vm (Sc13: extraction process). The sound data processing unit 53 concatenates the specific section Bm extracted in the extraction process Sc13 to the end of the specific section Bm-1 extracted in the previous extraction process Sc13 (Sc14: concatenation process). Note that the specific section B1 extracted from the first sound section A1 is placed at the beginning of the sound data Y2.

音データ処理部５３は、Ｍ個の音区間Ａ1～ＡMの全部について以上の処理（Ｓc11－Ｓc14）を実行したか否かを判定する（Ｓc15）。未処理の音区間Ａmが残存する場合（Ｓc15：NO）、音データ処理部５３は、動画データＸ2のＭ個の動画区間Ｖ1～ＶMのうち現時点で選択している動画区間Ｖmの直後の動画区間Ｖm+1を処理対象の動画区間Ｖmとして選択し（Ｓc11）、更新後の動画区間Ｖmについて音区間Ａmの選択（Ｓc12）と抽出処理Ｓc13と連結処理Ｓc14とを実行する。他方、Ｍ個の音区間Ａ1～ＡMの全部を処理した場合（Ｓc15：YES）、音データ処理部５３は編集処理Ｓcを終了する。以上の説明から理解される通り、編集処理Ｓcにおいて動画データＸ2は編集されない。 The sound data processing unit 53 determines whether the above processes (Sc11-Sc14) have been performed for all of the M sound sections A1-AM (Sc15). If an unprocessed sound section Am remains (Sc15: NO), the sound data processing unit 53 selects the moving image section Vm+1 immediately following the currently selected moving image section Vm among the M moving image sections V1-VM of the moving image data X2 as the moving image section Vm to be processed (Sc11), and performs selection of the sound section Am (Sc12), extraction process Sc13, and linking process Sc14 for the updated moving image section Vm. On the other hand, if all of the M sound sections A1-AM have been processed (Sc15: YES), the sound data processing unit 53 ends the editing process Sc. As can be understood from the above explanation, the moving image data X2 is not edited in the editing process Sc.

以上の例示の通り、動画区間Ｖmと後続の動画区間Ｖm+1との境界点Ｐmにおいて音区間Ａmから音区間Ａm+1に切替わるように、音データ処理部５３は音データＹ1を処理する。したがって、各動画区間Ｖmと各音区間Ａmとが並行し、かつ、動画区間Ｖmの始点において音区間Ａmが開始するコンテンツＣが生成される。すなわち、端末装置１０により再生される動画が動画区間Ｖmから動画区間Ｖm+1に遷移する境界点Ｐmにおいて、端末装置１０が再生する音は、音区間Ａmの途中の時点（特定区間Ｂmの終点）から音区間Ａm+1に切替わる。他方、動画データＸ2が指定する各動画区間Ｖmの時間長は変更されない。以上の説明から理解される通り、第１実施形態によれば、動画データＸ2が表す動画に対する影響を抑制しながら、当該動画の変化と音の変化との間に統一感があるコンテンツＣを生成できる。 As shown in the above example, the sound data processing unit 53 processes the sound data Y1 so that the sound interval Am switches to the sound interval Am+1 at the boundary point Pm between the video interval Vm and the subsequent video interval Vm+1. Therefore, content C is generated in which each video interval Vm and each sound interval Am are parallel to each other and the sound interval Am starts at the start point of the video interval Vm. That is, at the boundary point Pm where the video played by the terminal device 10 transitions from the video interval Vm to the video interval Vm+1, the sound played by the terminal device 10 switches to the sound interval Am+1 from the middle of the sound interval Am (the end point of the specific interval Bm). On the other hand, the duration of each video interval Vm specified by the video data X2 is not changed. As can be understood from the above explanation, according to the first embodiment, it is possible to generate content C that has a sense of unity between the change in the video and the change in the sound while suppressing the influence on the video represented by the video data X2.

また、第１実施形態においては、各動画区間Ｖmの境界点Ｐmにおいて音区間Ａm+1を開始させる簡便な処理により、動画区間Ｖmと動画区間Ｖm+1との境界点Ｐmにおいて音区間Ａmから音区間Ａm+1への切替を発生させることが可能である。なお、動画区間Ｖmは「第１動画区間」の一例であり、動画区間Ｖm+1は「第２動画区間」の一例である。また、音区間Ａmは「第１音区間」の一例であり、直後の音区間Ａm+1は「第２音区間」の一例である。 In addition, in the first embodiment, by a simple process of starting a sound section Am+1 at the boundary point Pm of each video section Vm, it is possible to cause switching from a sound section Am to an audio section Am+1 at the boundary point Pm between the video sections Vm and Vm+1. Note that the video section Vm is an example of a "first video section", and the video section Vm+1 is an example of a "second video section". Also, the sound section Am is an example of a "first sound section", and the immediately following sound section Am+1 is an example of a "second sound section".

Ｂ：第２実施形態
第２実施形態について説明する。なお、以下に例示する各形態において機能が第１実施形態と同様である要素については、第１実施形態の説明で使用した符号を流用して各々の詳細な説明を適宜に省略する。 B: Second embodiment A second embodiment will be described. Note that, in each of the following exemplary embodiments, for elements whose functions are similar to those of the first embodiment, the reference numerals used in the description of the first embodiment will be used and detailed descriptions of each will be omitted as appropriate.

図７は、第２実施形態における編集処理Ｓcの説明図である。第１実施形態においては、各音区間Ａmのうち終点を含む一部を削除することで、各音区間Ａmの端点と各動画区間Ｖmの端点とを時間軸上で一致させた。第２実施形態の音データ処理部５３は、各音区間Ａmを時間軸上において短縮または伸長することで、各音区間Ａmの端点と各動画区間Ｖmの端点とを時間軸上で一致させる。具体的には、音データ処理部５３は、各音区間Ａmの進行速度（例えばテンポ）を調整することで、当該音区間Ａmを動画区間Ｖmの時間長に短縮または伸長する。 Figure 7 is an explanatory diagram of the editing process Sc in the second embodiment. In the first embodiment, a portion of each sound interval Am, including the end point, was deleted to make the endpoints of each sound interval Am coincide with the endpoints of each video interval Vm on the time axis. The sound data processing unit 53 in the second embodiment shortens or extends each sound interval Am on the time axis to make the endpoints of each sound interval Am coincide with the endpoints of each video interval Vm on the time axis. Specifically, the sound data processing unit 53 shortens or extends each sound interval Am to the time length of the video interval Vm by adjusting the progression speed (e.g., tempo) of the sound interval Am.

図８は、第２実施形態における編集処理Ｓcの具体的な手順を例示するフローチャートである。編集処理Ｓcが開始されると、音データ処理部５３は、第１実施形態と同様に、動画区間Ｖmの選択（Ｓc21）と音区間Ａmの選択（Ｓc22）とを実行する。 Figure 8 is a flowchart illustrating the specific steps of the editing process Sc in the second embodiment. When the editing process Sc is started, the sound data processing unit 53 selects the video section Vm (Sc21) and the sound section Am (Sc22) in the same manner as in the first embodiment.

第２実施形態の音データ処理部５３は、第１実施形態の抽出処理Ｓc13に代えて伸縮処理Ｓc23を実行する。伸縮処理Ｓc23は、音区間Ａmの時間長が動画区間Ｖmの時間長に一致するように音区間Ａmを短縮または伸長する処理である。伸縮処理Ｓc23には公知の伸縮技術が任意に採用される。音データ処理部５３は、伸縮処理Ｓc23による伸縮後の音区間Ａmを、直前の伸縮処理Ｓc23による伸縮後の音区間Ａm-1の末尾に連結する（Ｓc24）。なお、最初の音区間Ａmは音データＹ2の先頭に配置される。Ｍ個の音区間Ａ1～ＡMの全部について以上の処理が反復される（Ｓc25）。第１実施形態と同様に、第２実施形態の編集処理Ｓcにおいても動画データＸ2は編集されない。 The sound data processing unit 53 of the second embodiment executes a stretching process Sc23 instead of the extraction process Sc13 of the first embodiment. The stretching process Sc23 is a process for shortening or extending the sound section Am so that the time length of the sound section Am matches the time length of the video section Vm. Any known stretching technique can be used for the stretching process Sc23. The sound data processing unit 53 connects the sound section Am stretched or shortened by the stretching process Sc23 to the end of the sound section Am-1 stretched or shortened by the previous stretching process Sc23 (Sc24). The first sound section Am is placed at the beginning of the sound data Y2. The above process is repeated for all M sound sections A1 to AM (Sc25). As in the first embodiment, the video data X2 is not edited in the editing process Sc of the second embodiment.

以上に例示した編集処理Ｓcにより、各動画区間Ｖmと各音区間Ａmとが並行し、かつ、動画区間Ｖmの始点において音区間Ａmが開始するコンテンツＣが生成される。したがって、第１実施形態と同様に、動画データＸ2が表す動画に対する影響を抑制しながら、当該動画の変化と音の変化との間に統一感があるコンテンツＣを生成できる。また、第２実施形態においては、各音区間Ａmが伸縮されるから、各音区間Ａmが途中の時点で不連続に途切れる可能性が低減される。また、音区間Ａmの時間長が動画区間Ｖmの時間長を下回る場合に、音区間Ａmを伸長することで、音区間Ａmが動画区間Ｖmに対して不足する可能性が低減される。 The editing process Sc illustrated above generates content C in which each video section Vm and each sound section Am run in parallel, and in which the sound section Am starts at the start point of the video section Vm. Therefore, as in the first embodiment, content C can be generated that has a sense of unity between changes in the video and changes in the sound, while suppressing the impact on the video represented by the video data X2. Furthermore, in the second embodiment, each sound section Am is expanded or contracted, reducing the possibility that each sound section Am will be interrupted discontinuously at some point along the way. Furthermore, when the duration of the sound section Am is shorter than the duration of the video section Vm, extending the sound section Am reduces the possibility that the sound section Am will be insufficient for the video section Vm.

第１実施形態および第２実施形態における音データ処理部５３は、動画区間Ｖmと動画区間Ｖm+1との境界点Ｐmにおいて音区間Ａmから音区間Ａm+1に切替わるように音データＹ1を処理する要素として包括的に表現される。 The sound data processing unit 53 in the first and second embodiments is collectively represented as an element that processes sound data Y1 so as to switch from sound interval Am to sound interval Am+1 at boundary point Pm between video interval Vm and video interval Vm+1.

Ｃ：第３実施形態
図９は、第３実施形態の音データ処理部５３が実行する編集処理Ｓcの説明図である。第３実施形態においては、音データＹ1が表す音の各音区間Ａmの時間長が、動画データＸ2が表す動画の各動画区間Ｖmと同等の時間長に設定された場合を想定する。なお、第３実施形態および第４実施形態においては、各音区間Ａmの区別は必須ではない。 C: Third embodiment Fig. 9 is an explanatory diagram of the editing process Sc executed by the sound data processing unit 53 of the third embodiment. In the third embodiment, it is assumed that the duration of each sound section Am of the sound represented by the sound data Y1 is set to the same duration as each moving image section Vm of the moving image represented by the moving image data X2. Note that in the third and fourth embodiments, it is not essential to distinguish between each sound section Am.

図９の遷移期間Ｑは、動画データＸ2が表す動画のＭ個の動画区間Ｖ1～ＶMのうち相前後する動画区間Ｖmと動画区間Ｖm+1との境界点Ｐmに対応する期間である。具体的には、遷移期間Ｑは境界点Ｐmを含む期間である。第１実施形態においては、境界点Ｐmを終点とする期間を遷移期間Ｑとして例示する。遷移期間Ｑは所定の時間長に設定される。ただし、遷移期間Ｑの時間長を、例えば端末装置１０の利用者からの指示に応じた可変長としてもよい。 The transition period Q in FIG. 9 is a period corresponding to the boundary point Pm between adjacent video sections Vm and Vm+1 among the M video sections V1 to VM of the video represented by the video data X2. Specifically, the transition period Q is a period that includes the boundary point Pm. In the first embodiment, the transition period Q is exemplified as a period whose end point is the boundary point Pm. The transition period Q is set to a predetermined time length. However, the time length of the transition period Q may be variable, for example, according to an instruction from the user of the terminal device 10.

第３実施形態の音データ処理部５３は、遷移期間Ｑ内において遷移期間Ｑ外よりも音量が減少するように音データＹ1を処理することで、音データＹ2を生成する。具体的には、音データ処理部５３は、遷移期間Ｑの始点ｑ1から終点ｑ2にかけて音量が減少し、かつ、遷移期間Ｑの終点ｑ2において音量が増加するように、音データＹ1を処理する。例えば、音データ処理部５３は、音データＹ1を構成する各サンプルに調整値Ｇを乗算することで音データＹ2を生成する。音データ処理部５３は、基準値ｇHと最小値ｇLとの間の範囲内で調整値Ｇを経時的に変化させる。基準値ｇHは、最小値ｇLを上回る数値である。例えば、基準値ｇHは１に設定され、最小値ｇLは０に設定される。 The sound data processing unit 53 of the third embodiment processes the sound data Y1 so that the volume is lower within the transition period Q than outside the transition period Q, thereby generating sound data Y2. Specifically, the sound data processing unit 53 processes the sound data Y1 so that the volume decreases from the start point q1 to the end point q2 of the transition period Q, and increases at the end point q2 of the transition period Q. For example, the sound data processing unit 53 generates sound data Y2 by multiplying each sample constituting the sound data Y1 by an adjustment value G. The sound data processing unit 53 changes the adjustment value G over time within the range between the reference value gH and the minimum value gL. The reference value gH is a numerical value that exceeds the minimum value gL. For example, the reference value gH is set to 1, and the minimum value gL is set to 0.

第３実施形態の音データ処理部５３は、第１に、遷移期間Ｑ外においては調整値Ｇを基準値ｇHに維持する。第２に、音データ処理部５３は、遷移期間Ｑの始点ｑ1から終点ｑ2にかけて調整値Ｇを基準値ｇHから最小値ｇLまで経時的に減少させる。遷移期間Ｑ内において、調整値Ｇは、例えば所定の変化率で直線的に減少する。ただし、調整値Ｇは、例えば可変の変化率で曲線的に変化してもよい。第３に、音データ処理部５３は、遷移期間Ｑの終点ｑ2において調整値Ｇを最小値ｇLから基準値ｇHまで増加させる。 First, the sound data processing unit 53 of the third embodiment maintains the adjustment value G at the reference value gH outside the transition period Q. Second, the sound data processing unit 53 decreases the adjustment value G over time from the reference value gH to the minimum value gL from the start point q1 to the end point q2 of the transition period Q. Within the transition period Q, the adjustment value G decreases linearly, for example, at a predetermined rate of change. However, the adjustment value G may change curvilinearly, for example, at a variable rate of change. Third, the sound data processing unit 53 increases the adjustment value G from the minimum value gL to the reference value gH at the end point q2 of the transition period Q.

図１０は、第３実施形態における編集処理Ｓcの具体的な手順を例示するフローチャートである。編集処理Ｓcが開始されると、音データ処理部５３は、動画データＸ2を参照することで、相異なる境界点Ｐmに対応する複数の遷移期間Ｑを時間軸上に設定する（Ｓc31）。音データ処理部５３は、複数の遷移期間Ｑの何れかを選択する（Ｓc32）。 Figure 10 is a flow chart illustrating the specific steps of the editing process Sc in the third embodiment. When the editing process Sc is started, the sound data processing unit 53 sets multiple transition periods Q corresponding to different boundary points Pm on the time axis by referring to the video data X2 (Sc31). The sound data processing unit 53 selects one of the multiple transition periods Q (Sc32).

音データ処理部５３は、選択中の遷移期間Ｑ内において音量が減少するように音データＹ1の音量を調整する（Ｓc33：調整処理）。具体的には、音データ処理部５３は、遷移期間Ｑの始点ｑ1から終点ｑ2にかけて調整値Ｇを基準値ｇHから最小値ｇLまで経時的に減少させ、当該終点ｑ2において調整値Ｇを最小値ｇLから基準値ｇHまで増加させる。 The sound data processing unit 53 adjusts the volume of the sound data Y1 so that the volume decreases during the selected transition period Q (Sc33: adjustment process). Specifically, the sound data processing unit 53 decreases the adjustment value G over time from the reference value gH to the minimum value gL from the start point q1 to the end point q2 of the transition period Q, and increases the adjustment value G from the minimum value gL to the reference value gH at the end point q2.

音データ処理部５３は、複数の遷移期間Ｑの全部について調整処理Ｓc33を実行したか否かを判定する（Ｓc34）。未処理の遷移期間Ｑが残存する場合（Ｓc34：NO）、音データ処理部５３は、複数の遷移期間Ｑのうち現時点で選択している遷移期間の直後の遷移期間を選択し（Ｓc32）、更新後の遷移期間Ｑについて調整処理Ｓc33を実行する。他方、複数の遷移期間Ｑの全部について調整処理Ｓc33を実行した場合（Ｓc34：YES）、音データ処理部５３は編集処理Ｓcを終了する。以上の説明から理解される通り、編集処理Ｓcにおいて動画データＸ2は編集されない。 The sound data processing unit 53 determines whether or not the adjustment process Sc33 has been performed for all of the multiple transition periods Q (Sc34). If an unprocessed transition period Q remains (Sc34: NO), the sound data processing unit 53 selects the transition period immediately following the currently selected transition period from among the multiple transition periods Q (Sc32) and performs the adjustment process Sc33 for the updated transition period Q. On the other hand, if the adjustment process Sc33 has been performed for all of the multiple transition periods Q (Sc34: YES), the sound data processing unit 53 ends the editing process Sc. As can be understood from the above explanation, the video data X2 is not edited in the editing process Sc.

以上に説明した通り、第３実施形態においては、動画区間Ｖmと動画区間Ｖm+1との境界点Ｐmに対応する遷移期間Ｑ内において音量が減少するように音データＹ1が処理される。したがって、第１実施形態と同様に、動画データＸ2が表す動画に対する影響を抑制しながら、当該動画の変化と音の変化との間に統一感があるコンテンツＣを生成できる。 As described above, in the third embodiment, the sound data Y1 is processed so that the volume decreases within the transition period Q corresponding to the boundary point Pm between the video section Vm and the video section Vm+1. Therefore, as in the first embodiment, it is possible to generate content C that has a sense of unity between changes in the video and changes in the sound while suppressing the effect on the video represented by the video data X2.

第３実施形態においては特に、遷移期間Ｑの始点ｑ1から終点ｑ2（境界点Ｐm）にかけて音量が減少し、遷移期間Ｑの終点ｑ2（動画区間Ｖmの始点）において音量が増加する。したがって、動画区間Ｖmの終点にかけて音量が経時的に減少し、かつ、動画区間Ｖm+1の開始とともに充分な音量で音が再生されるコンテンツＣを生成できる。 In particular, in the third embodiment, the volume decreases from the start point q1 to the end point q2 (boundary point Pm) of the transition period Q, and increases at the end point q2 (the start point of the video section Vm) of the transition period Q. Therefore, it is possible to generate content C in which the volume decreases over time toward the end point of the video section Vm, and sound is played at a sufficient volume at the start of the video section Vm+1.

Ｄ：第４実施形態
図１１は、第４実施形態における編集処理Ｓcの説明図である。第４実施形態においては、第３実施形態と同様に、音データＹ1が表す音の各音区間Ａmの時間長が、動画データＸ2が表す動画の各動画区間Ｖmと同等の時間長に設定された場合を想定する。第４実施形態においては、第３実施形態と同様に、動画データＸ2が表す動画の相異なる境界点Ｐmを含む複数の遷移期間Ｑが設定される。各遷移期間Ｑは、境界点Ｐmを終点とする期間である。 D: Fourth embodiment Fig. 11 is an explanatory diagram of an editing process Sc in the fourth embodiment. In the fourth embodiment, as in the third embodiment, it is assumed that the time length of each sound section Am of the sound represented by the sound data Y1 is set to be equal to the time length of each video section Vm of the video represented by the video data X2. In the fourth embodiment, as in the third embodiment, a plurality of transition periods Q including different boundary points Pm of the video represented by the video data X2 are set. Each transition period Q is a period having a boundary point Pm as an end point.

第４実施形態の音データ処理部５３は、遷移期間Ｑ内において遷移期間Ｑ外よりも音量が減少するように音データＹ1を処理することで、音データＹ2を生成する。具体的には、音データ処理部５３は、遷移期間Ｑの始点ｑ1において音量が減少し、かつ、遷移期間Ｑの始点ｑ1から終点ｑ2にかけて音量が増加するように、音データＹ1を処理する。例えば、音データ処理部５３は、第３実施形態と同様に、音データＹ1の各サンプルに乗算される調整値Ｇを、基準値ｇHと最小値ｇLとの間の範囲内で経時的に変化させる。 The sound data processing unit 53 of the fourth embodiment generates sound data Y2 by processing sound data Y1 so that the volume is lower within the transition period Q than outside the transition period Q. Specifically, the sound data processing unit 53 processes the sound data Y1 so that the volume is lower at the start point q1 of the transition period Q and is increased from the start point q1 to the end point q2 of the transition period Q. For example, the sound data processing unit 53 changes the adjustment value G, which is multiplied by each sample of the sound data Y1, over time within the range between the reference value gH and the minimum value gL, as in the third embodiment.

第４実施形態の音データ処理部５３は、第１に、遷移期間Ｑ外においては調整値Ｇを基準値ｇHに維持する。第２に、音データ処理部５３は、遷移期間Ｑの始点ｑ1において調整値Ｇを基準値ｇHから最小値ｇLまで減少させる。第３に、音データ処理部５３は、遷移期間Ｑの始点ｑ1から終点ｑ2にかけて調整値Ｇを最小値ｇLから基準値ｇHまで経時的に増加させる。遷移期間Ｑ内において、調整値Ｇは、例えば所定の変化率で直線的に増加する。ただし、調整値Ｇは、例えば可変の変化率で曲線的に変化してもよい。 First, the sound data processing unit 53 of the fourth embodiment maintains the adjustment value G at the reference value gH outside the transition period Q. Second, the sound data processing unit 53 decreases the adjustment value G from the reference value gH to the minimum value gL at the start point q1 of the transition period Q. Third, the sound data processing unit 53 increases the adjustment value G over time from the minimum value gL to the reference value gH from the start point q1 to the end point q2 of the transition period Q. Within the transition period Q, the adjustment value G increases linearly, for example, at a predetermined rate of change. However, the adjustment value G may change curvilinearly, for example, at a variable rate of change.

第４実施形態における編集処理Ｓcのうち調整処理Ｓc33以外の動作は第３実施形態と同様である。第４実施形態の調整処理Ｓc33において、音データ処理部５３は、遷移期間Ｑ内において音量が減少するように音データＹ1の音量を調整する。具体的には、第４実施形態の音データ処理部５３は、遷移期間Ｑの始点ｑ1において調整値Ｇを基準値ｇHから最小値ｇLまで減少させ、当該遷移期間Ｑの始点ｑ1から終点ｑ2にかけて調整値Ｇを最小値ｇLから基準値ｇHまで経時的に増加させる。 The operations of the editing process Sc in the fourth embodiment, other than the adjustment process Sc33, are the same as those in the third embodiment. In the adjustment process Sc33 in the fourth embodiment, the sound data processing unit 53 adjusts the volume of the sound data Y1 so that the volume decreases within the transition period Q. Specifically, the sound data processing unit 53 in the fourth embodiment decreases the adjustment value G from the reference value gH to the minimum value gL at the start point q1 of the transition period Q, and increases the adjustment value G over time from the minimum value gL to the reference value gH from the start point q1 to the end point q2 of the transition period Q.

以上に説明した通り、第４実施形態においては、動画区間Ｖmと動画区間Ｖm+1との境界点Ｐmを含む遷移期間Ｑ内において音量が減少するように音データＹ1が処理される。したがって、第３実施形態と同様に、動画データＸ2が表す動画に対する影響を抑制しながら、当該動画の変化と音の変化との間に統一感があるコンテンツＣを生成できる。 As described above, in the fourth embodiment, the sound data Y1 is processed so that the volume decreases within the transition period Q that includes the boundary point Pm between the video section Vm and the video section Vm+1. Therefore, as in the third embodiment, it is possible to generate content C that has a sense of unity between changes in the video and changes in the sound while suppressing the effect on the video represented by the video data X2.

第４実施形態においては特に、遷移期間Ｑの始点ｑ1において音量が減少し、遷移期間Ｑの始点ｑ1から終点ｑ2（境界点Ｐm）にかけて音量が増加する。したがって、音量が経時的に増加しながら動画区間Ｖmから動画区間Ｖm+1に切り替わるコンテンツＣを生成できる。 In particular, in the fourth embodiment, the volume decreases at the start point q1 of the transition period Q, and increases from the start point q1 to the end point q2 (boundary point Pm) of the transition period Q. Therefore, it is possible to generate content C in which the volume increases over time as the video section Vm switches to the video section Vm+1.

Ｅ：第５実施形態
図１２は、第５実施形態における端末装置１０の構成を例示するブロック図である。第１実施形態から第４実施形態においては、編集システム２０が素材データＤからコンテンツＣを生成した。第５実施形態においては端末装置１０が素材データＤからコンテンツＣを生成する。第５実施形態においては編集システム２０が省略される。 E: Fifth embodiment Fig. 12 is a block diagram illustrating a configuration of a terminal device 10 in a fifth embodiment. In the first to fourth embodiments, the editing system 20 generates the content C from the material data D. In the fifth embodiment, the terminal device 10 generates the content C from the material data D. In the fifth embodiment, the editing system 20 is omitted.

端末装置１０は、制御装置１１と記憶装置１２と再生装置１３とを具備する。なお、端末装置１０は、単体の装置で実現されるほか、相互に別体で構成された複数の装置でも実現される。例えば、再生装置１３は、端末装置１０とは別体で構成され、端末装置１０に有線または無線で接続されてもよい。 The terminal device 10 includes a control device 11, a storage device 12, and a playback device 13. The terminal device 10 may be realized as a single device, or as multiple devices configured separately from each other. For example, the playback device 13 may be configured separately from the terminal device 10 and connected to the terminal device 10 by wire or wirelessly.

制御装置１１は、端末装置１０の各要素を制御する単数または複数のプロセッサである。具体的には、例えばＣＰＵ、ＳＰＵ、ＤＳＰ、ＦＰＧＡ、またはＡＳＩＣ等の１種類以上のプロセッサにより、制御装置１１が構成される。 The control device 11 is a single or multiple processors that control each element of the terminal device 10. Specifically, the control device 11 is configured by one or more types of processors, such as a CPU, SPU, DSP, FPGA, or ASIC.

記憶装置１２は、制御装置１１が実行するプログラムと制御装置１１が使用する各種のデータとを記憶する単数または複数のメモリである。記憶装置１２は、例えば磁気記録媒体もしくは半導体記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せで構成される。また、端末装置１０に対して着脱される可搬型の記録媒体、または制御装置１１が通信網３０を介して書込または読出を実行可能な記録媒体（例えばクラウドストレージ）を、記憶装置１２として利用してもよい。 The storage device 12 is a single or multiple memories that store the programs executed by the control device 11 and various data used by the control device 11. The storage device 12 is composed of a known recording medium, such as a magnetic recording medium or a semiconductor recording medium, or a combination of multiple types of recording media. In addition, a portable recording medium that is detachable from the terminal device 10, or a recording medium (e.g., cloud storage) that the control device 11 can write to or read from via the communication network 30 may be used as the storage device 12.

再生装置１３は、コンテンツＣを再生する。具体的には、再生装置１３は、コンテンツＣの動画データＸ2が表す動画を表示する表示装置１３１と、当該コンテンツＣの音データＹ2が表す音を放音する放音装置１３２（例えばスピーカまたはヘッドホン）とを具備する。 The playback device 13 plays back the content C. Specifically, the playback device 13 includes a display device 131 that displays the video represented by the video data X2 of the content C, and a sound emitting device 132 (e.g., a speaker or headphones) that emits the sound represented by the sound data Y2 of the content C.

図１３は、端末装置１０の機能的な構成を例示するブロック図である。端末装置１０の制御装置１１は、記憶装置１２に記憶されたプログラムを実行することで、素材データＤからコンテンツＣを生成および提供するための複数の機能（素材データ取得部５１，動画データ処理部５２，音データ処理部５３および再生制御部５５）を実現する。すなわち、端末装置１０の機能は、第１実施形態から第４実施形態における編集システム２０の機能のうちコンテンツ提供部５４を再生制御部５５に置換した関係にある。 FIG. 13 is a block diagram illustrating an example of the functional configuration of the terminal device 10. The control device 11 of the terminal device 10 executes a program stored in the storage device 12 to realize a number of functions (a material data acquisition unit 51, a video data processing unit 52, an audio data processing unit 53, and a playback control unit 55) for generating and providing content C from material data D. In other words, the functions of the terminal device 10 are similar to those of the editing system 20 in the first to fourth embodiments, except that the content providing unit 54 has been replaced with the playback control unit 55.

素材データ取得部５１は、動画データＸ1と音データＹ1とを含む素材データＤを取得する。具体的には、素材データ取得部５１は、端末装置１０の利用者からの指示に応じて素材データＤを生成または編集する。なお、素材データ取得部５１は、端末装置１０が通信網３０を介して通信可能な外部装置から素材データＤを受信してもよい。 The material data acquisition unit 51 acquires material data D including video data X1 and audio data Y1. Specifically, the material data acquisition unit 51 generates or edits the material data D in response to an instruction from a user of the terminal device 10. Note that the material data acquisition unit 51 may receive the material data D from an external device with which the terminal device 10 can communicate via the communication network 30.

動画データ処理部５２は、第１実施形態と同様に、素材データＤの動画データＸ1から動画データＸ2を生成する。音データ処理部５３は、素材データＤの音データＹ1から音データＹ2を生成する。具体的には、音データ処理部５３は、第１実施形態から第４実施形態の何れかに例示した編集処理Ｓcを音データＹ1に対して実行することで、音データＹ2を生成する。第１実施形態と同様に、動画データ処理部５２が生成した動画データＸ2と音データ処理部５３が生成した音データＹ2とによりコンテンツＣが構成される。 As in the first embodiment, the video data processing unit 52 generates video data X2 from video data X1 of material data D. The sound data processing unit 53 generates sound data Y2 from sound data Y1 of material data D. Specifically, the sound data processing unit 53 generates sound data Y2 by executing the editing process Sc exemplified in any of the first to fourth embodiments on the sound data Y1. As in the first embodiment, content C is composed of the video data X2 generated by the video data processing unit 52 and the sound data Y2 generated by the sound data processing unit 53.

再生制御部５５は、コンテンツＣを再生装置１３に再生させる。具体的には、再生制御部５５は、動画データＸ2の供給により表示装置１３１に動画を表示させ、音データＹ2の供給により放音装置１３２に音を放音させる。したがって、端末装置１０の利用者はコンテンツＣを視聴可能である。第５実施形態においても第１実施形態と同様の効果が実現される。 The playback control unit 55 causes the playback device 13 to play content C. Specifically, the playback control unit 55 causes the display device 131 to display a video by supplying video data X2, and causes the sound emitting device 132 to emit sound by supplying sound data Y2. Thus, the user of the terminal device 10 can view content C. The fifth embodiment also achieves the same effects as the first embodiment.

第１実施形態から第４実施形態に例示した編集システム２０と、第５実施形態に例示した端末装置１０とは、動画データＸ1と音データＹ1とを処理する情報処理システムとして包括的に表現される。 The editing system 20 illustrated in the first to fourth embodiments and the terminal device 10 illustrated in the fifth embodiment are collectively expressed as an information processing system that processes video data X1 and audio data Y1.

Ｆ：変形例
以上に例示した各態様に付加される具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を、相互に矛盾しない範囲で適宜に併合してもよい。 F: Modifications Specific modifications to the above-mentioned embodiments are illustrated below. Two or more of the following embodiments may be combined as long as they are not mutually contradictory.

（１）前述の各形態に係る構成は適宜に併合可能である。例えば、第１実施形態または第２実施形態に例示した抽出処理（Ｓc13，Ｓc23）により各音区間Ａmを各動画区間Ｖmと同等の時間長に調整したうえで、第３実施形態または第４実施形態に例示した調整処理Ｓc33により各遷移期間Ｑ内の音量を調整してもよい。 (1) The configurations according to the above-mentioned embodiments can be combined as appropriate. For example, each sound section Am may be adjusted to have a duration equivalent to each video section Vm by the extraction process (Sc13, Sc23) illustrated in the first or second embodiment, and the volume within each transition period Q may be adjusted by the adjustment process Sc33 illustrated in the third or fourth embodiment.

（２）第３実施形態および第４実施形態においては、境界点Ｐmを終点とする遷移期間Ｑを例示したが、遷移期間Ｑと境界点Ｐmとの関係は以上の例示に限定されない。例えば、境界点Ｐmを始点として遷移期間Ｑを設定する形態、または、境界点Ｐmを中点として遷移期間Ｑを設定する形態も想定される。 (2) In the third and fourth embodiments, the transition period Q is illustrated as having the boundary point Pm as its end point, but the relationship between the transition period Q and the boundary point Pm is not limited to the above example. For example, it is also possible to set the transition period Q with the boundary point Pm as its start point, or to set the transition period Q with the boundary point Pm as its midpoint.

（３）前述の各形態においては、音データＹ（Ｙ1，Ｙ2）が楽曲の演奏音（楽器音または歌唱音）を表す形態を例示したが、音データＹが表す音は音楽的な音に限定されない。例えば、音楽的な要素を含まない発話音声（言語音）を音データＹが表す形態も想定される。例えば、動画データＸ2が表す動画に並行に再生されるべき発話音声（例えば動画の登場人物による発話音声または当該動画の解説音声）を音データＹが表してもよい。 (3) In each of the above-mentioned embodiments, the sound data Y (Y1, Y2) represents the performance sound of a piece of music (musical instrument sound or singing sound), but the sound represented by the sound data Y is not limited to musical sound. For example, a form in which the sound data Y represents speech (language sound) that does not include musical elements is also envisioned. For example, the sound data Y may represent speech (e.g., speech by a character in the video or commentary on the video) to be played in parallel with the video represented by the video data X2.

（４）前述の各形態においては、素材データＤが音データＹ1を含む構成を例示したが、素材データＤが音データＹ1に代えて文字列データを含む形態も想定される。文字列データは、動画データＸ2の動画に対して並行に再生されるべき音声に対応する文字列を表すデータである。音データ処理部５３は、素材データＤの文字列データを適用した音声合成により音データＹ1を生成し、当該音データＹ1に対する編集処理Ｓcにより音データＹ2を生成する。音声合成には公知の任意の方法が利用される。 (4) In each of the above embodiments, the material data D includes sound data Y1, but a configuration in which the material data D includes character string data instead of the sound data Y1 is also envisioned. The character string data is data representing a character string corresponding to the sound to be played in parallel to the video of the video data X2. The sound data processing unit 53 generates sound data Y1 by voice synthesis using the character string data of the material data D, and generates sound data Y2 by editing processing Sc on the sound data Y1. Any known method can be used for the voice synthesis.

（５）第１実施形態および第２実施形態においては、音データＹ1における各音区間Ａmの時間長を動画区間Ｖmの時間長に調整（削除または伸縮）したが、各音区間Ａmの時間長が動画区間Ｖmの時間長に応じて設定された音データＹ2を、音データ処理部５３が合成処理により生成してもよい。合成処理は、音符の時系列を表す制御データから演奏音を合成する楽音合成、または、文字列を表す制御データから発話音声または歌唱音等の音声を合成する音声合成である。音データ処理部５３は、例えば、各音区間Ａmが動画区間Ｖmと同等の時間長に設定された音データＹ2を、制御データを適用した合成処理により生成する。以上の説明から理解される通り、第１実施形態または第２実施形態において、音データＹ1に対する調整は省略されてもよい。また、第３実施形態または第４実施形態に利用される音データＹ1は、以上に例示した合成処理により生成されてもよい。 (5) In the first and second embodiments, the duration of each sound section Am in the sound data Y1 is adjusted (deleted or expanded) to the duration of the video section Vm. However, the sound data processing unit 53 may generate sound data Y2 in which the duration of each sound section Am is set according to the duration of the video section Vm by a synthesis process. The synthesis process is musical sound synthesis, which synthesizes a performance sound from control data representing a time series of musical notes, or voice synthesis, which synthesizes a voice such as a speech sound or a singing sound from control data representing a character string. For example, the sound data processing unit 53 generates sound data Y2 in which each sound section Am is set to the same duration as the video section Vm by a synthesis process that applies control data. As can be understood from the above description, in the first or second embodiment, the adjustment to the sound data Y1 may be omitted. Also, the sound data Y1 used in the third or fourth embodiment may be generated by the synthesis process exemplified above.

（６）前述の各形態においては、音データＹ（Ｙ1，Ｙ2）がサンプルの時系列で構成される形態を例示したが、音データＹの形式は任意である。例えば、ＭＩＤＩ（Musical Instrument Digital Interface）規格に準拠した形式の音データＹを利用してもよい。 (6) In each of the above embodiments, the sound data Y (Y1, Y2) is configured as a time series of samples, but the format of the sound data Y is arbitrary. For example, sound data Y in a format conforming to the MIDI (Musical Instrument Digital Interface) standard may be used.

（７）前述の各形態においては、動画データＸ（Ｘ1，Ｘ2）が動画を表す形態を例示したが、相互に並行に再生される動画および音の双方を動画データＸが表す形態も想定される。コンテンツＣが再生される状況では、動画データＸ2が表す音と音データＹ2が表す音とが並行に再生される。 (7) In each of the above embodiments, the video data X (X1, X2) represents a video, but a configuration in which the video data X represents both a video and sound that are played in parallel with each other is also envisioned. When content C is being played, the sound represented by the video data X2 and the sound represented by the sound data Y2 are played in parallel.

（８）第１実施形態から第４実施形態における編集システム２０の機能は、前述の通り、制御装置２１を構成する単数または複数のプロセッサと、記憶装置２２に記憶されたプログラムとの協働により実現される。同様に、第５実施形態における端末装置１０の機能は、制御装置１１を構成する単数または複数のプロセッサと、記憶装置１２に記憶されたプログラムとの協働により実現される。 (8) As described above, the functions of the editing system 20 in the first to fourth embodiments are realized by the cooperation of one or more processors constituting the control device 21 and the program stored in the storage device 22. Similarly, the functions of the terminal device 10 in the fifth embodiment are realized by the cooperation of one or more processors constituting the control device 11 and the program stored in the storage device 12.

以上の機能を実現するためのプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号（transitory, propagating signal）を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、配信装置が通信網を介してプログラムを配信する構成では、当該配信装置においてプログラムを記憶する記憶装置が、前述の非一過性の記録媒体に相当する。 The program for realizing the above functions can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and a good example is an optical recording medium (optical disk) such as a CD-ROM, but also includes any known type of recording medium such as a semiconductor recording medium or a magnetic recording medium. Note that a non-transitory recording medium includes any recording medium other than a transient, propagating signal, and does not exclude volatile recording media. In addition, in a configuration in which a distribution device distributes a program via a communication network, the storage device that stores the program in the distribution device corresponds to the non-transitory recording medium described above.

Ｇ：付記
以上に例示した形態から、例えば以下の構成が把握される。 G: Supplementary Note From the above-described exemplary embodiments, the following configurations, for example, can be understood.

本開示のひとつの態様（態様１）に係る情報処理方法は、複数の動画区間を含む動画データと複数の音区間を含む音データとを処理する情報処理方法であって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点において、前記複数の音区間のうちの第１音区間から当該第１音区間以外の第２音区間に切替わるように、前記音データを処理する。以上の態様においては、第１動画区間と第２動画区間との境界点において第１音区間から第２音区間に遷移するように音データが処理される。したがって、動画データが表す動画に対する影響を抑制しながら、当該動画の変化と音の変化との間に統一感があるコンテンツを生成できる。 An information processing method according to one aspect (aspect 1) of the present disclosure is an information processing method for processing video data including a plurality of video segments and sound data including a plurality of sound segments, and processes the sound data so as to switch from a first sound segment among the plurality of sound segments to a second sound segment other than the first sound segment at a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment. In the above aspect, the sound data is processed so as to transition from the first sound segment to the second sound segment at the boundary point between the first video segment and the second video segment. Therefore, it is possible to generate content that has a sense of unity between changes in the video and changes in sound while suppressing the effect on the video represented by the video data.

なお、「第１動画区間と第２動画区間との境界点において第１音区間から第２音区間に切替わる」とは、第１音区間から第２音区間への切替点が、第１動画区間と第２動画区間との境界点に実質的に一致することを意味する。「実質的に一致する」場合は、切替点が境界点に完全に一致する場合のほか、切替点と境界点とが厳密には一致しないけれども両者が一致すると同視できる場合も包含する。例えば、切替点と境界点とが実際には相違しても、切替点と境界点とが一致しているとコンテンツの視聴者が知覚できる程度に両者が近似する状態は、「実質的に一致する」と解釈できる。 Note that "switching from the first sound section to the second sound section at the boundary point between the first video section and the second video section" means that the switching point from the first sound section to the second sound section substantially coincides with the boundary point between the first video section and the second video section. "Substantially coincident" includes cases where the switching point completely coincides with the boundary point, as well as cases where the switching point and the boundary point do not strictly coincide but can be regarded as coinciding. For example, even if the switching point and the boundary point are actually different, a state in which the switching point and the boundary point are close enough that a viewer of the content can perceive them as coinciding can be interpreted as "substantially coinciding."

態様１の具体例（態様２）において、前記音データの処理においては、前記第１音区間のうち前記境界点に一致する途中の時点から前記第２音区間に切替わるように、前記音データを処理する。以上の態様によれば、第１音区間のうち境界点に一致する途中の時点において第２音区間を開始させる簡便な処理により、第１動画区間と第２動画区間との境界点において第１音区間から第２音区間への切替を発生させることが可能である。 In a specific example (Aspect 2) of Aspect 1, the sound data is processed so as to switch to the second sound section at an intermediate point in the first sound section that coincides with the boundary point. According to the above aspect, it is possible to switch from the first sound section to the second sound section at the boundary point between the first video section and the second video section by a simple process of starting the second sound section at an intermediate point in the first sound section that coincides with the boundary point.

態様１の具体例（態様３）において、前記音データの処理においては、前記第１音区間および前記第２音区間の少なくとも一方を時間軸上において短縮または伸長することで、前記第１音区間から前記第２音区間への切替の時点を前記境界点に一致させる。以上の態様によれば、第１音区間が途中で途切れる可能性、または、第１音区間が第１動画区間に対して不足する可能性を低減できる。なお、音区間の伸縮は、例えば再生速度の調整により実現される。すなわち、再生速度を増加させることで音区間は短縮され、再生速度を減少させることで音区間は伸長される。 In a specific example (aspect 3) of aspect 1, in processing the sound data, at least one of the first sound interval and the second sound interval is shortened or extended on the time axis, so that the time point at which the first sound interval switches to the second sound interval coincides with the boundary point. According to the above aspect, it is possible to reduce the possibility that the first sound interval will be cut off midway, or that the first sound interval will be insufficient for the first video interval. Note that the sound interval is expanded or contracted by, for example, adjusting the playback speed. That is, the sound interval is shortened by increasing the playback speed, and the sound interval is extended by decreasing the playback speed.

本開示の他の態様（態様４）に係る情報処理方法は、複数の動画区間を含む動画データと音を表す音データとを処理する情報処理方法であって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点を含む遷移期間内において音量が減少するように、前記音データを処理する。以上の態様においては、第１動画区間と第２動画区間との境界点を含む遷移期間内において音量が減少するように音データが処理される。したがって、動画データが表す動画に対する影響を抑制しながら、当該動画の変化と音の変化との間に統一感があるコンテンツを生成できる。 An information processing method according to another aspect (aspect 4) of the present disclosure is an information processing method for processing video data including a plurality of video segments and audio data representing sound, the audio data being processed so that the volume is reduced within a transition period including a boundary point between a first video segment and a second video segment following the first video segment among the plurality of video segments. In the above aspect, the audio data is processed so that the volume is reduced within a transition period including a boundary point between the first video segment and the second video segment. Therefore, it is possible to generate content that has a sense of unity between changes in the video and changes in the audio, while suppressing the effect on the video represented by the video data.

態様４の具体例（態様５）において、前記遷移期間は、前記境界点を終点とする期間であり、前記音データの処理においては、前記遷移期間の始点から終点にかけて前記音量が減少し、当該遷移期間の終点において前記音量が増加するように、前記音データを処理する。以上の態様においては、遷移期間の始点から終点（境界点）にかけて音量が減少し、遷移期間の終点（第２動画区間の始点）において音量が増加する。したがって、第１動画区間の終点にかけて音量が経時的に減少し、かつ、第２動画区間の開始とともに充分な音量で音が再生されるコンテンツを生成できる。 In a specific example (aspect 5) of aspect 4, the transition period is a period ending at the boundary point, and in processing the sound data, the sound data is processed so that the volume decreases from the start point to the end point of the transition period and increases at the end point of the transition period. In the above aspect, the volume decreases from the start point to the end point (boundary point) of the transition period and increases at the end point of the transition period (the start point of the second video section). Therefore, it is possible to generate content in which the volume decreases over time towards the end point of the first video section and sound is played at a sufficient volume as the second video section begins.

態様４の具体例（態様６）において、前記遷移期間は、前記境界点を終点とする期間であり、前記音データの処理においては、前記遷移期間の始点において前記音量が減少し、当該遷移期間の始点から終点にかけて前記音量が増加するように、前記音データを処理する。以上の態様においては、遷移期間の始点において音量が減少し、遷移期間の始点から終点（境界点）にかけて音量が増加する。したがって、音量が経時的に増加しながら第１動画区間から第２動画区間に切替わるコンテンツを生成できる。 In a specific example (aspect 6) of aspect 4, the transition period is a period ending at the boundary point, and in processing the sound data, the sound data is processed so that the volume decreases at the start of the transition period and increases from the start to the end of the transition period. In the above aspect, the volume decreases at the start of the transition period and increases from the start to the end of the transition period (boundary point). Therefore, it is possible to generate content in which the volume increases over time as the first video segment switches to the second video segment.

本開示のひとつの態様に係る情報処理システムは、複数の動画区間を含む動画データと複数の音区間を含む音データとを処理する情報処理システムであって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点において、前記複数の音区間のうちの第１音区間から当該第１音区間以外の第２音区間に切替わるように、前記音データを処理する音データ処理部を具備する。また、本開示の他の態様に係る情報処理システムは、複数の動画区間を含む動画データと音を表す音データとを処理する情報処理システムであって、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点を含む遷移期間内において音量が減少するように、前記音データを処理する音データ処理部を具備する。 An information processing system according to one aspect of the present disclosure is an information processing system that processes video data including a plurality of video segments and sound data including a plurality of sound segments, and includes a sound data processing unit that processes the sound data so as to switch from a first sound segment among the plurality of sound segments to a second sound segment other than the first sound segment at a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment. An information processing system according to another aspect of the present disclosure is an information processing system that processes video data including a plurality of video segments and sound data representing sound, and includes a sound data processing unit that processes the sound data so as to reduce the volume within a transition period that includes a boundary point between a first video segment among the plurality of video segments and a second video segment subsequent to the first video segment.

本開示のひとつの態様に係るプログラムは、複数の動画区間を含む動画データと複数の音区間を含む音データとを処理するためのプログラムであって、コンピュータを、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点において、前記複数の音区間のうちの第１音区間から当該第１音区間以外の第２音区間に切替わるように、前記音データを処理する音データ処理部として機能させる。また、本開示の他の態様に係るプログラムは、複数の動画区間を含む動画データと音を表す音データとを処理するためのプログラムであって、コンピュータを、前記複数の動画区間のうち第１動画区間と当該第１動画区間に後続する第２動画区間との境界点を含む遷移期間内において音量が減少するように、前記音データを処理する音データ処理部として機能させる。 A program according to one aspect of the present disclosure is a program for processing video data including multiple video segments and sound data including multiple sound segments, and causes a computer to function as a sound data processing unit that processes the sound data so as to switch from a first sound segment among the multiple sound segments to a second sound segment other than the first sound segment at a boundary point between a first video segment among the multiple video segments and a second video segment subsequent to the first video segment. A program according to another aspect of the present disclosure is a program for processing video data including multiple video segments and sound data representing sound, and causes a computer to function as a sound data processing unit that processes the sound data so as to reduce the volume within a transition period that includes a boundary point between a first video segment among the multiple video segments and a second video segment subsequent to the first video segment.

１００…情報システム、１０…端末装置、１１，２１…制御装置、１２，２２…記憶装置、１３…再生装置、１３１…表示装置、１３２…放音装置、２０…編集システム、２３…通信装置、５１…素材データ取得部、５２…動画データ処理部、５３…音データ処理部、５４…コンテンツ提供部、５５…再生制御部。 100...information system, 10...terminal device, 11, 21...control device, 12, 22...storage device, 13...playback device, 131...display device, 132...sound emission device, 20...editing system, 23...communication device, 51...material data acquisition unit, 52...video data processing unit, 53...sound data processing unit, 54...content provision unit, 55...playback control unit.

Claims

1. An information processing method for processing moving image data including a plurality of moving image segments and sound data representing sound, comprising the steps of:
processing the sound data so that a volume is reduced within a transition period including a boundary point between a first moving image section and a second moving image section subsequent to the first moving image section among the plurality of moving image sections, the method comprising:
the transition period is a period ending at the boundary point,
The sound data is processed such that the volume decreases from the start point to the end point of the transition period and the volume increases at the end point of the transition period.
Information processing methods.

1. An information processing method for processing moving image data including a plurality of moving image segments and sound data representing sound, comprising the steps of:
processing the sound data so that a volume is reduced within a transition period including a boundary point between a first moving image section and a second moving image section subsequent to the first moving image section among the plurality of moving image sections, the method comprising:
the transition period is a period ending at the boundary point,
In processing the sound data, the sound data is processed so that the volume decreases at the start point of the transition period and increases from the start point to the end point of the transition period.
Information processing methods.

An information processing system for processing video data including a plurality of video segments and sound data representing sound, comprising:
a sound data processing unit that processes the sound data so that a volume of the sound is reduced within a transition period including a boundary point between a first moving image section and a second moving image section subsequent to the first moving image section among the plurality of moving image sections ;
the transition period is a period ending at the boundary point,
The sound data processing unit processes the sound data such that the volume decreases from the start point to the end point of the transition period and the volume increases at the end point of the transition period.
Information processing system.

An information processing system for processing video data including a plurality of video segments and sound data representing sound, comprising:
a sound data processing unit that processes the sound data so that a volume of the sound is reduced within a transition period including a boundary point between a first moving image section and a second moving image section subsequent to the first moving image section among the plurality of moving image sections ;
the transition period is a period ending at the boundary point,
The sound data processing unit processes the sound data such that the volume decreases at the start point of the transition period and increases from the start point to the end point of the transition period.
Information processing system.

A program for processing video data including a plurality of video segments and sound data representing sounds, the program comprising:
functioning as a sound data processing unit that processes the sound data so as to reduce a volume within a transition period including a boundary point between a first moving image section and a second moving image section subsequent to the first moving image section among the plurality of moving image sections ;
the transition period is a period ending at the boundary point,
The sound data processing unit processes the sound data such that the volume decreases from the start point to the end point of the transition period and the volume increases at the end point of the transition period.
program.

A program for processing video data including a plurality of video segments and sound data representing sounds, the program comprising:
functioning as a sound data processing unit that processes the sound data so as to reduce a volume within a transition period including a boundary point between a first moving image section and a second moving image section subsequent to the first moving image section among the plurality of moving image sections ;
the transition period is a period ending at the boundary point,
The sound data processing unit processes the sound data such that the volume decreases at the start point of the transition period and increases from the start point to the end point of the transition period.
program.