JP7720428B2

JP7720428B2 - Image generation method

Info

Publication number: JP7720428B2
Application number: JP2024003291A
Authority: JP
Inventors: 祐也西尾; 哲和田; 康一田中; 幸徳西山
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2019-07-26
Filing date: 2024-01-12
Publication date: 2025-08-07
Anticipated expiration: 2040-07-20
Also published as: JPWO2021020197A1; JP2024026741A; WO2021020197A1

Description

本発明は、映像生成方法、映像生成装置及び映像生成プログラムに関する。 The present invention relates to an image generation method, an image generation device, and an image generation program.

映像と音声を記録する際、メインの音声とは異なる音声（風雑音、環境音、操作音、話し声など）が一緒に記録される場合がある。 When recording video and audio, sounds other than the main audio (wind noise, environmental sounds, operation sounds, speaking voices, etc.) may also be recorded.

特許文献１には、風雑音の有無又は強弱を表示する手段と、風雑音対策の有無又は強弱を手動で選択できる手段と、記録後でも風雑音対策の有無を選択できる手段と、を備えたビデオカメラが記載されている。 Patent document 1 describes a video camera equipped with a means for displaying the presence or absence and strength of wind noise, a means for manually selecting the presence or strength of wind noise countermeasures, and a means for selecting the presence or absence of wind noise countermeasures even after recording.

また、特許文献２には、風雑音を自動で低減させる機能を備え、かつ、その機能の動作を任意設定可能にしたビデオカメラが記載されている。 Patent document 2 also describes a video camera that has a function to automatically reduce wind noise and allows the operation of this function to be set arbitrarily.

特開２０１０－４３３９号公報JP 2010-4339 A 特開２００９－１２４４１４号公報JP 2009-124414 A

本開示の技術に係る一つの実施形態は、特定の音声を強調又は低減させた音声付きの映像を生成できる映像生成方法、映像生成装置及び映像生成プログラムを提供する。 One embodiment of the technology disclosed herein provides an image generation method, an image generation device, and an image generation program that can generate an image with audio that emphasizes or reduces specific audio.

（１）撮像部で撮像された第１映像を記録する映像記録工程と、第１映像に同期して第１音声を記録する第１音声記録工程と、第１音声と異なる第２音声を記録する第２音声記録工程と、第２音声を用いて第１音声を処理し、強調又は低減させた第２音声を含む第３音声を生成する音声生成工程と、第１映像と第３音声を関連付けて第２映像を生成する映像生成工程と、を備えた映像生成方法。 (1) A video generation method comprising: a video recording step of recording a first video captured by an imaging unit; a first audio recording step of recording a first audio in synchronization with the first video; a second audio recording step of recording a second audio different from the first audio; an audio generation step of processing the first audio using the second audio to generate a third audio including an enhanced or reduced second audio; and a video generation step of generating a second video by associating the first video with the third audio.

（２）音声生成工程は、強調又は低減させた第２音声を第１音声に合成して、第３音声を生成する、（１）の映像生成方法。 (2) A video generation method according to (1), in which the audio generation step synthesizes the emphasized or reduced second audio with the first audio to generate a third audio.

（３）音声生成工程の前に、第２音声の強度を設定する強度設定工程を更に備え、音声生成工程は、強度設定工程で設定された強度で第２音声を第１音声に合成する、（２）の映像生成方法。 (3) The video generation method of (2) further includes an intensity setting step of setting the intensity of the second sound before the sound generation step, and the sound generation step combines the second sound with the first sound at the intensity set in the intensity setting step.

（４）第１音声は第２音声と共通する音声成分である共通成分を含み、音声生成工程は、第２音声を用いて、共通成分を強調又は低減させる処理を第１音声に行って、第３音声を生成する、（１）の映像生成方法。 (4) A video generation method according to (1), in which the first audio includes a common component that is a common audio component with the second audio, and the audio generation step uses the second audio to perform processing on the first audio to emphasize or reduce the common component, thereby generating a third audio.

（５）音声生成工程の前に、共通成分の処理条件を設定する処理条件設定工程を更に備え、音声生成工程は、処理条件設定工程で設定された処理条件に従って、共通成分を強調又は低減させる処理を第１音声に対して行う、（４）の映像生成方法。 (5) A video generation method according to (4), further comprising a processing condition setting step for setting processing conditions for the common component before the audio generation step, wherein the audio generation step performs processing on the first audio to emphasize or reduce the common component in accordance with the processing conditions set in the processing condition setting step.

（６）撮像部を含む撮像装置本体の動きを検出する検出工程を更に備え、音声生成工程は、検出工程において動きの検出があった場合、第１音声又は第２音声に所定の処理を行って第３音声を生成する、（１）から（５）のいずれか一の映像生成方法。 (6) Any one of the video generation methods (1) to (5) further includes a detection step of detecting movement of the imaging device body including the imaging unit, and the audio generation step performs predetermined processing on the first audio or the second audio to generate a third audio when movement is detected in the detection step.

（７）撮像部による第１映像の撮像情報を取得する第１情報取得工程と、撮像情報を表示する第１表示工程と、を更に備えた（１）から（６）のいずれか一の映像生成方法。 (7) The image generation method according to any one of (1) to (6), further comprising a first information acquisition step of acquiring imaging information of a first image captured by an imaging unit, and a first display step of displaying the imaging information.

（８）撮像情報には、撮像部を含む撮像装置本体の動きの情報及び焦点距離の情報の少なくとも一つが含まれる、（７）の映像生成方法。 (8) The image generation method of (7), wherein the imaging information includes at least one of information about the movement of the imaging device body including the imaging unit and information about the focal length.

（９）第１音声及び第２音声を集音する集音部の情報を取得する第２情報取得工程を更に備え、集音部の情報を表示する第２表示工程と、を更に備えた（１）から（８）のいずれか一の映像生成方法。 (9) The video generation method of any one of (1) to (8), further comprising a second information acquisition step of acquiring information about a sound collection unit that collects the first sound and the second sound, and a second display step of displaying the information about the sound collection unit.

（１０）第２音声記録工程は、第１映像に同期して第２音声を記録する、（１）から（９）のいずれか一の映像生成方法。 (10) A video generation method according to any one of (1) to (9), wherein the second audio recording step records the second audio in synchronization with the first video.

（１１）第２音声の記録されたタイミングを検出する第２音声検出工程と、第２音声検出工程で検出された情報を第１映像に関連付ける関連付け工程と、を更に備えた（１０）の映像生成方法。 (11) The video generation method of (10) further includes a second audio detection step of detecting the timing at which the second audio was recorded, and an associating step of associating the information detected in the second audio detection step with the first video.

（１２）第２音声記録工程は、映像記録工程の前に第２音声を記録する、（１）から（１１）のいずれか一の映像生成方法。 (12) A video generation method according to any one of (1) to (11), wherein the second audio recording step records the second audio before the video recording step.

（１３）第１音声記録工程は、第１集音部を介して第１音声を記録し、第２音声記録工程は、第１集音部とは異なる第２集音部を介して第２音声を記録する、（１）から（１２）のいずれか一の映像生成方法。 (13) A video generation method according to any one of (1) to (12), wherein the first audio recording step records the first audio via a first audio collection unit, and the second audio recording step records the second audio via a second audio collection unit different from the first audio collection unit.

（１４）第２集音部は、指向性の集音特性を有し、第１集音部は、第２収音部よりも低い指向性の集音特性を有する、（１３）の映像生成方法。 (14) The image generation method of (13), wherein the second sound collection unit has directional sound collection characteristics, and the first sound collection unit has less directional sound collection characteristics than the second sound collection unit.

（１５）第２集音部は、指向性の集音特性を有し、第２音声生成工程は、第２音声の音源の位置を検出し、検出された音源の方向に第２収音部を指向する、（１３）又は（１４）の映像生成方法。 (15) The image generation method of (13) or (14), wherein the second sound collection unit has directional sound collection characteristics, and the second sound generation process detects the position of a sound source of the second sound and directs the second sound collection unit in the direction of the detected sound source.

本発明に係る映像生成方法を用いて映像を生成する機能を備えた撮像装置の概略構成を示すブロック図FIG. 1 is a block diagram showing the schematic configuration of an imaging device having a function for generating an image using the image generation method according to the present invention. 映像及び音声を記録する場合にＣＰＵが実現する主な機能のブロック図Block diagram of the main functions realized by the CPU when recording video and audio 記録済みの映像を再生する場合にＣＰＵが実現する主な機能のブロック図Block diagram of the main functions realized by the CPU when playing back recorded video 音声付きの映像を生成する場合にＣＰＵが実現する主な機能のブロック図Block diagram of the main functions realized by the CPU when generating video with audio 第３音声生成部が有する機能のブロック図Block diagram of functions of a third sound generation unit 音声付きの映像を生成する場合にＣＰＵが実現する主な機能のブロック図Block diagram of the main functions realized by the CPU when generating video with audio 第３音声生成部が有する機能のブロック図Block diagram of functions of a third sound generation unit 第３の実施の形態の撮像装置の概略構成を示すブロック図FIG. 10 is a block diagram showing a schematic configuration of an imaging apparatus according to a third embodiment. 映像及び音声を記録する場合にＣＰＵが実現する主な機能のブロック図Block diagram of the main functions realized by the CPU when recording video and audio 第３の実施の形態の第３音声生成部が有する機能のブロック図10 is a block diagram of functions of a third voice generation unit according to a third embodiment; 第３の実施の形態の第３音声生成部の変形例を示す図FIG. 13 is a diagram illustrating a modification of the third voice generation unit of the third embodiment. 撮像情報を取得して記録する場合及び撮像情報を表示する場合にＣＰＵが実現する機能のブロック図1 is a block diagram of functions realized by the CPU when acquiring and recording image information and when displaying image information. マイクロフォンの情報を取得して記録する場合及びマイクロフォンの情報を表示する場合にＣＰＵが実現する機能のブロック図A block diagram of the functions realized by the CPU when acquiring and recording microphone information and when displaying microphone information. 第２音声が記録されたタイミングを検出して記録する場合及び記録された情報を表示する場合にＣＰＵが実現する機能のブロック図FIG. 10 is a block diagram of functions implemented by the CPU when detecting and recording the timing at which the second audio is recorded and when displaying the recorded information.

以下、添付図面に従って本発明の好ましい実施の形態について詳説する。 A preferred embodiment of the present invention will now be described in detail with reference to the accompanying drawings.

［第１の実施の形態］
図１は、本発明に係る映像生成方法を用いて映像を生成する機能を備えた撮像装置の概略構成を示すブロック図である。 [First embodiment]
FIG. 1 is a block diagram showing a schematic configuration of an imaging device having a function for generating an image using an image generation method according to the present invention.

本実施の形態の撮像装置１は、撮像に同期して第１音声及び第２音声を記録する。そして、撮像後に第２音声を用いて第１音声を処理することにより、第２音声が所定の強度（音声レベル）で含まれた第３音声を生成する。そして、生成した第３音声を撮像により得られた映像（第１映像）に関連付けて音声付きの映像（第２映像）を生成する。 The imaging device 1 of this embodiment records first and second audio in synchronization with imaging. Then, after imaging, the first audio is processed using the second audio to generate third audio containing the second audio at a predetermined intensity (audio level). The generated third audio is then associated with the video obtained by imaging (first video) to generate video with audio (second video).

図１に示すように、撮像装置１は、撮像部１０、第１音声入力部１２、第２音声入力部１４、表示部１６、記憶部１８、音声出力部２０、操作部２２、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２４、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２６、及び、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２８等を備える。撮像部１０は、映像を撮像する。撮像部１０は、撮像光学系１０Ａ、撮像素子１０Ｂ及び画像信号処理部１０Ｃ等を備える。撮像光学系１０Ａは、被写体の像を撮像素子１０Ｂの受光面上に結像させる。撮像素子１０Ｂは、撮像光学系１０Ａによってその受光面上に結像された被写体の像を電気信号に変換する。画像信号処理部１０Ｃは、撮像素子１０Ｂから出力される信号に所定の信号処理を施して、映像信号を生成する。 As shown in FIG. 1, the imaging device 1 includes an imaging unit 10, a first audio input unit 12, a second audio input unit 14, a display unit 16, a memory unit 18, an audio output unit 20, an operation unit 22, a CPU (Central Processing Unit) 24, a ROM (Read Only Memory) 26, and a RAM (Random Access Memory) 28. The imaging unit 10 captures video. The imaging unit 10 includes an imaging optical system 10A, an imaging element 10B, and an image signal processing unit 10C. The imaging optical system 10A forms an image of a subject on the light receiving surface of the imaging element 10B. The imaging element 10B converts the image of the subject formed on its light receiving surface by the imaging optical system 10A into an electrical signal. The image signal processing unit 10C performs predetermined signal processing on the signal output from the image sensor 10B to generate a video signal.

第１音声入力部１２は、メインとなる音声（第１音声）の入力部である。第１音声入力部１２は、第１マイクロフォン１２Ａ及び第１音声信号処理部１２Ｂを備える。第１マイクロフォン１２Ａは、メインの音声としての第１音声を集音する。この第１音声は、第２音声を含まない音声（わずかに第２音声が含まれている場合を含む）である。第１マイクロフォン１２Ａは、第１集音部の一例である。第１音声信号処理部１２Ｂは、第１マイクロフォン１２Ａからの信号に所定の信号処理を施して、第１音声の音声信号を生成する。 The first audio input unit 12 is an input unit for the main audio (first audio). The first audio input unit 12 includes a first microphone 12A and a first audio signal processing unit 12B. The first microphone 12A collects the first audio as the main audio. This first audio is audio that does not include the second audio (including cases where a small amount of the second audio is included). The first microphone 12A is an example of a first audio collection unit. The first audio signal processing unit 12B performs predetermined signal processing on the signal from the first microphone 12A to generate an audio signal for the first audio.

第２音声入力部１４は、メインの音声に合成する特定の音声の（第２音声）の入力部である。第２音声入力部１４は、第２マイクロフォン１４Ａ及び第２音声信号処理部１４Ｂを備える。第２マイクロフォン１４Ａは、特定の音声である第２音声を集音する。この第２音声は、第１音声を含まない音声（実質的に含んでいないと認められる場合を含む）である。第２マイクロフォン１４Ａは、第２集音部の一例である。第２音声信号処理部１４Ｂは、第２マイクロフォン１４Ａからの信号に所定の信号処理を施して、第２音声の音声信号を生成する。 The second audio input unit 14 is an input unit for a specific audio (second audio) to be synthesized with the main audio. The second audio input unit 14 includes a second microphone 14A and a second audio signal processing unit 14B. The second microphone 14A collects the second audio, which is a specific audio. This second audio is audio that does not include the first audio (including cases where it is deemed not to substantially include the first audio). The second microphone 14A is an example of a second audio collection unit. The second audio signal processing unit 14B performs predetermined signal processing on the signal from the second microphone 14A to generate an audio signal for the second audio.

表示部１６は、撮像部１０で撮像中の映像をリアルタイムに表示する。また、表示部１６は、再生された映像を表示する。また、表示部１６は、必要に応じて、操作画面、メニュー画面及びメッセージ等を表示する。表示部１６は、たとえば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等の表示デバイス、及び、その駆動回路等を含んで構成される。 The display unit 16 displays the video being captured by the imaging unit 10 in real time. It also displays the played-back video. It also displays operation screens, menu screens, messages, and the like as needed. The display unit 16 is configured to include, for example, a display device such as an LCD (Liquid Crystal Display), its drive circuitry, and the like.

記憶部１８は、主として、撮像された映像及び集音された音声を記憶する。記憶部１８は、たとえば、不揮発性メモリ等の記憶媒体、及び、その制御回路等を含んで構成される。 The memory unit 18 mainly stores captured images and collected audio. The memory unit 18 is configured to include, for example, a storage medium such as a non-volatile memory, its control circuit, etc.

音声出力部２０は、再生された音声を出力する。また、音声出力部２０は、必要に応じて警告音等を出力する。音声出力部２０は、スピーカ、及び、そのスピーカから出力させる音声の信号を処理する信号処理回路等を含んで構成される。 The audio output unit 20 outputs the reproduced audio. The audio output unit 20 also outputs warning sounds and the like as necessary. The audio output unit 20 includes a speaker and a signal processing circuit that processes the audio signal to be output from the speaker.

操作部２２は、ユーザからの操作の入力を受け付ける。操作部２２は、録画ボタン等の各種操作ボタン類、及び、その操作の検出回路等を含んで構成される。 The operation unit 22 accepts operation inputs from the user. The operation unit 22 is composed of various operation buttons such as a recording button, as well as circuits for detecting these operations.

ＣＰＵ２４は、所定の制御プログラムを実行することにより、装置全体の制御部として機能する。ＣＰＵ２４は、ユーザの操作に基づいて、各部の動作を制御し、装置全体の動作を統括制御する。また、ＣＰＵ２４は、所定のプログラムを実行することにより、記録済みの映像及び音声を用いて音声付きの映像を生成する映像生成装置として機能する。映像生成装置として機能するＣＰＵ２４は、ユーザの操作に基づいて、記録済みの映像及び音声を処理し、音声付きの映像を生成する。ＲＯＭ２６は、ＣＰＵ２４が実行する各種プログラム、及び、制御に必要なデータ等を記憶する。ＲＡＭ２８は、ＣＰＵ２４に作業用のメモリ空間を提供する。 The CPU 24 executes a predetermined control program to function as a control unit for the entire device. Based on user operations, the CPU 24 controls the operation of each component and provides overall control over the operation of the entire device. By executing a predetermined program, the CPU 24 also functions as an image generation device that generates images with audio using recorded images and audio. Functioning as an image generation device, the CPU 24 processes recorded images and audio based on user operations to generate images with audio. The ROM 26 stores various programs executed by the CPU 24, as well as data necessary for control. The RAM 28 provides working memory space for the CPU 24.

図２は、映像及び音声を記録する場合にＣＰＵが実現する主な機能のブロック図である。同図に示すように、ＣＰＵ２４は、撮像制御部１０１、映像出力部１０２、第１映像記録部１０３、第１音声記録部１０４及び第２音声記録部１０５等として機能する。 Figure 2 is a block diagram of the main functions realized by the CPU when recording video and audio. As shown in the figure, the CPU 24 functions as an imaging control unit 101, a video output unit 102, a first video recording unit 103, a first audio recording unit 104, and a second audio recording unit 105, among others.

撮像制御部１０１は、撮像部１０による撮像を制御する。撮像制御部１０１は、撮像部１０から得られる映像信号に基づいて、適正露出で映像が撮像されるように、撮像部１０を制御する。また、撮像制御部１０１は、撮像部１０から得られる映像信号に基づいて、主要被写体に焦点が合うように、撮像部１０を制御する。 The imaging control unit 101 controls imaging by the imaging unit 10. Based on the video signal obtained from the imaging unit 10, the imaging control unit 101 controls the imaging unit 10 so that video is captured with proper exposure. Based on the video signal obtained from the imaging unit 10, the imaging control unit 101 also controls the imaging unit 10 so that the main subject is in focus.

映像出力部１０２は、撮像部１０で撮像された映像をリアルタイムに表示部１６に出力する。これにより、表示部１６にライブビューが表示される。 The video output unit 102 outputs the video captured by the imaging unit 10 to the display unit 16 in real time. This allows a live view to be displayed on the display unit 16.

第１映像記録部１０３は、撮像部１０で撮像された映像（第１映像）を記憶部１８に記録する。第１映像記録部１０３は、ユーザからの指示に応じて、映像の記録を開始する。また、ユーザから指示に応じて、映像の記録を終了する。ユーザは、操作部２２を介して、記録の開始及び終了を指示する。映像（第１映像）は、その撮像に同期して集音された第１音声及び第２音声を関連付けて、記憶部１８に記録される。 The first video recording unit 103 records the video (first video) captured by the imaging unit 10 in the storage unit 18. The first video recording unit 103 starts recording the video in response to a command from the user. It also stops recording the video in response to a command from the user. The user issues commands to start and stop recording via the operation unit 22. The video (first video) is recorded in the storage unit 18 in association with the first audio and second audio collected in synchronization with the capture.

第１音声記録部１０４は、第１音声入力部１２から入力される第１音声（メインの音声）を第１映像の撮像に同期して記憶部１８に記録する。第１音声は、第１映像に関連付けて、記憶部１８に記録される。 The first audio recording unit 104 records the first audio (main audio) input from the first audio input unit 12 in the storage unit 18 in synchronization with the capture of the first video. The first audio is recorded in the storage unit 18 in association with the first video.

第２音声記録部１０５は、第２音声入力部１４から入力される第２音声（特定の音声）を第１映像の撮像に同期して記憶部１８に記録する。第２音声は、第１映像に関連付けて、記憶部１８に記録される。 The second audio recording unit 105 records the second audio (specific audio) input from the second audio input unit 14 in the storage unit 18 in synchronization with the capture of the first video. The second audio is recorded in the storage unit 18 in association with the first video.

図３は、記録済みの映像を再生する場合にＣＰＵが実現する主な機能のブロック図である。同図に示すように、ＣＰＵ２４は、映像再生部１１１及び音声再生部１１２等として機能する。 Figure 3 is a block diagram of the main functions implemented by the CPU when playing back recorded video. As shown in the figure, the CPU 24 functions as a video playback unit 111, an audio playback unit 112, etc.

映像再生部１１１は、ユーザからの再生指示に応じて、記憶部１８に記録された映像を表示部１６で再生する。ユーザは、表示部１６及び操作部２２を利用して、再生する映像を選択し、再生を指示する。映像再生部１１１は、選択された映像を記憶部１８から読み出して再生する。 The video playback unit 111 plays video recorded in the storage unit 18 on the display unit 16 in response to a playback instruction from the user. The user uses the display unit 16 and operation unit 22 to select the video to play and issue a playback instruction. The video playback unit 111 reads the selected video from the storage unit 18 and plays it.

音声再生部１１２は、映像に音声が関連付けられている場合に映像に同期させて音声を再生する。映像に第１音声及び第２音声が関連付けられている場合、音声再生部１１２は、第１音声及び第２音声を合成して再生する。再生された音声は、音声出力部２０から出力される。 When audio is associated with video, the audio playback unit 112 plays the audio in synchronization with the video. When first and second audio are associated with video, the audio playback unit 112 synthesizes and plays the first and second audio. The played audio is output from the audio output unit 20.

図４は、音声付きの映像を生成する場合にＣＰＵが実現する主な機能のブロック図である。同図に示すように、ＣＰＵ２４は、第１映像取得部１２１、第１音声取得部１２２、第２音声取得部１２３、第３音声生成部１２４、強度設定部１２５、映像生成部１２６及び第２映像記録部１２７等として機能する。 Figure 4 is a block diagram of the main functions realized by the CPU when generating video with audio. As shown in the figure, the CPU 24 functions as a first video acquisition unit 121, a first audio acquisition unit 122, a second audio acquisition unit 123, a third audio generation unit 124, an intensity setting unit 125, a video generation unit 126, and a second video recording unit 127, among others.

第１映像取得部１２１は、処理対象としてユーザに選択された映像（第１映像）を記憶部１８から読み出して取得する。ユーザは、表示部１６及び操作部２２を利用して、処理対象の映像を選択する。取得した映像のデータは、映像生成部１２６に加えられる。 The first image acquisition unit 121 reads and acquires the image (first image) selected by the user as the image to be processed from the storage unit 18. The user selects the image to be processed using the display unit 16 and operation unit 22. The acquired image data is added to the image generation unit 126.

第１音声取得部１２２は、処理対象として選択された映像に関連付けられた第１音声（メインの音声）のデータを記憶部１８から読み出して取得する。取得した第１音声のデータは、第３音声生成部１２４に加えられる。 The first audio acquisition unit 122 reads and acquires data of the first audio (main audio) associated with the video selected for processing from the storage unit 18. The acquired first audio data is added to the third audio generation unit 124.

第２音声取得部１２３は、処理対象として選択された映像に関連付けられた第２音声（特定の音声）を記憶部１８から読み出して取得する。取得した第２音声のデータは、第３音声生成部１２４に加えられる。なお、第１音声取得部１２２と第２音声取得部１２３は、記憶部１８を介さずに、第１音声入力部１２と第２音声入力部１４から対応する音声データを直接取得しても良い。また、撮像装置１は装置内部の記憶部１８ではなく、外部の記憶部に音声データを記録してもよい。この場合、第１音声取得部１２２と第２音声取得部１２３は外部の記憶部から音声データを取得してもよい。 The second audio acquisition unit 123 reads and acquires the second audio (specific audio) associated with the video selected for processing from the storage unit 18. The acquired second audio data is added to the third audio generation unit 124. Note that the first audio acquisition unit 122 and the second audio acquisition unit 123 may acquire the corresponding audio data directly from the first audio input unit 12 and the second audio input unit 14, without going through the storage unit 18. The imaging device 1 may also record audio data in an external storage unit rather than the internal storage unit 18. In this case, the first audio acquisition unit 122 and the second audio acquisition unit 123 may acquire the audio data from the external storage unit.

第３音声生成部１２４は、第２音声を用いて第１音声を処理し、第３音声を生成する。第３音声は、第１音声中に第２音声が所定の強度（音声レベル）で含まれた音声として生成される。所定の強度は、ユーザが設定した強度である。図５は、第３音声生成部が有する機能のブロック図である。同図に示すように、第３音声生成部１２４は、強度調整部１２４Ａ及び合成部１２４Ｂの機能を有する。強度調整部１２４Ａは、強度設定部１２５の設定に従って、第２音声の強度を調整する。合成部１２４Ｂは、強度の調整後の第２音声を第１音声に合成して第３音声を生成する。これにより、第１音声中に第２音声を所定の強度で含んだ音声（第３音声）が生成される。なお、上記のように、第１音声は映像に同期して記録されているので、生成される第３音声も映像に同期する音声となる。生成された第３音声のデータは、映像生成部１２６に加えられる。 The third audio generation unit 124 processes the first audio using the second audio to generate the third audio. The third audio is generated as audio in which the second audio is included in the first audio at a predetermined intensity (audio level). The predetermined intensity is set by the user. Figure 5 is a block diagram of the functions of the third audio generation unit. As shown in the figure, the third audio generation unit 124 has the functions of an intensity adjustment unit 124A and a synthesis unit 124B. The intensity adjustment unit 124A adjusts the intensity of the second audio according to the setting of the intensity setting unit 125. The synthesis unit 124B synthesizes the intensity-adjusted second audio with the first audio to generate the third audio. This generates audio (third audio) in which the second audio is included in the first audio at a predetermined intensity. Note that, as described above, the first audio is recorded in synchronization with the video, so the generated third audio is also synchronized with the video. The generated third audio data is input to the video generation unit 126.

強度設定部１２５は、第１音声に合成する際の第２音声の強度（音声レベル）を設定する。強度設定部１２５は、操作部２２からの操作入力に基づいて強度を設定する。ユーザは、操作部２２を介して第２音声の強度を設定することにより、第１音声に対して第２音声を強調したり、低減したりできる。 The intensity setting unit 125 sets the intensity (audio level) of the second audio when it is synthesized with the first audio. The intensity setting unit 125 sets the intensity based on operation input from the operation unit 22. By setting the intensity of the second audio via the operation unit 22, the user can emphasize or attenuate the second audio relative to the first audio.

映像生成部１２６は、第１映像取得部１２１で取得された映像（第１映像）と第３音声生成部１２４で生成された第３音声を関連付けて、音声付きの映像（第２映像）を生成する。たとえば、映像ファイル及び音声ファイルをコンテナ化し、所定の動画形式の映像ファイルを生成する。たとえば、ＡＶＩ（ＡｕｄｉｏＶｉｄｅｏＩｎｔｅｒｌｅａｖｅ）、ＭＰ４（ＭＰＥＧ－４Ｐａｒｔ１４（ＩＳＯ／ＩＥＣ１４４９６－１４：２００３、ＩＳＯ／ＩＥＣＪＴＣ１））等のファイルを生成する。 The video generation unit 126 associates the video (first video) acquired by the first video acquisition unit 121 with the third audio generated by the third audio generation unit 124 to generate video with audio (second video). For example, it containerizes the video file and audio file to generate a video file in a specific video format. For example, it generates files such as AVI (Audio Video Interleave) and MP4 (MPEG-4 Part 14 (ISO/IEC 14496-14:2003, ISO/IEC JTC 1)).

第２映像記録部１２７は、映像生成部１２６で生成された音声付きの映像（第２映像）を記憶部１８に記憶する。 The second video recording unit 127 stores the video with audio (second video) generated by the video generation unit 126 in the memory unit 18.

次に、上記構成の撮像装置１を用いて音声付きの映像を生成する場合の手順（映像生成方法）について説明する。 Next, we will explain the procedure (image generation method) for generating video with audio using the imaging device 1 configured as described above.

まず、撮像を実施し、映像、第１音声及び第２音声を記録する。具体的には、撮像部１０で撮像される映像（第１映像）を記憶部１８に記録する（映像記録工程）。また、その撮像に同期して、第１音声及び第２音声を集音し、記憶部１８に記録する（第１音声記録工程及び第２音声記録工程）。ここで、第１音声には、メインの音声を記録する。一方、第２音声には、特定の音声を記録する。ここでの「特定の音声」とは、メインの音声とは異なる音声であって、メインの音声に含ませる音声である。たとえば、風が吹いている環境で話をしている人物の映像を撮像する場合において、人物の話し声をメインの音声、風切り音（風がマイクロフォンに当たることで生じる音）を特定の音声として記録できる。あるいは、海岸で話をしている人物の映像を撮像する場合において、人物の話し声をメインの音声、波の音を特定の音声として記録できる。 First, imaging is performed, and the image, first audio, and second audio are recorded. Specifically, the image (first image) captured by the imaging unit 10 is recorded in the storage unit 18 (image recording process). Furthermore, in synchronization with the imaging, the first audio and second audio are collected and recorded in the storage unit 18 (first audio recording process and second audio recording process). Here, the main audio is recorded as the first audio. On the other hand, a specific audio is recorded as the second audio. Here, the "specific audio" refers to audio that is different from the main audio and is to be included in the main audio. For example, when capturing video of a person talking in a windy environment, the person's speaking voice can be recorded as the main audio, and wind noise (sound caused by wind hitting a microphone) can be recorded as the specific audio. Or, when capturing video of a person talking on the beach, the person's speaking voice can be recorded as the main audio, and the sound of waves can be recorded as the specific audio.

次に、第１音声に対して第２音声を合成する際の第２音声の強度（音声レベル）を設定する（強度設定工程）。ユーザは、操作部２２を介して、その強度を設定する。この設定により、ユーザは、第１音声に対して第２音声を任意に強調したり、低減したりできる。 Next, the intensity (audio level) of the second audio when synthesizing the second audio with the first audio is set (intensity setting process). The user sets the intensity via the operation unit 22. This setting allows the user to arbitrarily emphasize or attenuate the second audio relative to the first audio.

次に、第２音声を用いて第１音声を処理し、第１音声中に第２音声を所定の強度で含んだ第３音声を生成する（音声生成工程）。所定の強度は、上記強度設定工程で設定された強度である。この際、まず、第２音声をユーザによって設定された強度に調整する。これにより、第１音声に対して強調又は低減させた第２音声が生成される。そして、その強度調整後の第２音声を第１音声に合成して、第３音声を生成する。これにより、第１音声中に第２音声を所定の強度で含んだ音声（第３音声）が生成される。たとえば、風切り音を第２音声として記録した場合には、メインの音声中に風切り音が所定の強度で含まれた音声（第３音声）が生成される。また、たとえば、波の音を第２音声として記録した場合には、メインの音声中に波の音が所定の強度で含まれた音声（第３音声）が生成される。 Next, the first sound is processed using the second sound to generate a third sound that includes the second sound at a predetermined intensity within the first sound (sound generation process). The predetermined intensity is the intensity set in the intensity setting process. In this process, the second sound is first adjusted to the intensity set by the user. This generates a second sound that is emphasized or attenuated relative to the first sound. The intensity-adjusted second sound is then synthesized with the first sound to generate a third sound. This generates a sound (third sound) that includes the second sound at a predetermined intensity within the first sound. For example, if wind noise is recorded as the second sound, a sound (third sound) is generated in which the wind noise is included at a predetermined intensity within the main sound. Furthermore, for example, if the sound of waves is recorded as the second sound, a sound (third sound) is generated in which the sound of waves is included at a predetermined intensity within the main sound.

次に、撮像により得られた映像（第１映像）に上記音声生成工程で生成された音声（第３音声）を関連付け、音声付きの映像（第２映像）を生成する（映像生成工程）。 Next, the audio (third audio) generated in the audio generation process is associated with the video obtained by imaging (first video), and an audio-accompanied video (second video) is generated (video generation process).

以上一連の工程で音声付きの映像が生成される。生成された音声付きの映像は、記憶部１８に記録される。 Video with audio is generated through the above series of steps. The generated video with audio is recorded in the storage unit 18.

本実施の形態の映像生成方法によれば、メインの音声（第１音声）とは別に特定の音声（第２音声）を記録しておくことにより、特定の音声を切り分けて編集できる。これにより、ユーザの意図に沿った音声付きの映像を生成できる。 The video generation method of this embodiment allows specific audio (second audio) to be recorded separately from the main audio (first audio), allowing the specific audio to be isolated and edited. This allows video with audio to be generated in line with the user's intentions.

［第１の実施の形態の変形例］
（１）第２音声の合成についての変形例
第２音声については、映像の特定の区間（時間軸上における区間）でのみ合成する構成とすることができる。この場合、合成する区間を指定して、第１音声に第２音声を合成する。区間の指定は、たとえば、第１映像及び第１音声を再生しながら、合成する区間を指定する。 [Modification of the first embodiment]
(1) Modification of synthesis of second audio The second audio can be synthesized only in a specific section of the video (a section on the time axis). In this case, the section to be synthesized is specified, and the second audio is synthesized with the first audio. The section to be synthesized is specified, for example, while the first video and the first audio are being played back.

また、合成する際の第２音声の強度の設定を時間軸に沿って部分的に変化させることもできる。これにより、たとえば、シーンに応じて、特定の音声の強度を変えた音声付きの映像を生成できる。 The intensity of the second audio when synthesized can also be changed partially along the time axis. This makes it possible to generate video with audio in which the intensity of specific audio is changed depending on the scene, for example.

（２）強度の設定について変形例
上記実施の形態では、第１音声に合成する際の第２音声の強度をユーザが任意に設定できるようにしているが、あらかじめ定められた複数の強度設定（例えば、強めに低減、弱めに低減、強めに強調、弱めに強調）から選択した強度で合成する構成とすることもできる。 (2) Variations on Intensity Setting In the above embodiment, the user can arbitrarily set the intensity of the second voice when it is synthesized with the first voice, but it is also possible to configure the synthesis to be performed at an intensity selected from a plurality of predetermined intensity settings (for example, strong reduction, weak reduction, strong emphasis, weak emphasis).

（３）第２音声の記録についての変形例
第２音声については、必ずしも第１映像の撮像に同期して記録する必要はない。たとえば、上記のように、特定の区間でのみ第２音声を合成する場合には、事前に又は事後的に第２音声を記録する構成としてもよい。 (3) Modifications for Recording the Second Audio The second audio does not necessarily need to be recorded in synchronization with the capture of the first video. For example, when the second audio is to be synthesized only in a specific section as described above, the second audio may be recorded in advance or afterward.

（４）第１音声入力部及び第２音声入力部の変形例
第１音声入力部１２及び第２音声入力部１４は、必要に応じて、集音された音声の音声信号にフィルタリング処理を施すことが好ましい。たとえば、第１音声入力部１２は、メインとする音声がクリアに記録されるように、フィルタリング処理を施すことが好ましい。同様に、第２音声入力部１４は、特定の音声がクリアに記録されるように、フィルタリング処理を施すことが好ましい。 (4) Modifications of the First Audio Input Unit and the Second Audio Input Unit The first audio input unit 12 and the second audio input unit 14 preferably perform filtering on the audio signal of the collected audio as needed. For example, the first audio input unit 12 preferably performs filtering so that the main audio is recorded clearly. Similarly, the second audio input unit 14 preferably performs filtering so that a specific audio is recorded clearly.

また、第１音声入力部１２及び第２音声入力部１４は、目的に応じたマイクロフォンを使用することが好ましい。たとえば、メインの音声として、広域の音声を収集する場合には、第１マイクロフォン１２Ａに、第２マイクロフォン１４Ａよりも低い無指向性（好ましくは無指向性）の集音特性を有するマイクロフォンを使用する。また、特定の音声を集音する第２マイクロフォン１４Ａには、指向性の集音特性を有するマイクロフォン（たとえば、ガンマイク等）を使用する。これにより、第１音声及び第２音声を精度よく記録できる。 It is also preferable to use microphones appropriate for the purpose for the first audio input unit 12 and the second audio input unit 14. For example, if a wide-range sound is to be collected as the main sound, a microphone with less omnidirectional (preferably omnidirectional) sound collection characteristics than the second microphone 14A is used for the first microphone 12A. Furthermore, a microphone with directional sound collection characteristics (such as a gun microphone) is used for the second microphone 14A, which collects specific sounds. This allows the first sound and second sound to be recorded with high accuracy.

なお、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａは、撮像装置１の装置本体に内蔵された構成でもよいし、外付けされた構成でもよい。 The first microphone 12A and the second microphone 14A may be built into the main body of the imaging device 1, or may be externally attached.

［第２の実施の形態］
上記第１の実施の形態と同様に、撮像装置を用いて映像を生成する場合を例に説明する。 Second Embodiment
As in the first embodiment, an example in which an image is generated using an imaging device will be described.

本実施の形態では、第２音声を含んだ音声を第１音声として記録する。そして、その第１音声とは別に記録した第２音声を用いて、第２音声を強調又は低減させる処理を第１音声に行い、第３音声を生成する。撮像装置の基本構成は、上記第１の実施の形態と同じであり、ＣＰＵ２４によって実現される機能が相違する。 In this embodiment, audio including a second audio is recorded as a first audio. Then, using the second audio recorded separately from the first audio, processing is performed on the first audio to emphasize or reduce the second audio, thereby generating a third audio. The basic configuration of the imaging device is the same as in the first embodiment described above, but the functions realized by the CPU 24 are different.

図６は、音声付きの映像を生成する場合にＣＰＵが実現する主な機能のブロック図である。同図に示すように、ＣＰＵ２４は、第１映像取得部１２１、第１音声取得部１２２、第２音声取得部１２３、第３音声生成部１２４、処理条件設定部１２８、映像生成部１２６及び第２映像記録部１２７等として機能する。第３音声生成部１２４及び処理条件設定部１２８を除く各部の機能は、上記第１の実施の形態と実質的に同じである。したがって、ここでは、第３音声生成部１２４及び処理条件設定部１２８の機能についてのみ説明する。 Figure 6 is a block diagram of the main functions realized by the CPU when generating video with audio. As shown in the figure, the CPU 24 functions as a first video acquisition unit 121, a first audio acquisition unit 122, a second audio acquisition unit 123, a third audio generation unit 124, a processing condition setting unit 128, a video generation unit 126, and a second video recording unit 127, among others. The functions of each unit except for the third audio generation unit 124 and the processing condition setting unit 128 are substantially the same as those in the first embodiment described above. Therefore, only the functions of the third audio generation unit 124 and the processing condition setting unit 128 will be described here.

図７は、第３音声生成部が有する機能のブロック図である。 Figure 7 is a block diagram of the functions of the third voice generation unit.

上記のように、本実施の形態では第２音声を含んだ音声が第１音声として記録される。第３音声生成部１２４は、第２音声と共通する音声成分（共通成分）を強調又は低減させる処理を第１音声に行って、第３音声を生成する。具体的には、第２音声と同じ周波数の音声成分を共通成分とし、その第２音声と同じ周波数の音声成分をユーザによって設定された処理条件で処理して、第３音声を生成する。このため、第３音声生成部１２４は、周波数検出部１２４Ｃ及び音声処理部１２４Ｄの機能を有する。 As described above, in this embodiment, audio including the second audio is recorded as the first audio. The third audio generation unit 124 processes the first audio to emphasize or reduce audio components that are common to the second audio (common components), thereby generating the third audio. Specifically, audio components with the same frequency as the second audio are considered common components, and these audio components with the same frequency as the second audio are processed under processing conditions set by the user to generate the third audio. For this reason, the third audio generation unit 124 has the functions of a frequency detection unit 124C and an audio processing unit 124D.

周波数検出部１２４Ｃは、第２音声のデータを解析して、第２音声の周波数を検出する。第２音声は、第１音声中の特定の音声であり、ユーザが強調又は低減させることを希望する音声である。上記第１の実施の形態と同様に、第２音声は第２マイクロフォン１４Ａで集音される。周波数検出部１２４Ｃで検出された情報は、音声処理部１２４Ｄに加えられる。 The frequency detection unit 124C analyzes the data of the second audio and detects the frequency of the second audio. The second audio is a specific audio component of the first audio component that the user wishes to emphasize or reduce. As in the first embodiment, the second audio component is collected by the second microphone 14A. The information detected by the frequency detection unit 124C is sent to the audio processing unit 124D.

音声処理部１２４Ｄは、周波数検出部１２４Ｃで検出される第２音声の周波数の情報を取得し、処理条件設定部１２８で設定された処理条件で第１音声を処理して、第３音声を生成する。すなわち、第２音声と同じ周波数の音声成分（共通成分）をユーザによって設定された処理条件で処理して、第３音声を生成する。 The audio processing unit 124D acquires information about the frequency of the second audio detected by the frequency detection unit 124C, and processes the first audio using the processing conditions set by the processing condition setting unit 128 to generate the third audio. In other words, the audio components (common components) with the same frequency as the second audio are processed using the processing conditions set by the user to generate the third audio.

処理条件設定部１２８は、第１音声を処理する際の処理条件を設定する。具体的には、第２音声と共通する音声成分である共通成分の処理条件（音の強調又は低減処理）を設定する。処理条件設定部１２８は、操作部２２からの操作入力に基づいて処理条件を設定する。ユーザは、操作部２２を介して第１音声を処理する際の処理条件を設定することにより、第１音声中に含まれる第２音声を強調したり、低減したり、キャンセルしたりできる。 The processing condition setting unit 128 sets the processing conditions when processing the first audio. Specifically, it sets the processing conditions (sound emphasis or reduction processing) for the common components, which are audio components that are common to the second audio. The processing condition setting unit 128 sets the processing conditions based on operation input from the operation unit 22. By setting the processing conditions when processing the first audio via the operation unit 22, the user can emphasize, reduce, or cancel the second audio included in the first audio.

次に、上記構成の撮像装置を用いて音声付きの映像を生成する場合の手順（映像生成方法）について説明する。 Next, we will explain the procedure (image generation method) for generating video with audio using the imaging device configured as described above.

まず、撮像を実施し、映像、第１音声及び第２音声を記録する。具体的には、撮像部１０で撮像される映像（第１映像）を記憶部１８に記録する（映像記録工程）。また、その撮像に同期して、第１音声及び第２音声を集音し、記憶部１８に記録する（第１音声記録工程及び第２音声記録工程）。上記のように、第１音声には、第２音声を含んだ音声を記録する。すなわち、第２音声と共通する音声成分である共通成分を含んだ音声を記録する。一方、第２音声には、第１音声中の特定の音声を記録する。ここでの「特定の音声」とは、第１音声中に含まれる音声の中でユーザが強調させたり、低減させたりすることを希望する音声である。たとえば、風が吹いている環境で話をしている人物の映像を撮像する場合において、風切り音を第２音声として記録できる。あるいは、海岸で話をしている人物の映像を撮像する場合において、波の音を第２音声として記録できる。 First, imaging is performed, and the image, first audio, and second audio are recorded. Specifically, the image (first image) captured by the imaging unit 10 is recorded in the storage unit 18 (image recording process). Furthermore, in synchronization with the imaging, the first audio and second audio are collected and recorded in the storage unit 18 (first audio recording process and second audio recording process). As described above, audio including the second audio is recorded as the first audio. That is, audio including a common component, which is an audio component common to the second audio, is recorded. Meanwhile, a specific audio component within the first audio is recorded as the second audio. Here, "specific audio" refers to audio included in the first audio that the user wishes to emphasize or reduce. For example, when capturing video of a person talking in a windy environment, wind noise can be recorded as the second audio. Or, when capturing video of a person talking on the beach, the sound of waves can be recorded as the second audio.

次に、第１音声を処理する際の第１音声の共通成分の処理条件設定する（処理条件設定工程）。ユーザは、操作部２２を介して、その処理条件を設定する。この設定により、ユーザは、第１音声に含まれている第２音声を任意に強調したり、低減したり、キャンセルしたりできる。 Next, processing conditions for the common components of the first audio when processing the first audio are set (processing condition setting process). The user sets these processing conditions via the operation unit 22. This setting allows the user to arbitrarily emphasize, reduce, or cancel the second audio included in the first audio.

次に、第２音声を用いて、共通成分を強調又は低減させる処理を第１音声に行い、第３音声を生成する（音声生成工程）。この工程では、まず、第２音声の周波数を検出する。そして、その周波数の音声成分を上記処理条件設定工程で設定された処理条件に従って強調又は低減させる処理を第１音声に行い、第３音声を生成する。これにより、第１音声に含まれる第２音声をユーザの意図した強度で含んだ、又は、キャンセルされた音声（第３音声）が生成される。たとえば、風が吹いている環境で第１音声を記録し、風切り音を第２音声として記録した場合には、風切り音を低減又はキャンセルさせた音声を生成できる。また、必要に応じて風切り音を強調した音声を生成できる。 Next, the first sound is processed using the second sound to emphasize or reduce the common components, thereby generating a third sound (sound generation process). In this process, the frequency of the second sound is first detected. Then, the first sound is processed to emphasize or reduce the sound components of that frequency according to the processing conditions set in the processing condition setting process, thereby generating a third sound. This generates a sound (third sound) that includes the second sound contained in the first sound at the intensity intended by the user, or that has been canceled. For example, if the first sound is recorded in a windy environment and wind noise is recorded as the second sound, it is possible to generate a sound in which the wind noise has been reduced or canceled. It is also possible to generate a sound in which wind noise has been emphasized as necessary.

本実施の形態の映像生成方法によれば、メインの音声（第１音声）とは別に、そのメインの音声に含まれる特定の音声（第２音声）を記録しておくことにより、特定の音声を切り分けて編集できる。これにより、ユーザの意図に沿った音声付きの映像を生成できる。 The video generation method of this embodiment allows you to record a specific audio (second audio) included in the main audio separately from the main audio (first audio), allowing you to isolate and edit the specific audio. This allows you to generate video with audio according to the user's intentions.

［第２の実施の形態の変形例］
（１）第３音声の生成についての変形例
共通成分を強調したり、低減したり、キャンセルしたりする処理は、映像の特定の区間でのみ実施することができる。この場合、区間を指定して処理を実施する。 [Modification of the second embodiment]
(1) Modification of the Generation of the Third Audio The process of emphasizing, reducing, or canceling the common component can be performed only in a specific section of the video. In this case, the process is performed by specifying the section.

また、共通成分の処理条件は、時間軸に沿って部分的に変化させることもできる。これにより、たとえば、シーンに応じて、特定の音声の強度を変えた音声付きの映像を生成できる。 In addition, the processing conditions for the common components can be partially varied along the time axis. This makes it possible, for example, to generate video with audio in which the intensity of specific sounds is changed depending on the scene.

（２）第２音声の記録についての変形例
第２音声については、必ずしも第１映像の撮像に同期して記録する必要はない。事前に又は事後的に第２音声を記録する構成としてもよい。たとえば、事前に編集したい環境の音声（風切り音、滝の音、工事の音など）を第２音声として記録しておくことができる。また、事前に編集をしたい環境の音声をサンプル音声として録音をしておき、映像記録工程中に、録音とサンプル音声の共通成分から第２音声を作成してもよい。 (2) Modifications for Recording the Second Audio The second audio does not necessarily need to be recorded in synchronization with the capture of the first video. The second audio may be recorded in advance or afterward. For example, the audio of the environment to be edited (such as wind noise, the sound of a waterfall, or the sound of construction work) may be recorded in advance as the second audio. Alternatively, the audio of the environment to be edited may be recorded in advance as a sample audio, and the second audio may be created from the common components of the recorded audio and the sample audio during the video recording process.

また、第２音声については、代表的なものをプリセットデータとして、あらかじめ撮像装置に保持しておくこともできる。これにより、たとえば、プリセットデータとして保持した音声を含む音声を映像と共に記録した場合に、その音声を強調したり、低減したりした音声を生成できる。たとえば、風切り音をプリセットデータとして保持している場合、そのプリセットデータが持つ風切り音の周波数データを利用して、第１音声の風切り音を強調したり、低減したりした音声付きの映像を生成できる。プリセットデータは、たとえば、ＲＯＭ２６又は記憶部１８に記録しておく。ユーザは、映像を生成する際に、編集対象の音声のデータを選択する。 In addition, representative second audio can be stored in advance as preset data in the imaging device. This makes it possible to generate audio that emphasizes or reduces audio when audio containing audio stored as preset data is recorded together with video. For example, if wind noise is stored as preset data, video with audio that emphasizes or reduces the wind noise of the first audio can be generated using the wind noise frequency data contained in that preset data. The preset data is stored, for example, in ROM 26 or memory unit 18. When generating video, the user selects the audio data to be edited.

（３）第１音声入力部及び第２音声入力部の変形例
上記第１の実施の形態と同様に、第１音声入力部１２及び第２音声入力部１４は、必要に応じて、集音された音声の音声信号にフィルタリング処理を施すことが好ましい。また、第１音声入力部１２及び第２音声入力部１４は、目的に応じたマイクロフォンを使用することが好ましい。 (3) Modifications of the First and Second Audio Input Units As in the first embodiment, the first and second audio input units 12, 14 preferably perform filtering on the audio signal of the collected audio as needed. Also, the first and second audio input units 12, 14 preferably use microphones suited to the purpose.

なお、第２音声を撮像に同期して記録する場合を除き、第２音声は、第１マイクロフォン１２Ａで集音する構成とすることもできる。すなわち、事前に又は事後的に第２音声を記録する場合には、第１音声入力部１２を利用して、第２音声を集音し、記録することができる。したがって、事前に又は事後的に第２音声を記録する場合、装置本体に第２音声入力部１４は不要である。 Incidentally, except when the second audio is recorded in synchronization with the image capture, the second audio can also be configured to be collected by the first microphone 12A. In other words, when recording the second audio beforehand or afterward, the first audio input unit 12 can be used to collect and record the second audio. Therefore, when recording the second audio beforehand or afterward, the second audio input unit 14 is not required on the device main body.

［第３の実施の形態］
本実施の形態では、映像及び音声の記録中に撮像装置本体の動きを検出し、その動きの情報を加味して、第３音声を生成する。 [Third embodiment]
In this embodiment, the movement of the imaging device body is detected while recording video and audio, and the third audio is generated taking into account information about that movement.

図８は、本実施の形態の撮像装置の概略構成を示すブロック図である。同図に示すように、本実施の形態の撮像装置１は、動き検出部３０を更に備えている点で上記第１及び第２の実施の形態の撮像装置と相違する。 Figure 8 is a block diagram showing the general configuration of an imaging device according to this embodiment. As shown in the figure, the imaging device 1 according to this embodiment differs from the imaging devices according to the first and second embodiments in that it further includes a motion detection unit 30.

動き検出部３０は、撮像部１０を含む撮像装置本体の動きを検出する。動き検出部３０は、撮像部１０による撮像に同期して撮像装置本体の動きを検出する。すなわち、撮像の開始と同時に動きの検出を開始し、撮像終了と同時に検出を終了する（動き検出工程）。動き検出部３０は、たとえば、加速度センサ等で構成される。なお、撮像装置本体が像振れ補正機能等を備えている場合には、振れ検出等に使用するセンサを動き検出用のセンサに使用することができる。 The motion detection unit 30 detects the motion of the imaging device body, including the imaging unit 10. The motion detection unit 30 detects the motion of the imaging device body in synchronization with the imaging by the imaging unit 10. That is, motion detection begins simultaneously with the start of imaging and ends simultaneously with the end of imaging (motion detection process). The motion detection unit 30 is composed of, for example, an acceleration sensor. Note that if the imaging device body is equipped with an image shake correction function or the like, the sensor used for shake detection, etc., can also be used as the motion detection sensor.

図９は、映像及び音声を記録する場合にＣＰＵが実現する主な機能のブロック図である。同図に示すように、ＣＰＵ２４は、撮像制御部１０１、映像出力部１０２、第１映像記録部１０３、第１音声記録部１０４、第２音声記録部１０５、動き記録部１０６等として機能する。動き記録部１０６を除く各部の機能は、上記第１の実施の形態と実質的に同じである。 Figure 9 is a block diagram of the main functions realized by the CPU when recording video and audio. As shown in the figure, the CPU 24 functions as an imaging control unit 101, a video output unit 102, a first video recording unit 103, a first audio recording unit 104, a second audio recording unit 105, a motion recording unit 106, and the like. The functions of each unit, except for the motion recording unit 106, are substantially the same as those in the first embodiment described above.

動き記録部１０６は、動き検出部３０で検出される撮像装置本体の動きの情報を第１映像の撮像に同期して記憶部１８に記録する。動きの情報は、第１映像に関連付けて、記憶部１８に記録される。音声付きの映像（第２映像）を生成する場合は、この記憶部１８に記憶された動きの情報を利用して、映像に関連付ける音声（第３音声）を生成する。 The motion recording unit 106 records information about the motion of the imaging device body detected by the motion detection unit 30 in the storage unit 18 in synchronization with the capture of the first video. The motion information is associated with the first video and recorded in the storage unit 18. When generating video with audio (second video), the motion information stored in the storage unit 18 is used to generate audio (third audio) to be associated with the video.

第３音声生成部１２４は、動きの情報を加味して、第３音声を生成する。ここでは、第１音声に第２音声を合成して第３音声を生成する場合を例に説明する。上記第１の実施の形態で説明したように、第１音声は、実質的に第２音声を含まない音声であり、第２音声は、実質的に第１音声を含まない音声である。 The third sound generation unit 124 generates the third sound by taking into account movement information. Here, we will explain an example in which the third sound is generated by synthesizing the second sound with the first sound. As explained in the first embodiment above, the first sound is a sound that does not substantially include the second sound, and the second sound is a sound that does not substantially include the first sound.

図１０は、本実施の形態の第３音声生成部が有する機能のブロック図である。同図に示すように、本実施の形態の第３音声生成部１２４は、第２音声処理部１２４Ｅ、強度調整部１２４Ａ及び合成部１２４Ｂの機能を有する。第２音声処理部１２４Ｅを除く各部の機能は、上記第１の実施の形態と実質的に同じである。 Figure 10 is a block diagram of the functions of the third sound generation unit of this embodiment. As shown in the figure, the third sound generation unit 124 of this embodiment has the functions of a second sound processing unit 124E, an intensity adjustment unit 124A, and a synthesis unit 124B. The functions of each unit except for the second sound processing unit 124E are substantially the same as those in the first embodiment described above.

第２音声処理部１２４Ｅは、映像及び音声を記録した際の撮像装置本体の動きの情報を取得し、その動きの情報に基づいて、第２音声を処理する。具体的には、撮像装置本体の動きに応じて、あらかじめ定められた処理条件に従って第２音声を処理する。一例として、メインの音声（第１音声）に風切り音（第２音声）を含めた音声（第３音声）を生成する場合を例に説明する。第２マイクロフォン１４Ａが左右一組のマイクロフォンで構成され、撮像装置本体に一体的に備えられているとする。したがって、この場合、第２マイクロフォン１４Ａは、撮像装置本体と一体的に動く。撮像装置本体をパンニングする動きが検出された場合、第２音声処理部１２４Ｅは、その撮像装置本体の動きに応じて、左右の音声の強度を変える処理を行う。具体的には、動かす側の音声を弱める。これにより、撮像装置本体の動きに応じて左右のマイクロフォンで変化する風切り音を適切に処理できる。すなわち、動かす側のマイクロフォンは、風切り音が強くなるので、動きに応じて弱めることにより、左右でバランスの取れた音声を合成できる。 The second audio processing unit 124E acquires information about the movement of the imaging device body when recording video and audio, and processes the second audio based on that movement information. Specifically, it processes the second audio according to predetermined processing conditions in response to the movement of the imaging device body. As an example, we will explain the case where audio (third audio) is generated that includes wind noise (second audio) in addition to the main audio (first audio). Assume that the second microphone 14A is composed of a pair of left and right microphones and is integrated into the imaging device body. Therefore, in this case, the second microphone 14A moves integrally with the imaging device body. When panning of the imaging device body is detected, the second audio processing unit 124E performs processing to change the intensity of the left and right audio in response to the movement of the imaging device body. Specifically, it weakens the audio from the moving microphone. This allows for appropriate processing of wind noise, which changes between the left and right microphones in response to the movement of the imaging device body. In other words, since wind noise is stronger in the moving microphone, weakening the noise in response to the movement allows for balanced left and right audio to be synthesized.

強度調整部１２４Ａは、強度設定部１２５の設定に従って、処理後の第２音声の強度を調整する。合成部１２４Ｂは、強度の調整後の第２音声を第１音声に合成して第３音声を生成する。 The intensity adjustment unit 124A adjusts the intensity of the processed second audio according to the setting of the intensity setting unit 125. The synthesis unit 124B synthesizes the intensity-adjusted second audio with the first audio to generate a third audio.

このように、本実施の形態では、撮像装置本体の動きに応じて第２音声を自動で処理して、第３音声を生成する。これにより、撮像装置本体の動きの影響を自動で除去できる。 In this way, in this embodiment, the second audio is automatically processed in accordance with the movement of the imaging device body to generate the third audio. This makes it possible to automatically remove the effects of the movement of the imaging device body.

［第３の実施の形態の変形例］
以下においては、第２音声を用いて、共通成分を強調又は低減させる処理を第１音声に行って、第３音声を生成する場合において、撮像装置本体の動きの情報を加味して、第３音声を生成する場合について説明する。 [Modification of the third embodiment]
Below, we will explain the case where a second audio is used to perform processing on a first audio to emphasize or reduce common components, thereby generating a third audio, and where information about the movement of the imaging device itself is added to generate the third audio.

図１１は、本例の第３音声生成部が有する機能のブロック図である。同図に示すように、本例の第３音声生成部１２４は、音声処理部１２４Ｄが、撮像装置本体の動きの情報に基づいて、第１音声を処理する点で上記第２の実施の形態の第３音声生成部１２４と相違する。 Figure 11 is a block diagram of the functions of the third sound generation unit in this example. As shown in the figure, the third sound generation unit 124 in this example differs from the third sound generation unit 124 in the second embodiment in that the sound processing unit 124D processes the first sound based on information about the movement of the imaging device body.

音声処理部１２４Ｄは、撮像装置本体の動きに応じて、あらかじめ定められた処理条件に従って第１音声を処理する。一例として、メインの音声に風切り音（第２音声）を含む音声（第１音声）を撮像に同期して記録した場合において、風切り音を強調又は低減させた音声（第３音声）を生成する場合を例に説明する。この場合、第１音声には、メインの音声の他に風切り音（第２音声）が含まれた音声が記録される。第２音声には、風切り音が記録される。 The audio processing unit 124D processes the first audio in accordance with predetermined processing conditions in response to the movement of the imaging device body. As an example, we will explain a case where audio (first audio) that includes wind noise (second audio) in the main audio is recorded in synchronization with imaging, and audio (third audio) is generated in which the wind noise is emphasized or reduced. In this case, the first audio is recorded as audio that includes wind noise (second audio) in addition to the main audio. The second audio is recorded as the wind noise.

周波数検出部１２４Ｃは、第２音声のデータを解析して、第２音声の周波数を検出する。 The frequency detection unit 124C analyzes the data of the second audio and detects the frequency of the second audio.

そして、ユーザは予め定められた複数の音声の強度設定から一つ強度の設定を選択する。音声処理部１２４Ｄは、周波数検出部１２４Ｃで検出される第２音声の周波数の情報及び撮像装置本体の動きの情報を取得し、処理条件設定部１２８で設定された処理条件と動きの情報とを組み合わせて第１音声を処理して、第３音声を生成する。たとえば、メインの音声（第１音声）に風切り音（第２音声）が含まれて記録される環境において、移動しながら映像及び音声を記録する場合を考える。風切り音（第２音声）の大きさは、移動する速度に応じて変化する。よって、音声処理部１２４Ｄは、撮像装置本体の移動速度（動き）に応じて、予め設定された強度設定を補正したり、処理対象の周波数（共通成分の周波数）を変える。これにより、移動しながら映像及び音声を記録した場合であっても、対象とする音声（第２音声）を適切に処理できる。例えば、ユーザの設定が風切り音（第２音声）を少し低減させる設定だった場合に、撮像装置本体の移動速度が速いと認定されたシーンは、他のシーンと比べて風切り音を大きく低減させる補正をかけて第３音声とする。これによって、第３音声の特定のシーンだけ、特定の音（風や波等）が大きくなり過ぎることが防止できる。 The user then selects one intensity setting from multiple predefined audio intensity settings. The audio processing unit 124D acquires information on the frequency of the second audio detected by the frequency detection unit 124C and information on the movement of the imaging device body, and processes the first audio by combining the processing conditions set by the processing condition setting unit 128 with the movement information to generate the third audio. For example, consider a case where video and audio are recorded while moving in an environment where wind noise (second audio) is recorded along with the main audio (first audio). The volume of the wind noise (second audio) changes depending on the speed of movement. Therefore, the audio processing unit 124D corrects the predefined intensity setting or changes the frequency to be processed (the frequency of the common component) depending on the speed of movement (movement) of the imaging device body. This allows the target audio (second audio) to be processed appropriately, even when video and audio are recorded while moving. For example, if the user's setting is to slightly reduce wind noise (second audio), a scene in which the imaging device body is recognized as moving fast will have a correction applied to it that significantly reduces wind noise compared to other scenes, resulting in the third audio. This prevents certain sounds (wind, waves, etc.) from becoming too loud in only certain scenes in the third audio.

［第４の実施の形態］
本実施の形態では、映像（第１映像）を撮像する際に、その映像の撮像情報を取得して、記憶部１８に記録しておく。記録した撮像情報は、音声付きの映像（第２映像）を生成する際に使用される。具体的には、音声付きの映像を生成する際に、表示部１６に表示される。ユーザは、表示部１６に表示された情報を利用して、音声の編集を行う区間（シーン）を特定する。ここで、撮像情報とは、映像の撮像に関する情報である。たとえば、映像を撮像した際の焦点距離の情報、被写体距離の情報、露出の情報等が含まれる。また、撮像装置が、撮像装置本体の動き等を検出する機能を備えている場合には、撮像中の撮像装置本体の動きの情報等も含まれる。 [Fourth embodiment]
In this embodiment, when a video (first video) is captured, imaging information of the video is acquired and recorded in storage unit 18. The recorded imaging information is used when generating a video with audio (second video). Specifically, when generating a video with audio, the imaging information is displayed on display unit 16. The user uses the information displayed on display unit 16 to identify the section (scene) for which audio editing is to be performed. Here, imaging information is information related to the capture of the video. For example, it includes information on the focal length when the video was captured, information on the subject distance, information on exposure, etc. Furthermore, if the imaging device has a function for detecting the movement of the imaging device itself, information on the movement of the imaging device itself during imaging is also included.

撮像情報の取得、記録及び表示等の処理はＣＰＵ２４が行う。図１２は、撮像情報を取得して記録する場合及び撮像情報を表示する場合にＣＰＵが実現する機能のブロック図である。同図に示すように、ＣＰＵ２４は、撮像情報取得部１３１、撮像情報記録部１３２及び撮像情報表示部１３３として機能する。 Processing such as acquiring, recording, and displaying imaging information is performed by the CPU 24. Figure 12 is a block diagram of the functions realized by the CPU when acquiring and recording imaging information and when displaying imaging information. As shown in the figure, the CPU 24 functions as an imaging information acquisition unit 131, an imaging information recording unit 132, and an imaging information display unit 133.

撮像情報取得部１３１は、撮像部１０による第１映像の撮像に同期して、第１映像の撮像情報を取得する。撮像情報としては、被写体距離の情報、焦点距離の情報、撮像装置の動き（たとえば、加速度センサの出力）の情報等を取得する。 The imaging information acquisition unit 131 acquires imaging information of the first video in synchronization with the imaging unit 10 capturing the first video. The imaging information acquired includes information on subject distance, focal length, and information on the movement of the imaging device (for example, the output of an acceleration sensor).

撮像情報記録部１３２は、撮像情報取得部１３１で取得した撮像情報を記憶部１８に記録する。撮像情報は、映像（第１映像）に関連付けて記録される。 The imaging information recording unit 132 records the imaging information acquired by the imaging information acquisition unit 131 in the storage unit 18. The imaging information is recorded in association with the video (first video).

撮像情報表示部１３３は、操作部２２からの操作入力に基づいて、記憶部１８から撮像情報を取得し、表示部１６に表示する。たとえば、時系列に沿って撮像情報を表示する。 The imaging information display unit 133 acquires imaging information from the memory unit 18 based on operation input from the operation unit 22 and displays it on the display unit 16. For example, it displays the imaging information in chronological order.

本実施の形態の撮像装置を用いた音節付き映像の生成は、たとえば、次のように行われる。 Video with syllables can be generated using the imaging device of this embodiment, for example, as follows:

まず、撮像部１０によって第１映像が撮像され、記憶部１８に記録される（映像記録工程）。撮像に同期して第１映像の撮像情報が取得され（第１情報取得工程）、第１映像に関連付けて記憶部１８に記録される。また、撮像に同期して第１音声及び第２音声が集音され、第１映像に関連付けて記憶部１８に記録される（第１音声記録工程及び第２音声記録工程）。 First, a first video is captured by the imaging unit 10 and recorded in the storage unit 18 (video recording process). Imaging information about the first video is acquired in synchronization with the imaging (first information acquisition process), and is recorded in association with the first video in the storage unit 18. Also, a first sound and a second sound are collected in synchronization with the imaging, and are recorded in association with the first video in the storage unit 18 (first sound recording process and second sound recording process).

次に、記録された第１映像、第１音声及び第２音声を用いて、音声付きの映像（第２映像）が生成される。まず、第１映像の撮像情報が記憶部１８から読み出され、表示部１６に表示される（第１表示工程）。撮像情報は、第１映像の時系列に沿って表示される。この撮像情報が表示されることにより、音声編集が必要な個所を特定しやすくできる。ユーザは、操作部２２を介して、第２音声を合成する個所、又は、第２音声を編集する個所（第２音声を強調したり、低減させたり、キャンセルしたりする個所）を特定し、音声付きの映像（第２映像）の生成を指示する。 Next, video with audio (second video) is generated using the recorded first video, first audio, and second audio. First, the imaging information of the first video is read from the storage unit 18 and displayed on the display unit 16 (first display process). The imaging information is displayed in chronological order of the first video. Displaying this imaging information makes it easier to identify areas where audio editing is required. The user specifies, via the operation unit 22, the areas where the second audio should be synthesized or the areas where the second audio should be edited (areas where the second audio should be emphasized, reduced, or canceled), and instructs the generation of video with audio (second video).

音声付きの映像の生成が指示されると、操作部２２からの操作入力に基づき、第２音声を用いて第１音声が処理され、第３音声が生成される（音声生成工程）。そして、生成された第３音声と第１映像とが関連付けられて、音声付きの映像（第２映像）が生成される（映像生成工程）。 When an instruction to generate a video with audio is given, the first audio is processed using the second audio based on operation input from the operation unit 22, and a third audio is generated (audio generation process). The generated third audio is then associated with the first video, and a video with audio (second video) is generated (video generation process).

このように、本実施の形態によれば、第１映像の撮像情報を取得して、記録しておく。これにより、音声付きの映像を生成する際に、音声の編集が必要な個所を簡単に特定できる。 In this way, according to this embodiment, the imaging information of the first video is acquired and recorded. This makes it easy to identify areas where audio editing is required when generating video with audio.

［第４の実施の形態の変形例］
取得した撮像情報の用途は、上記例に限定されない。たとえば、撮像情報を利用して、自動で音声編集が必要な個所を特定し、自動で第１音声を処理して、第３音声を生成する構成とすることもできる。たとえば、撮像装置本体の動きの大きい個所（動きが閾値以上の個所）、被写体の動きが大きい個所（被写体距離の変動が閾値以上の個所）等を撮像情報に基づいて特定し、自動で処理する構成とすることもできる。また、たとえば、撮像情報を利用して、自動で音声編集方法が変えることもできる。映像の全体を通じて第２音声を低減させる場合において、低減させる強度を撮像情報に基づいて自動で変えることができる（撮像装置本体の動きの大きい個所、被写体の動きが大きい個所に強度を変える等）。また、たとえば、撮像情報を利用して、音声編集方法が必要な個所を検出し、自動で頭出しすることもできる。 [Modification of the Fourth Embodiment]
The uses of the acquired imaging information are not limited to the above examples. For example, imaging information can be used to automatically identify areas requiring audio editing, automatically process the first audio, and generate the third audio. For example, imaging information can be used to identify areas where the imaging device itself is moving significantly (areas where the movement is greater than a threshold) or areas where the subject is moving significantly (areas where the change in subject distance is greater than a threshold), and the like, and automatically process the identified areas. Furthermore, imaging information can be used to automatically change the audio editing method. When reducing the second audio throughout the entire video, the reduction intensity can be automatically changed based on imaging information (e.g., by changing the intensity to areas where the imaging device itself is moving significantly or areas where the subject is moving significantly). Furthermore, imaging information can be used to detect areas requiring audio editing and automatically cue the audio.

［第５の実施の形態］
本実施の形態では、第１音声を集音する第１マイクロフォン１２Ａ及び第２音声を集音する第２マイクロフォン１４Ａの情報を取得し、記憶部１８に記録しておく。記録した第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報は、音声付きの映像（第２映像）を生成する際に使用される。具体的には、音声付きの映像を生成する際に、表示部１６に表示される。ユーザは、表示部１６に表示された情報を利用して、第２音声の強度の設定等を行う。ここで、マイクロフォンの情報とは、たとえば、風防の有無の情報、風防が備えられている場合に風防の種類（スポンジ型、ファー型、カゴ型等）の情報、指向性の有無の情報等である。この他、マイクロフォンの情報には、マイクロフォンの性能諸要素の情報（たとえば、マイクロフォンの指向特性の情報（音が到来する方向に対する感度の変化の情報）、周波数特性の情報（音の高低によって感度がどのように変わるかの情報）、最大音圧レベル（マイクロフォンが収音できる最も大きな音のレベル）の情報、等価雑音レベル（入力換算雑音レベル）の情報、出力インピーダンスの情報、開回路感度の情報等）等も含めることができる。 Fifth Embodiment
In the present embodiment, information from the first microphone 12A that collects the first sound and the second microphone 14A that collects the second sound is acquired and recorded in the storage unit 18. The recorded information from the first microphone 12A and the second microphone 14A is used when generating a video with sound (second video). Specifically, when generating a video with sound, the information is displayed on the display unit 16. The user uses the information displayed on the display unit 16 to set the intensity of the second sound, etc. Here, the microphone information includes, for example, information on whether or not a windshield is provided, information on the type of windshield (sponge type, fur type, cage type, etc.) if a windshield is provided, information on whether or not the device is directional, etc. In addition, the microphone information can also include information on various performance elements of the microphone (for example, information on the microphone's directional characteristics (information on how sensitivity changes depending on the direction from which sound arrives), information on frequency characteristics (information on how sensitivity changes depending on the pitch of the sound), information on maximum sound pressure level (the loudest sound level that the microphone can pick up), information on equivalent noise level (input converted noise level), information on output impedance, information on open circuit sensitivity, etc.).

第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報は、ユーザが操作部２２を介して撮像装置１に入力する。ＣＰＵ２４は、操作部２２を介して入力された第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を記憶部１８にする。また、音付きの映像を生成する際は、記憶部１８に記録された第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を表示部１６に表示する。 The user inputs information from the first microphone 12A and the second microphone 14A to the imaging device 1 via the operation unit 22. The CPU 24 stores the information from the first microphone 12A and the second microphone 14A input via the operation unit 22 in the memory unit 18. When generating video with sound, the information from the first microphone 12A and the second microphone 14A recorded in the memory unit 18 is displayed on the display unit 16.

図１３は、マイクロフォンの情報を取得して記録する場合及びマイクロフォンの情報を表示する場合にＣＰＵが実現する機能のブロック図である。同図に示すように、ＣＰＵ２４は、マイクロフォン情報取得部１４１、マイクロフォン情報記録部１４２及びマイクロフォン情報表示部１４３として機能する。 Figure 13 is a block diagram of the functions implemented by the CPU when acquiring and recording microphone information and when displaying microphone information. As shown in the figure, the CPU 24 functions as a microphone information acquisition unit 141, a microphone information recording unit 142, and a microphone information display unit 143.

マイクロフォン情報取得部１４１は、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を取得する。上記のように、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報は、操作部２２を介して、ユーザが入力する。ユーザは、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａについて、風防の有無、風防が備えられている場合に風防の種類（スポンジ型、ファー型、カゴ型等）、指向性の有無等の情報を入力する。 The microphone information acquisition unit 141 acquires information about the first microphone 12A and the second microphone 14A. As described above, the information about the first microphone 12A and the second microphone 14A is input by the user via the operation unit 22. The user inputs information about the first microphone 12A and the second microphone 14A, such as whether or not they have a windshield, and if so, the type of windshield (sponge type, fur type, cage type, etc.), and whether or not they are directional.

マイクロフォン情報記録部１４２は、マイクロフォン情報取得部１４１で取得した第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を記憶部１８に記録する。第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報は、映像（第１映像）に関連付けて記録される。 The microphone information recording unit 142 records the information from the first microphone 12A and the second microphone 14A acquired by the microphone information acquisition unit 141 in the memory unit 18. The information from the first microphone 12A and the second microphone 14A is recorded in association with the video (first video).

マイクロフォン情報表示部１４３は、操作部２２からの操作入力に基づいて、記憶部１８から第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を取得し、表示部１６に表示する。 The microphone information display unit 143 acquires information about the first microphone 12A and the second microphone 14A from the memory unit 18 based on operation input from the operation unit 22 and displays it on the display unit 16.

本実施の形態の撮像装置を用いた音声付き映像の生成は、たとえば、次のように行われる。 Video with audio is generated using the imaging device of this embodiment, for example, as follows:

まず、撮像部１０によって第１映像が撮像され、記憶部１８に記録される（映像記録工程）。また、撮像に同期して第１音声及び第２音声が集音され、第１映像に関連付けて記憶部１８に記録される（第１音声記録工程及び第２音声記録工程）。また、撮像の際に使用した第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報が入力され（第２情報取得工程）、記憶部１８に記録される。 First, a first video is captured by the imaging unit 10 and recorded in the storage unit 18 (video recording process). Furthermore, a first audio signal and a second audio signal are collected in synchronization with the capture and recorded in the storage unit 18 in association with the first video signal (first audio recording process and second audio recording process). Furthermore, information from the first microphone 12A and second microphone 14A used during the capture is input (second information acquisition process) and recorded in the storage unit 18.

次に、記録された第１映像、第１音声及び第２音声を用いて、音声付きの映像（第２映像）が生成される。まず、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報が記憶部１８から読み出され、表示部１６に表示される（第２表示工程）。ユーザは、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を元に第２音声を合成する際の強度を設定する。また、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を元に第１音声を処理する際の処理条件を設定する。たとえば、風防の有無、種類に応じて、強度を設定する。 Next, a video with audio (second video) is generated using the recorded first video, first audio, and second audio. First, information from the first microphone 12A and second microphone 14A is read from the storage unit 18 and displayed on the display unit 16 (second display process). The user sets the intensity when synthesizing the second audio based on the information from the first microphone 12A and second microphone 14A. The user also sets the processing conditions when processing the first audio based on the information from the first microphone 12A and second microphone 14A. For example, the intensity is set depending on the presence or absence of a windshield and its type.

このように、本実施の形態によれば、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を取得して、記録しておく。これにより、音声付きの映像を生成する際に、より適切に第３音声を生成できる。たとえば、メインの音声の損失を最小限に抑えることができる。 In this way, according to this embodiment, information from the first microphone 12A and the second microphone 14A is acquired and recorded. This allows the third audio to be generated more appropriately when generating video with audio. For example, loss of the main audio can be minimized.

［第５の実施の形態の変形例］
取得した第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報の用途は、上記の例に限定されない。たとえば、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を利用して、自動で第１音声を処理して、第３音声を生成する構成とすることもできる。たとえば、第１音声に第２音声を合成して、第３音声を生成する場合において、第２音声を合成する際の強度を第２マイクロフォン１４Ａの種類に応じて自動で設定することもできる。また、たとえば、第１音声に対して第２音声と同じ周波数の音声成分を処理して第３音声を生成する場合において、第２マイクロフォン１４Ａの種類に応じて、処理対象の周波数を自動で変える構成とすることもできる。 [Modification of the Fifth Embodiment]
The use of the acquired information from the first microphone 12A and the second microphone 14A is not limited to the above example. For example, a configuration may be adopted in which the first audio is automatically processed to generate a third audio using the information from the first microphone 12A and the second microphone 14A. For example, when generating a third audio by synthesizing a second audio with the first audio, the intensity of the second audio when synthesizing the second audio may be automatically set depending on the type of the second microphone 14A. Furthermore, when generating a third audio by processing audio components of the first audio with the same frequency as the second audio, the frequency to be processed may be automatically changed depending on the type of the second microphone 14A.

また、上記の例では、ユーザが、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を撮像装置１に入力する構成としているが、第１マイクロフォン１２Ａ及び第２マイクロフォン１４Ａの情報を自動で収集する構成とすることもできる。 In addition, in the above example, the user inputs information from the first microphone 12A and the second microphone 14A into the imaging device 1, but it is also possible to configure the information from the first microphone 12A and the second microphone 14A to be collected automatically.

［第６の実施の形態］
本実施の形態では、第１映像の撮像中、第２音声が記録されたタイミングを検出して記録しておく。記録した情報は、音声付きの映像（第２映像）を生成する際に使用される。具体的には、音声付きの映像を生成する際に、表示部１６に表示される。ユーザは、表示部１６に表示された情報を利用して、音声の編集を行う区間（シーン）を特定する。 Sixth Embodiment
In this embodiment, the timing at which the second audio is recorded is detected and recorded while the first video is being captured. The recorded information is used when generating the video with audio (the second video). Specifically, when generating the video with audio, the information is displayed on the display unit 16. The user uses the information displayed on the display unit 16 to specify the section (scene) for which audio editing is to be performed.

第２音声が記録されたタイミングの検出はＣＰＵ２４が行う。図１４は、第２音声が記録されたタイミングを検出して記録する場合及び記録された情報を表示する場合にＣＰＵが実現する機能のブロック図である。同図に示すように、ＣＰＵ２４は、第２音声検出部１５１、タイミング情報記録部１５２及びタイミング情報表示部１５３として機能する。 The CPU 24 detects the timing at which the second audio is recorded. Figure 14 is a block diagram of the functions implemented by the CPU when detecting and recording the timing at which the second audio is recorded and when displaying the recorded information. As shown in the figure, the CPU 24 functions as a second audio detection unit 151, a timing information recording unit 152, and a timing information display unit 153.

第２音声検出部１５１は、第２音声入力部１４から入力される第２音声の音声信号に基づいて第２音声が記録されたタイミングを検出する。すなわち、第２音声の音声信号の入力を検出して、第２音声が記録されたタイミングを検出する。 The second audio detection unit 151 detects the timing when the second audio was recorded based on the audio signal of the second audio input from the second audio input unit 14. In other words, it detects the input of the audio signal of the second audio and detects the timing when the second audio was recorded.

タイミング情報記録部１５２は、第２音声検出部１５１で検出された第２音声の記録タイミングの情報（タイミング情報）を記憶部１８に記録する。タイミング情報は、映像（第１映像）に関連付けて記録される。 The timing information recording unit 152 records information (timing information) about the recording timing of the second audio detected by the second audio detection unit 151 in the storage unit 18. The timing information is recorded in association with the video (first video).

タイミング情報表示部１５３は、操作部２２からの操作入力に基づいて、記憶部１８からタイミング情報を取得し、表示部１６に表示する。たとえば、第２音声が記録されたタイミングを時間軸上で表示する。 The timing information display unit 153 acquires timing information from the memory unit 18 based on operation input from the operation unit 22 and displays it on the display unit 16. For example, it displays the timing at which the second audio was recorded on a time axis.

まず、撮像部１０によって第１映像が撮像され、記憶部１８に記録される（映像記録工程）。撮像に同期して第１映像のタイミング情報が取得され（第１情報取得工程）、第１映像に関連付けて記憶部１８に記録される。また、撮像に同期して第１音声及び第２音声が集音され、第１映像に関連付けて記憶部１８に記録される（第１音声記録工程及び第２音声記録工程）。また、第２音声の記録されたタイミングの検出が行われる（第２音声検出工程）。検出された情報は、第１映像に関連付けられて記憶部１８に記録される（関連付け工程）。 First, a first video is captured by the imaging unit 10 and recorded in the storage unit 18 (video recording process). Timing information for the first video is acquired in synchronization with the capturing (first information acquisition process), and is recorded in association with the first video in the storage unit 18. Also, a first sound and a second sound are collected in synchronization with the capturing, and are recorded in association with the first video in the storage unit 18 (first sound recording process and second sound recording process). Also, the timing at which the second sound was recorded is detected (second sound detection process). The detected information is recorded in association with the first video in the storage unit 18 (association process).

次に、記録された第１映像、第１音声及び第２音声を用いて、音声付きの映像（第２映像）が生成される。まず、タイミング情報が記憶部１８から読み出され、表示部１６に表示される。たとえば、第２音声が記録されたタイミングが時間軸上で表示される。ユーザは、操作部２２を介して、第２音声を合成する個所、又は、第２音声を編集する個所を特定し、音声付きの映像（第２映像）の生成を指示する。 Next, video with audio (second video) is generated using the recorded first video, first audio, and second audio. First, timing information is read from the storage unit 18 and displayed on the display unit 16. For example, the timing at which the second audio was recorded is displayed on the time axis. The user specifies the point at which the second audio is to be synthesized or edited via the operation unit 22, and instructs the generation of video with audio (second video).

このように、本実施の形態によれば、第２音声が記録されたタイミングを検出して、記録しておく。これにより、音声付きの映像を生成する際に、音声の編集が必要な個所（シーン）を簡単に特定できる。 In this way, according to this embodiment, the timing at which the second audio is recorded is detected and recorded. This makes it easy to identify locations (scenes) where audio editing is required when generating video with audio.

［第６の実施の形態の変形例］
取得したタイミング情報の用途は、上記例に限定されない。たとえば、タイミング情報に基づいて、音声編集が必要な個所を自動的に特定する構成とすることもできる。 [Modification of the Sixth Embodiment]
The use of the acquired timing information is not limited to the above example, and for example, it may be configured to automatically identify portions that require audio editing based on the timing information.

また、第２音声が記録されたタイミングは、第１映像の撮像終了後に検出してもよい。すなわち、撮像終了後に第２音声の音声データを解析して、第２音声が記録されたタイミングを検出する構成としてもよい。 The timing at which the second audio was recorded may also be detected after the capture of the first video has ended. In other words, the audio data of the second audio may be analyzed after the capture has ended to detect the timing at which the second audio was recorded.

［その他の実施の形態］
上記のように、メインの音声である第１音声を集音する第１マイクロフォン１２Ａについては、無指向性の集音特性を有するマイクロフォンを使用することが好ましい。また、第２音声を集音する第２マイクロフォン１４Ａについては、指向性の集音特性を有するマイクロフォン（たとえば、ガンマイク等）を使用することが好ましい。これにより、第１音声及び第２音声の録音の精度を向上できる。また、たとえば、特定の声の音声特性を維持しながら音の調整を行うこともできる。 [Other embodiments]
As described above, it is preferable to use a microphone with omnidirectional sound collection characteristics for the first microphone 12A that collects the first sound, which is the main sound. It is also preferable to use a microphone with directional sound collection characteristics (such as a gun microphone) for the second microphone 14A that collects the second sound. This improves the accuracy of recording the first sound and the second sound. It is also possible to adjust the sound while maintaining the sound characteristics of a specific voice, for example.

また、指向性の集音特性を有するマイクロフォンを使用して第２音声を集音する場合において、指向性を調整できる場合、第２音声の音源の位置の変化に応じて、マイクロフォンの指向性を変えることがより好ましい。たとえば、映像内における第２音声の音源の位置の変化に追従して、第２マイクロフォン１４Ａの向きを変える。音源の位置の変化は、たとえば、映像を解析して検出する。たとえば、第２音声の音源となる被写体を映像内で特定し、その被写体の位置を画像認識等で検出し、音源の位置を特定する。 Furthermore, when collecting the second sound using a microphone with directional sound collection characteristics, if the directivity can be adjusted, it is more preferable to change the microphone's directivity in accordance with changes in the position of the sound source of the second sound. For example, the orientation of the second microphone 14A is changed in response to changes in the position of the sound source of the second sound within the video. Changes in the position of the sound source can be detected, for example, by analyzing the video. For example, the subject that is the sound source of the second sound is identified within the video, and the position of the subject is detected using image recognition, etc., to identify the position of the sound source.

また、上記実施の形態では、本発明を撮像装置で実施する場合を例に説明したが、本発明を実施する装置及びシステムは、これに限定されるものではない。たとえば、撮像機能及び録音機能を備えた携帯電子機器（たとえば、スマートフォン、タブレットコンピュータ、ノートパソコン等）でも実施できる。また、記録済みの第１映像、第１音声及び第２音声をコンピュータ（たとえば、パーソナルコンピュータ等）に取り込み、コンピュータで第３音声を生成し、音声付きの映像（第２映像）を生成することもできる。 In addition, while the above embodiment has been described with reference to an example in which the present invention is implemented in an imaging device, the devices and systems implementing the present invention are not limited to this. For example, the present invention can also be implemented in a portable electronic device (e.g., a smartphone, tablet computer, laptop computer, etc.) equipped with imaging and recording functions. It is also possible to import the recorded first video, first audio, and second audio into a computer (e.g., a personal computer, etc.), generate the third audio on the computer, and generate video with audio (second video).

第３音声を生成する機能、及び、第２映像を生成する機能等を実行する制御部は、各種のプロセッサ（processor）を用いて実現できる。各種のプロセッサには、たとえば、ソフトウェア（プログラム）を実行して各種の機能を実現する汎用的なプロセッサであるＣＰＵが含まれる。また、上記各種のプロセッサには、画像処理に特化したプロセッサであるＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などの製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ：ＰＬＤ）も含まれる。更に、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）などの特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路なども上記各種のプロセッサに含まれる。 The control unit that performs functions such as generating the third audio and generating the second video can be implemented using various processors. These include, for example, a CPU, a general-purpose processor that executes software (programs) to implement various functions. These processors also include a GPU (Graphics Processing Unit), a processor specialized for image processing, and a programmable logic device (PLD), such as an FPGA (Field Programmable Gate Array), whose circuit configuration can be changed after manufacture. Furthermore, these processors also include dedicated electrical circuits, such as an ASIC (Application Specific Integrated Circuit), which is a processor with a circuit configuration designed specifically to perform specific processing.

制御部は、１つのプロセッサにより実現されてもよいし、同種又は異種の複数のプロセッサ（たとえば、複数のＦＰＧＡ、あるいはＣＰＵとＦＰＧＡの組み合わせ、又はＣＰＵとＧＰＵの組み合わせ）で実現されてもよい。また、複数の機能を１つのプロセッサで実現してもよい。複数の機能を１つのプロセッサで構成する例としては、第１に、サーバなどのコンピュータに代表されるように、１つ以上のＣＰＵとソフトウェアの組合せで１つのプロセッサを構成し、このプロセッサが複数の機能として実現する形態がある。第２に、システムオンチップ（ＳｙｓｔｅｍＯｎＣｈｉｐ：ＳｏＣ）などに代表されるように、システム全体の機能を１つのＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）チップで実現するプロセッサを使用する形態がある。このように、各種の機能は、ハードウェア的な構造として、上記各種のプロセッサを１つ以上用いて構成される。更に、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子などの回路素子を組み合わせた電気回路（ｃｉｒｃｕｉｔｒｙ）である。これらの電気回路は、論理和、論理積、論理否定、排他的論理和、及びこれらを組み合わせた論理演算を用いて上記の機能を実現する電気回路であってもよい。 The control unit may be implemented by a single processor, or by multiple processors of the same or different types (for example, multiple FPGAs, a combination of a CPU and an FPGA, or a combination of a CPU and a GPU). Multiple functions may also be implemented by a single processor. Examples of multiple functions implemented by a single processor include: 1) a single processor implemented by a combination of one or more CPUs and software, as in computers such as servers, which implements multiple functions; 2) a processor that implements the functions of the entire system on a single integrated circuit (IC) chip, as in systems on chips (SoCs). In this way, various functions are implemented as a hardware structure using one or more of the above-mentioned processors. Furthermore, the hardware structure of these various processors is, more specifically, an electrical circuit (circuitry) that combines circuit elements such as semiconductor devices. These electrical circuits may be electrical circuits that achieve the above functions using logical operations such as logical sum, logical product, logical negation, exclusive logical sum, and combinations of these.

上記のプロセッサあるいは電気回路がソフトウェア（プログラム）を実行する際は、実行するソフトウェアのプロセッサ（コンピュータ）読み取り可能なコードをＲＯＭ等の非一時的記録媒体に記憶しておき、プロセッサがそのソフトウェアを参照する。非一時的記録媒体に記憶しておくソフトウェアは、画像の入力、解析、表示制御等を実行するためのプログラムを含む。ＲＯＭではなく各種光磁気記録装置、半導体メモリ等の非一時的記録媒体にコードを記録してもよい。ソフトウェアを用いた処理の際には、たとえばＲＡＭが一時的記憶領域として用いられ、また、たとえば、不図示のＥＥＰＲＯＭ（ＥｌｅｃｔｒｏｎｉｃａｌｌｙＥｒａｓａｂｌｅａｎｄＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）に記憶されたデータを参照することもできる。 When the above processor or electrical circuit executes software (programs), the processor (computer) readable code of the software to be executed is stored in a non-transitory recording medium such as a ROM, and the processor references that software. Software stored in a non-transitory recording medium includes programs for image input, analysis, display control, etc. Code may also be recorded in a non-transitory recording medium such as various magneto-optical recording devices or semiconductor memory instead of a ROM. When processing using the software, RAM, for example, is used as a temporary storage area, and data stored in an EEPROM (Electronically Erasable and Programmable Read Only Memory) (not shown) may also be referenced.

１撮像装置
１０撮像部
１０Ａ撮像光学系
１０Ｂ撮像素子
１０Ｃ画像信号処理部
１２第１音声入力部
１２Ａ第１マイクロフォン
１２Ｂ第１音声信号処理部
１４第２音声入力部
１４Ａ第２マイクロフォン
１４Ｂ第２音声信号処理部
１６表示部
１８記憶部
２０音声出力部
２２操作部
２４ＣＰＵ
２６ＲＯＭ
２８ＲＡＭ
３０動き検出部
１０１撮像制御部
１０２映像出力部
１０３第１映像記録部
１０４第１音声記録部
１０５第２音声記録部
１０６動き記録部
１１１映像再生部
１１２音声再生部
１２１第１映像取得部
１２２第１音声取得部
１２３第２音声取得部
１２４第３音声生成部
１２４Ａ強度調整部
１２４Ｂ合成部
１２４Ｃ周波数検出部
１２４Ｄ音声処理部
１２４Ｅ第２音声処理部
１２５強度設定部
１２６映像生成部
１２７第２映像記録部
１２８処理条件設定部
１３１撮像情報取得部
１３２撮像情報記録部
１３３撮像情報表示部
１４１マイクロフォン情報取得部
１４２マイクロフォン情報記録部
１４３マイクロフォン情報表示部
１５１第２音声検出部
１５２タイミング情報記録部
１５３タイミング情報表示部 1 Imaging device 10 Imaging unit 10A Imaging optical system 10B Imaging element 10C Image signal processing unit 12 First audio input unit 12A First microphone 12B First audio signal processing unit 14 Second audio input unit 14A Second microphone 14B Second audio signal processing unit 16 Display unit 18 Storage unit 20 Audio output unit 22 Operation unit 24 CPU
26 ROM
28 RAM
30 Motion detection unit 101 Imaging control unit 102 Video output unit 103 First video recording unit 104 First audio recording unit 105 Second audio recording unit 106 Motion recording unit 111 Video playback unit 112 Audio playback unit 121 First video acquisition unit 122 First audio acquisition unit 123 Second audio acquisition unit 124 Third audio generation unit 124A Intensity adjustment unit 124B Synthesis unit 124C Frequency detection unit 124D Audio processing unit 124E Second audio processing unit 125 Intensity setting unit 126 Video generation unit 127 Second video recording unit 128 Processing condition setting unit 131 Imaging information acquisition unit 132 Imaging information recording unit 133 Imaging information display unit 141 Microphone information acquisition unit 142 Microphone information recording unit 143 Microphone information display unit 151 Second audio detection unit 152 Timing information recording unit 153 Timing information display unit

Claims

a video recording step of recording a first video captured by the imaging unit;
a first audio recording step of recording a first audio in association with the first video;
a second audio recording step of recording a second audio different from the first audio in association with the first video;
a second audio detection step of detecting a timing at which the second audio is recorded in the first video ;
a step of displaying the timing at which the second audio was recorded on a time axis and displaying information on the timing at which the second audio was recorded in the first video on a display unit ;
playing the first video and the first audio;
receiving , by using the information displayed on the display unit, a designation of a section on a time axis that requires editing by combining the second audio with the first video and the first audio;
an intensity setting step of receiving an instruction to set the intensity of the second sound;
a sound generating step of synthesizing the second sound, which has been emphasized or reduced based on the setting instruction, for a specified section of the first sound to generate a third sound;
an image generating step of generating a second image by associating the first image with the third audio;
An image generation method comprising:

the first voice includes a common component that is a voice component common to the second voice,
the sound generating step performs a process of emphasizing or reducing the common component on the first sound using the second sound to generate the third sound.
The image generation method according to claim 1 .

a processing condition setting step of setting processing conditions for the common component before the sound generating step,
the sound generating step performs processing on the first sound to emphasize or reduce the common component in accordance with the processing conditions set in the processing condition setting step;
The image generation method according to claim 2 .

a detection step of detecting a movement of the imaging device body including the imaging unit,
the sound generating step, when the motion is detected in the detecting step, performs a predetermined process on the first sound or the second sound to generate the third sound.
The image generating method according to any one of claims 1 to 3.

a first information acquisition step of acquiring imaging information of the first video by the imaging unit;
a first display step of displaying the imaging information;
The image generating method according to claim 1 , further comprising:

The imaging information includes at least one of information on the movement of the imaging device body including the imaging unit and information on the focal length.
The image generation method according to claim 5 .

a second information acquisition step of acquiring information about a sound collection unit that collects the first sound and the second sound,
a second display step of displaying information about the sound collection unit;
The image generating method according to claim 1 , further comprising:

the first sound recording step includes recording the first sound via a first sound collection unit;
the second sound recording step includes recording the second sound via a second sound collection unit different from the first sound collection unit.
The image generation method according to any one of claims 1 to 7.

the second sound collection unit has directional sound collection characteristics,
The first sound collection unit has a sound collection characteristic with lower directivity than the second sound collection unit.
The image generation method according to claim 8.

the second sound collection unit has directional sound collection characteristics,
the second sound recording step includes detecting a position of a sound source of the second sound, directing the second sound collection unit in the direction of the detected sound source, and recording the second sound.
10. The image generating method according to claim 8 or 9.