JP7808792B2

JP7808792B2 - Sound field reproduction device, sound field reproduction method, and sound field reproduction system

Info

Publication number: JP7808792B2
Application number: JP2022129434A
Authority: JP
Inventors: 宏正大橋
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2026-01-30
Anticipated expiration: 2042-08-15
Also published as: JP2024026010A; WO2024038702A1

Description

本開示は、音場再現装置、音場再現方法及び音場再現システムに関する。 This disclosure relates to a sound field reproduction device, a sound field reproduction method, and a sound field reproduction system.

昨今、リアルタイムに音場再現を行うためにシーンベース立体音響再生技術が注目されている。シーンベース立体音響再生技術とは、複数の指向性マイク素子を剛球上又は中空球面上に配置されているアンビソニックスマイクを用いて収録（収音）した多チャンネル信号に対して信号処理を施すことにより、視聴環境（空間）を取り囲むように配置されたスピーカを用いてあたかもリスナー（聴取者）がアンビソニックスマイクの設置箇所に存在しているかのような立体的な音場をリアルタイムに再現する方式である。 Recently, scene-based 3D sound reproduction technology has been attracting attention for its ability to reproduce sound fields in real time. Scene-based 3D sound reproduction technology is a method of reproducing a three-dimensional sound field in real time using speakers arranged to surround the viewing environment (space) by applying signal processing to multi-channel signals recorded (picked up) using Ambisonics microphones, which have multiple directional microphone elements arranged on a rigid or hollow sphere. This makes it appear as if the listener is actually located where the Ambisonics microphones are installed.

音場再現に関する先行技術として、例えば特許文献１が知られている。特許文献１は、収音対象空間において一体となって設置された複数の収音部であって、音源の位置と当該音源から発せられる音を反射する物体の位置とに応じた複数の異なる向きで設置された複数の収音部による収音に基づく複数の収音信号を取得し、この取得された複数の収音信号に基づいて、収音対象空間内の指定された聴取点に対応する音響信号を生成する、信号処理装置を開示している。 Patent Document 1, for example, is known as prior art related to sound field reproduction. Patent Document 1 discloses a signal processing device that acquires multiple collected signals based on sound collected by multiple sound collection units installed together in a target sound collection space, with the multiple sound collection units installed in multiple different orientations depending on the position of the sound source and the position of objects that reflect the sound emitted from the sound source, and generates an acoustic signal corresponding to a specified listening point within the target sound collection space based on the multiple collected signals.

特開２０１９－１９２９７５号公報Japanese Patent Application Laid-Open No. 2019-192975

特許文献１の構成では、複数の収音部が配置されている収音対象空間内に聴取点が存在していることが前提となっている。このため、特許文献１を用いてシーンベース立体音響のシステムを構築しようとしても、収音部が配置されている収音対象空間内にリスナーが存在しなければならない。つまり、リスナーが収音対象空間とは異なる位置に存在する場合には、収音対象空間内で収音された音響信号をその収音対象空間内で聴取可能となるように音場再現することは困難であるという課題がある。 The configuration of Patent Document 1 assumes that the listening point is located within a target sound collection space where multiple sound collection units are located. Therefore, even if a scene-based 3D sound system is constructed using Patent Document 1, the listener must be located within the target sound collection space where the sound collection units are located. In other words, if the listener is located in a different position from the target sound collection space, it is difficult to reproduce the sound field so that the sound signals collected within the target sound collection space can be heard within that target sound collection space.

一方で、アンビソニックスマイクは球面上に配置されたマイク素子数が増えるほど高次のアンビソニックス成分（音場成分）を合成することができるため、マイク素子数を増やすことにより収録及び再現の際の方向解像度を高めることができることが知られている。しかしながら、パブリックビューイング等のイベントでのリアルタイムのライブ配信にて高次の音場成分を合成するには、収録対象空間に配置させるマイク素子数を増やしてより高次の音場成分を合成することが必要となる。このため、マイク素子の設置空間によってはマイク素子の配置の物理的な制約があり、また、マイク素子数の増加に伴う伝送チャンネル数の増加によって信号処理及び合成処理等の処理負荷が増大し、音場再現のための出力の遅延を引き起こしてしまう。そのため、マイク素子数は不必要に増やさずに低次の音場成分を用いて音場再現することが求められるが、この場合、周囲に配置したスピーカ群の中心位置から外れるにつれて音源定位誤差の増大が著しくなり、期待する音場再現を行うことができない。 On the other hand, it is known that an Ambisonics microphone can synthesize higher-order Ambisonics components (sound field components) as the number of microphone elements arranged on a spherical surface increases, thereby improving directional resolution during recording and reproduction. However, synthesizing higher-order sound field components for real-time live streaming of events such as public viewing requires increasing the number of microphone elements arranged in the recording space to synthesize higher-order sound field components. For this reason, depending on the installation space of the microphone elements, there are physical constraints on microphone element placement. Furthermore, an increase in the number of transmission channels associated with an increase in the number of microphone elements increases the processing load for signal processing and synthesis processes, resulting in output delays for sound field reproduction. Therefore, it is necessary to reproduce the sound field using lower-order sound field components without unnecessarily increasing the number of microphone elements. However, in this case, sound source localization errors increase significantly as the microphone element moves away from the center position of the surrounding speakers, making it impossible to achieve the expected sound field reproduction.

本開示は、上述した従来の状況に鑑みて案出され、アンビソニックスマイクを用いて収録した低次の音場成分を利用し、音場再現空間内での音源定位誤差の増大を抑制する音場再現装置、音場再現方法及び音場再現システムを提供することを目的とする。 The present disclosure was devised in light of the above-mentioned conventional situation, and aims to provide a sound field reproduction device, sound field reproduction method, and sound field reproduction system that utilizes low-order sound field components recorded using an Ambisonics microphone to suppress increases in sound source localization errors within a sound field reproduction space.

本開示は、収録デバイスが配置される音場収録空間内の音源抽出方向の指定を受ける音源抽出方向制御部と、前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成する再符号化部と、前記音場収録空間とは異なる音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力する音場再生部と、を備える、音場再現装置を提供する。 The present disclosure provides a sound field reproduction device comprising: a sound source extraction direction control unit that receives a specified sound source extraction direction within a sound field recording space in which a recording device is placed; a re-encoding unit that generates high-order basis acoustic signals by re-encoding acoustic signals that correspond to the sound source extraction direction among low-order basis acoustic signals based on encoding processing using signals recorded by the recording device; and a sound field reproduction unit that outputs signals based on the high-order basis acoustic signals from each of a plurality of speakers provided in a sound field reproduction space different from the sound field recording space.

また、本開示は、収録デバイスが配置される音場収録空間内の音源抽出方向の指定を受けるステップと、前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成するステップと、前記音場収録空間とは異なる音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力するステップと、を有する、音場再現方法を提供する。 The present disclosure also provides a sound field reproduction method comprising the steps of: receiving a designation of a sound source extraction direction within a sound field recording space in which a recording device is placed; generating high-order basis acoustic signals by re-encoding acoustic signals corresponding to the sound source extraction direction among low-order basis acoustic signals based on an encoding process using signals recorded by the recording device; and outputting signals based on the high-order basis acoustic signals from each of a plurality of speakers provided in a sound field reproduction space different from the sound field recording space.

また、本開示は、音場収録空間内の音源を収録可能な収録デバイスを有する音場収録装置と、前記収録デバイスにより収録された音響信号を、前記音場収録空間とは異なる音場再現空間内で再現する音場再現装置と、を備え、前記音場再現装置は、前記音場収録空間内の音源抽出方向の指定を受ける音源抽出方向制御部と、前記収録デバイスによる収録信号を用いた符号化処理に基づく低次基底音響信号のうち前記音源抽出方向に対応する音響信号を再符号化して高次基底音響信号を生成する再符号化部と、前記音場再現空間内に設けられた複数のスピーカのそれぞれから、前記高次基底音響信号に基づく信号を出力する音場再生部と、を備える、音場再現システムを提供する。 The present disclosure also provides a sound field reproduction system comprising: a sound field recording apparatus having a recording device capable of recording a sound source within a sound field recording space; and a sound field reproduction device that reproduces the sound signal recorded by the recording device in a sound field reproduction space different from the sound field recording space, wherein the sound field reproduction device comprises: a sound source extraction direction control unit that receives a designation of a sound source extraction direction within the sound field recording space; a re-encoding unit that re-encodes, from low-order basis sound signals based on an encoding process using the signal recorded by the recording device, sound signals that correspond to the sound source extraction direction to generate high-order basis sound signals; and a sound field reproduction unit that outputs signals based on the high-order basis sound signals from each of a plurality of speakers provided in the sound field reproduction space.

なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、または、記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized as a system, device, method, integrated circuit, computer program, or recording medium, or as any combination of a system, device, method, integrated circuit, computer program, and recording medium.

本開示によれば、アンビソニックスマイクを用いて収録した低次の音場成分を利用し、音場再現空間内での音源定位誤差の増大を抑制できる。 This disclosure makes it possible to utilize low-order sound field components recorded using an Ambisonics microphone, thereby suppressing increases in sound source localization errors within the sound field reproduction space.

アンビソニックスマイクを用いたシーンベース立体音響再生技術における音場収録から音場再現までの概念を模式的に示す図A schematic diagram showing the concept of scene-based 3D sound reproduction using Ambisonics microphones, from sound field recording to sound field reproduction. 次数ｎ及び度数ｍに対する球面調和関数展開に基づくアンビソニックス成分の基底の一例を示す図FIG. 1 shows an example of a basis of Ambisonics components based on spherical harmonic expansion for degree n and degree m. 実施の形態１に係る音場再現システムのシステム構成例を示すブロック図FIG. 1 is a block diagram showing an example of the system configuration of a sound field reproduction system according to a first embodiment. 実施の形態１の音場収録から音場再現までの動作概要例を示す図FIG. 1 is a diagram showing an example of an outline of operations from sound field recording to sound field reproduction according to the first embodiment; 実施の形態１に係る音場再現装置による音場再現の動作手順例を時系列に示すフローチャート1 is a flowchart showing an example of a procedure for reproducing a sound field by the sound field reproduction device according to the first embodiment in chronological order. 実施の形態２に係る音場再現システムのシステム構成例を示すブロック図A block diagram showing an example of the system configuration of a sound field reproduction system according to a second embodiment. 実施の形態２の音場収録から音場再現までの動作概要例を示す図FIG. 10 is a diagram showing an example of an outline of operations from sound field recording to sound field reproduction according to the second embodiment. 実施の形態２に係る音場再現装置による音場再現の動作手順例を時系列に示すフローチャート10 is a flowchart showing an example of the operation procedure for reproducing a sound field by the sound field reproduction device according to the second embodiment in chronological order.

以下、図面を適宜参照して、本開示に係る音場再現装置、音場再現方法及び音場再現システムを具体的に開示した実施の形態について、詳細に説明する。ただし、必要以上に詳細な説明は省略する場合がある。例えば、すでによく知られた事項の詳細説明及び実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の記載の主題を限定することは意図されていない。 Hereinafter, with appropriate reference to the drawings, detailed descriptions will be provided of specific embodiments of the sound field reproduction device, sound field reproduction method, and sound field reproduction system disclosed herein. However, more detailed descriptions than necessary may be omitted. For example, detailed descriptions of already well-known matters and redundant descriptions of substantially identical configurations may be omitted. This is to avoid unnecessary redundancy in the following description and to facilitate understanding by those skilled in the art. Note that the accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter of the claims.

以下の各実施の形態では、音場収録空間（例えばライブ会場）内の音、音楽、人の声等の音源信号を収録する収録デバイスとしてアンビソニックスマイクを用いたシーンベース立体音響再生技術を例示して説明する。アンビソニックスマイクを用いたシーンベース立体音響再生技術では、アンビソニックスマイクを構成する複数のマイク素子で収録した信号（収録信号）或いは点音源を、球面調和関数を用いた中間表現ＩＴＭＲ１（図１参照）或いはＢフォーマット信号として表現する（エンコードする）ことにより、全方位から到来する音場をアンビソニックス信号領域（後述参照）において統一的に取り扱う。更に、この中間表現をデコード（復号化）することによりスピーカ駆動信号を生成し、音場再現空間（例えばサテライト会場）内での所望の音場再現を実現する。 In the following embodiments, we will explain an example of a scene-based stereophonic reproduction technology that uses an Ambisonics microphone as a recording device to record sound source signals such as sounds, music, and human voices within a sound field recording space (e.g., a live venue). In scene-based stereophonic reproduction technology using an Ambisonics microphone, signals (recorded signals) or point sound sources recorded by multiple microphone elements that make up the Ambisonics microphone are represented (encoded) as an intermediate representation ITMR1 (see Figure 1) or B-format signal using spherical harmonic functions, thereby handling the sound field arriving from all directions in a unified manner in the Ambisonics signal domain (see below). Furthermore, speaker drive signals are generated by decoding this intermediate representation, thereby achieving the desired sound field reproduction within a sound field reproduction space (e.g., a satellite venue).

（実施の形態１）
まず、図１を参照して、シーンベース立体音響再生技術の概念について説明する。図１は、アンビソニックスマイク１１を用いたシーンベース立体音響再生技術における音場収録から音場再現までの概念を模式的に示す図である。アンビソニックスマイク１１は、ライブ会場ＬＶ１等の音場収録空間内に配置される。ライブ会場ＬＶ１では、複数の音源（例えば複数人によるバンド演奏であればボーカル、ベース、ギター、ドラム等の各種の音源）による演奏等が行われ、その演奏等の音がアンビソニックスマイク１１により収録される。 (Embodiment 1)
First, the concept of scene-based stereophonic sound reproduction technology will be described with reference to Fig. 1. Fig. 1 is a diagram schematically illustrating the concept from sound field recording to sound field reproduction in scene-based stereophonic sound reproduction technology using Ambisonics microphones 11. The Ambisonics microphones 11 are placed in a sound field recording space such as a live music venue LV1. At the live music venue LV1, a performance is held using multiple sound sources (for example, in the case of a band performance by multiple people, various sound sources such as vocals, bass, guitar, and drums), and the sounds of the performance are recorded by the Ambisonics microphones 11.

収録デバイスの一例としてのアンビソニックスマイク１１は、４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４を備える。マイク素子Ｍｃ１～Ｍｃ４のそれぞれは、方向Ｄｒ１を正面方向とした場合に、図１中の立方体ＣＢ１の中心から４つの頂点を向くように中空配置され、各頂点方向に対する単一指向性を有している。マイク素子Ｍｃ１は、アンビソニックスマイク１１の前方左上（ＦＬＵ：ＦｒｏｎｔＬｅｆｔＵｐ）を向き、その前方左上（ＦＬＵ）の方向の音を収録する。マイク素子Ｍｃ２は、アンビソニックスマイク１１の前方右下（ＦＲＤ：ＦｒｏｎｔＲｉｇｈｔＤｏｗｎ）を向き、その前方右下（ＦＲＤ）の方向の音を収録する。マイク素子Ｍｃ３は、アンビソニックスマイク１１の後方左下（ＢＬＤ：ＢａｃｋＬｅｆｔＤｏｗｎ）を向き、その後方左下の方向の音を収録する。マイク素子Ｍｃ４は、アンビソニックスマイク１１の後方右上（ＢＲＵ：ＢａｃｋＲｉｇｈｔＵｐ）を向き、その後方右上の方向の音を収録する。 The Ambisonics microphone 11, an example of a recording device, has four microphone elements Mc1, Mc2, Mc3, and Mc4. Each of the microphone elements Mc1 to Mc4 is positioned in midair so that it faces one of the four vertices from the center of the cube CB1 in Figure 1, with direction Dr1 being the front direction, and has unidirectionality in the direction of each vertex. Microphone element Mc1 faces the front left and upper left (FLU) of the Ambisonics microphone 11 and records sound in that front left and upper (FLU) direction. Microphone element Mc2 faces the front right and lower (FRD) of the Ambisonics microphone 11 and records sound in that front right and lower (FRD) direction. Microphone element Mc3 faces toward the back left down (BLD: Back Left Down) of the Ambisonics microphone 11 and records sound from the back left down direction. Microphone element Mc4 faces toward the back right up (BRU: Back Right Up) of the Ambisonics microphone 11 and records sound from the back right up direction.

これらの４方向（つまり、ＦＬＵ、ＦＲＤ、ＢＬＤ、ＢＲＵ）の音の収録信号は、Ａフォーマット信号と呼ばれる。Ａフォーマット信号は、そのままでは使用できず、指向特性（指向性）を有する中間表現ＩＴＭＲ１としてのＢフォーマット信号に変換される。Ｂフォーマット信号は、例えば、全方向（全方位）の音のＢフォーマット信号Ｗ、前後方向の音のＢフォーマット信号Ｘ、左右方向の音のＢフォーマット信号Ｙ、上下方向の音のＢフォーマット信号Ｚを有する。Ａフォーマット信号は、次に示す変換式により、Ｂフォーマット信号に変換される。 The recorded signals of these four directions (i.e., FLU, FRD, BLD, BRU) are called A-format signals. A-format signals cannot be used as is and are converted into B-format signals, which are intermediate representations ITMR1 that have directional characteristics (directivity). B-format signals include, for example, B-format signal W for omnidirectional (omnidirectional) sounds, B-format signal X for front-to-back sounds, B-format signal Y for left-to-right sounds, and B-format signal Z for up-down sounds. A-format signals are converted into B-format signals using the following conversion formula:

Ｗ＝ＦＬＵ＋ＦＲＤ＋ＢＬＤ＋ＢＲＵ
Ｘ＝ＦＬＵ＋ＦＲＤ－ＢＬＤ－ＢＲＵ
Ｙ＝ＦＬＵ－ＦＲＤ＋ＢＬＤ－ＢＲＵ
Ｚ＝ＦＬＵ－ＦＲＤ－ＢＬＤ＋ＢＲＵ W=FLU+FRD+BLD+BRU
X=FLU+FRD-BLD-BRU
Y=FLU-FRD+BLD-BRU
Z=FLU-FRD-BLD+BRU

Ｂフォーマット信号Ｗ、Ｘ、Ｙ、Ｚを合成することにより、前後、左右、上下の全方位の音の信号が得られる。そして、Ｂフォーマット信号Ｗ、Ｘ、Ｙ、Ｚのそれぞれの信号レベルを変更させて合成することにより、前後、左右、上下の全方位のうち任意の指向特性を有する音の信号を生成することができる。例えば図１に示すように、立方体でモデル化される音場再現空間（例えばサテライト会場ＳＴＬ１）内の各頂点部分に、合計８つのスピーカＳＰｋ１、ＳＰｋ２、ＳＰｋ３、ＳＰｋ４、ＳＰｋ５、ＳＰｋ６、ＳＰｋ７、ＳＰｋ８が配置され、音場収録空間（例えばライブ会場ＬＶ１）と同様（つまり、前後、左右、上下の方向が平行或いは同方向）の３次元座標系を考える。 By combining the B format signals W, X, Y, and Z, sound signals in all directions (front/back, left/right, up/down) can be obtained. Then, by changing the signal levels of the B format signals W, X, Y, and Z and combining them, sound signals with omnidirectional characteristics in any of the directions (front/back, left/right, up/down) can be generated. For example, as shown in Figure 1, consider a three-dimensional coordinate system similar to that of the sound field recording space (e.g., live venue LV1) (i.e., the front/back, left/right, up/down directions are parallel or the same).

なお、スピーカＳＰｋ１～ＳＰｋ８のそれぞれの位置は、音場再現空間（例えばサテライト会場ＳＴＬ１）の基準位置（例えば中心位置ＬＳＰ１）からの既定距離と角度（方位角θ_ｉ及び仰角φ_ｉ）とにより特定可能である。ｉは音場再現空間（例えばサテライト会場ＳＴＬ１）内に配置されているスピーカを示す変数であり、図１の例では１から８までのいずれかの整数をとる。 The position of each of the speakers SPk1 to SPk8 can be specified by a predetermined distance and angle (azimuth angle _θi and elevation angle _φi ) from a reference position (e.g., center position LSP1) of the sound field reproduction space (e.g., satellite venue STL1). i is a variable indicating the speaker located in the sound field reproduction space (e.g., satellite venue STL1), and takes an integer from 1 to 8 in the example of Figure 1.

音場再現空間（例えばサテライト会場ＳＴＬ１）の中心位置ＬＳＰ１にユーザであるリスナー（聴取者）が存在し、正面方向（Ｆｒｏｎｔ）を向いているとする。このような状況下において、音場収録空間（例えばライブ会場ＬＶ１）内で収録されたＡフォーマット信号に基づく符号化処理により得られたＢフォーマット信号Ｗ、Ｘ、Ｙ、Ｚのデータと音場再現空間（例えばサテライト会場ＳＴＬ１）内のスピーカＳＰｋ１～ＳＰｋ８のそれぞれの方向とに基づいて、音場収録空間（例えばライブ会場ＬＶ１）内の音場を音場再現空間（例えばサテライト会場ＳＴＬ１）内で自由に再現することができる。つまり、音場再現空間（例えばサテライト会場ＳＴＬ１）にユーザであるリスナー（聴取者）が存在する場合に、リスナーの正面方向を基準方向とし、その基準方向から任意の３次元方向（例えば後述する音源提示方向θ_{ｔａｒｇｅｔ}）の音を再現出力することが可能となる。 Assume that a user, a listener (listener), is present at the center position LSP1 of a sound field reproduction space (for example, a satellite venue STL1) and is facing forward (Front). Under such circumstances, the sound field in the sound field recording space (for example, a live venue LV1) can be freely reproduced in the sound field reproduction space (for example, a satellite venue STL1) based on the data of the B format signals W, X, Y, and Z obtained by encoding based on the A format signal recorded in the sound field recording space (for example, a live venue LV1) and the respective directions of the speakers SPk1 to SPk8 in the sound field reproduction space (for example, a satellite venue STL1). In other words, when a user, a listener (listener), is present in the sound field reproduction space (for example, a satellite venue STL1), the direction in front of the listener is set as the reference direction, and it is possible to reproduce and output sound in any three-dimensional direction (for example, the sound source presentation direction θ _target described later) from that reference direction.

次に、図２を参照して、次数ｎ及び度数ｍに対する球面調和関数展開に基づくアンビソニックス成分の基底について説明する。図２は、次数ｎ及び度数ｍに対する球面調和関数展開に基づくアンビソニックス成分の基底の一例を示す図である。 Next, with reference to Figure 2, we will explain the basis of Ambisonics components based on spherical harmonic expansion for order n and frequency m. Figure 2 shows an example of the basis of Ambisonics components based on spherical harmonic expansion for order n and frequency m.

図２の横軸（ｍ）は度数（ｄｅｇｒｅｅ）を示し、図２の縦軸（ｎ）は次数（ｏｒｄｅｒ）を示す。度数ｍは、－ｎから＋ｎまでの値をとる。ｎ＝Ｎ次までの球面調和関数は合計（Ｎ＋１）^２個の基底を含む。例えば、ｎ＝Ｎ＝０である場合、１個の基底（つまり、全方位のＢフォーマット信号Ｗ）が得られる。また例えば、ｎ＝Ｎ＝１である場合、４個の基底（つまり、（ｎ、ｍ）＝（０、０）に対応する全方位のＢフォーマット信号Ｗ、（ｎ、ｍ）＝（１、－１）に対応する前後方向のＢフォーマット信号Ｘ、（ｎ、ｍ）＝（１、０）に対応する上下方向のＢフォーマット信号Ｚ、（ｎ、ｍ）＝（１、１）に対応する左右方向のＢフォーマット信号Ｙ）が得られる。なお、ｎ＝Ｎ＝２以降も同様であるため、説明を省略する。 The horizontal axis (m) in FIG. 2 indicates the degree, and the vertical axis (n) in FIG. 2 indicates the order. The degree m ranges from −n to +n. Spherical harmonics up to order n=N include a total of (N+1) ² bases. For example, when n=N=0, one basis (i.e., the omnidirectional B-format signal W) is obtained. For example, when n=N=1, four bases (i.e., the omnidirectional B-format signal W corresponding to (n, m)=(0, 0), the front-to-back B-format signal X corresponding to (n, m)=(1, −1), the up-down B-format signal Z corresponding to (n, m)=(1, 0), and the left-to-right B-format signal Y corresponding to (n, m)=(1, 1)) are obtained. Note that the same applies for n=N=2 and onward, and therefore a description thereof will be omitted.

球面調和関数はｎとｍの増加に対して空間的な周期性が増す性質を有することが知られている。このため、ｎとｍの組み合わせによって異なる方向パターン（指向特性）のＢフォーマット信号を表現することが可能となる。次数ｎ及び度数ｍに対する次元をアンビソニックスチャネルナンバリング（ＡＣＮ：ＡｍｂｉｓｏｎｉｃｓＣｈａｎｎｅｌＮｕｍｂｅｒｉｎｇ）に基づいてＫ＝ｎ（ｎ＋１）＋ｍと定義すると、球面調和関数を式（１）のようにベクトル形式で表現可能である。式（１）において、上添字のＴは転置を示す。 Spherical harmonics are known to have the property that their spatial periodicity increases as n and m increase. This makes it possible to express B-format signals with different directional patterns (directional characteristics) depending on the combination of n and m. If the dimensions for order n and degree m are defined as K = n(n+1) + m based on Ambisonics Channel Numbering (ACN), spherical harmonics can be expressed in vector form as shown in equation (1). In equation (1), the superscript T indicates transposition.

次に、図３及び図４を参照して、実施の形態１に係る音場再現システム１００のシステム構成並びに動作概要について説明する。図３は、実施の形態１に係る音場再現システム１００のシステム構成例を示すブロック図である。図４は、実施の形態１の音場収録から音場再現までの動作概要例を示す図である。 Next, the system configuration and operation overview of the sound field reproduction system 100 according to embodiment 1 will be described with reference to Figures 3 and 4. Figure 3 is a block diagram showing an example of the system configuration of the sound field reproduction system 100 according to embodiment 1. Figure 4 is a diagram showing an example of the operation overview from sound field recording to sound field reproduction according to embodiment 1.

音場再現システム１００は、音場収録装置１と、音場再現装置２とを含む。音場収録装置１と音場再現装置２とはネットワークＮＷ１を介して互いにデータ通信が可能に接続されている。ネットワークＮＷ１は、有線ネットワークでもよいし、無線ネットワークでもよい。有線ネットワークは、例えば有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、有線ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、電力線通信（ＰＬＣ：ＰｏｗｅｒＬｉｎｅＣｏｍｍｕｎｉｃａｔｉｏｎ）のうち少なくとも１つが該当し、他の有線通信可能なネットワーク構成でもよい。一方、無線ネットワークは、Ｗｉ－Ｆｉ（登録商標）等の無線ＬＡＮ、無線ＷＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信、４Ｇ或いは５Ｇ等の移動体携帯通信網のうち少なくとも１つが該当し、他の無線通信可能なネットワーク構成でもよい。 The sound field reproduction system 100 includes a sound field recording device 1 and a sound field reproduction device 2. The sound field recording device 1 and the sound field reproduction device 2 are connected to each other via a network NW1 so that they can communicate data with each other. The network NW1 may be a wired network or a wireless network. The wired network may be, for example, at least one of a wired local area network (LAN), a wired wide area network (WAN), or power line communication (PLC), or may be any other network configuration capable of wired communication. On the other hand, the wireless network may be at least one of a wireless LAN such as Wi-Fi (registered trademark), a wireless WAN, a short-range wireless communication such as Bluetooth (registered trademark), or a mobile communication network such as 4G or 5G, or may be any other network configuration capable of wireless communication.

音場収録装置１は、例えば音場収録空間（例えばライブ会場ＬＶ１）に配置され、アンビソニックスマイク１１と、Ａ／Ｄ変換部１２と、符号化部１３と、マイク素子方向指定部１４とを含む。なお、音場収録装置１は、少なくともアンビソニックスマイク１１を有していればよく、Ａ／Ｄ変換部１２、符号化部１３及びマイク素子方向指定部１４は音場再現装置２に設けられてもよい。言い換えると、アンビソニックスマイク１１は、音場再現装置２の外部に設けられても構わない。 The sound field recording device 1 is placed, for example, in a sound field recording space (e.g., a live performance venue LV1) and includes an Ambisonics microphone 11, an A/D converter 12, an encoder 13, and a microphone element direction designator 14. Note that the sound field recording device 1 only needs to have at least the Ambisonics microphone 11; the A/D converter 12, encoder 13, and microphone element direction designator 14 may be provided in the sound field reproduction device 2. In other words, the Ambisonics microphone 11 may be provided outside the sound field reproduction device 2.

アンビソニックスマイク１１は、４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４を備え、マイク素子Ｍｃ１において前方左上方向（図１参照）の音を収録し、マイク素子Ｍｃ２において前方右下方向（図１参照）の音を収録し、マイク素子Ｍｃ３において後方左下方向（図１参照）の音を収録し、後方右上方向（図１参照）の音を収録する。なお、アンビソニックスマイク１１は、中空配置された４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４よりも多くの単一指向性を有するマイク素子を備えていてもよく、また、剛球上に配置された無指向性を有するマイク素子を備えていても良い。多数のマイク素子を備えたアンビソニックスマイクを用いることにより、符号化部１３において、２次以上オーダーのアンビソニックス信号を合成することが可能となる。アンビソニックスマイク１１を構成する各マイク素子により収録された信号（収録信号）は、Ａ／Ｄ変換部１２に入力される。 The Ambisonics microphone 11 includes four microphone elements Mc1, Mc2, Mc3, and Mc4. The microphone element Mc1 records sound from the upper left front direction (see FIG. 1), the microphone element Mc2 records sound from the lower right front direction (see FIG. 1), and the microphone element Mc3 records sound from the lower left rear direction (see FIG. 1) and the upper right rear direction (see FIG. 1). The Ambisonics microphone 11 may include more unidirectional microphone elements than the four microphone elements Mc1, Mc2, Mc3, and Mc4 arranged in midair, or may include omnidirectional microphone elements arranged on a rigid sphere. Using an Ambisonics microphone with multiple microphone elements enables the encoding unit 13 to synthesize second- or higher-order Ambisonics signals. The signals (recorded signals) recorded by each microphone element constituting the Ambisonics microphone 11 are input to the A/D conversion unit 12.

Ａ／Ｄ変換部１２、符号化部１３及びマイク素子方向指定部１４は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＧＰＵ（ＧｒａｐｈｉｃａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の電子デバイスのうち少なくとも１つが実装された半導体チップ若しくは専用のハードウェアにより構成される。 The A/D conversion unit 12, encoding unit 13, and microphone element direction designation unit 14 are configured by a semiconductor chip or dedicated hardware that implements at least one of the following electronic devices: a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), an FPGA (Field Programmable Gate Array), etc.

Ａ／Ｄ変換部１２は、アンビソニックスマイク１１を構成する各マイク素子からのアナログ形式の収録信号をディジタル形式の収録信号に変換して符号化部１３に送る。 The A/D conversion unit 12 converts the analog recording signals from each microphone element that makes up the Ambisonics microphone 11 into digital recording signals and sends them to the encoding unit 13.

符号化部１３は、Ａ／Ｄ変換部１２による変換後の収録信号とマイク素子方向指定部１４からの方向ベクトルθ_ｍとを用いて、Ａ／Ｄ変換部１２による変換後の収録信号を符号化処理することにより、低次基底音響信号（例えば１次オーダーアンビソニックス信号）を生成する。符号化部１３による符号化処理の詳細については後述する。 The encoding unit 13 generates a low-order basis acoustic signal (for example, a first-order Ambisonics signal) by encoding the recorded signal converted by the A/D conversion unit 12 using the recorded signal converted by the A/D conversion unit 12 and the direction vector θ _m from the microphone element direction designation unit 14. Details of the encoding process by the encoding unit 13 will be described later.

ここで、符号化部１３による符号化処理の詳細について説明する。 Here, we will explain the details of the encoding process performed by the encoding unit 13.

一般的に、球面上の任意の角度（θ、φ）に対し半径ｒの位置で観測（収録）される音圧ｐは波動方程式の球面調和関数領域における内部問題の解として、波数ｋに対し式（２）の球面調和関数を基底として式（４）と展開されることが知られている。式（４）において、Ａ^ｍ _ｎは展開係数であり、Ｒ_ｎ（ｋｒ）は動径関数項である。また、次数ｎに関する無限和は有限次数Ｎで打ち切ることで近似され、この打ち切り次数Ｎに応じて音場再現の精度が変化する。以下、打ち切り次数をＮとして表現する。 It is generally known that the sound pressure p observed (recorded) at a position of radius r for any angle (θ, φ) on a sphere can be expanded as equation (4) using the spherical harmonic functions of equation (2) as a basis for wave number k as a solution to an internal problem in the spherical harmonic domain of the wave equation. In equation (4), A ^m _n are expansion coefficients, and R _n (kr) is a radial function term. Furthermore, an infinite sum with respect to order n is approximated by truncation at a finite order N, and the accuracy of sound field reproduction changes depending on this truncation order N. Hereinafter, the truncation order will be expressed as N.

式（６）において、ｉは虚数単位であり、ｊ_ｎ（ｋｒ）はｎ次の球ベッセル関数、ｊ^’ _ｎ（ｋｒ）はその導関数である。本開示においては、この平面波に対する展開係数ベクトルγ^ｍ _ｎを、符号化部１３による符号化処理の出力であるＢフォーマット信号（中間表現）として取り扱う。以下、この展開係数ベクトルをアンビソニックス領域信号又は単にアンビソニックス信号と称する場合がある。 In equation (6), i is the imaginary unit, j _n (kr) is the nth-order spherical Bessel function, and j ^' _n (kr) is its derivative. In this disclosure, the expansion coefficient vector γ ^m _n for this plane wave is treated as a B-format signal (intermediate representation) that is the output of the encoding process by the encoding unit 13. Hereinafter, this expansion coefficient vector may be referred to as an Ambisonics domain signal or simply as an Ambisonics signal.

より具体的には、符号化部１３による符号化処理では、Ａ／Ｄ変換部１２による変換後の時間領域信号である収録信号をアンビソニックス信号（例えば１次オーダーアンビソニックス信号）へと変換し、このアンビソニックス信号（例えば１次オーダーアンビソニックス信号）は音場再現装置２の第１復号化部２５及び第２復号化部２６のそれぞれによりデコード処理されてスピーカ駆動信号に変換される。 More specifically, in the encoding process by the encoding unit 13, the recorded signal, which is a time-domain signal after conversion by the A/D conversion unit 12, is converted into an Ambisonics signal (e.g., a first-order Ambisonics signal), and this Ambisonics signal (e.g., a first-order Ambisonics signal) is decoded by each of the first decoding unit 25 and second decoding unit 26 of the sound field reproduction device 2 and converted into a speaker drive signal.

音場再現装置２は、例えば音場再現空間（例えばサテライト会場ＳＴＬ１）に配置され、音源抽出方向制御部２１と、音源提示方向制御部２２と、再符号化部２３と、スピーカ方向指定部２４と、第１復号化部２５と、第２復号化部２６と、信号混合部２７と、音場再生部２８と、スピーカＳＰｋ１、ＳＰｋ２、…、ＳＰｋ８とを含む。なお、以下の説明において、スピーカの配置数は一例として８としているが、２以上の整数であれば８に限定されないことは言うまでもない。 The sound field reproduction device 2 is placed, for example, in a sound field reproduction space (for example, satellite venue STL1) and includes a sound source extraction direction control unit 21, a sound source presentation direction control unit 22, a re-encoding unit 23, a speaker direction designation unit 24, a first decoding unit 25, a second decoding unit 26, a signal mixing unit 27, a sound field reproduction unit 28, and speakers SPk1, SPk2, ..., SPk8. In the following description, the number of speakers arranged is 8 as an example, but it goes without saying that it is not limited to 8 as long as it is an integer greater than or equal to 2.

信号混合部２７は、第１復号化部２５からの高次基底音響信号に対応するスピーカ駆動信号と、第２復号化部２６からの低次基底音響信号に対応するスピーカ駆動信号とを、スピーカごとに対応するように混合して音場再生部２８に送る。なお、信号混合部２７の構成は音場再現装置２から省略されてもよく、この場合には第１復号化部２５による高次基底音響信号のみが音場再生部２８を介して各スピーカＳＰｋ１～ＳＰｋ８のそれぞれから出力される。 The signal mixer 27 mixes the speaker drive signals corresponding to the high-order base acoustic signals from the first decoding unit 25 and the speaker drive signals corresponding to the low-order base acoustic signals from the second decoding unit 26 so as to correspond to each speaker, and sends the mixed signals to the sound field reproduction unit 28. Note that the signal mixer 27 may be omitted from the sound field reproduction device 2; in this case, only the high-order base acoustic signals from the first decoding unit 25 are output from each of the speakers SPk1 to SPk8 via the sound field reproduction unit 28.

音場再生部２８は、信号混合部２７による混合後のスピーカごとのディジタル形式のスピーカ駆動信号をアナログ形式のスピーカ駆動信号に変換して信号増幅し、対応するスピーカから出力（再生）する。 The sound field reproduction unit 28 converts the digital speaker drive signals for each speaker after mixing by the signal mixing unit 27 into analog speaker drive signals, amplifies the signals, and outputs (plays) them from the corresponding speakers.

スピーカＳＰｋ１、ＳＰｋ２、…、ＳＰｋ８のそれぞれは、立方体でモデル化される音場再現空間（例えばサテライト会場ＳＴＬ１）の各頂点部分に配置され、音場再生部２８からのスピーカ駆動信号に基づいて音場を再生（再現）する。なお、スピーカ設置数は再現したい音場によって変化させてよく、特定の方位に対する再現を行わない場合や、トランスオーラルシステムやＶＢＡＰ（ＶｅｃｔｏｒＢａｓｅｄＡｍｐｌｉｔｕｄｅＰａｎｎｉｎｇ）法など一般的に知られた仮想音像生成方式を組み合わせることにより８個よりも少ないスピーカを用いて音場再現を行っても良い。逆に、８個よりも多くのスピーカを用いた音場再現を行っても良い。また、スピーカ設置位置はサテライト会場ＳＴＬ１の基準位置（例えば中心位置ＬＳＰ１）を取り囲むように設置されていれば音場再現空間（例えばサテライト会場ＳＴＬ１）の各頂点部分以外であっても良い。音場再生部２８はスピーカの代わりに聴取者（ユーザ）が装着しているヘッドホンやイヤホンなどの両耳への再生装置に信号を出力しても良い。また、音場再生部２８は、聴取者（ユーザ）の両耳への再生装置（例えば、上述したヘッドホンやイヤホン）に信号を供給する際は後述するデコード処理によって方位角＋－９０°に対応した再生信号を生成しても良いし、頭部を包囲する複数の方向に対して仮想音像を生成し、それら複数の角度に対応したＨＲＴＦ（ＨｅａｄＲｅｌａｔｅｄＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ）などの立体音像をユーザに知覚させるための伝達特性を対応した方向の仮想音像に対して周波数領域で乗算又は時間領域で畳み込むことで再生信号を生成しても良い。これにより、サテライト会場ＳＴＬ１に配置されたスピーカＳＰｋ１、ＳＰｋ２、…、ＳＰｋ８のそれぞれからに限った音場再現となるのではなく、サテライト会場ＳＴＬ１に配置された聴取者（ユーザ）が装着している再生装置（例えば、上述したヘッドホンやイヤホン）への音場再現も可能となる。 Speakers SPk1, SPk2, ..., SPk8 are each placed at the vertices of a sound field reproduction space (e.g., satellite venue STL1) modeled as a cube, and reproduce (reproduce) the sound field based on the speaker drive signal from the sound field reproduction unit 28. The number of speakers installed may vary depending on the sound field to be reproduced. In cases where reproduction is not performed in a specific direction, or by combining commonly known virtual sound image generation methods such as a transaural system or the VBAP (Vector Based Amplitude Panning) method, sound field reproduction may be performed using fewer than eight speakers. Conversely, sound field reproduction may be performed using more than eight speakers. Furthermore, the speakers may be installed at locations other than the vertices of the sound field reproduction space (e.g., satellite venue STL1) as long as they are installed to surround the reference position of satellite venue STL1 (e.g., center position LSP1). The sound field reproduction unit 28 may output signals to a binaural reproduction device such as headphones or earphones worn by a listener (user) instead of speakers. When supplying signals to a binaural reproduction device (for example, the above-mentioned headphones or earphones) of the listener (user), the sound field reproduction unit 28 may generate reproduction signals corresponding to azimuth angles of +-90° by a decoding process described later, or may generate virtual sound images for a plurality of directions surrounding the head and generate reproduction signals by multiplying in the frequency domain or convolving in the time domain the virtual sound images for the corresponding directions with transfer characteristics such as HRTFs (Head Related Transfer Functions) corresponding to the plurality of angles that allow the user to perceive a three-dimensional sound image. This means that sound field reproduction is not limited to the speakers SPk1, SPk2, ..., SPk8 located in satellite venue STL1, but also makes it possible to reproduce the sound field on playback devices (for example, the headphones or earphones mentioned above) worn by listeners (users) located in satellite venue STL1.

ここで、再符号化部２３による再符号化処理、第１復号化部２５及び第２復号化部２６による処理の詳細について説明する。 Here, we will explain in detail the re-encoding process by the re-encoding unit 23 and the processing by the first decoding unit 25 and second decoding unit 26.

次に、図５を参照して、音場再現装置２による音場再現の動作手順について説明する。図５は、実施の形態１に係る音場再現装置２による音場再現の動作手順例を時系列に示すフローチャートである。なお、以下の説明ではステップＳｔ１及びステップＳｔ２の各処理は音場収録装置１内で実行されるとして説明するが、ステップＳｔ２の処理は音場収録装置１のアンビソニックスマイク１１以外の構成が音場再現装置２内に設けられる場合には音場再現装置２により実行されてよい。 Next, the operational procedure for sound field reproduction by the sound field reproduction device 2 will be described with reference to Figure 5. Figure 5 is a flowchart showing an example of the operational procedure for sound field reproduction by the sound field reproduction device 2 according to embodiment 1 in chronological order. Note that in the following explanation, the processes of steps St1 and St2 are described as being executed within the sound field recording device 1, but the process of step St2 may be executed by the sound field reproduction device 2 if components other than the Ambisonics microphone 11 of the sound field recording device 1 are provided within the sound field reproduction device 2.

音場再現装置２は、ステップＳｔ２の処理を受けて、ステップＳｔ３～ステップＳｔ６の一連の処理（つまり、高次基底音響信号を生成するための再符号化処理）とステップＳｔ７の処理（つまり、低次基底音響信号を生成するための復号化処理）とを並行して実行する。 After the processing of step St2, the sound field reproduction device 2 executes a series of processes from step St3 to step St6 (i.e., re-encoding processes to generate higher-order basis acoustic signals) and the processing of step St7 (i.e., decoding processes to generate lower-order basis acoustic signals) in parallel.

音場再現装置２の信号混合部２７は、ステップＳｔ６での第１復号化部２５からの高次基底音響信号に対応するスピーカ駆動信号（第１復号化処理の出力の一例）と、ステップＳｔ７での第２復号化部２６からの低次基底音響信号に対応するスピーカ駆動信号（第２復号化処理の出力の一例）とを、スピーカごとに対応するように混合する（ステップＳｔ８）。音場再現装置２の音場再生部２８は、ステップＳｔ８での信号混合部２７による混合後のスピーカごとのディジタル形式のスピーカ駆動信号をアナログ形式のスピーカ駆動信号に変換して信号増幅し、対応するスピーカＳＰｋ１～ＳＰｋ８のそれぞれから出力（再生）する（ステップＳｔ９）。 The signal mixing unit 27 of the sound field reproduction device 2 mixes the speaker drive signals (an example of the output of the first decoding process) corresponding to the high-order basis acoustic signals from the first decoding unit 25 in step St6 with the speaker drive signals (an example of the output of the second decoding process) corresponding to the low-order basis acoustic signals from the second decoding unit 26 in step St7 so as to correspond to each speaker (step St8). The sound field reproduction unit 28 of the sound field reproduction device 2 converts the digital speaker drive signals for each speaker after mixing by the signal mixing unit 27 in step St8 into analog speaker drive signals, amplifies the signals, and outputs (plays) them from the corresponding speakers SPk1 to SPk8 (step St9).

また、収録デバイスは、複数のマイク素子Ｍｃ１～Ｍｃ４のそれぞれが異なる方向を向くように立体的に配置されたアンビソニックスマイク１１により構成される。これにより、音場収録装置１は、音場収録空間（ライブ会場ＬＶ１）内の複数の音源による演奏等の雰囲気の音を立体的に収録することができる。 The recording device is also composed of an Ambisonics microphone 11, with multiple microphone elements Mc1 to Mc4 arranged three-dimensionally so that each faces a different direction. This allows the sound field recording device 1 to three-dimensionally record the atmospheric sounds of a performance or other event produced by multiple sound sources within the sound field recording space (live venue LV1).

まず、図６及び図７を参照して、実施の形態２に係る音場再現システム１００Ａのシステム構成並びに動作概要について説明する。図６は、実施の形態２に係る音場再現システム１００Ａのシステム構成例を示すブロック図である。図７は、実施の形態２の音場収録から音場再現までの動作概要例を示す図である。図６及び図７の説明において、対応する図３及び図４の構成及び動作と重複する内容については同一の符号を参照して説明を簡略化或いは省略し、異なる内容について説明する。 First, with reference to Figures 6 and 7, the system configuration and operation overview of sound field reproduction system 100A according to embodiment 2 will be described. Figure 6 is a block diagram showing an example system configuration of sound field reproduction system 100A according to embodiment 2. Figure 7 is a diagram showing an example of the operation overview from sound field recording to sound field reproduction according to embodiment 2. In the explanation of Figures 6 and 7, the same reference numerals will be used to simplify or omit explanations of content that overlaps with the configuration and operation of the corresponding Figures 3 and 4, and different content will be explained.

音場再現システム１００Ａは、音場収録装置１と、音場再現装置２Ａとを含む。音場収録装置１の構成は実施の形態１と同一であるため、説明を省略する。 Sound field reproduction system 100A includes sound field recording device 1 and sound field reproduction device 2A. The configuration of sound field recording device 1 is the same as in embodiment 1, so a description thereof will be omitted.

音場再現装置２Ａは、例えば音場再現空間（例えばサテライト会場ＳＴＬ１）に配置され、音源抽出方向制御部２１と、音源提示方向制御部２２と、再符号化部２３と、スピーカ方向指定部２４と、第１復号化部２５と、音源取得部２９と、第２符号化部３０と、第２信号混合部３１と、第２復号化部３２と、信号混合部２７と、音場再生部２８と、スピーカＳＰｋ１、ＳＰｋ２、…、ＳＰｋ８とを含む。 The sound field reproduction device 2A is placed, for example, in a sound field reproduction space (for example, satellite venue STL1) and includes a sound source extraction direction control unit 21, a sound source presentation direction control unit 22, a re-encoding unit 23, a speaker direction designation unit 24, a first decoding unit 25, a sound source acquisition unit 29, a second encoding unit 30, a second signal mixing unit 31, a second decoding unit 32, a signal mixing unit 27, a sound field reproduction unit 28, and speakers SPk1, SPk2, ..., SPk8.

音源取得部２９は、音場再現空間（例えばサテライト会場ＳＴＬ１）に提示したい複数の音源（例えばボーカル、ベース、ギター、ドラム等の各種の音源）の音響信号ｓ１［ｎ］、…、ｓｂ［ｎ］を取得して第２符号化部３０に送る。それぞれの音響信号ｓ１［ｎ］、…、ｓｂ［ｎ］は点音源として表現可能である。ｎは離散時刻を示し、ｂは音源の個数を示す。これらの音源は音場収録空間（ライブ会場Ｌｖ１）で個別に収録されたものであっても良いし、音場収録空間とは関係のない音源であっても良い。 The sound source acquisition unit 29 acquires acoustic signals s1[n], ..., sb[n] of multiple sound sources (e.g., various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in a sound field reproduction space (e.g., satellite venue STL1) and sends them to the second encoding unit 30. Each acoustic signal s1[n], ..., sb[n] can be expressed as a point sound source. n indicates a discrete time, and b indicates the number of sound sources. These sound sources may be recorded individually in the sound field recording space (live venue Lv1), or may be sound sources unrelated to the sound field recording space.

第２信号混合部３１は、第２符号化部３０による符号化処理により得られた音源ごとの高次基底音響信号（例えばＮ次オーダーアンビソニックス信号）を混合して第２復号化部３２に送る。 The second signal mixer 31 mixes the higher-order basis acoustic signals (e.g., Nth-order Ambisonics signals) for each sound source obtained by the encoding process by the second encoder 30, and sends the mixed signals to the second decoder 32.

次に、図８を参照して、音場再現装置２Ａによる音場再現の動作手順について説明する。図８は、実施の形態２に係る音場再現装置２Ａによる音場再現の動作手順例を時系列に示すフローチャートである。図８の説明において、図５の説明と重複する処理については同一のステップ番号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 Next, the operational procedure for sound field reproduction by the sound field reproduction device 2A will be described with reference to Figure 8. Figure 8 is a flowchart showing an example of the operational procedure for sound field reproduction by the sound field reproduction device 2A according to embodiment 2 in chronological order. In the description of Figure 8, processes that overlap with the description of Figure 5 will be assigned the same step numbers, and the description will be simplified or omitted, and only differences will be described.

図８において、音場再現装置２Ａの音源取得部２９は、音場再現空間（例えばサテライト会場ＳＴＬ１）に提示したい複数の音源（例えばボーカル、ベース、ギター、ドラム等の各種の音源）の音響信号ｓ１［ｎ］、…、ｓｂ［ｎ］（点音源信号の一例）を取得する（ステップＳｔ１１）。音場再現装置２Ａの第２符号化部３０は、ｂ個の点音源の方向ベクトルθ_ｂを、メモリ（図示略）から読み出す或いはユーザインターフェース（図示略）からの指定に基づいて取得する（ステップＳｔ１２）。 8 , the sound source acquisition unit 29 of the sound field reproduction device 2A acquires acoustic signals s1[n], ..., sb[n] (examples of point sound source signals) of multiple sound sources (e.g., various sound sources such as vocals, bass, guitar, drums, etc.) to be presented in a sound field reproduction space (e.g., satellite venue STL1) (step St11). The second encoding unit 30 of the sound field reproduction device 2A reads out direction vectors θ _b of b point sound sources from a memory (not shown) or acquires them based on specifications from a user interface (not shown) (step St12).

以上により、実施の形態２に係る音場再現装置２Ａは、音場再現空間（サテライト会場ＳＴＬ１）内に提示したい複数の音源信号（例えばボーカル、ベース、ギター、ドラム等の各種の音源からの音信号）のそれぞれを符号化処理して第２高次基底音響信号（Ｎ次オーダーアンビソニックス信号）を生成する第２符号化部３０と、音源信号ごとの第２高次基底音響信号を混合する第２信号混合部３１と、を更に備える。これにより、実施の形態２に係る音場再現装置２Ａは、音場収録空間（ライブ会場ＬＶ１）とは異なり音場再現空間（サテライト会場ＳＴＬ１）において独自に提示したい音源による雰囲気の音を高次基底によって高い方向解像度を有しながら出力することができる。 As described above, the sound field reproduction device 2A according to embodiment 2 further includes a second encoding unit 30 that encodes each of multiple sound source signals (e.g., sound signals from various sound sources such as vocals, bass, guitar, and drums) to be presented in the sound field reproduction space (satellite venue STL1) to generate second higher-order basis sound signals (Nth-order Ambisonics signals), and a second signal mixing unit 31 that mixes the second higher-order basis sound signals for each sound source signal. As a result, the sound field reproduction device 2A according to embodiment 2 can output atmospheric sounds from sound sources that are to be uniquely presented in the sound field reproduction space (satellite venue STL1), while maintaining high directional resolution through the use of high-order basis signals, unlike the sound field recording space (live venue LV1).

以上、添付図面を参照しながら実施の形態について説明したが、本開示はかかる例に限定されない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例、修正例、置換例、付加例、削除例、均等例に想到し得ることは明らかであり、それらについても本開示の技術的範囲に属すると了解される。また、発明の趣旨を逸脱しない範囲において、上述した実施の形態における各構成要素を任意に組み合わせてもよい。 Although the embodiments have been described above with reference to the accompanying drawings, the present disclosure is not limited to such examples. It will be clear to those skilled in the art that various modifications, alterations, substitutions, additions, deletions, and equivalents may be conceived within the scope of the claims, and it is understood that these also fall within the technical scope of the present disclosure. Furthermore, the components of the above-described embodiments may be combined in any manner as long as they do not deviate from the spirit of the invention.

本開示は、アンビソニックスマイクを用いて収録した低次の音場成分を利用し、音場再現空間内での音源定位誤差の増大を抑制する音場再現装置、音場再現方法及び音場再現システムとして有用である。 This disclosure is useful as a sound field reproduction device, sound field reproduction method, and sound field reproduction system that utilizes low-order sound field components recorded using an Ambisonics microphone to suppress increases in sound source localization errors within a sound field reproduction space.

１音場収録装置
２、２Ａ音場再現装置
１１アンビソニックスマイク
１２Ａ／Ｄ変換部
１３符号化部
１４マイク素子方向指定部
２１音源抽出方向制御部
２２音源提示方向制御部
２３再符号化部
２４スピーカ方向指定部
２５第１復号化部
２６、３２第２復号化部
２７信号混合部
２８音場再生部
２９音源取得部
３０第２符号化部
３１第２信号混合部
１００、１００Ａ音場再現システム
ＳＰｋ１、ＳＰｋ２、ＳＰｋ３、ＳＰｋ４、ＳＰｋ５、ＳＰｋ６、ＳＰｋ７、ＳＰｋ８スピーカ 1 Sound field recording device 2, 2A Sound field reproduction device 11 Ambisonics microphone 12 A/D conversion unit 13 Encoding unit 14 Microphone element direction designation unit 21 Sound source extraction direction control unit 22 Sound source presentation direction control unit 23 Re-encoding unit 24 Speaker direction designation unit 25 First decoding unit 26, 32 Second decoding unit 27 Signal mixing unit 28 Sound field reproduction unit 29 Sound source acquisition unit 30 Second encoding unit 31 Second signal mixing unit 100, 100A Sound field reproduction system SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, SPk8 Speaker

Claims

a sound source extraction direction control unit that receives a designation of a sound source extraction direction within a sound field recording space in which a recording device is placed;
a re-encoding unit that re-encodes an acoustic signal corresponding to the sound source extraction direction among low-order basis acoustic signals based on an encoding process using a signal recorded by the recording device to generate a high-order basis acoustic signal;
a sound field reproduction unit that outputs a signal based on the high-order base acoustic signal from each of a plurality of speakers provided in a sound field reproduction space different from the sound field recording space,
Sound field reproduction device.

a sound source presentation direction control unit that receives a designation of a sound source presentation direction that is the same as or different from the sound source extraction direction and is a direction of emphasizing sound field reproduction in a sound field reproduction space that is different from the sound field recording space;
the re-encoding unit performs the re-encoding using the acoustic signal and the sound source presentation direction to generate the high-order basis acoustic signal.
The sound field reproduction device according to claim 1 .

a first decoding unit that generates a first sound field drive signal having a high-order basis component for each of the speakers by using the high-order basis acoustic signal and arrangement information of each of the plurality of speakers;
The sound field reproduction device according to claim 1 .

a second decoding unit that generates a second sound field drive signal having a low-order basis component for each of the speakers by using the low-order basis acoustic signal and arrangement information of each of the plurality of speakers,
The sound field reproduction device according to claim 3.

a signal mixer that mixes the first sound field drive signal and the second sound field drive signal for each speaker;
the sound field reproduction unit outputs the mixed signal by the signal mixer to each of the speakers as a signal based on the high-order basis acoustic signal.
The sound field reproduction device according to claim 4.

The sound source extraction direction is specified as a three-dimensional direction from a reference position in the sound field recording space.
The sound field reproduction device according to claim 1 .

The sound source presentation direction is specified as a three-dimensional direction from a reference position in the sound field recording space.
The sound field reproduction device according to claim 2.

a second encoding unit that encodes each of a plurality of sound source signals to be presented in the sound field reproduction space to generate second higher-order basis acoustic signals;
a second signal mixer that mixes the second higher-order basis acoustic signals for each of the sound source signals,
The sound field reproduction device according to claim 3.

a second decoding unit that generates a third sound field drive signal having a high-order basis component for each speaker by using the second high-order basis acoustic signal for each sound source signal after mixing by the second signal mixer and arrangement information of each of the plurality of speakers,
The sound field reproduction device according to claim 8.

a signal mixing unit that mixes the first sound field drive signal and the third sound field drive signal for each speaker,
the sound field reproduction unit outputs the mixed signal by the signal mixer to each of the speakers as a signal based on the high-order basis acoustic signal.
The sound field reproduction device according to claim 9.

receiving a designation of a sound source extraction direction within a sound field recording space in which a recording device is placed;
generating high-order basis acoustic signals by re-encoding acoustic signals corresponding to the sound source extraction direction among low-order basis acoustic signals based on an encoding process using the signals recorded by the recording device;
outputting a signal based on the high-order base acoustic signal from each of a plurality of speakers provided in a sound field reproduction space different from the sound field recording space,
Sound field reproduction method.

a sound field recording device having a recording device capable of recording a sound source in a sound field recording space;
a sound field reproduction device that reproduces the acoustic signal recorded by the recording device in a sound field reproduction space that is different from the sound field recording space,
The sound field reproduction device comprises:
a sound source extraction direction control unit that receives a designation of a sound source extraction direction within the sound field recording space;
a re-encoding unit that re-encodes an acoustic signal corresponding to the sound source extraction direction among low-order basis acoustic signals based on an encoding process using a signal recorded by the recording device to generate a high-order basis acoustic signal;
a sound field reproduction unit that outputs a signal based on the high-order base acoustic signal from each of a plurality of speakers provided in the sound field reproduction space,
Sound field reproduction system.

The recording device is configured by an Ambisonics microphone in which a plurality of microphone elements are arranged three-dimensionally so that each of the microphone elements faces in a different direction.
A sound field reproduction system according to claim 12.