JP7847447B2

JP7847447B2 - Acoustic signal processing device, and program

Info

Publication number: JP7847447B2
Application number: JP2022025810A
Authority: JP
Inventors: 敦郎伊藤; 陽佐々木
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2026-04-17
Anticipated expiration: 2042-02-22
Also published as: JP2023122230A

Description

本発明は、音響信号処理装置、および、プログラム、特に、バイノーラル再生を実現するための技術に関する。 This invention relates to an acoustic signal processing device and a program, and more particularly to a technique for realizing binaural playback.

ＡＲ（Augmented Reality，拡張現実）／ＶＲ（Virtual Reality，仮想現実）コンテンツは、映像オブジェクトや視聴者の動きに合わせて音をレンダリングして制作または提示されることがある。音響レンダリングは、効果的な臨場感の演出に用いられる。ＡＲ／ＶＲコンテンツは、ヘッドホンまたはヘッドホン内蔵のヘッドマウンテッドディスプレイ（ＨＭＤ：Head Mounted Display）を頭部に装着して提示されることを前提に制作されることがある。ヘッドホンを用いて立体的な音響空間を仮想的に再現する技術としてバイノーラル再生技術が採用されることがある。 Augmented Reality (AR) and Virtual Reality (VR) content is sometimes created or presented by rendering sound in accordance with the movements of video objects and the viewer. Sound rendering is used to create an effective sense of presence. AR/VR content is sometimes created with the assumption that it will be presented while wearing headphones or a head-mounted display (HMD) with built-in headphones. Binaural playback technology may be employed as a technique to virtually recreate a three-dimensional sound space using headphones.

バイノーラル再生は、任意の位置に設置された音源から放射された音波が受聴者両耳の外耳道入口に到達して得られる音圧を再現して実現される。バイノーラル再生では、ヘッドホン、イヤホンなど、左右各耳に近接した再生音源を用いて音を直接提示する。音源から左右各耳までの音の伝達特性には、受聴者本人の頭部、耳介、胴体、など（以下、「頭部等」と総称する）における音波の反射、回折、減衰等の影響が含まれる。バイノーラル再生により、仮想的に設定された位置に設置されたかかる伝達特性が付加された音を提示して、受聴者に対して高い臨場感をもたらすことができる。 Binaural playback is achieved by reproducing the sound pressure obtained when sound waves emitted from a sound source placed at a predetermined position reach the entrance of the ear canal of both ears of the listener. In binaural playback, sound is presented directly using playback sound sources close to each ear, such as headphones or earphones. The sound transmission characteristics from the sound source to each ear include the effects of reflection, diffraction, and attenuation of sound waves in the listener's head, auricle, torso, etc. (hereinafter collectively referred to as "head, etc."). By presenting sound with these transmission characteristics added, virtually placed at a set position, binaural playback can provide the listener with a high sense of presence.

バイノーラル再生では、空間内の音源位置から聴取位置（受聴点）までの伝達特性を示す特徴量として、頭部伝達関数（ＨＲＴＦ：Head Related Transfer Function）または頭部インパルス応答（ＨＲＩＲ：Head Related Impulse Response）が用いられる。ＨＲＴＦは周波数領域で表現されるのに対し、ＨＲＩＲはＨＲＴＦの時間領域表現に相当する。一般に、ＨＲＴＦまたはＨＲＩＲは、無響室などの特殊な音響環境下で測定される。測定において、受聴者の周囲に配置したスピーカから測定信号に基づく音を放射し、受聴者の外耳道入口に設置したマイクロホンを用いて収音する。収音により得られた収音信号と既知の測定信号を用いてＨＲＴＦまたはＨＲＩＲ（以下、「ＨＲＩＲ等」と総称する）が得られる。 In binaural playback, the Head-Related Transfer Function (HRTF) or Head-Related Impulse Response (HRIR) is used as a feature quantity that indicates the transfer characteristics from the sound source position in space to the listening position (receiving point). While HRTF is expressed in the frequency domain, HRIR is the time-domain representation of HRTF. Generally, HRTF or HRIR is measured in a special acoustic environment such as an anechoic chamber. In the measurement, sound based on the measurement signal is emitted from speakers placed around the listener, and the sound is captured using a microphone placed at the entrance of the listener's ear canal. The HRTF or HRIR (hereinafter collectively referred to as "HRIR, etc.") is obtained using the captured signal and a known measurement signal.

バイノーラル信号は、入力される音響信号（以下、「入力信号」と呼ぶ）に左耳のＨＲＩＲを畳み込んで得られる左耳用の音響信号（以下、「左耳用信号」と呼ぶ）と、入力信号に右耳のＨＲＩＲを畳み込んで得られる右耳用信号を含む２チャネルの音響信号である。各耳用の音響信号（以下、「各耳用信号」と呼ぶ）ｙは、式（１）に例示されるように、入力信号にＨＲＩＲを畳み込み演算を行って得られる。式（１）において、ｙ（ｔ）、ｘ（ｔ）、ｈ（ｔ）は、それぞれ時刻ｔにおける各耳用信号、入力信号、ＨＲＩＲのサンプル値を示す。 A binaural signal is a two-channel acoustic signal containing a left-ear acoustic signal (hereinafter referred to as the "left-ear signal") obtained by convolving the left ear's HRIR into the input acoustic signal (hereinafter referred to as the "input signal"), and a right-ear signal obtained by convolving the right ear's HRIR into the input signal. Each ear's acoustic signal (hereinafter referred to as the "each ear's signal") y is obtained by performing a convolving operation with the input signal using HRIR, as illustrated in equation (1). In equation (1), y(t), x(t), and h(t) represent the sample values of each ear's signal, the input signal, and the HRIR at time t, respectively.

各耳用信号ｙ（ｔ）は、式（２）に例示されるように入力信号のフーリエ変換Ｘ（ω）とＨＲＴＦＨ（ω）との積Ｙ（ω）を算出し、積Ｙ（ω）を逆フーリエ変換して求めることもできる。式（２）において、ωは、周波数を示す。 Each ear signal y(t) can also be obtained by calculating the product Y(ω) of the Fourier transform X(ω) of the input signal and HRTF H(ω), as exemplified in equation (2), and then performing an inverse Fourier transform on the product Y(ω). In equation (2), ω represents frequency.

上記のように、バイノーラル再生は、ヘッドホンを用いて各耳用信号に基づく音を、それぞれ対応する耳に提示して実現される。両耳の外耳道に提示される音の音圧は、ＨＲＩＲ等の測定時に用いた音源の音源位置から到来する音波による音圧と同等となる。そのため、受聴者は、その音源位置に仮想的な音像を知覚することができる。 As described above, binaural playback is achieved by presenting sounds based on individual ear signals to the corresponding ears using headphones. The sound pressure of the sound presented to the external auditory canals of both ears is equivalent to the sound pressure from sound waves arriving from the sound source location used in measurements such as HRIR. Therefore, the listener can perceive a virtual sound image at that sound source location.

近年では、三次元空間における音響表現技術として空間フーリエ級数展開に基づく音場の表記に注目されている。代表的な例として球面調和展開に基づく方法が知られている（例えば、非特許文献１）。この方法は、式（３）に示されるように三次元空間における音圧分布ｐ（ｒ，θ，φ，ω）は、球面調和展開の基底関数の線形結合で表現できることに基づく。言い換えれば、球面調和展開により、三次元空間における音圧分布が動径方向成分と角度方向成分に変数分離した形式で記述される。式（３）は、デカルト座標系で与えられる三次元の波動方程式を変数変換し、極座標系で与えられる三次元の波動方程式の一般解に相当する。式（３）において、（ｒ，θ，φ）は、極座標系で表された三次元座標を示す。ｈ_ｎ ^（２）は、ｎ次第二種球ハンケル関数を示す。第二種球ハンケル関数は、動径方向ｒ成分の直交基底を与える。Ｙ_ｎ ^ｍは、ｎ次ｍ位球面調和関数を示す。球面調和関数は角度方向の直交基底を与える。Ａ_ｎ ^ｍは、球面調和スペクトルを示す。球面調和スペクトルは、第二種球ハンケル関数と球面調和関数の積を球面調和展開の基底関数に対する重み係数に相当し、音源の指向性などの音場の空間分布を表現することができる。これまで、球面調和スペクトルの仮想音源の任意方向への回転、任意位置への音圧の補間などの音場制御への応用が提案されている。 In recent years, the notation of sound fields based on spatial Fourier series expansion has attracted attention as a technique for representing acoustics in three-dimensional space. A representative example is a method based on spherical harmonic expansion (for example, Non-Patent Document 1). This method is based on the fact that the sound pressure distribution p(r, θ, φ, ω) in three-dimensional space can be expressed as a linear combination of basis functions of the spherical harmonic expansion, as shown in equation (3). In other words, the spherical harmonic expansion describes the sound pressure distribution in three-dimensional space in a form in which the variables are separated into radial and angular components. Equation (3) corresponds to the general solution of the three-dimensional wave equation given in polar coordinates by transforming the variables of the three-dimensional wave equation given in Cartesian coordinates. In equation (3), (r, θ, φ) represents the three-dimensional coordinates expressed in polar coordinates. _{h n} ⁽²⁾ represents the nth-order second-kind spherical Hankel function. The second-kind spherical Hankel function gives the orthogonal basis for the radial r component. Y _n ^m represents the nth-th order m-th spherical harmonics. Spherical harmonics provide an orthogonal basis in the angular direction. _{A n} ^m represents the spherical harmonic spectrum. The spherical harmonic spectrum can represent the spatial distribution of a sound field, such as the directivity of a sound source, by using the product of the second kind spherical Hankel function and the spherical harmonics as weighting coefficients for the basis functions of the spherical harmonic expansion. To date, applications of the spherical harmonic spectrum to sound field control, such as rotation of a virtual sound source in an arbitrary direction and interpolation of sound pressure at an arbitrary position, have been proposed.

この原理を用いた技術として、アンビソニックス（Ambisonics）（例えば、非特許文献２）が知られている。この技術は、次世代音声符号化方式であるＭＰＥＧ－Ｈ３ＤＡ（非特許文献３）でも採用され、規格化されている。また、三次元空間において測定されたＨＲＴＦを用いて、アンビソニックスを応用してバイノーラル信号をエンコードする方法も提案されている（特許文献１）。 Ambisonics (e.g., Non-Patent Document 2) is a known technology that utilizes this principle. This technology is also employed and standardized in MPEG-H 3DA (Non-Patent Document 3), a next-generation speech encoding scheme. Furthermore, a method has been proposed to encode binaural signals using ambisonics with HRTF measured in three-dimensional space (Patent Document 1).

特許第６０６７９３４号公報Patent No. 6067934

羽田陽一, “音の波数領域信号処理,” 電子情報通信学会基礎・境界ソサイエティ Fundamentals Review, 11巻, 4号, pp. 243-255, 2017.Yoichi Haneda, "Signal Processing in the Wavenumber Domain of Sound," Fundamentals Review, IEICE Fundamentals and Boundary Society, Vol. 11, No. 4, pp. 243-255, 2017. D. H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. Journal of Audio Engineering Society 20(5), pp.346-360, 1972.D. H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. Journal of Audio Engineering Society 20(5), pp.346-360, 1972. ISO/IEC 23008-3:2019 「Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio, Second edition」（2019）ISO/IEC 23008-3:2019 “Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio, Second edition” (2019) A. V. Oppenheim, R. W. Schafer, Digital signal processing, Englewood Cliffs, N.J.: Prentice-Hall, 1975.A. V. Oppenheim, R. W. Schafer, Digital signal processing, Englewood Cliffs, N.J.: Prentice-Hall, 1975. P. A. Martin, Multiple Scattering: Interaction of Time-Harmonic Waves with N obstacles, Cambridge university press, 2006.P. A. Martin, Multiple Scattering: Interaction of Time-Harmonic Waves with N obstacles, Cambridge university press, 2006.

ＨＲＩＲ等は、音響空間内の音源位置と受聴点の組に対して定まる。１組のＨＲＩＲ等を用いたバイノーラル再生によれば、その受聴点に所在していることを仮定して、その音源位置に静止した音像が知覚される。音像または受聴者の動きを表現するためには、音源位置と受聴点の位置関係の変化に応じて、ＨＲＩＲ等を切り替えることが一般的である。しかしながら、ＨＲＩＲの測定に係る音源位置と受聴点の組は有限個であるため、たとえ音源位置を固定しても表現できる音像の位置が限られる。ＨＲＩＲ等を切り替えるだけでは、任意の軌跡上で受聴点を移動させ、滑らかに移動する音像を受聴者に知覚させることは困難である。 HRIRs (Hearing, Resonance, and Infrared) are determined for each pair of sound source and listening point within the acoustic space. In binaural playback using a single pair of HRIRs, a sound image is perceived as stationary at the sound source location, assuming the listener is present at that location. To represent the movement of the sound image or the listener, it is common practice to switch HRIRs according to changes in the relative positions of the sound source and listening point. However, since the number of sound source and listening point pairs involved in HRIR measurement is finite, even if the sound source location is fixed, the positions of the sound image that can be represented are limited. Simply switching HRIRs makes it difficult to move the listening point along an arbitrary trajectory and allow the listener to perceive a smoothly moving sound image.

ＨＲＴＦを用いたバイノーラル再生では、音源信号をバッファ単位でフーリエ変換し、周波数領域でＨＲＴＦとの積をとってレンダリングがなされる。バッファごとに音源位置または受聴点が異なるＨＲＴＦが用いられる。少なくとも２個のバッファからの出力信号をミキシングした音を提示することで、音像の音源位置または受聴点がバッファ間で直線近似される。滑らかに移動する音像を知覚させるには、時間サンプル単位でレンダリングを行うことが望ましい。また、異なるＨＲＴＦに基づく出力信号を時間領域でミキシングすると、音像定位の手がかりとなる個々のＨＲＴＦが有する周波数特性が乱されるおそれがある。 In binaural playback using HRTF, the sound source signal is Fourier transformed buffer by buffer, and rendering is performed by multiplying it with the HRTF in the frequency domain. Each buffer uses a different HRTF, representing a different sound source position or listening point. By presenting a sound mixed from the output signals of at least two buffers, the sound source position or listening point of the sound image is linearly approximated between the buffers. To perceive a smoothly moving sound image, rendering in time-sample units is desirable. Furthermore, mixing output signals based on different HRTFs in the time domain may disrupt the frequency characteristics of individual HRTFs, which serve as cues for sound image localization.

本発明は、音源または受聴点が任意に移動する音を周波数領域でレンダリングすることを一つの課題とする。 One of the objectives of this invention is to render sound in the frequency domain when the sound source or listening point moves arbitrarily.

［１］本発明の一態様は、受聴点を基準とする音源位置での球面調和展開の基底関数から原点を基準とする音源位置での基底関数への変換係数を算出し、各耳について、頭部伝達関数の球面調和スペクトル、前記変換係数、および、音源信号に基づいて音場の球面調和スペクトルを算出する球面調和スペクトル算出部と、各耳について、前記音場の球面調和スペクトルを用いて前記基底関数を線形結合して音圧スペクトルを算出し、前記音圧スペクトルを時間領域の音響信号に変換するバイノーラル信号生成部と、を備える音響信号処理装置である。
［１］の構成によれば、各耳について周波数領域において頭部伝達関数の球面調和スペクトルを再計算せずに受聴点における音場を表す球面調和スペクトルが算出される。算出された球面調和スペクトルを用いて球面調和展開における基底関数を線形結合することで、音源または受聴点が任意に移動する音を周波数領域でレンダリングすることができる。レンダリングにより、音源位置または受聴点の滑らかな変動に応じて音の周波数特性が連続的に変動する。 [1] One aspect of the present invention is an acoustic signal processing device comprising: a spherical harmonic spectrum calculation unit that calculates conversion coefficients from basis functions of a spherical harmonic expansion at a sound source position with respect to the listening point to basis functions at a sound source position with respect to the origin, and for each ear, calculates a spherical harmonic spectrum of a sound field based on the spherical harmonic spectrum of the head-related transfer function, the conversion coefficients, and the sound source signal; and a binaural signal generation unit that calculates a sound pressure spectrum for each ear by linearly combining the basis functions using the spherical harmonic spectrum of the sound field, and converts the sound pressure spectrum into an acoustic signal in the time domain.
According to the configuration of [1], for each ear, a spherical harmonic spectrum representing the sound field at the listening point is calculated without recalculating the spherical harmonic spectrum of the head-related transfer function in the frequency domain. By linearly combining the basis functions in the spherical harmonic expansion using the calculated spherical harmonic spectrum, it is possible to render in the frequency domain a sound in which the sound source or listening point moves arbitrarily. Through rendering, the frequency characteristics of the sound continuously change in accordance with the smooth fluctuation of the sound source position or listening point.

［２］本発明の一態様は、上述の音響信号処理装置であって、前記受聴点は、頭部位置であり、前記頭部伝達関数の取得に係る基準点は、前記頭部位置であってもよい。
［２］の構成によれば、頭部位置が受聴点として用いられることで、各耳の位置が用いられる場合よりも、レンダリングに係る受聴点による操作を簡素にすることができる。また、頭部位置が頭部伝達関数の取得に係る基準点として用いられることで、左右各耳に係る頭部伝達関数の一括した取得および管理が容易になる。 [2] One aspect of the present invention is the above-described acoustic signal processing device, wherein the listening point is the head position, and the reference point for acquiring the head-related transfer function is the head position.
According to the configuration in [2], using the head position as the listening point simplifies the rendering operations based on the listening point compared to when the position of each ear is used. Furthermore, using the head position as the reference point for acquiring the head-related transfer function makes it easier to acquire and manage the head-related transfer functions for each ear simultaneously.

［３］本発明の一態様は、上述の音響信号処理装置であって、前記受聴点は、各耳の位置であり、前記頭部伝達関数の取得に係る基準点は、各耳の位置であってもよい。
［３］の構成によれば、各耳の位置が受聴点として用いられることで、その位置が頭部伝達関数の球面調和展開の展開中心として用いられる。そのため、頭部位置が受聴点として用いられる場合よりも、算出される音圧スペクトルの推定精度を向上させることができる。 [3] One aspect of the present invention is the above-described acoustic signal processing device, wherein the listening point is the position of each ear, and the reference point for acquiring the head-related transfer function is the position of each ear.
According to the configuration of [3], the position of each ear is used as a listening point, and that position is used as the expansion center for the spherical harmonic expansion of the head-related transfer function. Therefore, the estimation accuracy of the calculated sound pressure spectrum can be improved compared to when the head position is used as the listening point.

［４］本発明の一態様は、コンピュータに上述の音響信号処理装置として機能させるためのプログラムであってもよい。
［４］の構成によれば、各耳の位置が受聴点として用いられることで、その位置が頭部伝達関数の球面調和展開の展開中心として用いられる。そのため、頭部中心が受聴点として用いられる場合よりも、算出される音圧スペクトルの推定精度を向上させることができる。 [4] One aspect of the present invention is a program that causes a computer to function as the above-described acoustic signal processing device.
According to the configuration in [4], the position of each ear is used as the listening point, and that position is used as the expansion center for the spherical harmonic expansion of the head-related transfer function. Therefore, the estimation accuracy of the calculated sound pressure spectrum can be improved compared to when the center of the head is used as the listening point.

本発明によれば、音源または受聴点が任意に移動する音を周波数領域でレンダリングすることができる。 According to this invention, it is possible to render sound in the frequency domain even when the sound source or listening point moves arbitrarily.

第１の実施形態に係る音響信号処理装置の機能構成例を示す概略ブロック図である。This is a schematic block diagram showing an example of the functional configuration of the acoustic signal processing device according to the first embodiment. 第１の実施形態に係るグローバル座標系とローカル座標系を例示する図である。This figure illustrates the global coordinate system and local coordinate system according to the first embodiment. 第１の実施形態に係る音響信号処理を例示するフローチャートである。This is a flowchart illustrating the acoustic signal processing according to the first embodiment. 第２の実施形態に係るグローバル座標系とローカル座標系を例示する図である。This figure illustrates the global coordinate system and local coordinate system according to the second embodiment. 第２の実施形態に係る音響信号処理を例示するフローチャートである。This is a flowchart illustrating the acoustic signal processing according to the second embodiment.

＜第１の実施形態＞
以下、図面を参照しながら本発明の実施形態について説明する。まず、第１の実施形態に係る音響信号処理装置１０の機能構成例について説明する。図１は、本実施形態に係る音響信号処理装置１０の機能構成例を示す概略ブロック図である。
音響信号処理装置１０には、左右各耳について、受聴点の座標と音源信号が入力される。音響信号処理装置１０は、受聴点を基準とする音源位置での球面調和展開の基底関数から原点を基準とする音源位置での基底関数への変換係数を算出する。音響信号処理装置１０は、各耳について、頭部伝達関数の球面調和スペクトル、変換係数、および、音源信号に基づいて音場の球面調和スペクトルを算出する。音響信号処理装置１０は、各耳について、音場の球面調和スペクトルを用いて基底関数を線形結合して音圧スペクトルを算出し、音圧スペクトルを時間領域の音響信号に変換する。 <First Embodiment>
Embodiments of the present invention will be described below with reference to the drawings. First, an example of the functional configuration of the acoustic signal processing device 10 according to the first embodiment will be described. Figure 1 is a schematic block diagram showing an example of the functional configuration of the acoustic signal processing device 10 according to this embodiment.
The acoustic signal processing device 10 receives the coordinates of the listening point and the sound source signal for each ear (left and right). The acoustic signal processing device 10 calculates conversion coefficients from the basis functions of the spherical harmonic expansion at the sound source position relative to the listening point to the basis functions at the sound source position relative to the origin. For each ear, the acoustic signal processing device 10 calculates the spherical harmonic spectrum of the sound field based on the spherical harmonic spectrum of the head-related transfer function, the conversion coefficients, and the sound source signal. For each ear, the acoustic signal processing device 10 calculates the sound pressure spectrum by linearly combining the basis functions using the spherical harmonic spectrum of the sound field, and converts the sound pressure spectrum into a time-domain acoustic signal.

音響信号処理装置１０は、入力部１１０と、制御部１２０と、記憶部１３０と、出力部１４０と、を備える。
入力部１１０には、受聴点の座標を示す受聴点情報と、音源信号が入力される。入力部１１０は、受聴点情報と音源信号を制御部１２０に出力する。受聴点は、例えば、三次元空間における受聴者頭部の位置（以下、「頭部位置」と呼ぶ）である。頭部位置として、例えば、頭部の中心の位置が指示される。時刻ごとの受聴点の時系列は移動軌跡に相当する。入力部１１０は、時刻ごとに受聴点情報を入出力してもよいし、ある期間における移動軌跡を示す受聴点情報を入出力してもよい。入力部１１０は、時刻ごとに音源信号を取得してもよいし、その期間における音源信号を一括して取得してもよい。入力部１１０は、例えば、入力インタフェースである。 The acoustic signal processing device 10 comprises an input unit 110, a control unit 120, a storage unit 130, and an output unit 140.
The input unit 110 receives listening point information indicating the coordinates of the listening point and a sound source signal. The input unit 110 outputs the listening point information and the sound source signal to the control unit 120. The listening point is, for example, the position of the listener's head in three-dimensional space (hereinafter referred to as "head position"). For example, the position of the center of the head is indicated as the head position. The time series of the listening point for each time corresponds to the movement trajectory. The input unit 110 may input and output listening point information for each time, or it may input and output listening point information indicating the movement trajectory over a certain period. The input unit 110 may acquire the sound source signal for each time, or it may acquire the sound source signal for that period all at once. The input unit 110 is, for example, an input interface.

制御部１２０は、音響信号処理装置１０の機能を実現するための処理を実行する。制御部１２０は、球面調和スペクトル算出部１２２と、バイノーラル信号生成部１２４と、球面調和展開部１２６とを備える。 The control unit 120 performs processing to realize the functions of the acoustic signal processing device 10. The control unit 120 includes a spherical harmonic spectrum calculation unit 122, a binaural signal generation unit 124, and a spherical harmonic expansion unit 126.

球面調和スペクトル算出部１２２は、各時刻τについて受聴点として頭部位置［ｒ_ｈ（τ）］におけるｎ次ｍ位の球面調和展開における基底関数から音源位置ｒにおけるν次μ位の基底関数への変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）またはＳ’_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）を周波数ごとに算出する。本文において［…］は、ベクトルを示す。但し、後述の数式では、ベクトルは太字で表される。頭部位置［ｒ_ｈ（τ）］は、三次元の球面座標［ｒ_ｈ（τ），θ_ｈ（τ），φ_ｈ（τ）］で表される。ｒ_ｈ（τ）、θ_ｈ（τ）、φ_ｈ（τ）は、それぞれ頭部位置の動径、極角、方位角を示す。変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）、Ｓ’_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）は、球ベッセル関数の加法定理を用いて（式（７）において後述）、頭部位置ｒ_ｈ（τ）におけるｎ次（ｎは、０以上Ｎ以下の整数、Ｎは予め定めた０以上の整数）球ベッセル関数ｊ_ｎ（ｋｒ_ｈ（τ））（ｋは、波数）とｎ次ｍ位（ｍは、‐ｎ以上ｎ以下の整数）球面調和関数Ｙ_ｎ ^ｍθ_ｈ（τ），φ_ｈ（τ））の積であるｎ次ｍ位の基底関数を、音場全体を網羅するグローバル座標系における音源位置ｒにおけるν次（νは、０以上Ｎ以下の整数）第二種球ハンケル関数ｈ_ν ^（２）（ｋｒ）と第ν次μ位（μは、‐ν以上ν以下の整数）球面調和関数Ｙ_ν ^μ（θ，φ）の積もしくはν次球ベッセル関数ｊ_ν（ｋｒ）とν次μ位球面調和関数Ｙ_ν ^μ（θ，φ）との積となる球面調和展開のν次μ位基底関数の次数ならびに位数間の加重和とが等しくなるように定まる重み係数に相当する。球面調和スペクトル算出部１２２は、ｒ＜ｒ_ｓ（τ）のとき変換係数Ｓ_ｎ ^ｍ _ν ^μを算出し、ｒ＞ｒ_ｓ（τ）のとき変換係数Ｓ’_ｎ ^ｍ _ν ^μを算出する。ｒ、ｒ_ｓは、それぞれグローバル座標系における音源位置の動径、受聴点を基準とするローカル座標系における音源位置の動径を示す。即ち、変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）、Ｓ’_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）は、受聴点［ｒ_ｈ（τ）］を基準とする音源位置［ｒ_ｓ］での球面調和展開のｎ次ｍ位基底関数から、原点を基準とする音源位置［ｒ］でのν次μ位基底関数の加重和への変換における、個々の基底関数の寄与の度合いを示す。 The spherical harmonic spectrum calculation unit 122 calculates, for each time τ, the conversion coefficient S n m ν μ ([ _r _h (τ)]) or S' _n ^m _ν ^μ ([r _h (τ)]) for each frequency from the basis function in the n-th ^order m- _th order ^spherical harmonic expansion at the head position [r _h (τ)] as the listening point. In this text, [...] indicates a vector. However, in the formulas described later, vectors are shown in bold. The head position [r _h (τ)] is expressed in three-dimensional spherical coordinates [r _h (τ), θ _h (τ), φ _h (τ)]. _{r h} (τ), θ _h (τ), and φ _h (τ) represent the radial, polar, and azimuthal angles of the head position, respectively. The transformation coefficients S _n ^m _ν ^μ ([r _h (τ)]) and S' _n ^m _ν ^μ ([r _h (τ)]) are obtained using the addition theorem for spherical Bessel functions (described later in equation (7)) to obtain an nth-order m- _th basis function, which is the product of the nth-order (n is an integer between 0 and N, and N is a predetermined integer between 0 and N) spherical Bessel function j _n (kr _h (τ)) (k is the wavenumber) at head position r h (τ) and the nth-order m-th (m is an integer between -n and N) spherical harmonic function Y _n ^m θ _h (τ), φ _h (τ)) at sound source position r in a global coordinate system covering the entire sound field, and the ν-order (ν is an integer between 0 and N) second kind spherical Hankel function h _ν ⁽²⁾ (kr) and the νth-order μ-th (μ is an integer between -ν and ν) spherical harmonic function Y _ν ^μ This corresponds to a weighting coefficient determined such that the order of the ν-th order μ-th basis function of the spherical harmonic expansion, which is the _product of (θ, φ) or the product of the ν-th order spherical Bessel function j _ν (kr) and the ν-th order μ-th spherical harmonic function Y ν ^μ (θ, φ), and the weighted sum between the orders are equal. The spherical harmonic spectrum calculation unit 122 calculates the transformation coefficient S _n ^m _ν ^μ when r < r _s (τ), and calculates the transformation coefficient S' _n ^m _ν ^μ when r > r _s (τ). r and r _s represent the radial movement of the sound source position in the global coordinate system and the radial movement of the sound source position in the local coordinate system with respect to the listening point, respectively. In other words, the conversion coefficients S _n ^m _ν ^μ ([r _h (τ)]) and S' _n ^m _ν ^μ ([ _r _h (τ)]) indicate the degree of contribution of each basis function in the conversion from the nth-order m-th basis function of the spherical harmonic expansion at the sound source position [r _s ] relative to the listening point [r h (τ)] to a weighted sum of the ν-order μ-th basis functions at the sound source position [r] relative to the origin.

球面調和スペクトル算出部１２２は、入力部１１０から入力される音源信号ｓ（τ）に算出した変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）またはＳ’_ｎ ^ｍ _ν ^μ（［ｒ_ｈ（τ）］）を乗じて得られる変換音源信号ｚ（ｔ）（後述）に対してフーリエ変換を行い周波数領域の変換音源スペクトルＺ_ｎ ^ｍ _ν ^μ（ω）に変換する。
球面調和スペクトル算出部１２２は、記憶部１３０から左右各耳についてＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）を読み出し、読み出したＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）を変換音源スペクトルに乗じて音場の球面調和スペクトルＰ_ν ^μ（ω）を算出する。音場の球面調和スペクトルＰ_ν ^μ（ω）は、式（１３）（後述）を用いて算出される。球面調和スペクトル算出部１２２は、各耳について算出した音場の球面調和スペクトルＰ_ν ^μ（ω）をバイノーラル信号生成部１２４に出力する。 The spherical harmonic spectrum calculation unit 122 performs a Fourier transform on the converted sound source signal z(t) (described later), which is obtained by multiplying the sound source signal s(τ) input from the input unit 110 by the calculated conversion coefficient S _n ^m _ν ^μ ([ _{r h} ₍ τ)]) or S' _n ^{m ν} _μ ⁽ [r h (τ)]), and converts it into a converted sound source spectrum Z _n ^m _ν ^μ (ω) in the frequency domain.
The spherical harmonic spectrum calculation unit 122 reads the spherical harmonic spectrum α _n ^m (ω) of the HRTF for each ear from the memory unit 130, and multiplies the read spherical harmonic spectrum α _n ^m (ω) of the HRTF by the converted sound source spectrum to calculate the spherical harmonic spectrum P _ν ^μ (ω) of the sound field. The spherical harmonic spectrum P _ν ^μ (ω) of the sound field is calculated using equation (13) (described later). The spherical harmonic spectrum calculation unit 122 outputs the spherical harmonic spectrum P _ν ^μ (ω) of the sound field calculated for each ear to the binaural signal generation unit 124.

バイノーラル信号生成部１２４は、球面調和スペクトル算出部１２２から入力される音場の球面調和スペクトルＰ_ν ^μ（ω）を重み係数とし、球ベッセル関数ｊ_ν（ｋｒ）と球面調和関数Ｙ_ν ^μ（θ，φ）の積または第二種球ハンケル関数ｈ_ν ^（２）（ｋｒ）と球面調和関数Ｙ_ν ^μ（θ，φ）の積を基底関数とする加重和を音圧スペクトルＰ（ｒ，ω）として算出する。音圧スペクトルＰ（ｒ，ω）は、式（１２）（後述）を用いて算出される。バイノーラル信号生成部１２４は、各耳について算出した周波数領域の音圧スペクトルＰ（ｒ，ω）に対して逆フーリエ変換を行い時間領域の音響信号を各耳用信号として生成する。バイノーラル信号生成部１２４は、各耳用信号からなるバイノーラル信号を出力信号として出力部１４０を経由して出力する。 The binaural signal generation unit 124 uses the spherical harmonic spectrum P _ν ^μ (ω) of the sound field input from the spherical harmonic spectrum calculation unit 122 as weighting coefficients, and calculates the sound pressure spectrum P(r,ω) as a weighted sum using the product of the spherical Bessel function j _ν (kr) and the spherical harmonic function Y _ν ^μ (θ,φ) or the product of the second kind spherical Hankel function h _ν ⁽²⁾ (kr) and the spherical harmonic function Y _ν ^μ (θ,φ) as basis functions. The sound pressure spectrum P(r,ω) is calculated using equation (12) (described later). The binaural signal generation unit 124 performs an inverse Fourier transform on the frequency domain sound pressure spectrum P(r,ω) calculated for each ear to generate time domain acoustic signals as signals for each ear. The binaural signal generation unit 124 outputs the binaural signal consisting of each ear signal as an output signal via the output unit 140.

球面調和展開部１２６は、左右各耳に対し受聴点と音源位置の組ごとにＨＲＴＦを取得し、取得したＨＲＴＦに対して球面調和展開を行い、受聴点と音源位置の組に共通の球面調和スペクトルα_ｎ ^ｍ（ω）を予め算出しておく。個々のＨＲＴＦは、式（６）（後述）に示すように球面調和展開により、ｎ次第二種球ハンケル関数とｎ次ｍ位球面調和関数の積を基底関数とする線形結合で表現される。球面調和展開部１２６は、動径方向への依存性がｎ次第二種球ハンケル関数で説明され、角度方向への依存性がｎ次ｍ位球面調和関数で説明されるように、受聴点と音源位置の組間で共通の重み係数を球面調和スペクトルα_ｎ ^ｍ（ω）として算出することができる。球面調和展開部１２６は、算出した球面調和スペクトルα_ｎ ^ｍ（ω）を示すＨＲＴＦデータを記憶部１３０に記憶する。 The spherical harmonic expansion unit 126 acquires the HRTF for each pair of listening point and sound source position for each left and right ear, performs a spherical harmonic expansion on the acquired HRTF, and pre-calculates a common spherical harmonic spectrum α _n ^m (ω) for the listening point and sound source position pairs. Each HRTF is expressed as a linear combination using the product of an nth-order second-kind spherical Hankel function and an nth-order m-th spherical harmonic function as basis functions by spherical harmonic expansion, as shown in equation (6) (described later). The spherical harmonic expansion unit 126 can calculate a common weight coefficient as the spherical harmonic spectrum α n m (ω) for the listening point and sound source position pairs such that the dependence in the radial direction is explained by the nth-order second-kind spherical Hankel function and the dependence in the angular direction is explained by _{the nth-order} ^m -th spherical harmonic function. The spherical harmonic expansion unit 126 stores the HRTF data showing the calculated spherical harmonic spectrum α _n ^m (ω) in the storage unit 130.

記憶部１３０には、制御部１２０における処理に用いられるデータ、制御部１２０により取得されたデータが記憶される。記憶部１３０は、例えば、ＲＡＭ（Random Access memory）、ＲＯＭ（Read Only Memory）などの記憶媒体を含んで構成される。
出力部１４０は、バイノーラル信号生成部１２４から入力されるバイノーラル信号を外部に出力する。出力部１４０は、例えば、出力インタフェースである。出力部１４０は、入力部１１０と一体化し、入出力インタフェースとして構成されてもよい。 The storage unit 130 stores data used for processing in the control unit 120 and data acquired by the control unit 120. The storage unit 130 is configured to include storage media such as RAM (Random Access memory) and ROM (Read Only Memory).
The output unit 140 outputs the binaural signal input from the binaural signal generation unit 124 to the outside. The output unit 140 is, for example, an output interface. The output unit 140 may be integrated with the input unit 110 and configured as an input/output interface.

上記の手法によれば、ローカル座標系の原点である受聴点に基づくＨＲＴＦの球面調和スペクトルがグローバル座標系の原点に基づくＨＲＴＦの球面調和スペクトルに変換され、動径方向の成分と角度方向の成分に分離される。そして、ローカル座標系からグローバル座標系への基底関数の変換係数と音源信号から変換音源スペクトルが得られる。さらに、変換音源スペクトルと、音源位置と受聴点に共通のＨＲＴＦの球面調和スペクトルを用いて音圧スペクトルに変換される。そのため、ＨＲＴＦが畳み込まれた音源信号が、任意の受聴点、音源位置に対して周波数領域で補間される。音像定位の手がかりとなる音の周波数特性（スペクトラルキュー、spectral cue）が演算により乱されないので、受聴者に対し、より確実な音源位置への音像定位が期待される。これに対し、非特許文献４に記載の重複加算法では、異なる目標方向に係るバイノーラル信号が時間領域で加算される。個々のバイノーラル信号に含まれるＨＲＴＦの位相差のために両者に干渉が生じる。干渉による周波数特性の乱れのため、目標方向への音像定位が実現できないことがあった。本実施形態は、かかる課題の解決手段となりうる。 According to the method described above, the spherical harmonic spectrum of the HRTF based on the listening point, which is the origin of the local coordinate system, is converted to the spherical harmonic spectrum of the HRTF based on the origin of the global coordinate system, and separated into radial and angular components. Then, the converted sound source spectrum is obtained from the conversion coefficients of the basis functions from the local coordinate system to the global coordinate system and the sound source signal. Furthermore, the converted sound source spectrum and the spherical harmonic spectrum of the HRTF common to the sound source position and the listening point are used to convert it into a sound pressure spectrum. As a result, the sound source signal with the HRTF convolved is interpolated in the frequency domain for any listening point and sound source position. Since the frequency characteristics of the sound (spectral cue), which are clues for sound image localization, are not disturbed by the calculation, more reliable sound image localization to the sound source position can be expected for the listener. In contrast, in the overlapping summation method described in Non-Patent Literature 4, binaural signals relating to different target directions are added in the time domain. Interference occurs between the two due to the phase difference of the HRTF contained in each binaural signal. Due to interference-induced distortion of frequency characteristics, it was sometimes impossible to achieve sound image localization in the target direction. This embodiment can serve as a solution to this problem.

次に、本実施形態に係る音響信号処理について、より詳細に説明する。本実施形態は、三次元空間におけるＨＲＴＦの球面調和展開に基づく。ここで、再生音源と受聴者を含む音場全体を網羅するグローバル座標系と、受聴者の頭部位置を原点とするローカル座標系のそれぞれについて図２を用いて説明する。グローバル座標系における原点Ｏから任意の位置Ｓを表すベクトルを［ｒ］、任意の軌跡上を移動する頭部の時刻ｔにおける位置（以下、「頭部位置」と呼ぶことがある）Ｈを表すベクトルを［ｒ_ｈ（ｔ）］、頭部位置Ｈを原点とするローカル座標系における任意の位置Ｓを表すベクトルを［ｒ_ｓ（ｔ）］と表す。ベクトル［ｒ］、［ｒ_ｈ（ｔ）］、［ｒ_ｓ（ｔ）］は、それぞれ三次元の球座標［ｒ（ｔ），θ（ｔ），φ（ｔ）］、［ｒ_ｈ（ｔ），θ_ｈ（ｔ），φ_ｈ（ｔ）］、［ｒ_ｓ（ｔ），θ_ｓ（ｔ），φ_ｓ（ｔ）］と表される。 Next, the acoustic signal processing according to this embodiment will be described in more detail. This embodiment is based on the spherical harmonic expansion of HRTF in three-dimensional space. Here, the global coordinate system that covers the entire sound field including the playback sound source and the listener, and the local coordinate system with the listener's head position as the origin will be explained using Figure 2. [r] represents a vector representing an arbitrary position S from the origin O in the global coordinate system, [r _h (t)] represents the position H of the head moving along an arbitrary trajectory at time t (hereinafter sometimes referred to as "head position"), and [r _s (t)] represents a vector representing an arbitrary position S in the local coordinate system with the head position H as the origin. The vectors [r], [r _h (t)], and [r _s (t)] can be expressed as three-dimensional spherical coordinates [r (t), θ (t), φ (t)], [r _h (t), θ _h (t), φ _h (t)], and [r _s (t), θ _s (t), φ _s (t)], respectively.

ＨＲＴＦの測定において位置Ｓが測定用のスピーカが設置される音源位置だと仮定すると、音源位置Ｓから頭部位置Ｈまでの音の周波数領域での伝達特性を示す伝達関数はＨＲＴＦ、その伝達特性を時間領域で表現したインパルス応答がＨＲＩＲに相当する。本実施形態では、ＨＲＴＦを球面調和関数で展開して得られる「球面調和スペクトルで表される指向性を有する頭部の移動」として、受聴点としての頭部位置Ｈの移動が記述される。なお、ＨＲＴＦは音源と頭部の相対的な位置関係によって決まるため、受聴点に代え、目標点の移動とみなされてもよい。目標点は、仮想的に音源が設置され、音像定位の目標となる位置を指す。 In HRTF measurement, assuming that position S is the sound source position where the measurement speaker is installed, the transfer function representing the frequency domain transfer characteristics of sound from sound source position S to head position H corresponds to HRTF, and the impulse response representing that transfer characteristic in the time domain corresponds to HRIR. In this embodiment, the movement of the head position H as the listening point is described as "the movement of the head with directivity represented by a spherical harmonic spectrum," obtained by expanding HRTF using spherical harmonics. Note that since HRTF is determined by the relative positional relationship between the sound source and the head, it may be considered as the movement of a target point instead of the listening point. The target point refers to the position where the sound source is virtually installed and which serves as the target for sound image localization.

グローバル座標系において受聴点Ｈが任意の軌道上を移動し、音源位置Ｓから音が放音される場合を仮定する。このとき、音源位置Ｓ（座標［ｒ］）からの時刻τにおける受聴点Ｈ（座標［ｒ_ｈ（τ）］）までのインパルス応答ｇ（［ｒ］‐［ｒ_ｈ（τ）］）は時変インパルス応答となる。式（４）に示すように、時刻ｔにおけるバイノーラル信号ｐ（［ｒ］，ｔ）は、音源信号ｓ（ｔ）にインパルス応答ｇ（［ｒ］‐［ｒ_ｈ（τ）］）を用いて畳み込み演算を行って得られる。周波数領域でのバイノーラル信号のスペクトルは、式（５）に示すように、時間領域でのバイノーラル信号ｐ（［ｒ］，ｔ）をフーリエ変換して得られる。 Assume that in the global coordinate system, the listening point H moves along an arbitrary trajectory, and sound is emitted from the sound source position S. In this case, the impulse response g([r] - [r _h (τ)]) from the sound source position S (coordinate [ _r ]) to the listening point H (coordinate [r h (τ)]) at time τ is a time-varying impulse response. As shown in equation (4), the binaural signal p([r], t) at time t is obtained by performing a convolution operation on the sound source signal s(t) using the impulse response g([r] - [r _h (τ)]). The spectrum of the binaural signal in the frequency domain is obtained by performing a Fourier transform on the binaural signal p([r], t) in the time domain, as shown in equation (5).

ＨＲＴＦは、頭部に設置されたマイクロホンを用いて測定されるため、頭部位置を原点とするローカル座標系で球面調和展開を行うことで頭部周りの分布として表現される。ローカル座標系における音源位置Ｓ（座標［ｒ_ｓ］）から頭部位置Ｈ（座標［ｒ_ｈ］）での周波数領域における伝達関数、即ち、ＨＲＴＦは、Ｇ（［ｒ_ｓ］‐［ｒ_ｈ（τ）］，ω）と表される。球面調和展開によれば、式（６）に示すように、ＨＲＴＦは、ｎ次第二種球ハンケル関数とｎ次ｍ位球面調和関数の積を基底関数とする線形結合、つまり、次数と位数を跨いだ加重和に変換される。そのため、ＨＲＴＦは、個々の基底関数に対して乗じられる重み係数からなる球面調和スペクトルα_ｎ ^ｍ（ω）で表現される。ＨＲＴＦは、左耳と右耳とで別個に取得されるため、球面調和スペクトルα_ｎ ^ｍ（ω）も左耳と右耳とで異なる。 Since HRTF is measured using a microphone placed on the head, it is represented as a distribution around the head by performing a spherical harmonic expansion in a local coordinate system with the head position as the origin. The transfer function in the frequency domain from the sound source position S (coordinate [r _s ]) to the head position H (coordinate [r _h ]) in the local coordinate system, i.e., the HRTF, is expressed as G([r _s ] - [r _h (τ)], ω). According to the spherical harmonic expansion, as shown in equation (6), the HRTF is transformed into a linear combination with basis functions being the product of an nth-order second kind spherical Hankel function and an nth-order m-order spherical harmonic function, that is, a weighted sum that crosses order and degree. Therefore, the HRTF is represented by a spherical harmonic spectrum α _n ^m (ω) consisting of weight coefficients multiplied by each individual basis function. Since HRTF is acquired separately for the left and right ears, the spherical harmonic spectrum α _n ^m (ω) also differs between the left and right ears.

式（６）に示す球面調和展開によれば、展開中心とする頭部位置Ｈが時間経過に応じて変化する。そのため、直交基底関数である第二種球ハンケル関数と球面調和関数も頭部位置Ｈの移動の度に再計算する必要がある。そこで、本実施形態では、球ベッセル関数の加法定理を用いてグローバル座標系の原点に展開中心をシフトし、頭部位置Ｈに関わらず、固定の音源位置に係る直交基底関数を用いて音場を表す。球ベッセル関数の加法定理は、三次元座標［ｒ_１］（＝［ｒ_１、θ_１、φ_１］）、［ｒ_２］（＝［ｒ_２、θ_２、φ_２］）の間で［ｒ_２］＝［ｒ_１］＋［ｂ］（ｂは、三次元座標［ｒ_１］、［ｒ_２］間の座標）となるとき、式（７）に示す関係が成り立つことを指す。式（７）において、変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｂ］）、Ｓ’_ｎ ^ｍ _ν ^μ（［ｂ］）は、それぞれ式（８）、（９）により与えられる。 According to the spherical harmonic expansion shown in equation (6), the head position H, which is the center of the expansion, changes over time. Therefore, the orthogonal basis functions, the second kind of spherical Hankel function and the spherical harmonics, also need to be recalculated each time the head position H moves. In this embodiment, the expansion center is shifted to the origin of the global coordinate system using the addition theorem of spherical Bessel functions, and the sound field is represented using orthogonal basis functions relating to a fixed sound source position, regardless of the head position H. The addition theorem of spherical Bessel functions states that when [r ₂ ] = [r ₁ _] + [ _b ] (where _b is the coordinate between the three-dimensional coordinates [r ₁ ] and [r 2]) between the three-dimensional coordinates [r ₁ ] (= [r ₁ , θ ₁ , φ ₁ ]) and [r ₂ ] (= [r ₂ , θ 2, φ 2]), the relationship shown in equation (7) holds. In equation (7), the conversion coefficients S _n ^m _ν ^μ ([b]) and S' _n ^m _ν ^μ ([b]) are given by equations (8) and (9), respectively.

式（８）、（９）において、Ｙ^＊ _ｑ ^μ－ｍ（θ_ｂ，φ_ｂ）は、球面調和関数Ｙ_ｑ ^μ－ｍ（θ_ｂ，φ_ｂ）の複素共役を示す。Ｗ_１、Ｗ_２は、式（１０）のウィグナー（Wigner）の３ｊ記号を示す。 In equations (8) and (9), Y ^* _q ^{μ - m} (θ _b , φ _b ) represents the complex conjugate of the spherical harmonic function Y _q ^{μ - m} (θ _b , φ _b ). _W1 and _W2 represent the Wigner 3j notation in equation (10).

式（６）に示すＨＲＴＦに球ベッセル関数の加法定理を適用し、球面調和関数の展開中心をグローバル座標系の原点Ｏにシフトすることで、ＨＲＴＦはグローバル座標系での音源位置に係る基底関数の線形結合で表される。具体的には、式（７）の三次元座標［ｒ_１］、［ｂ］にそれぞれグローバル座標系における原点Ｏを基準とする音源位置の座標［ｒ］、頭部位置の座標［ｒ_ｈ（τ）］を代入して式（６）に適用することで、ＨＲＴＦは、式（１１）のように変形される。そして、式（１１）で表されるＨＲＴＦを式（５）に代入すると、音圧スペクトルＰ（［ｒ］，ω）が式（１２）に示すように与えられる。 By applying the addition formula for spherical Bessel functions to the HRTF shown in equation (6) and shifting the expansion center of the spherical harmonics to the origin O of the global coordinate system, the HRTF can be expressed as a linear combination of basis functions relating to the sound source position in the global coordinate system. Specifically, by substituting the coordinates of the sound source position [r] and the head position [ _{r h} ₍ τ)] relative to the origin O in the global coordinate system into the three-dimensional coordinates [r 1] and [b] of equation (7), respectively, and applying this to equation (6), the HRTF is transformed as shown in equation (11). Then, substituting the HRTF expressed in equation (11) into equation (5), the sound pressure spectrum P([r], ω) is given as shown in equation (12).

式（１２）において、球ベッセル関数ｊ_ν（ｋｒ）と球面調和関数Ｙ_ν ^μ（θ，φ）との積となる基底関数に対して乗じられる音場の球面調和スペクトルＰ_ν ^μ（ω）は、式（１３）により与えられる。上記の変換音源信号ｚ（τ）は、式（１３）の音源信号ｓ（τ）と変換係数Ｓ_ｎ ^ｍ _ν ^μ（ｒ_ｈ（τ））との積に相当する。時間領域の変換音源信号ｚ（τ）は、周波数領域の変換音源スペクトルＺ_ｎ ^ｍ _ν ^μ（ω）に変換される。式（１３）に示すように変換音源スペクトルＺ_ｎ ^ｍ _ν ^μ（ω）は、さらにＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）に乗じられる。 In equation (12), the spherical harmonic spectrum _P _ν ^μ (ω) of the sound field, which is multiplied by the basis function that is the product of the spherical Bessel function ^j _ν (kr) and the spherical harmonic function Y ν μ (θ, φ), is given by equation (13). The above-mentioned transformed sound source signal z (τ) corresponds to the product of the sound source signal s (τ) in equation (13) and the transformation coefficient S _n ^m _ν ^μ (r _h (τ)). The transformed sound source signal z (τ) in the time domain is transformed into the transformed sound source spectrum Z _n ^m _ν ^μ (ω) in the frequency domain. As shown in equation (13), the transformed sound source spectrum Z _n ^m _ν ^μ (ω) is further multiplied by the spherical harmonic spectrum α _n ^m (ω) of the HRTF.

音場の球面調和スペクトルＰ_ν ^μ（ω）は、グローバル座標系の原点Ｏを展開中心として与えられる。よって、球面調和スペクトル算出部１２２は、式（１３）を用いてグローバル座標系で与えられる音源位置の座標［ｒ］と受聴点の座標［ｒ_ｈ（τ）］に基づいて、逐次にＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）を算出せずに、音場の球面調和スペクトルＰ_ν ^μ（ω）を算出することができる。そして、バイノーラル信号生成部１２４は、式（１２）で与えられるバイノーラル信号のスペクトルを逆フーリエ変換することで出力信号としてバイノーラル信号を生成することができる。 The spherical harmonic spectrum P _ν ^μ (ω) of the sound field is given with the origin O of the global coordinate system as the unfolding center. Therefore, the spherical harmonic spectrum calculation unit 122 can calculate the spherical harmonic spectrum P ν μ (ω) of the sound field without sequentially calculating the spherical harmonic spectrum α _n ^m (ω) of the HRTF based on the coordinates [r] of the sound source position and the coordinates [ _r _h (τ)] of the listening point given in the global coordinate system using ^equation (13). Then, the binaural signal generation unit 124 can generate a binaural signal as an output signal by performing an inverse Fourier transform on the spectrum of the binaural signal given by equation (12).

次に、本実施形態に係る音響信号処理の例について説明する。図３は、本実施形態に係る音響信号処理を例示するフローチャートである。
（ステップＳ１０２）入力部１１０には、受聴点の座標と音源信号が入力される。
（ステップＳ１０４）球面調和スペクトル算出部１２２は、各時刻において、受聴点として頭部中心を基準とする音源位置での球面調和展開の基底関数から原点を展開中心として基準とする音源位置での基底関数への変換係数を周波数ごとに算出する。
（ステップＳ１０６）球面調和スペクトル算出部１２２は、各耳について、音源信号に変換係数を乗じて得られる変換音源信号を周波数領域の変換音源スペクトルに変換する。球面調和スペクトル算出部１２２は、式（１３）に従い、変換した変換音源スペクトルにＨＲＴＦの球面調和スペクトルを乗じて音場の球面調和スペクトルを算出する。 Next, an example of acoustic signal processing according to this embodiment will be described. Figure 3 is a flowchart illustrating acoustic signal processing according to this embodiment.
(Step S102) The input unit 110 receives the coordinates of the listening point and the sound source signal.
(Step S104) The spherical harmonic spectrum calculation unit 122 calculates, for each frequency, the conversion coefficients from the basis functions of the spherical harmonic expansion at the sound source position with the center of the head as the reference point as the listening point to the basis functions at the sound source position with the origin as the expansion center.
(Step S106) The spherical harmonic spectrum calculation unit 122 converts the converted sound source signal obtained by multiplying the sound source signal by a conversion coefficient for each ear into a converted sound source spectrum in the frequency domain. The spherical harmonic spectrum calculation unit 122 calculates the spherical harmonic spectrum of the sound field by multiplying the converted sound source spectrum by the spherical harmonic spectrum of the HRTF according to equation (13).

（ステップＳ１０８）バイノーラル信号生成部１２４は、各耳について、式（１２）に従い音場の球面調和スペクトルを用いて球面調和展開の基底関数を線形結合して音圧スペクトルを算出する。線形結合として、音場の球面調和スペクトルを重み係数とする球面調和展開の基底関数の加重和が音圧スペクトルとして得られる。
（ステップＳ１１０）バイノーラル信号生成部１２４は、各耳について、周波数領域の音圧スペクトルを時間領域の出力信号に変換し、変換した出力信号を出力部１４０に出力する。その後、図３に示す処理を終了する。 (Step S108) The binaural signal generation unit 124 calculates the sound pressure spectrum for each ear by linearly combining the basis functions of the spherical harmonic expansion using the spherical harmonic spectrum of the sound field according to equation (12). As a linear combination, the weighted sum of the basis functions of the spherical harmonic expansion, with the spherical harmonic spectrum of the sound field as the weight coefficient, is obtained as the sound pressure spectrum.
(Step S110) The binaural signal generation unit 124 converts the frequency domain sound pressure spectrum for each ear into a time domain output signal and outputs the converted output signal to the output unit 140. After that, the process shown in Figure 3 is completed.

＜第２の実施形態＞
次に、第２の実施形態について説明する。以下の説明は、第１の実施形態との差異点を主とし、共通点については、第１の実施形態における説明を援用する。
球面調和展開によれば、展開中心に近接している位置ほど、高い精度で音場のスペクトルを推定することができる。通例、バイノーラル再生では、受聴者の頭部位置が受聴点として採用される。しかし、ＨＲＴＦの測定点として、左右各耳の外耳道入口が用いられる。外耳道入口は、頭部位置から７～１０ｃｍ程度離れた位置となる。この測定点の頭部位置からのずれは、音場スペクトルの精度の低下を招く原因になりうる。そこで、本実施形態では、球面調和展開の展開中心を各耳の位置とする。これにより、音場スペクトルの推定精度の向上が期待される。 <Second Embodiment>
Next, a second embodiment will be described. The following description will mainly focus on the differences from the first embodiment, and for similarities, refer to the description in the first embodiment.
According to spherical harmonic expansion, the closer a location is to the expansion center, the more accurately the sound field spectrum can be estimated. Typically, in binaural playback, the listener's head position is used as the listening point. However, the entrances to the external auditory canals of each ear are used as measurement points for HRTF. The entrances to the external auditory canals are located about 7 to 10 cm away from the head position. This deviation of the measurement points from the head position can cause a decrease in the accuracy of the sound field spectrum. Therefore, in this embodiment, the expansion center of the spherical harmonic expansion is set to the position of each ear. This is expected to improve the accuracy of sound field spectrum estimation.

図４は、本実施形態に係るグローバル座標系とローカル座標系の関係を示す。但し、左耳を例にする。
球面調和スペクトル算出部１２２には、入力部１１０を経由して頭部位置を示す受聴点情報が入力される。球面調和スペクトル算出部１２２は、例えば、頭部中心から所定距離離れ、所定の頭部方向に向かって左方の位置を左耳の位置として定めることができる。 Figure 4 shows the relationship between the global coordinate system and the local coordinate system according to this embodiment. However, the left ear is used as an example.
The spherical harmonic spectrum calculation unit 122 receives listening point information indicating the head position via the input unit 110. The spherical harmonic spectrum calculation unit 122 can, for example, determine the position of the left ear as a position located a predetermined distance from the center of the head and to the left in a predetermined direction of the head.

ここで、頭部位置Ｈを基準とする左耳Ｅの位置を示すベクトルを［ｒ_ｅ］（＝［ｒ_ｅ，θ_ｅ，φ_ｅ］）、左耳Ｅを基準とする音源位置Ｓのベクトルを［ｒ’_ｓ（τ）］（＝［ｒ’_ｓ（τ），θ’_ｓ（τ），φ’_ｓ（τ）］）、グローバル座標系における原点Ｏを基準とする耳Ｅの位置のベクトルを［ｒ’_ｈ（τ）］（＝［ｒ’_ｈ（τ），θ’_ｈ（τ），φ’_ｈ（τ））と表す。球ベッセル関数の加法定理を用い、式（６）で表されるＨＲＴＦの球面調和展開において、展開中心を頭部位置Ｈから左耳Ｅにシフトすると、音源から左耳へのＨＲＴＦＧ_Ｌ（［ｒ’_ｓ］‐［ｒ’_ｈ（τ）］，ω）が式（１４）に示すように与えられる。 Here, the vector indicating the position of the left ear E relative to the head position H is denoted as [r _e ] (= [r _e , θ _e , φ _e ]), the vector indicating the sound source position S relative to the left ear E is denoted as [r' _s (τ)] (= [r' _s (τ), θ' _s (τ), φ' _s (τ)]), and the vector indicating the position of the ear E relative to the origin O in the global coordinate system is denoted as [r' _h (τ)] (= [r' _h (τ), θ' _h (τ), φ' _h (τ)]). Using the addition theorem for spherical Bessel functions, in the spherical harmonic expansion of the HRTF expressed in equation (6), if the expansion center is shifted from the head position H to the left ear E, the HRTF _GL ([r' _s ] - [r' _h (τ)], ω) from the sound source to the left ear is given as shown in equation (14).

式（１４）において、第二種球ハンケル関数ｊ_ν ^（２）（ｋｒ’_ｓ（τ））と球面調和関数Ｙ_ν ^μ（θ’_ｓ（τ），φ’_ｓ（τ））との積となる基底関数に乗じられる左耳の球面調和スペクトルβ_ν ^μ（ω）は、式（１５）により与えられる。
左耳Ｅの座標［ｒ_ｅ］は、頭部中心からの距離と方向により予め定めておいてもよい。その場合、球面調和展開部１２６は、頭部位置Ｈを基準とする左耳Ｅの座標［ｒ_ｅ］に対する変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｒ_ｅ］）を、式（８）を用いて算出し、算出した変換係数Ｓ_ｎ ^ｍ _ν ^μ（［ｒ_ｅ］）と左耳のＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）を球面調和スペクトルβ_ｎ ^ｍ（ω）に補正しておいてもよい。球面調和展開部１２６は、右耳についても同様の手法を用いて右耳のＨＲＴＦの球面調和スペクトルβ_ｎ ^ｍ（ω）を補正することができる。球面調和展開部１２６は、各耳について補正した球面調和スペクトルβ_ｎ ^ｍ（ω）を示すＨＲＴＦデータを予め記憶部１３０に記憶しておく。 In equation (14), the spherical harmonic spectrum _{βνμ(ω) of the left ear, multiplied by a basis function that is the product of the second kind of spherical Hankel function jν(2) (kr's(τ)) and the spherical harmonic function Yνμ} ₍ ^θ _'s ⁽ _τ ₎ , ^φ _'s (τ)), is given by equation (15).
The coordinates [r _e ] of the left ear E may be predetermined by the distance and direction from the center of the head. In that case, the spherical harmonic expansion unit 126 may calculate a conversion coefficient S _n ^m _ν ^μ ([r _e ]) for the coordinates [r _e ] of the left ear E relative to the head position H using equation (8), and correct the calculated conversion coefficient S _n ^m _ν ^μ ([r _e ]) and the spherical harmonic spectrum α _n ^m (ω) of the left ear's HRTF to a spherical harmonic spectrum β _n ^m (ω). The spherical harmonic expansion unit 126 can also correct the spherical harmonic spectrum β _n ^m (ω) of the right ear's HRTF using the same method. The spherical harmonic expansion unit 126 stores the HRTF data showing the corrected spherical harmonic spectrum β _n ^m (ω) for each ear in the storage unit 130 in advance.

式（１４）に式（１５）の球面調和スペクトルβ_ν ^μ（ω）を代入すると、左耳に係るＨＲＴＦＧ_Ｌ（［ｒ’_ｓ］‐［ｒ’_ｈ（τ）］，ω）は、式（１６）に示すように変形される。変形されたＨＲＴＦは、式（６）におけるＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）に代え、補正された球面調和スペクトルβ_ν ^μ（ω）が用いられる点を除き、式（６）に示すＨＲＴＦと同様の形式を有する。この球面調和スペクトルの補正は、ＨＲＴＦの取得に係る基準点の頭部中心から左耳の位置へのシフトとみなすこともできる。 Substituting ^{the spherical harmonic spectrum βνμ} ₍ ω) from equation (15) into equation (14), the HRTF G _L ([r _'s ] - [ _r'h (τ)], ω) for the left ear is transformed as shown in equation (16). The transformed HRTF has the same form as the HRTF shown in equation (6), except that _{the corrected spherical harmonic spectrum βνμ} ⁽ ^ω ) is used instead of the spherical harmonic spectrum _αnm (ω) of the HRTF in equation (6). This correction of the spherical harmonic spectrum can also be considered as a shift of the reference point for acquiring the HRTF from the center of the head to the position of the left ear.

次に球面調和関数の加法定理を用い、式（１６）においてＨＲＴＦの球面調和展開における展開中心をグローバル座標系における原点Ｏにシフトする。式（７）の三次元座標［ｒ_１］、［ｂ］にそれぞれグローバル座標系における原点Ｏを基準とする音源位置の座標［ｒ］、左耳の位置の座標［ｒ’_ｈ（τ）」を代入して式（１６）に適用することで、ＨＲＴＦは式（１７）のように変形される。そして、式（１７）に表されるＨＲＴＦを式（５）に代入すると、式（１８）に示すように左耳用信号の音圧スペクトルＰ_Ｌ（［ｒ］，ω）が得られる。 Next, using the addition formula for spherical harmonics, the expansion center in the spherical harmonic expansion of HRTF in equation (16) is shifted to the origin O in the global coordinate system. Substituting the coordinates of the sound source position [r] and the position of the left ear [r' _h (τ)], respectively, with respect to the origin O in the global coordinate system, into the three-dimensional coordinates [r ₁ ] and [b] in equation (7), and applying this to equation (16), the HRTF is transformed as shown in equation (17). Then, substituting the HRTF expressed in equation (17) into equation (5), the sound pressure spectrum P _L ([r], ω) of the left ear signal is obtained as shown in equation (18).

式（１８）において、η次球ベッセル関数ｊ_η（ｋｒ）とη次ξ位球面調和関数Ｙ_η ^ξ（θ，φ）との積となるη次ξ位基底関数、または、η次第二種球ハンケル関数ｈ_η ^（２）（ｋｒ）とη次ξ位球面調和関数Ｙ_η ^ξ（θ，φ）との積となるη次ξ位基底関数に乗じて得られる音場の球面調和スペクトルＰ_Ｌη ^ξ（ω）は、式（１９）に表される。 In equation (18), the spherical harmonic spectrum ^{P Lηξ} (ω) of the sound field obtained by multiplying by an η-order ξ basis function which is the product of the η-order sphere Bessel function j _η (kr) and the η-order ξ-order spherical harmonic function Y _η ξ (θ, φ), or by an η-order ξ basis function which is the product of the η-order second kind sphere Hankel function h _η ⁽²⁾ (kr) and the η-order ξ-order spherical harmonic function _{Y η} _ξ ⁽ ^θ , φ), is expressed in equation (19).

よって、球面調和スペクトル算出部１２２は、グローバル座標系で与えられる音源位置Ｓの座標［ｒ］、受聴点としての左耳Ｅの座標［ｒ’_ｈ（τ）］、および、記憶部１３０から読み出した左耳Ｅの球面調和スペクトルβ_ν ^μ（ω）を用いて、音場の球面調和スペクトルＰ_Ｌη ^ξ（ω）を算出することができる。そして、バイノーラル信号生成部１２４は、式（１８）で与えられる左耳用のスペクトルを逆フーリエ変換することで左耳用信号を生成することができる。 Therefore, the spherical harmonic spectrum calculation unit 122 can calculate the spherical harmonic spectrum P Lη ξ (ω) of the sound field using the coordinates [r] of the sound source position S given in the global coordinate system, the coordinates [r' _h ^{(τ)] of the left ear E as the listening point, and the spherical harmonic spectrum β ν μ} ₍ _ω ⁾ of the left ear E read from the storage unit 130. The binaural signal generation unit 124 can then generate the left ear signal by performing an inverse Fourier transform on the spectrum for the left ear given by equation (18).

球面調和スペクトル算出部１２２は、右耳についても、左耳と同様の手法を用いて球面調和スペクトルＰ_Ｒη ^ξ（ω）を算出することができる。バイノーラル信号生成部１２４も、右耳についても、左耳と同様な手法を用いて球面調和スペクトルＰ_Ｒη ^ξ（ω）と音源信号ｓ（τ）から右耳について音圧スペクトルを算出し、算出した音圧スペクトルを逆フーリエ変換することで右耳用信号を生成することができる。 The spherical harmonic spectrum calculation unit 122 can calculate the spherical harmonic spectrum P _Rη ^ξ (ω) for the right ear using the same method as for the left ear. The binaural signal generation unit 124 can also calculate the sound pressure spectrum for the right ear from the spherical harmonic spectrum P _Rη ^ξ (ω) and the sound source signal s(τ) using the same method as for the left ear, and generate a signal for the right ear by performing an inverse Fourier transform on the calculated sound pressure spectrum.

次に、本実施形態に係る音響信号処理の例について説明する。図５は、本実施形態に係る音響信号処理を例示するフローチャートである。
（ステップＳ１２２）入力部１１０には、受聴点として頭部中心の座標と音源信号が入力される。
（ステップＳ１２４）球面調和スペクトル算出部１２２は、各時刻において、受聴点として各耳の位置を基準とする音源位置での球面調和展開の基底関数から原点を展開中心として基準とする音源位置での基底関数への変換係数を周波数ごとに算出する。
（ステップＳ１２６）球面調和スペクトル算出部１２２は、各耳について、音源信号に変換係数を乗じて得られる変換音源信号を周波数領域の変換音源スペクトルに変換する。球面調和スペクトル算出部１２２は、式（１９）を用いて変換した変換音源スペクトルに補正された球面調和スペクトルを乗じて音場の球面調和スペクトルを算出する。 Next, an example of acoustic signal processing according to this embodiment will be described. Figure 5 is a flowchart illustrating acoustic signal processing according to this embodiment.
(Step S122) The input unit 110 receives the coordinates of the center of the head and the sound source signal as the listening point.
(Step S124) The spherical harmonic spectrum calculation unit 122 calculates, for each frequency, the conversion coefficients from the basis functions of the spherical harmonic expansion at the sound source position, with the position of each ear as the listening point as the reference, to the basis functions at the sound source position, with the origin as the expansion center as the reference, at each time point.
(Step S126) The spherical harmonic spectrum calculation unit 122 converts the converted sound source signal obtained by multiplying the sound source signal by a conversion coefficient for each ear into a converted sound source spectrum in the frequency domain. The spherical harmonic spectrum calculation unit 122 calculates the spherical harmonic spectrum of the sound field by multiplying the converted sound source spectrum obtained using equation (19) by the corrected spherical harmonic spectrum.

（ステップＳ１２８）バイノーラル信号生成部１２４は、各耳について、式（１８）に従い音場の球面調和スペクトルを用いて球面調和展開の基底関数を線形結合して音圧スペクトルを算出する。
（ステップＳ１３０）バイノーラル信号生成部１２４は、各耳について、周波数領域の音圧スペクトルを時間領域の出力信号に変換し、変換した出力信号を出力部１４０に出力する。その後、図５に示す処理を終了する。 (Step S128) The binaural signal generation unit 124 calculates a sound pressure spectrum for each ear by linearly combining the basis functions of the spherical harmonic expansion using the spherical harmonic spectrum of the sound field according to equation (18).
(Step S130) The binaural signal generation unit 124 converts the frequency domain sound pressure spectrum for each ear into a time domain output signal and outputs the converted output signal to the output unit 140. After that, the process shown in Figure 5 is completed.

なお、上記の説明では、ＨＲＴＦの測定の基準とする受聴点として頭部位置を用い、個々の頭部中心の位置に左右１組のＨＲＴＦが関連付けられている場合を前提とした。ＨＲＴＦの測定の基準として、各耳について、その位置とその耳で測定されたＨＲＴＦが関連付けられる場合には、球面調和展開部１２６は、各耳について測定された音源位置ごとのＨＲＴＦに対して球面調和展開を行って、その耳に係るＨＲＴＦの球面調和スペクトルα_ｎ ^ｍ（ω）を算出してもよい。その場合、球面調和スペクトルα_ｎ ^ｍ（ω）は、その耳について補正された球面調和スペクトルβ_ν ^μ（ω）に相当する。従って、ステップＳ１２４、Ｓ１２６において、球面調和スペクトル算出部１２２は、算出される球面調和スペクトルα_ｎ ^ｍ（ω）を補正された球面調和スペクトルβ_ν ^μ（ω）に代えて用いればよい。また、バイノーラル信号生成部１２４も、ステップＳ１２８において算出される球面調和スペクトルα_ｎ ^ｍ（ω）を補正された球面調和スペクトルβ_ν ^μ（ω）に代えて用いればよい。 In the above explanation, it is assumed that the head position is used as the listening point for measuring HRTF, and that a pair of HRTF values for the left and right ears are associated with the position of the center of each ear. If, for each ear, the position and the HRTF measured for that ear are associated as the reference for measuring HRTF, the spherical harmonic expansion unit 126 may perform spherical harmonic expansion on the HRTF measured for each sound source position for each ear to calculate the spherical harmonic spectrum α _n ^m (ω) of the HRTF for that ear. In that case, the spherical harmonic spectrum α _n ^m (ω) corresponds to the corrected spherical harmonic spectrum β _ν ^μ (ω) for that ear. Therefore, in steps S124 and S126, the spherical harmonic spectrum calculation unit 122 may use the calculated spherical harmonic spectrum α _n ^m (ω) instead of the corrected spherical harmonic spectrum β _ν ^μ (ω). Furthermore, the binaural signal generation unit 124 may use the spherical harmonic spectrum α _n ^m (ω) calculated in step S128 instead of the corrected spherical harmonic spectrum β _ν ^μ (ω).

一般に、ＨＲＴＦは、音源位置と受聴点との相対的な位置関係により定まる。上記の説明では、受聴点が移動し、音源位置が静止している場合を例にしたが、これには限られない。本実施形態は、例えば、受聴点が静止し、音源位置が移動している場合にも適用することができる。 Generally, HRTF (Heat-Range Frequency) is determined by the relative positional relationship between the sound source and the listening point. While the above explanation uses the example of a listening point that moves and a sound source that remains stationary, it is not limited to this case. This embodiment can also be applied, for example, to a situation where the listening point is stationary and the sound source moves.

上記の説明では、球面調和スペクトル算出部１２２とバイノーラル信号生成部１２４が、１個の音源に対して１系統のバイノーラル信号を生成する場合を例にしたが、これには限られない。入力部１１０には、複数の音源のそれぞれについて、受聴点情報と音源信号を関連付けて入力されてもよい。入力部１１０には、さらに各音源の音源位置を示す音源位置情報を関連付けて入力されてもよい。そして、制御部１２０には個々の音源に対し音源位置が設定されてもよい。球面調和スペクトル算出部１２２とバイノーラル信号生成部１２４は、音源ごとに与えられる音源信号と音源位置に対してバイノーラル信号を生成し、生成したバイノーラル信号を音源間でミキシングして得られる音響信号を出力信号として出力部１４０を経由して出力してもよい。 The above description uses the example of the spherical harmonic spectrum calculation unit 122 and the binaural signal generation unit 124 generating one binaural signal for one sound source, but it is not limited to this. The input unit 110 may receive input for each of multiple sound sources, associated with listening point information and sound source signals. The input unit 110 may also receive input associated with sound source position information indicating the sound source location of each sound source. Furthermore, the control unit 120 may have the sound source position set for each individual sound source. The spherical harmonic spectrum calculation unit 122 and the binaural signal generation unit 124 may generate binaural signals for each sound source based on the sound source signal and sound source position, and the generated binaural signals may be mixed between the sound sources to obtain an acoustic signal which is then output as an output signal via the output unit 140.

制御部１２０は、外部機器から音源信号と受聴点情報を取得する際、入力部１１０を用いることに代え、予め記憶部１３０に記憶された音源信号と受聴点情報を読み出してもよい。また、制御部１２０は、出力信号を外部機器に出力部１４０を用いて出力することに代え、記憶部１３０に記憶してもよいし、音響信号処理装置１０に設置または接続された再生部（スピーカ）に出力し、放音させてもよい。
音響信号処理装置１０では、球面調和展開部１２６が省略されてもよい。記憶部１３０には、外部機器から取得したＨＲＴＦデータが予め記憶されてもよい。 When the control unit 120 acquires a sound source signal and listening point information from an external device, it may read the sound source signal and listening point information previously stored in the storage unit 130 instead of using the input unit 110. Also, instead of outputting the output signal to an external device using the output unit 140, the control unit 120 may store it in the storage unit 130, or output it to a playback unit (speaker) installed in or connected to the sound signal processing device 10 and emit sound.
In the acoustic signal processing device 10, the spherical harmonic expansion unit 126 may be omitted. The storage unit 130 may have HRTF data acquired from an external device pre-stored in it.

以上に説明したように、本実施形態に係る音響信号処理装置１０は、球面調和スペクトル算出部１２２とバイノーラル信号生成部１２４を備える。球面調和スペクトル算出部１２２は、受聴点を基準とする音源位置での球面調和展開の基底関数から原点を基準とする音源位置での基底関数への変換係数を算出し、各耳について、頭部伝達関数の球面調和スペクトル、変換係数、および、音源信号に基づいて音場の球面調和スペクトルを算出する。バイノーラル信号生成部１２４は、各耳について、音場の球面調和スペクトルを用いて基底関数を線形結合して出力信号の音圧スペクトルを算出し、音圧スペクトルを時間領域の出力信号に変換する。
この構成によれば、各耳について周波数領域において頭部伝達関数の球面調和スペクトルを再計算せずに受聴点における音場を表す球面調和スペクトルが算出される。算出された球面調和スペクトルを用いて球面調和展開における基底関数を線形結合することで、音源または受聴点が任意に移動する音を周波数領域でレンダリングすることができる。レンダリングにより、音源位置または受聴点の滑らかな変動に応じて音の周波数特性が連続的に変動する。また、時間領域における頭部伝達関数または頭部伝達関数を畳み込んだ音源信号の加算を伴わないため、音像定位の手がかりとなる周波数特性を乱さずに受聴点における音場の音圧スペクトルが推定される。そのため、その音源位置への音像定位が阻害されない。 As described above, the acoustic signal processing device 10 according to this embodiment comprises a spherical harmonic spectrum calculation unit 122 and a binaural signal generation unit 124. The spherical harmonic spectrum calculation unit 122 calculates conversion coefficients from the basis functions of the spherical harmonic expansion at the sound source position with respect to the listening point to the basis functions at the sound source position with respect to the origin, and calculates the spherical harmonic spectrum of the sound field for each ear based on the spherical harmonic spectrum of the head-related transfer function, the conversion coefficients, and the sound source signal. The binaural signal generation unit 124 calculates the sound pressure spectrum of the output signal for each ear by linearly combining the basis functions using the spherical harmonic spectrum of the sound field, and converts the sound pressure spectrum into a time-domain output signal.
With this configuration, for each ear, the spherical harmonic spectrum representing the sound field at the listening point is calculated without recalculating the spherical harmonic spectrum of the head-related transfer function in the frequency domain. By linearly combining the basis functions in the spherical harmonic expansion using the calculated spherical harmonic spectrum, it is possible to render sound in the frequency domain with the sound source or listening point moving arbitrarily. Through rendering, the frequency characteristics of the sound continuously change in accordance with the smooth fluctuations of the sound source position or listening point. Furthermore, since it does not involve the addition of the head-related transfer function or the sound source signal convolved with the head-related transfer function in the time domain, the sound pressure spectrum of the sound field at the listening point is estimated without disturbing the frequency characteristics that provide clues for sound image localization. Therefore, sound image localization to that sound source position is not hindered.

また、受聴点と頭部伝達関数の取得に係る基準点は、それぞれ頭部位置であってもよい。
頭部位置が受聴点として用いられることで、各耳の位置が用いられる場合よりも、レンダリングに係る受聴点による操作を簡素にすることができる。また、頭部位置が頭部伝達関数の取得に係る基準点として用いられることで、左右各耳に係る頭部伝達関数の一括した取得および管理が容易になる。 Furthermore, the listening point and the reference point for acquiring the head-related transfer function may both be head positions.
By using the head position as the listening point, the rendering operations related to the listening point can be simplified compared to when the position of each ear is used. Furthermore, by using the head position as the reference point for acquiring head-related transfer functions, it becomes easier to acquire and manage head-related transfer functions for each ear simultaneously.

また、受聴点と頭部伝達関数の取得に係る基準点は、それぞれ各耳の位置であってもよい。
各耳の位置が受聴点として用いられることで、その位置が頭部伝達関数の球面調和展開の展開中心として用いられる。そのため、頭部中心が受聴点として用いられる場合よりも、算出される音圧スペクトルの推定精度を向上させることができる。 Furthermore, the listening point and the reference point for acquiring the head-related transfer function may be the position of each ear, respectively.
By using the position of each ear as the listening point, that position is used as the expansion center for the spherical harmonic expansion of the head-related transfer function. Therefore, the estimation accuracy of the calculated sound pressure spectrum can be improved compared to when the center of the head is used as the listening point.

なお、音響信号処理装置１０は、専用の音響信号処理装置として実現されてもよいし、パーソナルコンピュータ、タブレット端末装置、などの情報端末装置のように、音響信号の処理を主機能としない装置として実現されてもよい。音響信号処理装置１０は、各種のコンテンツの制作、編集、配信（放送を含む）に係る機器（例えば、ミキシングコンソールなど）の一部として実現されてもよい。 The audio signal processing device 10 may be implemented as a dedicated audio signal processing device, or it may be implemented as a device whose primary function is not audio signal processing, such as a personal computer or tablet device. The audio signal processing device 10 may also be implemented as part of equipment related to the production, editing, and distribution (including broadcasting) of various types of content (for example, a mixing console).

なお、上述の音響信号処理装置１０の一部または全部は、専用の部材（集積回路など）を用いて構成されてもよいし、コンピュータで実現するようにしてもよい。例えば、球面調和スペクトル算出部１２２とバイノーラル信号生成部１２４のいずれか、または、それらの組み合わせは、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置がＲＯＭ（Read Only Memory）などの記憶媒体から読み出された所定のプログラムに記述された指令で指示される処理を実行して、その機能を実現してもよい。 Furthermore, some or all of the above-described acoustic signal processing device 10 may be configured using dedicated components (such as integrated circuits) or implemented using a computer. For example, either the spherical harmonic spectrum calculation unit 122 or the binaural signal generation unit 124, or a combination thereof, may be implemented by a general-purpose arithmetic processing unit such as a CPU (Central Processing Unit) executing processes instructed by commands written in a predetermined program read from a storage medium such as ROM (Read Only Memory).

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 Although one embodiment of this invention has been described in detail above with reference to the drawings, the specific configuration is not limited to that described above, and various design modifications can be made without departing from the spirit of this invention.

１０…音響信号処理装置、１１０…入力部、１２０…制御部、１２２…球面調和スペクトル算出部、１２４…バイノーラル信号生成部、１２６…球面調和展開部、１３０…記憶部、１４０…出力部 10…Acoustic signal processing unit, 110…Input unit, 120…Control unit, 122…Spherical harmonic spectrum calculation unit, 124…Binaural signal generation unit, 126…Spherical harmonic expansion unit, 130…Storage unit, 140…Output unit

Claims

The conversion coefficients from the basis functions of the spherical harmonic expansion at the sound source position relative to the listening point to the basis functions at the sound source position relative to the origin are calculated.
For each ear, a spherical harmonic spectrum calculation unit calculates the spherical harmonic spectrum of the sound field based on the spherical harmonic spectrum of the head-related transfer function, the conversion coefficient, and the sound source signal.
For each ear, the sound pressure spectrum is calculated by linearly combining the basis functions using the spherical harmonic spectrum of the sound field.
An acoustic signal processing device comprising: a binaural signal generation unit that converts the sound pressure spectrum into an acoustic signal in the time domain.

The receiving point is the head position,
The acoustic signal processing apparatus according to claim 1, wherein the reference point for obtaining the head-related transfer function is the head position.

The aforementioned listening points are the positions of each ear.
The acoustic signal processing device according to claim 1, wherein the reference points for obtaining the head-related transfer function are the positions of each ear.

A program for causing a computer to function as an acoustic signal processing device according to any one of claims 1 to 3.