JP7659464B2

JP7659464B2 - Audio device and audio control method

Info

Publication number: JP7659464B2
Application number: JP2021115945A
Authority: JP
Inventors: 浩二阪本
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2025-04-09
Anticipated expiration: 2041-07-13
Also published as: JP2023012347A

Description

本発明は、音響装置および音響制御方法に関する。 The present invention relates to an audio device and an audio control method.

従来、例えば音楽や音声といった各種音源の音源信号を複数のチャンネルから出力する音響装置が知られている（例えば、特許文献１参照）。従来技術にあっては、音源信号から生成される疑似的な疑似残響音を直接音に積極的に付加し、複数のチャンネルでサラウンド再生するようにしている。 Conventionally, there is known an acoustic device that outputs various sound source signals, such as music and voice, from multiple channels (see, for example, Patent Document 1). In the conventional technology, a pseudo-reverberation sound generated from the sound source signal is actively added to the direct sound, and surround sound is reproduced on multiple channels.

特許第５３７２１４２号公報Patent No. 5372142

しかしながら、従来技術には、音源信号に応じた適切なサラウンド再生を行うという点で、改善の余地があった。 However, conventional technology leaves room for improvement in terms of providing appropriate surround sound reproduction according to the sound source signal.

本発明は、上記に鑑みてなされたものであって、音源信号に応じた適切なサラウンド再生を行うことができる音響装置および音響制御方法を提供することを目的とする。 The present invention has been made in consideration of the above, and aims to provide an audio device and an audio control method that can perform appropriate surround playback according to a sound source signal.

上記課題を解決し、目的を達成するために、本発明は、音響装置において、分離部と、出力制御部とを備える。分離部は、音源信号から疑似的な疑似残響音の付加処理が不要な所定音源信号を分離し、分離された前記所定音源信号を前記音源信号から除去する。出力制御部は、前記分離部によって前記所定音源信号が除去された前記音源信号に対し、前記疑似残響音を生成するためのフィルタを適用して出力する。 In order to solve the above problems and achieve the object, the present invention provides an acoustic device comprising a separation unit and an output control unit. The separation unit separates a predetermined sound source signal that does not require processing to add artificial pseudo-reverberation from a sound source signal, and removes the separated predetermined sound source signal from the sound source signal. The output control unit applies a filter for generating the artificial reverberation to the sound source signal from which the predetermined sound source signal has been removed by the separation unit, and outputs the result.

本発明によれば、音源信号に応じた適切なサラウンド再生を行うことができる。 The present invention makes it possible to perform appropriate surround playback according to the sound source signal.

図１Ａは、第１の実施形態に係る音響制御方法の概要を説明する図である。FIG. 1A is a diagram for explaining an overview of an acoustic control method according to a first embodiment. 図１Ｂは、第１の実施形態に係る音響制御方法の概要を説明する図である。FIG. 1B is a diagram illustrating an overview of the acoustic control method according to the first embodiment. 図２は、第１の実施形態に係る音響装置を備えた音響システムの構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of the configuration of an audio system including the audio device according to the first embodiment. 図３は、分離部による分離・抽出処理を説明する図である。FIG. 3 is a diagram illustrating the separation and extraction process performed by the separation unit. 図４Ａは、再生される音の音像などを説明する図である。FIG. 4A is a diagram for explaining the sound image of the reproduced sound, etc. 図４Ｂは、再生される音の音像などを説明する図である。FIG. 4B is a diagram for explaining the sound image of the reproduced sound. 図４Ｃは、再生される音の音像などを説明する図である。FIG. 4C is a diagram for explaining the sound image of the reproduced sound. 図４Ｄは、再生される音の音像などを説明する図である。FIG. 4D is a diagram for explaining the sound image of the reproduced sound. 図４Ｅは、再生される音の音像などを説明する図である。FIG. 4E is a diagram for explaining the sound image of the reproduced sound. 図４Ｆは、再生される音の音像などを説明する図である。FIG. 4F is a diagram for explaining the sound image of the reproduced sound. 図４Ｇは、再生される音の音像などを説明する図である。FIG. 4G is a diagram for explaining the sound image of the reproduced sound. 図５は、第１の実施形態に係る音響装置が実行する処理手順を示すフローチャートである。FIG. 5 is a flowchart showing a processing procedure executed by the audio device according to the first embodiment. 図６は、第２の実施形態に係る音響装置を備えた音響システムの構成例を示すブロック図である。FIG. 6 is a block diagram showing an example of the configuration of an audio system including an audio device according to the second embodiment. 図７は、決定部によるゲインの決定処理を説明する図である。FIG. 7 is a diagram illustrating the gain determination process performed by the determination unit. 図８は、第２の実施形態に係る音響装置が実行する処理手順を示すフローチャートである。FIG. 8 is a flowchart showing a processing procedure executed by the audio device according to the second embodiment.

以下、添付図面を参照して、本願の開示する音響装置および音響制御方法の実施形態を詳細に説明する。なお、以下に示す実施形態によりこの発明が限定されるものではない。 Embodiments of the audio device and audio control method disclosed in this application will be described in detail below with reference to the attached drawings. Note that the present invention is not limited to the embodiments described below.

（第１の実施形態）
＜第１の実施形態に係る音響装置による音響制御方法の概要＞
以下では先ず、第１の実施形態に係る音響装置による音響制御方法の概要について図１Ａおよび図１Ｂを参照して説明する。図１Ａおよび図１Ｂは、第１の実施形態に係る音響制御方法の概要を説明する図である。 (First embodiment)
<Overview of Acoustic Control Method Using Acoustic Device According to First Embodiment>
First, an overview of an acoustic control method using an acoustic device according to a first embodiment will be described below with reference to Figures 1A and 1B. Figures 1A and 1B are diagrams for explaining the overview of the acoustic control method according to the first embodiment.

第１の実施形態に係る音響制御方法は、例えば、図１Ａに示す音響装置１によって実行される。図１Ａでは、例えば車両の車室などの車内空間において、前方の左右に配置された２つのスピーカＦＲ，ＦＬから音源信号である直接音および実際の残響音が出力され、また、後方の左右に配置された２つのスピーカＲＬ，ＲＲから疑似的な疑似残響音が出力されて直接音に付加されることで、サラウンド再生を行う場合を示している。 The acoustic control method according to the first embodiment is executed, for example, by an acoustic device 1 shown in FIG. 1A. FIG. 1A shows a case in which, in an in-vehicle space such as the cabin of a vehicle, direct sound, which is a sound source signal, and actual reverberation sound are output from two speakers FR, FL arranged on the left and right sides of the front, and pseudo reverberation sound is output from two speakers RL, RR arranged on the left and right sides of the rear and added to the direct sound, thereby performing surround reproduction.

ここで、音源信号は、例えば２つのスピーカＦＲ，ＦＬそれぞれから異なる音を出力することで空間的な広がり（音像幅）をもった音源の信号である。つまり、音源信号は、２つのチャンネル（スピーカＦＲ，ＦＬ）でステレオ再生されるステレオ信号である。 The sound source signal here is a signal from a sound source that has a spatial spread (sound image width) by outputting different sounds from, for example, two speakers FR and FL. In other words, the sound source signal is a stereo signal that is reproduced in stereo on two channels (speakers FR and FL).

また、音源信号は、例えばクラシック音楽やオペラ等のような複数の楽器音や音声（ボーカル）が混在した音、すなわち、複数の音源が混在する音の信号であるが、これに限定されるものではない。すなわち、音源信号は、音声のみの音の信号であってもよいし、ピアノのみやバイオリンのみといった単一の楽器（音源）の音の信号であってもよい。 The sound source signal may be, for example, a signal of a mixture of multiple instrument sounds and voices (vocals), such as classical music or opera, that is, a signal of a mixture of multiple sound sources, but is not limited to this. In other words, the sound source signal may be a signal of a sound only of voice, or a signal of a sound from a single instrument (sound source), such as only a piano or only a violin.

ここで、図１Ａに示すように、空間において、例えば楽器音や音声（ボーカル）などの音源が定位した位置である音源Ｓからの音響を受聴する聴取者（リスナ）Ｌは、２種類の空間印象を知覚できることが知られている。一方の空間印象は、直接音と時間的にも空間的にも融合して知覚される「みかけの音源の幅」と定義される音像幅ＡＳＷであり、他方の空間印象は「みかけの音源以外の音源によって聴き手のまわりが満たされている感じ」と定義される包まれ感ＬＥＶである。なお、音像幅ＡＳＷは、音源信号の初期成分である直接音および初期反射音成分に由来した音像Ａの幅である。また、包まれ感ＬＥＶは、音源信号の後期成分である残響音成分に由来した音像Ｂである。 As shown in FIG. 1A, it is known that a listener L who listens to sound from a sound source S, which is a position where a sound source such as a musical instrument or voice (vocals) is located in space, can perceive two types of spatial impressions. One spatial impression is the sound image width ASW, which is defined as the "width of the apparent sound source" that is perceived as being blended with the direct sound in both time and space, and the other spatial impression is the sense of envelopment LEV, which is defined as the "feeling that the listener is surrounded by sound sources other than the apparent sound source." The sound image width ASW is the width of the sound image A derived from the direct sound and early reflected sound components, which are the early components of the sound source signal. The sense of envelopment LEV is the sound image B derived from the reverberation sound components, which are the later components of the sound source signal.

これら音像幅ＡＳＷと包まれ感ＬＥＶを設計および評価するにあたっては、いわゆる「第一波面の法則」を用いた指標を利用する場合がある。かかる指標では、図１Ｂに示すように、２つの閾値ＴＨ１，ＴＨ２によって区画された２つの領域Ｒ１，Ｒ２が定義される。 When designing and evaluating the sound image width ASW and the sense of envelopment LEV, an index using the so-called "law of the first wave front" may be used. In this index, as shown in Figure 1B, two regions R1 and R2 are defined by two thresholds TH1 and TH2.

領域Ｒ１は、音源信号に含まれる成分のうち、主として直接音を含む成分（初期成分）が含まれる領域である。例えば、領域Ｒ１の初期成分が大きいと音像幅ＡＳＷが大きくなるため、聴感上、拡散されていると聴取者Ｌが感じることで音源によっては音質が悪い（不明瞭である）と評価される。なお、直接音とは、例えば、音声（ボーカル）や楽器等から直接録音した音であり、壁等で反射した音を含まない音である。 Region R1 is a region that contains, among the components contained in the sound source signal, components (initial components) that mainly contain direct sound. For example, if the initial components of region R1 are large, the sound image width ASW becomes large, and the listener L may feel that the sound is diffused, and depending on the sound source, the sound quality may be evaluated as poor (unclear). Note that direct sound is, for example, sound recorded directly from voice (vocals) or musical instruments, and does not include sound reflected by walls, etc.

また、領域Ｒ２は、音源信号に含まれる成分のうち、主として残響音を含む成分（後期成分）が含まれる領域である。例えば、領域Ｒ２の残響音の成分が大きいと包まれ感ＬＥＶが大きくなるため、聴感上、拡散されていると聴取者Ｌが感じることで包まれ感が充実すると評価される。なお、残響音とは、例えば、音声や楽器等の音が壁等で反射した音を録音した音であり、直接音から時間的に遅れた音である。 Region R2 is a region that contains, among the components contained in the sound source signal, components that mainly contain reverberation sounds (late components). For example, if the reverberation sound components in region R2 are large, the sense of envelopment LEV increases, and the listener L will feel that the sound is diffused, which will result in a rich sense of envelopment. Note that reverberation sounds are, for example, recorded sounds of voices, musical instruments, etc. that are reflected by walls, etc., and are sounds that are delayed in time from the direct sound.

なお、車室などの比較的狭い空間では、直接音と残響音とは分離しにくく、直接音に残響音が混在しやすくなる。かかる場合、領域Ｒ２の残響音の成分が小さくなって、包まれ感ＬＥＶが小さくなる。そこで、直接音と残響音とが混在してしまう車室のような狭い空間では、音響装置１は、直接音に、疑似的な疑似残響音を積極的に付加することで、領域Ｒ２の残響音の成分を大きくし、包まれ感ＬＥＶを確保することができる。 In addition, in a relatively small space such as a vehicle interior, it is difficult to separate direct sound and reverberant sound, and reverberant sound tends to be mixed into the direct sound. In such a case, the reverberant sound components in area R2 become smaller, and the envelopment LEV becomes smaller. Therefore, in a small space such as a vehicle interior where direct sound and reverberant sound are mixed, the acoustic device 1 actively adds pseudo-reverberant sound to the direct sound, thereby increasing the reverberant sound components in area R2 and ensuring the envelopment LEV.

なお、上記では、直接音等がスピーカＦＲ，ＦＬから出力され、疑似残響音がスピーカＲＬ，ＲＲから出力されるとしたが、これに限られない。すなわち、例えば音響装置１は、４つのスピーカＦＲ，ＦＬ，ＲＬ，ＲＲのうち全部あるいは一部から直接音等を出力し、その大きさなどを調整するようにしてもよい。これにより、例えば音響装置１は、直接音等の音像Ａの空間的な位置を移動させることが可能になる。同様に、音響装置１は、４つのスピーカＦＲ，ＦＬ，ＲＬ，ＲＲのうち全部あるいは一部から疑似残響音を出力し、その大きさなどを調整するようにしてもよい。これにより、例えば音響装置１は、疑似残響音の音像Ｂの空間的な位置を移動させることが可能になる。このように、音響装置１は、直接音等の音像Ａや疑似残響音の音像Ｂをそれぞれ任意の位置に定位させることができる。 In the above, it has been described that direct sound and the like are output from the speakers FR and FL, and pseudo reverberation sound is output from the speakers RL and RR, but this is not limited to the above. That is, for example, the acoustic device 1 may output direct sound and the like from all or some of the four speakers FR, FL, RL, and RR, and adjust the volume and the like. This allows, for example, the acoustic device 1 to move the spatial position of the sound image A of the direct sound and the like. Similarly, the acoustic device 1 may output pseudo reverberation sound from all or some of the four speakers FR, FL, RL, and RR, and adjust the volume and the like. This allows, for example, the acoustic device 1 to move the spatial position of the sound image B of the pseudo reverberation sound. In this way, the acoustic device 1 can localize the sound image A of the direct sound and the sound image B of the pseudo reverberation sound at any position.

ところで、従来技術においては、音源のＬＲチャンネルの相関等に基づいて、音源信号から初期反射音成分を含むサラウンド成分を抽出し、抽出されたサラウンド成分を遅延させて出力することで、疑似残響音を付加する処理が行われる場合があった。しかしながら、かかる場合、様々な種類の音源信号に対して意図通りのサラウンド成分を抽出できるとは限らなかった。また、上記のようにサラウンド成分を遅延させて出力すると、そもそも存在していた初期反射音が得られず、結果として適切なサラウンド再生を行うことができないおそれがあった。 In the prior art, there have been cases where a process for adding pseudo-reverberation is performed by extracting surround components including early reflection sound components from a sound source signal based on the correlation between the left and right channels of the sound source, and then delaying and outputting the extracted surround components. However, in such cases, it was not always possible to extract the intended surround components for various types of sound source signals. Furthermore, when the surround components are delayed and output as described above, the early reflection sounds that were originally present may not be obtained, and as a result, appropriate surround reproduction may not be possible.

そこで、本実施形態に係る音響装置１にあっては、音源信号に応じた適切なサラウンド再生を行うことができるような構成とした。 Therefore, the audio device 1 according to this embodiment is configured to perform appropriate surround playback according to the sound source signal.

以下、音響装置１の処理について、図１Ａを参照して具体的に説明すると、音響装置１は先ず、音源装置５０の音源信号から、疑似的な疑似残響音の付加処理が不要な所定音源信号を分離し、分離された所定音源信号を音源信号から除去する（ステップＳ１）。なお、所定音源信号の分離の詳細な手法については、後述する。また、ここでの所定音源信号は、例えば音声（ボーカル）成分を含む。 The processing of the acoustic device 1 will now be described in detail with reference to FIG. 1A. First, the acoustic device 1 separates a predetermined sound source signal that does not require processing to add artificial reverberation from the sound source signal of the sound source device 50, and removes the separated predetermined sound source signal from the sound source signal (step S1). Note that a detailed method for separating the predetermined sound source signal will be described later. In addition, the predetermined sound source signal here includes, for example, a voice (vocal) component.

次いで、音響装置１は、所定音源信号が除去された音源信号（以下、「除去音源信号」と記載する場合がある）に対して、疑似残響音を生成するためのフィルタを適用し、疑似残響音を示す残響信号を生成してスピーカＲＬ，ＲＲから出力する（ステップＳ２）。なお、フィルタとしては、例えば、ＦＩＲ（Finite Impulse Response）フィルタや、ＩＩＲ（Infinite Impulse Response）フィルタ等のインパルス応答性のフィルタを用いることができるが、これに限定されるものではない。 Next, the acoustic device 1 applies a filter for generating a pseudo-reverberation sound to the sound source signal from which the predetermined sound source signal has been removed (hereinafter, may be referred to as the "removed sound source signal"), generates a reverberation signal indicating the pseudo-reverberation sound, and outputs it from the speakers RL and RR (step S2). Note that, as the filter, for example, an impulse response filter such as an FIR (Finite Impulse Response) filter or an IIR (Infinite Impulse Response) filter can be used, but is not limited to this.

このように、本実施形態にあっては、疑似残響音の付加処理が不要な所定音源信号（例えば音声成分）が除去された音源信号にのみフィルタが適用されて、残響信号が生成される。言い換えると、音源信号のうちの所定音源信号には、フィルタが適用されず、残響信号が生成されない。 In this manner, in this embodiment, a filter is applied only to a sound source signal from which a predetermined sound source signal (e.g., a voice component) that does not require processing to add artificial reverberation has been removed, and a reverberation signal is generated. In other words, a filter is not applied to a predetermined sound source signal among the sound source signals, and a reverberation signal is not generated.

これにより、例えば音源信号に、疑似残響音の付加処理が必要な音源信号と、不要な音源信号（ここでは所定音源信号）とが含まれる場合であっても、付加処理が必要な音源信号に対してのみ残響信号が生成されるため、音源信号に応じた（詳しくは音源信号の内容（種類）に応じた）適切なサラウンド再生を行うことができる。 As a result, even if the sound source signal includes a sound source signal that requires processing to add artificial reverberation and a sound source signal that does not (here, a specified sound source signal), a reverberation signal is generated only for the sound source signal that requires the processing, so that appropriate surround playback can be performed according to the sound source signal (more specifically, according to the content (type) of the sound source signal).

なお、所定音源信号に含まれる音声成分については、上記したフィルタが適用されず、直接音として再生されることとなる。そのため、聴取者Ｌは、再生された所定音源信号（ここでは音声）における包まれ感ＬＥＶを感じない状態となるが、これは、音声（ボーカル）などは包まれ感ＬＥＶがない方が、聴取者Ｌは、再生された音声を明瞭に聞くことができるためである。 The above-mentioned filter is not applied to the audio components contained in the specified sound source signal, and they are reproduced as direct sound. Therefore, the listener L does not feel the envelopment LEV in the reproduced specified sound source signal (sound in this case). This is because the listener L can hear the reproduced sound more clearly if there is no envelopment LEV for sound (vocals) and the like.

なお、上記では、所定音源信号が除去された音源信号にのみフィルタが適用される例を示したが、これに限定されるものではない。すなわち、例えば、音響装置１は、所定音源信号が除去された音源信号、および、所定音源信号の両方に対してフィルタを適用してもよい。このとき、所定音源信号が除去された音源信号については、対応する疑似残響音の残響レベルが相対的に大きくなるような残響信号が生成される一方、所定音源信号については、対応する疑似残響音の残響レベルが相対的に小さくなるような残響信号が生成されるようにしてもよい。なお、上記した「残響レベル」は、例えば残響信号がスピーカＲＬ，ＲＲから再生されたときの室内（ここでは車室内）における疑似残響音の残響の度合いを示す指標値である。 In the above, an example in which a filter is applied only to a sound source signal from which a predetermined sound source signal has been removed has been shown, but the present invention is not limited to this. That is, for example, the acoustic device 1 may apply a filter to both the sound source signal from which a predetermined sound source signal has been removed and the predetermined sound source signal. In this case, for the sound source signal from which a predetermined sound source signal has been removed, a reverberation signal may be generated that relatively increases the reverberation level of the corresponding artificial reverberation sound, while for the predetermined sound source signal, a reverberation signal may be generated that relatively decreases the reverberation level of the corresponding artificial reverberation sound. The above-mentioned "reverberation level" is an index value that indicates the degree of reverberation of the artificial reverberation sound in a room (here, a vehicle interior) when the reverberation signal is reproduced from the speakers RL and RR, for example.

なお、上記では、所定音源信号が音声成分を含むようにしたが、これに限定されるものではない。すなわち、所定音源信号には、サラウンド再生されて包まれ感ＬＥＶが大きくなると、響きすぎて不自然な聴こえ方となるような音、具体的にはドラムなどの打楽器の音など打点間に無音が存在するような過渡的な音の成分を含んでもよい。 In the above, the specified sound source signal includes a voice component, but is not limited to this. In other words, the specified sound source signal may include a component of a sound that reverberates too much and sounds unnatural when the surround sound level is increased during surround playback, specifically, a transient sound component such as the sound of a percussion instrument such as a drum, in which there is silence between striking points.

＜第１の実施形態に係る音響装置を備えた音響システムの構成＞
次に、第１の実施形態に係る音響装置１を備えた音響システムの構成について、図２を用いて説明する。図２は、第１の実施形態に係る音響装置１を備えた音響システムの構成例を示すブロック図である。図２では、本実施形態の特徴を説明するために必要な構成要素のみを機能ブロックで表しており、一般的な構成要素についての記載を省略している。 <Configuration of an audio system including an audio device according to the first embodiment>
Next, the configuration of an audio system including the audio device 1 according to the first embodiment will be described with reference to Fig. 2. Fig. 2 is a block diagram showing an example of the configuration of an audio system including the audio device 1 according to the first embodiment. In Fig. 2, only components necessary for explaining the features of this embodiment are shown as functional blocks, and descriptions of general components are omitted.

換言すれば、図２に図示される各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。例えば、各機能ブロックの分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することが可能である。 In other words, each component shown in FIG. 2 is a functional concept, and does not necessarily have to be physically configured as shown. For example, the specific form of distribution and integration of each functional block is not limited to that shown, and all or part of it can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc.

図２に示すように、音響システム１００は、音響装置１と、音源装置５０と、各種センサ６０と、複数のスピーカＦＬ，ＦＲ，ＲＬ，ＲＲとを備える。なお、本実施形態に係る音響システム１００は、車両に搭載されるが、これに限定されるものではない。 As shown in FIG. 2, the acoustic system 100 includes an acoustic device 1, a sound source device 50, various sensors 60, and multiple speakers FL, FR, RL, and RR. Note that the acoustic system 100 according to this embodiment is mounted in a vehicle, but is not limited to this.

音源装置５０は、音源信号を音響装置１へ出力する。音源信号は、例えばステレオ信号である。音源信号は、音響装置１を介して２つのチャンネルである２つのスピーカＦＬ，ＦＲからそれぞれ異なる信号が出力されることで、空間的な広がりをもった音像となる。 The sound source device 50 outputs a sound source signal to the acoustic device 1. The sound source signal is, for example, a stereo signal. The sound source signal becomes a spatially expanding sound image by outputting different signals from two speakers FL and FR, which are two channels, via the acoustic device 1.

各種センサ６０には、車両の状態を検出する各種のセンサが含まれる。各種センサ６０としては、例えば車両における車速などの走行状態、窓の開閉状態、エアコンの運転状態、乗員の着席状態、スピーカＦＬ，ＦＲ，ＲＬ，ＲＲに対するフェーダ調整指示（乗員による前後バランス調整指示）、自動運転の有無などを検出可能なセンサが含まれ、検出された車両の状態を示す情報を音響装置１へ出力する。なお、上記では、車両の状態として、走行状態や窓の開閉状態など具体的に示したが、これらはあくまでも例示であって限定されるものではない。 The various sensors 60 include various sensors that detect the state of the vehicle. The various sensors 60 include sensors that can detect, for example, the running state of the vehicle, such as the vehicle speed, the open/closed state of the windows, the operating state of the air conditioner, the seating state of the occupants, fader adjustment instructions for the speakers FL, FR, RL, and RR (instructions for adjusting the front/rear balance by the occupants), whether or not the vehicle is in automatic driving, and output information indicating the detected state of the vehicle to the audio device 1. Note that, although the running state and the open/closed state of the windows are specifically shown above as the state of the vehicle, these are merely examples and are not limiting.

複数のスピーカＦＬ，ＦＲ，ＲＬ，ＲＲは、音響装置１に接続される。これらスピーカＦＬ，ＦＲ，ＲＬ，ＲＲは、音響装置１から出力される信号を音として出力する。例えば、スピーカＦＬ，ＦＲは、音源信号である直接音を出力し、スピーカＲＬ，ＲＲは、音源信号から生成された疑似残響音を出力するが、これに限られない。 Multiple speakers FL, FR, RL, and RR are connected to the audio device 1. These speakers FL, FR, RL, and RR output the signal output from the audio device 1 as sound. For example, the speakers FL and FR output direct sound, which is a sound source signal, and the speakers RL and RR output pseudo-reverberation sound generated from the sound source signal, but are not limited to this.

音響装置１は、制御部２と、記憶部３とを備える。制御部２は、取得部２１と、分離部２２と、出力制御部２３とを備える。音響装置１は、例えばＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、フラッシュメモリ、入出力ポートなどを有するコンピュータや各種の回路を含む。 The audio device 1 includes a control unit 2 and a storage unit 3. The control unit 2 includes an acquisition unit 21, a separation unit 22, and an output control unit 23. The audio device 1 includes a computer having, for example, a CPU (Central Processing Unit), ROM (Read Only Memory), RAM (Random Access Memory), flash memory, input/output ports, and various other circuits.

コンピュータのＣＰＵは、たとえば、ＲＯＭに記憶されたプログラムを読み出して実行することによって、制御部２の取得部２１、分離部２２および出力制御部２３として機能する。 The computer's CPU functions as the acquisition unit 21, separation unit 22, and output control unit 23 of the control unit 2, for example, by reading and executing a program stored in the ROM.

また、制御部２の取得部２１、分離部２２および出力制御部２３の少なくともいずれか一つまたは全部をＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアで構成することもできる。 In addition, at least one or all of the acquisition unit 21, separation unit 22, and output control unit 23 of the control unit 2 can be configured with hardware such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

また、記憶部３は、ＲＡＭやフラッシュメモリに対応する。ＲＡＭやフラッシュメモリは、各種プログラムの情報等を記憶することができる。なお、音響装置１は、有線や無線のネットワークで接続された他のコンピュータや可搬型記録媒体を介して上記したプログラムや各種情報を取得することとしてもよい。 The storage unit 3 corresponds to a RAM or a flash memory. The RAM or the flash memory can store information on various programs. The audio device 1 may also acquire the above-mentioned programs and various information via other computers or portable recording media connected via a wired or wireless network.

取得部２１は、各種情報や信号を取得する。例えば、取得部２１は、音源装置５０から音源信号を取得する。例えば、取得部２１は、ステレオ信号である音源信号を取得する。具体的には、取得部２１は、２つのチャンネルである２つのスピーカＦＬ，ＦＲそれぞれから出力される音源信号を取得する。取得部２１は、取得された音源信号を分離部２２および出力制御部２３へ出力する。 The acquisition unit 21 acquires various information and signals. For example, the acquisition unit 21 acquires a sound source signal from the sound source device 50. For example, the acquisition unit 21 acquires a sound source signal that is a stereo signal. Specifically, the acquisition unit 21 acquires sound source signals output from two speakers FL and FR, which are two channels. The acquisition unit 21 outputs the acquired sound source signals to the separation unit 22 and the output control unit 23.

なお、以下では、ステレオ信号である音源信号のうち、左側のチャンネルであるスピーカＦＬから出力（再生）される音源信号を「Ｌｃｈ用音源信号」、右側のチャンネルであるスピーカＦＲから出力される音源信号を「Ｒｃｈ用音源信号」と記載する場合がある。 In the following, among the stereo sound source signals, the sound source signal output (played) from the speaker FL, which is the left channel, may be referred to as the "Lch sound source signal," and the sound source signal output from the speaker FR, which is the right channel, may be referred to as the "Rch sound source signal."

取得部２１は、各種センサ６０から出力された車両の状態（例えば車速などの走行状態や、窓の開閉状態など）を示す情報を取得し、取得された情報を出力制御部２３へ出力する。 The acquisition unit 21 acquires information indicating the vehicle state (e.g., the driving state such as the vehicle speed, the open/closed state of the windows, etc.) output from the various sensors 60, and outputs the acquired information to the output control unit 23.

分離部２２は、音源信号から、疑似残響音の付加処理が不要な所定音源信号などを含む各種の音源信号を分離して抽出する処理を行う。例えば、分離部２２は、上記した所定音源信号に加え、Ｌ成分音源信号、Ｒ成分音源信号、残響音成分音源信号などを分離して抽出することができる。 The separation unit 22 performs a process of separating and extracting various sound source signals, including a predetermined sound source signal that does not require processing to add artificial reverberation, from the sound source signal. For example, in addition to the predetermined sound source signal described above, the separation unit 22 can separate and extract an L component sound source signal, an R component sound source signal, a reverberation component sound source signal, and the like.

Ｌ成分音源信号は、２つのチャンネルであるスピーカＦＬ，ＦＲのうち一方のチャンネル（スピーカＦＬ）で再生される音の成分（Ｌ成分）を含む音源信号である。Ｒ成分音源信号は、他方のチャンネル（スピーカＦＲ）で再生される音の成分（Ｒ成分）を含む音源信号である。なお、Ｌ成分音源信号は第１音源信号の一例であり、Ｒ成分音源信号は第２音源信号の一例である。残響音成分音源信号は、初期反射音などの実際の残響音成分を含む音源信号である。 The L component sound source signal is a sound source signal that includes a sound component (L component) that is reproduced on one of the two channels (speaker FL) of the speakers FL and FR. The R component sound source signal is a sound source signal that includes a sound component (R component) that is reproduced on the other channel (speaker FR). The L component sound source signal is an example of a first sound source signal, and the R component sound source signal is an example of a second sound source signal. The reverberation component sound source signal is a sound source signal that includes actual reverberation sound components such as early reflection sounds.

なお、上記では、分離部２２が、音源信号から、所定音源信号、Ｌ成分音源信号、Ｒ成分音源信号および残響音成分音源信号を分離して抽出するとしたが、これに限られず、例えばこれら各種音源信号のうちの一部を分離して抽出する構成であってもよい。また、所定音源信号は、上記したように音声成分を含むことから、以下では、所定音源信号を「音声音源信号」と記載する場合がある。 In the above, the separation unit 22 separates and extracts the predetermined sound source signal, the L component sound source signal, the R component sound source signal, and the reverberation component sound source signal from the sound source signal, but this is not limited thereto, and the separation unit 22 may be configured to separate and extract, for example, a portion of these various sound source signals. In addition, since the predetermined sound source signal includes a voice component as described above, hereinafter, the predetermined sound source signal may be referred to as the "voice sound source signal."

ここで、分離部２２による分離・抽出処理について図３を参照しつつ説明する。図３は、分離部２２による分離・抽出処理を説明する図である。図３に示すように、分離部２２には、先ず、取得部２１からＬｃｈ用音源信号およびＲｃｈ用音源信号が入力される。 Here, the separation and extraction process by the separation unit 22 will be described with reference to FIG. 3. FIG. 3 is a diagram illustrating the separation and extraction process by the separation unit 22. As shown in FIG. 3, first, the separation unit 22 receives an Lch sound source signal and an Rch sound source signal from the acquisition unit 21.

次いで、分離部２２は、Ｌｃｈ用音源信号およびＲｃｈ用音源信号に対してそれぞれ、時間周波数解析を行い、Ｌｃｈ音源データとＲｃｈ音源データを算出する。例えば、分離部２２は、Ｌｃｈ用音源信号に対して短時間フーリエ変換することで、時間領域から時間周波数領域に変換し、Ｌｃｈ音源データを算出する。同様に、分離部２２は、Ｒｃｈ用音源信号に対して短時間フーリエ変換することで、Ｒｃｈ音源データを算出する。 The separation unit 22 then performs time-frequency analysis on the Lch sound source signal and the Rch sound source signal, respectively, to calculate Lch sound source data and Rch sound source data. For example, the separation unit 22 performs a short-time Fourier transform on the Lch sound source signal to convert it from the time domain to the time-frequency domain and calculate the Lch sound source data. Similarly, the separation unit 22 performs a short-time Fourier transform on the Rch sound source signal to calculate the Rch sound source data.

Ｌｃｈ音源データおよびＲｃｈ音源データは、対応する音響信号の時間に対する周波数特性を示す。詳しくは、Ｌｃｈ音源データおよびＲｃｈ音源データは、音響信号を所定の周波数帯域毎および経過時間毎に区分けし、区分けされた領域毎に相対的なデシベル値（図示せず）が設定されるデータである。 The Lch sound source data and the Rch sound source data indicate the frequency characteristics of the corresponding audio signal over time. In more detail, the Lch sound source data and the Rch sound source data are data in which the audio signal is divided into predetermined frequency bands and elapsed times, and a relative decibel value (not shown) is set for each divided region.

このように、分離部２２は、音源信号に対して時間周波数解析を行うことで、Ｌｃｈ音源データおよびＲｃｈ音源データを精度良く算出することができる。本実施形態にあっては、精度良く算出された音源データを用いることで、音源信号に応じた適切なサラウンド再生を行うことが可能になる。 In this way, the separation unit 22 can accurately calculate the Lch sound source data and the Rch sound source data by performing time-frequency analysis on the sound source signal. In this embodiment, by using the accurately calculated sound source data, it becomes possible to perform appropriate surround playback according to the sound source signal.

次いで、分離部２２は、Ｌｃｈ音源データとＲｃｈ音源データとの差分情報を算出する。差分情報は、例えばＲｃｈ音源データからＬｃｈ音源データを減算したＬＲチャンネル差を示す情報であり、具体的にはレベル差情報および位相差情報である。詳しくは、レベル（振幅）差情報は、チャネル間レベル差（ＩＣＬＤ（Inter-channel Level Difference））であり、位相差情報は、チャネル間位相差（ＩＣＰＤ（Inter-channel Phase Difference））である。 The separation unit 22 then calculates difference information between the Lch sound source data and the Rch sound source data. The difference information is, for example, information indicating the L/R channel difference obtained by subtracting the Lch sound source data from the Rch sound source data, and is specifically level difference information and phase difference information. In more detail, the level (amplitude) difference information is the inter-channel level difference (ICLD (Inter-channel Level Difference)), and the phase difference information is the inter-channel phase difference (ICPD (Inter-channel Phase Difference)).

このように、分離部２２は、２つのチャンネル（スピーカＦＬ，ＦＲ）でそれぞれ再生される音源信号間の差を示す差分情報を算出し、かかる差分情報を用いることで、後述するような音声音源信号等の分離を精度良く行うことが可能になる。 In this way, the separation unit 22 calculates difference information indicating the difference between the sound source signals reproduced by each of the two channels (speakers FL and FR), and by using this difference information, it becomes possible to perform accurate separation of audio sound source signals, etc., as described below.

次いで、分離部２２は、差分情報に基づいて、各種音源信号を分離して抽出するためのマスクを生成する。詳しくは、分離部２２は、差分情報に基づいて、音声音源信号、Ｌ成分音源信号、Ｒ成分音源信号および残響音成分音源信号などを分離・抽出するためのマスクをそれぞれ生成する。なお、マスクとしては、例えば０または１の２値だけをとるバイナリマスクを用いることができるが、これに限定されるものではない。 Then, the separation unit 22 generates masks for separating and extracting various sound source signals based on the difference information. In detail, the separation unit 22 generates masks for separating and extracting the voice sound source signal, the L component sound source signal, the R component sound source signal, and the reverberation sound component sound source signal, based on the difference information. Note that, as the mask, for example, a binary mask that takes only two values, 0 or 1, can be used, but is not limited to this.

また、以下では、音声音源信号を分離・抽出するためのマスクを「音声用マスク」と記載する場合がある。また、Ｌ成分音源信号、Ｒ成分音源信号および残響音成分音源信号を分離・抽出するためのマスクをそれぞれ「Ｌ成分用マスク」、「Ｒ成分用マスク」および「残響音用マスク」と記載する場合がある。 In the following, a mask for separating and extracting an audio source signal may be referred to as an "audio mask." Furthermore, masks for separating and extracting an L component audio source signal, an R component audio source signal, and a reverberation component audio source signal may be referred to as an "L component mask," an "R component mask," and a "reverberation mask," respectively.

先ず、「音声用マスク」の生成について説明する。なお、音声（ボーカル）は通常、モノラル録音されるため、ここでは、Ｌｃｈ用音源信号およびＲｃｈ用音源信号には、同じ音声音源信号が含まれているものとする。 First, we will explain how to generate an "audio mask." Note that, because audio (vocals) is usually recorded in mono, we will assume here that the Lch audio source signal and the Rch audio source signal contain the same audio audio source signal.

分離部２２は、差分情報のうちレベル差情報であるＩＣＬＤ、位相差情報であるＩＣＰＤと、下記の設定条件１とに基づいて音声用マスクの値（正確には各領域における値）を設定し、音声用マスクを生成する。 The separation unit 22 sets the audio mask value (more precisely, the value in each region) based on ICLD, which is level difference information, and ICPD, which is phase difference information, from the difference information, and on the setting condition 1 below, to generate the audio mask.

〔設定条件１〕
マスクの値１：｜ＩＣＬＤ｜＜閾値ａ、かつ、｜ＩＣＰＤ｜＜閾値ｂの領域
マスクの値０：上記以外の領域
ここで、閾値ａおよび閾値ｂはともに、比較的小さい値に設定される。詳しくは、閾値ａおよび閾値ｂは、ＩＣＬＤおよびＩＣＰＤがともに０あるいは０付近であることが推定できるような値に設定されるが、これに限定されるものではなく、任意の値に設定可能である。 [Setting condition 1]
Mask value 1: Region where |ICLD|<threshold a and |ICPD|<threshold b Mask value 0: Regions other than the above Here, thresholds a and b are both set to relatively small values. More specifically, thresholds a and b are set to values that allow for estimation that both ICLD and ICPD are 0 or close to 0, but are not limited thereto and can be set to any value.

上記したように、Ｌｃｈ用音源信号およびＲｃｈ用音源信号には同じ音声音源信号が含まれるため、音声用マスクは、設定条件１により、音声音源信号が含まれる領域（言い換えるとＩＣＬＤおよびＩＣＰＤがともに０あるいは０付近である領域）の値が１に、それ以外の領域が０に設定されることとなる。 As described above, the Lch sound source signal and the Rch sound source signal contain the same audio sound source signal, so setting condition 1 sets the audio mask to a value of 1 in the area containing the audio sound source signal (in other words, the area where ICLD and ICPD are both 0 or close to 0) and to 0 in other areas.

次いで、分離部２２は、生成された音声用マスクを用いて、音声音源信号を分離・抽出する。具体的には、分離部２２は、音声用マスクの値が１に設定された時間周波数領域から、音声音源信号を分離・抽出する。 Next, the separation unit 22 separates and extracts the audio source signal using the generated audio mask. Specifically, the separation unit 22 separates and extracts the audio source signal from the time-frequency domain where the value of the audio mask is set to 1.

例えば、分離部２２は、Ｌｃｈ音源データおよびＲｃｈ音源データに対して、音声用マスクを適用してフィルタリングすることで、Ｌｃｈ音源データおよびＲｃｈ音源データからそれぞれ音声音源信号を分離する。そして、分離部２２は、分離された２つの音声音源信号を平均化し、平均化された信号を音声音源信号として抽出する。 For example, the separation unit 22 applies an audio mask to the Lch sound source data and the Rch sound source data and filters them to separate the audio sound source signal from each of the Lch sound source data and the Rch sound source data. The separation unit 22 then averages the two separated audio sound source signals and extracts the averaged signal as the audio sound source signal.

なお、上記では、音声用マスクがＬｃｈ音源データおよびＲｃｈ音源データの両方に適用される例を示したが、これに限定されるものではなく、例えばＬｃｈ音源データおよびＲｃｈ音源データの一方に適用され、一方のデータから音声音源信号が分離・抽出されてもよい。 In the above, an example was shown in which the audio mask is applied to both the Lch sound source data and the Rch sound source data, but this is not limited to this. For example, the audio mask may be applied to one of the Lch sound source data and the Rch sound source data, and the audio source signal may be separated and extracted from one of the data.

なお、上記では、音声用マスクにおいて、ＩＣＬＤおよびＩＣＰＤが０あるいは０付近である領域以外の領域の値が０に設定されるようにしたが、これに加えてあるいは代えて、例えば音声帯域（一例として３００～３４００Ｈｚ）以外の周波数領域の値が、予め０に設定されてもよい。すなわち、ここでは、音声音源信号の分離・抽出する処理が行われる。このため、上記のように、音声帯域以外の周波数領域、言い換えると音声音源信号が存在し得ない周波数領域の値が予め０に設定されることで、音声音源信号が存在し得ない周波数領域に音声音源信号が存在すると誤検知することがなく、よって音声音源信号を精度良く分離・抽出することが可能になる。 In the above, in the voice mask, values in areas other than the area where ICLD and ICPD are 0 or near 0 are set to 0. However, in addition to or instead of this, values in frequency areas other than the voice band (300 to 3400 Hz, as an example) may be set to 0 in advance. That is, here, a process of separating and extracting a voice source signal is performed. For this reason, by setting values in frequency areas other than the voice band, in other words frequency areas where a voice source signal cannot exist, to 0 in advance as described above, it is possible to avoid erroneous detection that a voice source signal exists in a frequency area where a voice source signal cannot exist, and therefore to separate and extract the voice source signal with high accuracy.

また、音声用マスクにおいて、例えば音声帯域以下の周波数領域の値が、予め１に設定されるようにしてもよい。これにより、モノラル成分を含む楽器の音源信号であって、音声帯域以下の周波数の楽器（例えばバスドラムなど）の音源信号も分離して抽出することが可能になる。 In addition, in the audio mask, the value of the frequency range below the audio band may be preset to 1. This makes it possible to separate and extract the audio source signal of an instrument (such as a bass drum) that contains a mono component and has a frequency below the audio band.

次に、「Ｌ成分用マスク」の生成について説明する。例えば、ある楽器音（一例としてピアノ）の成分が、Ｒｃｈ用音源信号に比べてＬｃｈ用音源信号に多く含まれる場合がある。Ｌ成分用マスクは、このような楽器音の成分をＬ成分音源信号として分離・抽出するためのマスクである。 Next, we will explain how to generate an "L component mask." For example, there may be cases where the components of a certain instrument sound (a piano as an example) are contained in greater quantities in the Lch sound source signal than in the Rch sound source signal. The L component mask is a mask for separating and extracting such instrument sound components as the L component sound source signal.

具体的には、分離部２２は、レベル差情報であるＩＣＬＤ、位相差情報であるＩＣＰＤと、下記の設定条件２とに基づいてＬ成分用マスクの値（正確には各領域における値）を設定し、Ｌ成分用マスクを生成する。 Specifically, the separation unit 22 sets the value of the L component mask (more precisely, the value in each region) based on the level difference information ICLD, the phase difference information ICPD, and the setting condition 2 below, and generates the L component mask.

〔設定条件２〕
マスクの値１：ＩＣＬＤ＜閾値ｃ、かつ、ＩＣＰＤ＜閾値ｄの領域
マスクの値０：上記以外の領域
ここで、閾値ｃおよび閾値ｄはともに、０以下の値に設定される。詳しくは、閾値ｃおよび閾値ｄは、ＩＣＬＤおよびＩＣＰＤがともに負値であることが推定できるような値に設定されるが、これに限定されるものではなく、任意の値に設定可能である。 [Setting condition 2]
Mask value 1: Area where ICLD<threshold c and ICPD<threshold d Mask value 0: Area other than the above Here, threshold c and threshold d are both set to values equal to or less than 0. In detail, threshold c and threshold d are set to values that allow estimation that ICLD and ICPD are both negative values, but are not limited to this and can be set to any value.

すなわち、上記したようにある楽器音（ここではピアノ）の成分がＬｃｈ用音源信号に多く含まれるため、Ｒｃｈ音源データからＬｃｈ音源データを減算して得た差分情報では、ある楽器音の成分を含む領域のＩＣＬＤおよびＩＣＰＤが負値となる。従って、Ｌ成分用マスクは、設定条件２により、ある楽器音（ここではピアノ）の成分が含まれる領域（言い換えるとＩＣＬＤおよびＩＣＰＤがともに負値である領域）の値が１に、それ以外の領域が０に設定されることとなる。 That is, as described above, the Lch sound source signal contains a large amount of a certain instrument sound (piano in this case), so in the difference information obtained by subtracting the Lch sound source data from the Rch sound source data, ICLD and ICPD in the area containing the certain instrument sound components are negative values. Therefore, according to setting condition 2, the L component mask is set to a value of 1 in the area containing the certain instrument sound (piano in this case) components (in other words, areas where ICLD and ICPD are both negative values), and to 0 in other areas.

次いで、分離部２２は、生成されたＬ成分用マスクを用いて、Ｌ成分音源信号を分離・抽出する。具体的には、分離部２２は、Ｌ成分用マスクの値が１に設定された時間周波数領域から、Ｌ成分音源信号を分離・抽出する。例えば、分離部２２は、Ｌｃｈ音源データに対して、Ｌ成分用マスクを適用してフィルタリングすることで、Ｌｃｈ音源データからＬ成分音源信号を分離する。 Then, the separation unit 22 separates and extracts the L component sound source signal using the generated L component mask. Specifically, the separation unit 22 separates and extracts the L component sound source signal from the time-frequency domain where the value of the L component mask is set to 1. For example, the separation unit 22 applies the L component mask to the Lch sound source data and filters it, thereby separating the L component sound source signal from the Lch sound source data.

次に、「Ｒ成分用マスク」の生成について説明する。例えば、ある楽器音（一例としてギター）の成分が、Ｌｃｈ用音源信号に比べてＲｃｈ用音源信号に多く含まれる場合がある。Ｒ成分用マスクは、このような楽器音の成分をＲ成分音源信号として分離・抽出するためのマスクである。 Next, we will explain how to generate an "R component mask." For example, there may be cases where the Rch sound source signal contains more components of a certain instrument sound (a guitar as an example) than the Lch sound source signal. The R component mask is a mask for separating and extracting such instrument sound components as the R component sound source signal.

具体的には、分離部２２は、レベル差情報であるＩＣＬＤ、位相差情報であるＩＣＰＤと、下記の設定条件３とに基づいてＲ成分用マスクの値（正確には各領域における値）を設定し、Ｒ成分用マスクを生成する。 Specifically, the separation unit 22 sets the value of the R component mask (more precisely, the value in each region) based on the level difference information ICLD, the phase difference information ICPD, and the following setting condition 3, and generates the R component mask.

〔設定条件３〕
マスクの値１：ＩＣＬＤ＞閾値ｅ、かつ、ＩＣＰＤ＞閾値ｆの領域
マスクの値０：上記以外の領域
ここで、閾値ｅおよび閾値ｆはともに、０以上の値に設定される。詳しくは、閾値ｅおよび閾値ｆは、ＩＣＬＤおよびＩＣＰＤがともに正値であることが推定できるような値に設定されるが、これに限定されるものではなく、任意の値に設定可能である。 [Setting condition 3]
Mask value 1: Area where ICLD > threshold e and ICPD > threshold f Mask value 0: Area other than the above Here, threshold e and threshold f are both set to values equal to or greater than 0. In detail, threshold e and threshold f are set to values that allow for estimation that both ICLD and ICPD are positive values, but are not limited thereto and can be set to any values.

すなわち、上記したようにある楽器音（ここではギター）の成分がＲｃｈ用音源信号に多く含まれるため、Ｒｃｈ音源データからＬｃｈ音源データを減算して得た差分情報では、ある楽器音の成分を含む領域のＩＣＬＤおよびＩＣＰＤが正値となる。従って、Ｒ成分用マスクは、設定条件３により、ある楽器音（ここではギター）の成分が含まれる領域（言い換えるとＩＣＬＤおよびＩＣＰＤがともに正値である領域）の値が１に、それ以外の領域が０に設定されることとなる。 That is, as described above, the Rch sound source signal contains a large amount of a certain instrument sound (here, a guitar), so in the difference information obtained by subtracting the Lch sound source data from the Rch sound source data, ICLD and ICPD in the area containing the certain instrument sound component are positive values. Therefore, according to setting condition 3, the R component mask is set to a value of 1 in the area containing the certain instrument sound (here, a guitar) component (in other words, the area where ICLD and ICPD are both positive values), and to 0 in other areas.

次いで、分離部２２は、生成されたＲ成分用マスクを用いて、Ｒ成分音源信号を分離・抽出する。具体的には、分離部２２は、Ｒ成分用マスクの値が１に設定された時間周波数領域から、Ｒ成分音源信号を分離・抽出する。例えば、分離部２２は、Ｒｃｈ音源データに対して、Ｒ成分用マスクを適用してフィルタリングすることで、Ｒｃｈ音源データからＲ成分音源信号を分離する。 Then, the separation unit 22 separates and extracts the R component sound source signal using the generated R component mask. Specifically, the separation unit 22 separates and extracts the R component sound source signal from the time-frequency domain where the value of the R component mask is set to 1. For example, the separation unit 22 applies the R component mask to the Rch sound source data and filters it, thereby separating the R component sound source signal from the Rch sound source data.

次に、「残響音用マスク」について説明する。例えば、Ｌｃｈ用音源信号およびＲｃｈ用音源信号には、上記した音声などのモノラル成分、Ｌ成分やＲ成分の他に、初期反射音などの実際の残響音成分が含まれる場合がある。かかる残響音成分においては、再生時に音像が不明瞭になりやすいものがあり、残響音用マスクは、このような残響音成分を残響音成分音源信号として分離・抽出するためのマスクである。 Next, we will explain the "reverberation mask." For example, the Lch sound source signal and the Rch sound source signal may contain actual reverberation components such as early reflection sounds in addition to the monaural components such as the above-mentioned voice, L components, and R components. Some of these reverberation components tend to make the sound image unclear when played back, and the reverberation mask is a mask for separating and extracting such reverberation components as reverberation component sound source signals.

具体的には、分離部２２は、レベル差情報であるＩＣＬＤ、位相差情報であるＩＣＰＤと、下記の設定条件４とに基づいて残響音用マスクの値（正確には各領域における値）を設定し、残響音用マスクを生成する。 Specifically, the separation unit 22 sets the reverberation mask value (more precisely, the value in each region) based on the level difference information ICLD, the phase difference information ICPD, and the setting condition 4 below, and generates the reverberation mask.

〔設定条件４〕
マスクの値０：｜ＩＣＬＤ｜＜閾値ａ、かつ、｜ＩＣＰＤ｜＜閾値ｂの領域
マスクの値０：ＩＣＬＤ＜閾値ｃ、かつ、ＩＣＰＤ＜閾値ｄの領域
マスクの値０：ＩＣＬＤ＞閾値ｅ、かつ、ＩＣＰＤ＞閾値ｆの領域
マスクの値１：上記以外の領域
設定条件４の内容から分かるように、上述した設定条件１～３においてマスクの値が１に設定される領域が、設定条件４では０に設定され、それ以外の領域の値が１に設定される。すなわち、例えばＩＣＬＤとＩＣＰＤとで正負が反転するような領域やＩＣＰＤ≒１８０°となるような領域は、再生時に音像が不明瞭になりやすい残響音を含む領域であるため、当該領域の値が１に設定されることとなる。 [Setting condition 4]
Mask value 0: Region where |ICLD|<threshold a, and |ICPD|<threshold b Mask value 0: Region where ICLD<threshold c, and ICPD<threshold d Mask value 0: Region where ICLD>threshold e, and ICPD>threshold f Mask value 1: Regions other than the above As can be seen from the contents of setting condition 4, the regions where the mask value is set to 1 in setting conditions 1 to 3 described above are set to 0 in setting condition 4, and the values of the other regions are set to 1. That is, for example, a region where ICLD and ICPD have reversed positive and negative values, or a region where ICPD is ≈ 180°, is a region that contains reverberation that tends to make the sound image unclear during playback, so the value of that region is set to 1.

次いで、分離部２２は、生成された残響音用マスクを用いて、残響音成分音源信号を分離・抽出する。具体的には、分離部２２は、残響音用マスクの値が１に設定された時間周波数領域から、残響音成分音源信号を分離・抽出する。 Next, the separation unit 22 separates and extracts the reverberation component sound source signal using the generated reverberation mask. Specifically, the separation unit 22 separates and extracts the reverberation component sound source signal from the time-frequency domain where the value of the reverberation mask is set to 1.

例えば、分離部２２は、Ｌｃｈ音源データおよびＲｃｈ音源データに対して、残響音用マスクを適用してフィルタリングすることで、Ｌｃｈ音源データおよびＲｃｈ音源データからそれぞれ残響音成分音源信号を分離する。そして、分離部２２は、分離された２つの残響音成分音源信号を平均化し、平均化された信号を残響音成分音源信号として抽出する。 For example, the separation unit 22 applies a reverberation mask to the Lch sound source data and the Rch sound source data and filters them to separate the reverberation component sound source signals from the Lch sound source data and the Rch sound source data. The separation unit 22 then averages the two separated reverberation component sound source signals and extracts the averaged signal as the reverberation component sound source signal.

なお、上記では、残響音用マスクがＬｃｈ音源データおよびＲｃｈ音源データの両方に適用される例を示したが、これに限定されるものではなく、例えばＬｃｈ音源データおよびＲｃｈ音源データの一方に適用され、一方のデータから残響音成分音源信号が分離・抽出されてもよい。 In the above, an example was shown in which the reverberation mask is applied to both the Lch sound source data and the Rch sound source data, but this is not limited to the above. For example, the reverberation mask may be applied to either the Lch sound source data or the Rch sound source data, and the reverberation component sound source signal may be separated and extracted from one of the data.

なお、上記では、各マスクの値が１あるいは０に設定されるため、各音源信号の分離処理において分離歪が発生する場合があるが、かかる場合、各マスクにローパスフィルタを組み合わせることで、分離歪の軽減を図るようにしてもよい。また、各マスクの値が０に設定される領域について、例えば０．１など緩和した値が設定されることで、分離歪の軽減を図るようにしてもよい。 In the above, since the value of each mask is set to 1 or 0, separation distortion may occur in the separation process of each sound source signal. In such a case, separation distortion may be reduced by combining a low-pass filter with each mask. Also, for areas where the value of each mask is set to 0, a relaxed value such as 0.1 may be set to reduce separation distortion.

図２の説明を続けると、分離部２２は、分離・抽出された、音声音源信号、Ｌ成分音源信号、Ｒ成分音源信号および残響音成分音源信号を出力制御部２３へ出力する。また、分離部２２は、音声音源信号（所定音源信号）を音源信号から除去して除去音源信号を生成し、生成された除去音源信号を出力制御部２３へ出力する。 Continuing with the explanation of FIG. 2, the separation unit 22 outputs the separated and extracted voice sound source signal, L component sound source signal, R component sound source signal, and reverberation component sound source signal to the output control unit 23. The separation unit 22 also removes the voice sound source signal (predetermined sound source signal) from the sound source signal to generate a removed sound source signal, and outputs the generated removed sound source signal to the output control unit 23.

出力制御部２３は、取得部２１から入力される音源信号、および、分離部２２から入力される音声音源信号、Ｌ成分音源信号、Ｒ成分音源信号、残響音成分音源信号、除去音源信号に所定の処理を施してスピーカＦＬ，ＦＲ，ＲＬ，ＲＲから出力する。 The output control unit 23 performs predetermined processing on the sound source signal input from the acquisition unit 21, and the voice sound source signal, L component sound source signal, R component sound source signal, reverberation sound component sound source signal, and removal sound source signal input from the separation unit 22, and outputs them from the speakers FL, FR, RL, and RR.

例えば、出力制御部２３は、直接音等を含む音源信号をＤ／Ａ変換し、Ｄ／Ａ変換後の音源信号を増幅してスピーカＦＬ，ＦＲから出力（再生）する。ここで、再生される音の音像などについて図４Ａ～図４Ｇを参照して説明する。 For example, the output control unit 23 performs D/A conversion on a sound source signal including a direct sound, etc., amplifies the D/A converted sound source signal, and outputs (plays) it from the speakers FL and FR. Here, the sound image of the played sound will be described with reference to Figures 4A to 4G.

図４Ａ～図４Ｇは、出力制御部２３によって再生される音の音像などを説明する図である。なお、図４Ａは、直接音等を含む音源信号のみが出力されて再生された状態を示し、言い換えると、疑似残響音等が再生（付加）されていない状態を示している。また、図４Ａ～図４Ｆでは、音声（ボーカル）の音像を符号Ａ１で示している。また、音源信号は、上記したようにステレオ信号であり、Ｌｃｈ用音源信号およびＲｃｈ用音源信号を含む。図４Ａ～図４Ｆでは、Ｒｃｈ用音源信号に比べてＬｃｈ用音源信号に多く含まれる楽器音等の音像を符号ＡＬで示し、Ｌｃｈ用音源信号に比べてＲｃｈ用音源信号に多く含まれる楽器音等の音像を符号ＡＲで示している。 Figures 4A to 4G are diagrams for explaining the sound images of the sound reproduced by the output control unit 23. Note that Figure 4A shows a state in which only the sound source signal including the direct sound etc. is output and reproduced, in other words, a state in which the pseudo reverberation sound etc. is not reproduced (added). Also, in Figures 4A to 4F, the sound image of the voice (vocals) is indicated by the symbol A1. Also, as described above, the sound source signal is a stereo signal, and includes an Lch sound source signal and an Rch sound source signal. In Figures 4A to 4F, the sound image of the instrument sound etc. contained more in the Lch sound source signal than in the Rch sound source signal is indicated by the symbol AL, and the sound image of the instrument sound etc. contained more in the Rch sound source signal than in the Lch sound source signal is indicated by the symbol AR.

図４Ａに示すように、出力制御部２３が、直接音等を含む音源信号をスピーカＦＬ，ＦＲから出力すると、音声の音像Ａ１や楽器音の音像ＡＬ，ＡＲがスピーカＦＬ，ＦＲ間において比較的狭い間隔で定位する。 As shown in FIG. 4A, when the output control unit 23 outputs a sound source signal including direct sound etc. from the speakers FL and FR, the sound image A1 of the voice and the sound images AL and AR of the instrument sound are localized at a relatively narrow interval between the speakers FL and FR.

図２の説明を続けると、出力制御部２３は、フィルタ２３ａを有し、かかるフィルタ２３ａを用いて疑似残響音を生成して出力する。フィルタ２３ａは、疑似残響音を生成するためのフィルタである。なお、フィルタとしては、例えばＦＩＲフィルタやＩＩＲフィルタ等を用いることができるが、これに限定されるものではない。 Continuing with the explanation of FIG. 2, the output control unit 23 has a filter 23a, and generates and outputs a pseudo-reverberation sound using the filter 23a. The filter 23a is a filter for generating a pseudo-reverberation sound. Note that, as the filter, for example, an FIR filter or an IIR filter can be used, but is not limited to these.

例えば、出力制御部２３は、所定音源信号が除去された前記音源信号、すなわち除去音源信号に対し、フィルタ２３ａを適用して疑似残響音を示す残響信号を生成する。そして、出力制御部２３は、残響信号をＤ／Ａ変換し、Ｄ／Ａ変換後の残響信号を増幅してスピーカＲＬ，ＲＲから出力する。これにより、疑似残響音が直接音等に付加される。 For example, the output control unit 23 applies a filter 23a to the sound source signal from which a specific sound source signal has been removed, i.e., the removed sound source signal, to generate a reverberation signal that represents a pseudo-reverberation sound. The output control unit 23 then performs D/A conversion on the reverberation signal, amplifies the D/A converted reverberation signal, and outputs it from the speakers RL and RR. This adds the pseudo-reverberation sound to the direct sound, etc.

従って、図４Ｂに示すように、例えば出力制御部２３は、スピーカＲＬ，ＲＲから出力される疑似残響音の大きさ等を調整し、疑似残響音の音像Ｂを聴取者Ｌの周辺に定位させることで、聴取者Ｌは、十分な包まれ感ＬＥＶを感じることができる。 Therefore, as shown in FIG. 4B, for example, the output control unit 23 adjusts the volume of the pseudo-reverberation sound output from the speakers RL and RR, and positions the sound image B of the pseudo-reverberation sound around the listener L, so that the listener L can feel a sufficient sense of being enveloped LEV.

このように、本実施形態に係る出力制御部２３は、疑似残響音の付加処理が不要な音声音源信号が除去された音源信号（除去音源信号）にのみフィルタ２３ａを適用し、残響信号を生成して出力する。 In this way, the output control unit 23 according to this embodiment applies the filter 23a only to the sound source signal (removed sound source signal) from which the voice sound source signal that does not require processing to add artificial reverberation has been removed, and generates and outputs a reverberation signal.

これにより、例えば音源信号に、疑似残響音の付加処理が必要な音源信号と、不要な音源信号（ここでは音声音源信号）とが含まれる場合であっても、付加処理が必要な音源信号に対してのみ残響信号が生成されて出力されるため、音源信号に応じた（詳しくは音源信号の内容（種類）に応じた）適切なサラウンド再生を行うことができる。 As a result, even if the sound source signal contains sound source signals that require processing to add artificial reverberation and sound source signals that do not (here, voice sound source signals), a reverberation signal is generated and output only for the sound source signals that require processing, making it possible to perform appropriate surround playback according to the sound source signal (more specifically, according to the content (type) of the sound source signal).

なお、上記において、出力制御部２３は、音声音源信号が除去された音源信号（除去音源信号）にのみフィルタ２３ａを適用したが、これに限定されるものではない。すなわち、例えば出力制御部２３は、音声音源信号（所定音源信号）が除去された音源信号、および、音声音源信号の両方に対してフィルタ２３ａを適用してもよい。このとき、出力制御部２３は、音声音源信号が除去された音源信号（除去音源信号）に対応する疑似残響音の残響レベルが、音声音源信号に対応する疑似残響音の残響レベルより高くなるようにして出力する。逆に言えば、出力制御部２３は、音声音源信号に対応する疑似残響音の残響レベルが、除去音源信号に対応する疑似残響音の残響レベルより低くなるようにして出力する。言い換えると、フィルタ２３ａは、上記した疑似残響音の残響レベルとなる残響信号を生成するように設定される。 In the above, the output control unit 23 applies the filter 23a only to the sound source signal from which the voice source signal has been removed (removed sound source signal), but this is not limited to this. That is, for example, the output control unit 23 may apply the filter 23a to both the sound source signal from which the voice source signal (predetermined sound source signal) has been removed and the voice source signal. At this time, the output control unit 23 outputs the pseudo reverberation sound corresponding to the sound source signal from which the voice source signal has been removed (removed sound source signal) so that the reverberation level of the pseudo reverberation sound corresponding to the voice source signal is higher than the reverberation level of the pseudo reverberation sound corresponding to the voice source signal. In other words, the output control unit 23 outputs the pseudo reverberation sound corresponding to the voice source signal so that the reverberation level of the pseudo reverberation sound corresponding to the removed sound source signal is lower than the reverberation level of the pseudo reverberation sound corresponding to the removed sound source signal. In other words, the filter 23a is set to generate a reverberation signal that has the reverberation level of the pseudo reverberation sound described above.

これにより、例えば音声（ボーカル）の音像や音色が極端に鮮明で手前に浮き出てくるような不自然な印象を聴取者Ｌに与えることを抑制することが可能になる。詳しくは、例えばボーカルと楽器音とが混在する音楽の音源信号に対し、楽器音の音源信号（すなわち除去音源信号）にのみ大きな残響がかかり、ボーカルの音声音源信号に残響が全くかからないようにした場合、ボーカルのみ音像や音色が極端に鮮明で手前に浮き出てくるような不自然な印象を聴取者Ｌに与えることがある。これは、大きな残響がかかって反射音が多くなると音像（ここでは楽器の音像）を遠くに感じる心理特性があり、それに伴って残響が全くかからない音像（ここではボーカルの音像）が手前に近づく印象となることに起因する。そこで、音声音源信号が除去された音源信号、および、音声音源信号の両方に対してフィルタ２３ａを適用し、残響レベルを上記のように異ならせることで、上述した不自然な印象を聴取者Ｌに与えることを抑制することが可能になる。 This makes it possible to suppress the listener L from getting an unnatural impression that the sound image and tone of the voice (vocals) are extremely clear and appear to be in front. In detail, for example, when a music sound source signal containing vocals and instrumental sounds is given a large reverberation only to the instrumental sound source signal (i.e., the removed sound source signal) and no reverberation is given to the vocal sound source signal, the listener L may get an unnatural impression that the sound image and tone of only the vocals are extremely clear and appear to be in front. This is because there is a psychological characteristic that when there is a large reverberation and there are many reflected sounds, the sound image (here, the sound image of the instrument) feels far away, and accordingly, the sound image without any reverberation (here, the vocal sound image) feels closer to the listener. Therefore, by applying the filter 23a to both the sound source signal from which the voice sound source signal has been removed and the voice sound source signal, and by making the reverberation levels different as described above, it is possible to suppress the listener L from getting the unnatural impression described above.

なお、出力制御部２３は、サラウンド再生される除去音源信号を出力するスピーカを、スピーカＦＬ，ＦＲ，ＲＬ，ＲＲの中から、除去音源信号の内容（種類）に応じて選択してもよい。一例として、出力制御部２３は、直接音と疑似残響音とが聴取者Ｌを挟むようにして再生されるように、音源信号および残響信号が出力されるスピーカを選択することで、包まれ感ＬＥＶを向上させるようにしてもよい。 The output control unit 23 may select the speaker that outputs the removed sound source signal to be played in surround from among the speakers FL, FR, RL, and RR depending on the content (type) of the removed sound source signal. As an example, the output control unit 23 may improve the sense of envelopment LEV by selecting the speaker that outputs the sound source signal and the reverberation signal so that the direct sound and the artificial reverberation sound are played on either side of the listener L.

具体的に説明すると、残響信号（言い換えると、フィルタ２３ａが適用されてサラウンド再生される除去音源信号）には、上記したＬ成分音源信号やＲ成分音源信号が含まれ、Ｌ成分音源信号やＲ成分音源信号は、分離部２２によって分離・抽出されている。従って、出力制御部２３は、残響信号（フィルタ２３ａが適用された除去音源信号）のうち、Ｌ成分音源信号をスピーカＲＲからサラウンド再生（出力）し、Ｒ成分音源信号をスピーカＲＬからサラウンド再生する。 To be more specific, the reverberation signal (in other words, the removed sound source signal to which the filter 23a is applied and which is played in surround) includes the L component sound source signal and the R component sound source signal described above, and the L component sound source signal and the R component sound source signal are separated and extracted by the separation unit 22. Therefore, of the reverberation signal (the removed sound source signal to which the filter 23a is applied), the output control unit 23 plays (outputs) the L component sound source signal in surround from the speaker RR, and plays the R component sound source signal in surround from the speaker RL.

これにより、聴取者Ｌは、直接音のＬ成分が再生されるスピーカＦＬと疑似残響音のＬ成分がサラウンド再生されるスピーカＲＲとの間、および、直接音のＲ成分が再生されるスピーカＦＲと疑似残響音のＲ成分がサラウンド再生されるスピーカＲＬとの間に位置することとなる。そのため、疑似残響音の音像Ｂは聴取者Ｌの周辺に定位しやすくなり、結果として聴取者Ｌにおける包まれ感ＬＥＶを向上させることが可能になる。 As a result, the listener L is positioned between the speaker FL, which reproduces the L component of the direct sound, and the speaker RR, which reproduces the L component of the artificial reverberation sound in surround sound, and between the speaker FR, which reproduces the R component of the direct sound, and the speaker RL, which reproduces the R component of the artificial reverberation sound in surround sound. Therefore, the sound image B of the artificial reverberation sound is more likely to be localized around the listener L, and as a result, it is possible to improve the sense of envelopment LEV for the listener L.

また、出力制御部２３は、分離された残響音成分音源信号に対し、フィルタ２３ａを適用して残響信号を生成する。そして、出力制御部２３は、残響音成分音源信号から生成された残響信号をスピーカＲＬ，ＲＲから出力してもよい。これにより、残響音成分音源信号による疑似残響音が直接音等に付加される。詳しくは、例えば実際の残響音成分のうち、再生時に音像が不明瞭になりやすい残響音成分の疑似残響音が直接音に付加され、これにより当該残響音成分の音像が明瞭化され、また、包まれ感ＬＥＶをより向上させることが可能になる。 The output control unit 23 also applies a filter 23a to the separated reverberation component sound source signal to generate a reverberation signal. The output control unit 23 may then output the reverberation signal generated from the reverberation component sound source signal from the speakers RL and RR. This causes a pseudo-reverberation sound due to the reverberation component sound source signal to be added to the direct sound, etc. In more detail, for example, a pseudo-reverberation sound of a reverberation component that is likely to cause an unclear sound image during playback among the actual reverberation components is added to the direct sound, thereby clarifying the sound image of the reverberation component and further improving the sense of envelopment LEV.

ところで、例えば車両の車室においては、図４Ｂに想像線で示すように、車室の後方（例えば後部座席）に他の聴取者Ｌｘが着席する場合がある。このとき、上記した疑似残響音の付加処理が行わると、聴取者Ｌｘに対して、疑似残響音を含む残響音が過剰になるおそれがある。 Incidentally, for example, in the passenger compartment of a vehicle, as shown by the imaginary lines in FIG. 4B, another listener Lx may be seated at the rear of the passenger compartment (e.g., in the back seat). In this case, if the above-described process of adding artificial reverberation is performed, there is a risk that the reverberation sound, including the artificial reverberation sound, will become excessive for the listener Lx.

そこで、本実施形態に係る出力制御部２３は、車両の状態に応じて疑似残響音の付加処理を行うことで、聴取者Ｌｘに対して残響音が過剰になることを抑制してもよい。詳しくは、出力制御部２３は、車両の状態に応じて、除去音源信号に対してフィルタ２３ａを適用し、適用された除去音源信号（すなわち残響信号）を、チャンネル（例えばスピーカＲＬ，ＲＲなど）から出力することで、残響音が過剰になることを抑制してもよい。 Therefore, the output control unit 23 according to this embodiment may suppress the reverberation sound from becoming excessive for the listener Lx by performing a process of adding a pseudo reverberation sound according to the state of the vehicle. In detail, the output control unit 23 may suppress the reverberation sound from becoming excessive by applying a filter 23a to the removed sound source signal according to the state of the vehicle and outputting the applied removed sound source signal (i.e., the reverberation signal) from a channel (e.g., speakers RL, RR, etc.).

例えば、出力制御部２３は、図４Ｂに想像線で示すように、後部座席の聴取者Ｌｘより後方にスピーカＲｘが存在するような車両の状態である場合、フィルタ２３ａが適用された除去音源信号（残響信号）をスピーカＲｘから出力する。これにより、図示は省略するが、疑似残響音の音像を聴取者Ｌｘの周辺に定位させることが可能となる。そのため、聴取者Ｌｘおいても、聴取者Ｌと同様、包まれ感ＬＥＶを確保することができ、残響音が過剰になることを抑制することができる。 For example, as shown by the imaginary lines in FIG. 4B, when the vehicle is in a state in which the speaker Rx is located behind the listener Lx in the rear seat, the output control unit 23 outputs the removed sound source signal (reverberation signal) to which the filter 23a has been applied from the speaker Rx. This makes it possible to localize the sound image of the pseudo-reverberation sound around the listener Lx, although this is not shown in the figure. Therefore, the sense of envelopment LEV can be ensured for the listener Lx as for the listener L, and excessive reverberation can be prevented.

また、例えば出力制御部２３は、スピーカＦＬ，ＦＲ，ＲＬ，ＲＲに対するフェーダ調整指示（車両の状態の一例）に応じて疑似残響音の付加処理を行ってもよい。例えば、出力制御部２３は、後部座席重視で再生することを示すフェーダ調整指示が車両の状態として各種センサ６０から検出された場合、スピーカＲＬ，ＲＲから出力される残響信号を弱めたり、残響信号の出力を禁止したりしてもよい。これにより、疑似残響音が低下する、あるいは無くなるため、聴取者Ｌｘに対して残響音が過剰になることを抑制することができる。 For example, the output control unit 23 may add pseudo-reverberation in response to a fader adjustment instruction (an example of the vehicle state) for the speakers FL, FR, RL, and RR. For example, when a fader adjustment instruction indicating that playback should be focused on the rear seats is detected from the various sensors 60 as the vehicle state, the output control unit 23 may weaken the reverberation signal output from the speakers RL and RR or prohibit the output of the reverberation signal. This reduces or eliminates the pseudo-reverberation, thereby preventing the reverberation from becoming excessive for the listener Lx.

また、例えば出力制御部２３は、乗員の着席状態（車両の状態の一例）に応じて疑似残響音の付加処理を行ってもよい。例えば、出力制御部２３は、後部座席に乗員が着席していないこと、すなわち聴取者Ｌｘが存在しないことを示す乗員の着席状態が車両の状態として各種センサ６０から検出された場合に、残響信号をスピーカＲＬ，ＲＲから出力するようにしてもよい。これにより、上記したような過剰な残響音が生じることはない。 For example, the output control unit 23 may add a pseudo-reverberation sound depending on the seating state of the occupants (one example of the state of the vehicle). For example, when the various sensors 60 detect a seating state of the occupants indicating that no occupants are seated in the rear seats, i.e., that no listener Lx is present, as the state of the vehicle, the output control unit 23 may output a reverberation signal from the speakers RL and RR. This prevents the excessive reverberation sound described above from occurring.

また、例えば出力制御部２３は、車両の状態の一例である、車両における窓の開閉状態、エアコンの運転状態、車速などの走行状態の少なくともいずれかに応じて疑似残響音の付加処理を行ってもよい。例えば、後部座席の窓が開放されている状態や、エアコンが比較的強風で運転されている状態、車速が比較的高い走行状態などにおいては、後部座席に比較的大きな騒音が発生する。従って、後部座席の聴取者Ｌｘは、上記したような残響音を感じにくい環境下にあると推定される。そこで、出力制御部２３は、上記した窓の開放、エアコンの強風運転、高速走行などが車両の状態として各種センサ６０から検出された場合に、残響信号をスピーカＲＬ，ＲＲから出力するようにしてもよい。これにより、聴取者Ｌｘは残響音を感じにくい環境下にあるため、聴取者Ｌｘに対して残響音が過剰になることを抑制することができる。 For example, the output control unit 23 may add a pseudo reverberation sound according to at least one of the vehicle conditions, such as the vehicle window opening/closing state, the air conditioner operating state, and the vehicle speed. For example, when the rear seat window is open, the air conditioner is operating at a relatively strong wind, or the vehicle is running at a relatively high speed, a relatively loud noise is generated in the rear seat. Therefore, it is estimated that the listener Lx in the rear seat is in an environment where the above-mentioned reverberation sound is difficult to be felt. Therefore, the output control unit 23 may output a reverberation signal from the speakers RL and RR when the above-mentioned open window, strong wind operation of the air conditioner, high speed driving, etc. are detected as the vehicle conditions by the various sensors 60. As a result, the listener Lx is in an environment where the reverberation sound is difficult to be felt, and therefore the reverberation sound can be prevented from becoming excessive for the listener Lx.

なお、上記では、残響信号が、聴取者Ｌの後方に配置されたスピーカＲＬ，ＲＲから出力されるようにしたが、これに限定されるものではなく、例えば聴取者Ｌの上方に配置されたスピーカ（図示せず）などその他のスピーカから出力されるように構成してもよい。 In the above description, the reverberation signal is output from the speakers RL and RR located behind the listener L, but this is not limited to the above. The reverberation signal may be output from other speakers, such as a speaker (not shown) located above the listener L.

また、出力制御部２３は、音像の定位制御などを行うことができる。具体的には、出力制御部２３は、分離された音声音源信号による音声の音像Ａ１が所定位置に定位するように、音声音源信号を出力することができる。例えば、図４Ｃに示すように、出力制御部２３は、分離された音声音源信号をスピーカＲＬ，ＲＲから強調して出力することで、音声音源信号による音声（ボーカル）の音像Ａ１を聴取者Ｌ側へ変位した位置に定位させることができる。これにより、例えば音声の音像Ａ１と楽器音の音像ＡＬ，ＡＲとが重なりにくくすることが可能となり、よって各音像Ａ１，ＡＬ，ＡＲに対応する音をそれぞれ明瞭化させることができる。なお、本明細書における「強調」は、例えば各種信号に対応する音を新たに加えて再生したり、再生される音の大きさを増加させたりすることを意味するが、これに限定されるものではない。 The output control unit 23 can also perform localization control of the sound image. Specifically, the output control unit 23 can output the audio source signal so that the sound image A1 of the sound generated by the separated audio source signal is localized at a predetermined position. For example, as shown in FIG. 4C, the output control unit 23 can localize the sound image A1 of the sound (vocals) generated by the audio source signal at a position displaced toward the listener L by emphasizing and outputting the separated audio source signal from the speakers RL and RR. This makes it possible to make it difficult for the sound image A1 of the sound and the sound images AL and AR of the instrument sound to overlap, and thus makes it possible to clarify the sounds corresponding to each of the sound images A1, AL, and AR. Note that "emphasis" in this specification means, for example, adding and reproducing new sounds corresponding to various signals, or increasing the volume of the reproduced sound, but is not limited thereto.

なお、上記では、所定位置が聴取者Ｌ側へ変位した位置となるようにしたが、これに限定されるものではなく、任意の位置に設定可能である。例えば、図示は省略するが、聴取者Ｌの上方にスピーカが配置され、かかるスピーカから音声音源信号が強調して出力されるようにすることで、音声の音像Ａ１を上方へ変位した位置に定位させるようにしてもよい。これにより、音像Ａ１と音像ＡＬ，ＡＲとは、高さ方向にずれることとなるため、例えば音の立体感が増して臨場感を向上させることができる。 In the above, the specified position is set to a position displaced toward the listener L, but this is not limited to this and can be set to any position. For example, although not shown in the figure, a speaker may be placed above the listener L and the audio source signal may be emphasized and output from the speaker, so that the audio image A1 is localized at a position displaced upward. As a result, the sound image A1 and the sound images AL and AR are shifted in the height direction, which can increase the three-dimensional effect of the sound and improve the sense of realism, for example.

また、出力制御部２３は、分離された音声音源信号を、例えばスピーカＦＬ，ＦＲなどから強調して出力してもよい。これにより、直接音に含まれる音声（ボーカル）に、強調された音声音源信号の音声が付加され、よって音声の音像Ａ１をより明瞭化することが可能になる。 The output control unit 23 may also emphasize and output the separated audio source signal from, for example, speakers FL and FR. This adds the sound of the emphasized audio source signal to the sound (vocals) contained in the direct sound, thereby making it possible to make the sound image A1 of the audio clearer.

また、出力制御部２３は、音像の音像幅制御などを行うことができる。具体的には、出力制御部２３は、図４Ｄに示すように、分離されたＬ成分音源信号を一方のスピーカ（チャンネル）ＦＬから出力し、Ｌ成分音源信号による音の音像幅を制御してもよい。例えば、出力制御部２３は、分離されたＬ成分音源信号をスピーカＦＬから強調して出力する。これにより、直接音に含まれるＬ成分の例である楽器音（例えばピアノ）に、強調されたＬ成分音源信号の楽器音が付加され、よって当該楽器音の音像ＡＬの音像幅を左方へ拡大させることが可能になる。 The output control unit 23 can also control the sound image width of the sound image. Specifically, as shown in FIG. 4D, the output control unit 23 may output the separated L component sound source signal from one speaker (channel) FL and control the sound image width of the sound due to the L component sound source signal. For example, the output control unit 23 emphasizes and outputs the separated L component sound source signal from the speaker FL. This adds the instrument sound of the emphasized L component sound source signal to an instrument sound (e.g., a piano), which is an example of an L component contained in the direct sound, and thus makes it possible to expand the sound image width of the sound image AL of the instrument sound to the left.

同様に、出力制御部２３は、分離されたＲ成分音源信号を他方のスピーカ（チャンネル）ＦＲから出力し、Ｒ成分音源信号による音の音像幅を制御してもよい。例えば、出力制御部２３は、分離されたＲ成分音源信号をスピーカＦＲから強調して出力する。これにより、直接音に含まれるＲ成分の例である楽器音（例えばギター）に、強調されたＲ成分音源信号の楽器音が付加され、よって当該楽器音の音像ＡＲの音像幅を右方へ拡大させることが可能になる。 Similarly, the output control unit 23 may output the separated R component sound source signal from the other speaker (channel) FR and control the sound image width of the sound due to the R component sound source signal. For example, the output control unit 23 emphasizes and outputs the separated R component sound source signal from the speaker FR. This adds the instrument sound of the emphasized R component sound source signal to an instrument sound (e.g., a guitar), which is an example of the R component contained in the direct sound, thereby making it possible to expand the sound image width of the sound image AR of the instrument sound to the right.

また、図４Ｄに想像線で示すように、出力制御部２３は、分離されたＬ成分音源信号をスピーカＲＬから出力し、Ｌ成分音源信号による音（ここでは楽器音）の音像ＡＬｘの音像幅を後方へ拡大させるように制御してもよい。同様に、出力制御部２３は、分離されたＲ成分音源信号をスピーカＲＲから出力し、Ｒ成分音源信号による音（ここでは楽器音）の音像ＡＲｘの音像幅を後方へ拡大させるように制御してもよい。これにより、音像ＡＬｘ，ＡＲｘが聴取者Ｌを包み込むような位置に定位させることが可能になる。 Also, as shown by the imaginary lines in FIG. 4D, the output control unit 23 may output the separated L component sound source signal from the speaker RL, and control the sound image width of the sound image ALx of the sound (here, the instrument sound) due to the L component sound source signal to be expanded rearward. Similarly, the output control unit 23 may output the separated R component sound source signal from the speaker RR, and control the sound image width of the sound image ARx of the sound (here, the instrument sound) due to the R component sound source signal to be expanded rearward. This makes it possible to localize the sound images ALx and ARx at a position that envelops the listener L.

ここで、例えば車室内には、走行音やエアコン音などの騒音（走行騒音）が発生する。なお、図４Ｅの例では、騒音の音像を符号Ｃで示している。騒音の音像Ｃは、聴取者Ｌ付近に定位することがある。この場合、騒音の音像Ｃが音声の音像Ａ１や楽器音の音像ＡＬ，ＡＲと近接したり、重なったりするため、音声や音楽などに騒音が混ざってしまい、音像Ａ１，ＡＬ，ＡＲが不明瞭になるおそれがあった。 Here, for example, noises such as driving sounds and air conditioner sounds (driving noises) are generated inside the vehicle cabin. In the example of FIG. 4E, the sound image of the noise is indicated by the symbol C. The sound image C of the noise may be localized near the listener L. In this case, the sound image C of the noise may be close to or overlap with the sound image A1 of the voice or the sound images AL and AR of the musical instrument sound, so that the noise may mix with the voice or music, causing the sound images A1, AL, and AR to become unclear.

そこで、例えば出力制御部２３は、分離された音声音源信号をスピーカＦＬ，ＦＲから強調して出力したり、スピーカＲＬ，ＲＲから強調して出力されていた音声音源信号を弱めたり、出力禁止したりすることで、音声音源信号による音声の音像Ａ１を騒音の音像Ｃから離間する方向（例えば前方）へ変位した位置に定位させる（矢印Ｄ１参照）。これにより、例えば音声の音像Ａ１と騒音の音像Ｃとが重なりにくくすることが可能となる、言い換えると音声に騒音が混ざりにくくすることが可能となり、音声の音像Ａ１を明瞭化することができる。 For example, the output control unit 23 may output the separated audio source signal from the speakers FL and FR with emphasis, or may weaken or prohibit the output of the audio source signal that was output with emphasis from the speakers RL and RR, thereby localizing the audio image A1 of the audio source signal at a position displaced in a direction away from the noise image C (for example, forward) (see arrow D1). This makes it possible, for example, to make it difficult for the audio image A1 and the noise image C to overlap, in other words, to make it difficult for the noise to mix with the audio, and to clarify the audio image A1.

また、例えば出力制御部２３は、分離されたＬ成分音源信号をスピーカＦＬから強調して出力したり、スピーカＲＬから強調して出力されていたＬ成分音源信号を弱めたり、出力禁止したりする。これにより、出力制御部２３は、Ｌ成分音源信号の楽器音の音像ＡＬを騒音の音像Ｃから離間する方向（例えば前方）へ変位した位置に定位させたり（矢印Ｄ２ａ参照）、音像ＡＬの音像幅を左方へ拡大させたりする（矢印Ｄ２ｂ参照）。これにより、例えば楽器音の音像ＡＬと騒音の音像Ｃとが重なりにくくすることが可能となる、言い換えると楽器音に騒音が混ざりにくくすることが可能となり、楽器音の音像ＡＬを明瞭化することができる。 For example, the output control unit 23 may emphasize the separated L component sound source signal and output it from the speaker FL, or may weaken or prohibit the output of the L component sound source signal that was emphasized and output from the speaker RL. As a result, the output control unit 23 may localize the sound image AL of the instrument sound of the L component sound source signal at a position displaced in a direction away from the sound image C of the noise (for example, forward) (see arrow D2a), or may expand the sound image width of the sound image AL to the left (see arrow D2b). As a result, for example, it becomes possible to make it difficult for the sound image AL of the instrument sound and the sound image C of the noise to overlap, in other words, it becomes possible to make it difficult for the noise to mix with the instrument sound, and it becomes possible to clarify the sound image AL of the instrument sound.

同様に、例えば出力制御部２３は、分離されたＲ成分音源信号をスピーカＦＲから強調して出力したり、スピーカＲＲから強調して出力されていたＲ成分音源信号を弱めたり、出力禁止したりする。これにより、出力制御部２３は、Ｒ成分音源信号の楽器音の音像ＡＲを騒音の音像Ｃから離間する方向へ変位した位置に定位させたり（矢印Ｄ３ａ参照）、音像ＡＲの音像幅を右方へ拡大させたりする（矢印Ｄ３ｂ参照）。これにより、例えば楽器音の音像ＡＲと騒音の音像Ｃとが重なりにくくすることが可能となる、言い換えると楽器音に騒音が混ざりにくくすることが可能となり、楽器音の音像ＡＲを明瞭化することができる。 Similarly, for example, the output control unit 23 may emphasize the separated R component sound source signal and output it from the speaker FR, or weaken or prohibit the output of the R component sound source signal that was emphasized and output from the speaker RR. As a result, the output control unit 23 may localize the sound image AR of the instrument sound of the R component sound source signal at a position displaced away from the sound image C of the noise (see arrow D3a), or expand the sound image width of the sound image AR to the right (see arrow D3b). As a result, for example, it becomes possible to make it difficult for the sound image AR of the instrument sound and the sound image C of the noise to overlap, in other words, it becomes possible to make it difficult for the noise to mix with the instrument sound, and the sound image AR of the instrument sound can be made clearer.

なお、出力制御部２３は、騒音の音像Ｃの位置に応じて、音像Ａ１，ＡＬ，ＡＲの定位制御や音像幅制御を行ってもよい。すなわち、例えば、騒音の音像Ｃの位置は、車速などの走行状態、窓の開閉状態、エアコンの運転状態などに応じて変化する。そのため、例えば走行状態や窓の開閉状態、エアコンの運転状態と、騒音の音像Ｃの位置との相関関係を予め実験等を通じて算出しておき、出力制御部２３は、算出された相関関係を示す情報に基づいて、音像Ａ１，ＡＬ，ＡＲの定位制御や音像幅制御を行ってもよい。なお、騒音の音像Ｃの位置は、図示しないマイクなどを用いて集音された音などを解析して検出されるようにしてもよい。 The output control unit 23 may control the position and width of the sound images A1, AL, and AR depending on the position of the noise sound image C. That is, for example, the position of the noise sound image C changes depending on the driving conditions such as the vehicle speed, the open/closed state of the windows, the operating state of the air conditioner, etc. Therefore, for example, the correlation between the driving conditions, the open/closed state of the windows, the operating state of the air conditioner, and the position of the noise sound image C may be calculated in advance through experiments, etc., and the output control unit 23 may control the position and width of the sound images A1, AL, and AR based on information indicating the calculated correlation. The position of the noise sound image C may be detected by analyzing sounds collected using a microphone (not shown) or the like.

ここで、例えば車両は、乗員（運転者）の運転操作を要しない自動運転制御によって走行可能に構成される場合がある。このような自動運転制御が実行されているとき、乗員である聴取者Ｌにおいては運転負荷が軽減されるため、図４Ｆに示すように、車両の窓や車外に、映像３１が映し出されることがある。なお、映像３１は、任意の種類の映像に設定可能であるが、ここでは、音源信号に含まれる楽器音に対応する楽器が演奏される映像や音声（ボーカル）を発する歌手などの映像が含まれるものとする。 Here, for example, a vehicle may be configured to be able to run using automatic driving control that does not require the occupant (driver) to operate the vehicle. When such automatic driving control is being executed, the driving burden on the occupant (listener L) is reduced, and image 31 may be projected on the vehicle window or outside the vehicle, as shown in FIG. 4F. Note that image 31 can be set to any type of image, but in this case it includes an image of an instrument being played that corresponds to the instrument sound included in the sound source signal, or an image of a singer emitting sound (vocals).

このように、自動運転制御が実行されて映像３１が映し出されている車両状態の場合、出力制御部２３は、映像３１に応じて音像Ａ１，ＡＬ，ＡＲの定位制御や音像幅制御を行ってもよい。例えば、出力制御部２３は、分離された音声音源信号をスピーカＦＬ，ＦＲから強調して出力するなどして、音声音源信号による音声の音像Ａ１を映像３１と重なるような位置に定位させる。また、例えば、出力制御部２３は、分離されたＬ成分音源信号をスピーカＦＬやスピーカＲＬから強調して出力するなどして、Ｌ成分音源信号の楽器音の音像ＡＬを映像３１と重なるような位置に定位させたり、音像ＡＬの音像幅を制御したりする。同様に、出力制御部２３は、分離されたＲ成分音源信号をスピーカＦＲやスピーカＲＲから強調して出力するなどして、Ｒ成分音源信号の楽器音の音像ＡＲを映像３１と重なるような位置に定位せたり、音像ＡＲの音像幅を制御したりする。これにより、音像Ａ１，ＡＬ，ＡＲが歌手や楽器の映像３１に重なるため、聴取者Ｌに対して、映像３１に即した音像Ａ１，ＡＬ，ＡＲの拡がりを感じさせることが可能になる。 In this way, when the vehicle is in a state in which automatic driving control is being executed and image 31 is being displayed, output control unit 23 may perform localization control and sound image width control of sound images A1, AL, and AR in accordance with image 31. For example, output control unit 23 localizes sound image A1 of the sound from the sound source signal at a position where it overlaps with image 31 by emphasizing and outputting the separated audio sound source signal from speakers FL and FR. Also, for example, output control unit 23 localizes sound image AL of the instrument sound of the L component sound source signal at a position where it overlaps with image 31 or controls the sound image width of sound image AL by emphasizing and outputting the separated L component sound source signal from speakers FL and RL. Similarly, the output control unit 23 emphasizes and outputs the separated R component sound source signal from the speaker FR or the speaker RR, localizes the sound image AR of the instrument sound of the R component sound source signal at a position where it overlaps with the image 31, and controls the sound image width of the sound image AR. As a result, the sound images A1, AL, and AR overlap with the image 31 of the singer and the instrument, making it possible for the listener L to feel the spread of the sound images A1, AL, and AR in accordance with the image 31.

また、図４Ｇに示すように、例えば自動運転制御の実行時や駐車時において、乗員である聴取者Ｌの座席の向きが変えられることがある。このような場合、出力制御部２３は、座席の向き（あるいは聴取者Ｌの顔の向き）などに応じて、音像Ａ１，ＡＬ，ＡＲの定位制御や音像幅制御を行ってもよい。例えば、出力制御部２３は、分離された音声音源信号を、スピーカＦＬ，ＦＲ，ＲＬ，ＲＲのうち座席の向きに応じて選択されたスピーカから強調して出力するなどして、音声音源信号による音声の音像Ａ１を聴取者Ｌの正面方向の位置に定位させる。また、例えば、出力制御部２３は、分離されたＬ成分音源信号を、座席の向きに応じて選択されたスピーカから強調して出力するなどして、Ｌ成分音源信号の楽器音の音像ＡＬを聴取者Ｌの左方向の位置に定位させたり、音像ＡＬの音像幅を制御したりする。同様に、出力制御部２３は、分離されたＲ成分音源信号を、座席の向きに応じて選択されたスピーカから強調して出力するなどして、Ｒ成分音源信号の楽器音の音像ＡＲを聴取者Ｌの右方向の位置に定位させたり、音像ＡＲの音像幅を制御したりする。これにより、音像Ａ１，ＡＬ，ＡＲを、座席の向き（言い換えると聴取者Ｌの向き）などを含む車両の状態に即した位置に定位（配置）することが可能になる。なお、上記した聴取者Ｌの顔の向きは、図示しない車内カメラなどによって検出されるが、これに限定されるものではない。 As shown in FIG. 4G, the seat orientation of the occupant, the listener L, may be changed, for example, when performing automatic driving control or when parking. In such a case, the output control unit 23 may perform localization control or sound image width control of the sound images A1, AL, and AR according to the seat orientation (or the face orientation of the listener L). For example, the output control unit 23 may localize the sound image A1 of the sound by the sound source signal in a position in front of the listener L by emphasizing and outputting the separated voice sound source signal from a speaker selected from the speakers FL, FR, RL, and RR according to the seat orientation. Also, for example, the output control unit 23 may localize the sound image AL of the instrument sound of the L component sound source signal in a position to the left of the listener L or control the sound image width of the sound image AL by emphasizing and outputting the separated L component sound source signal from a speaker selected according to the seat orientation. Similarly, the output control unit 23 localizes the sound image AR of the instrument sound of the R component sound source signal to the right of the listener L and controls the sound image width of the sound image AR by emphasizing and outputting the separated R component sound source signal from a speaker selected according to the seat orientation, etc. This makes it possible to localize (place) the sound images A1, AL, and AR in positions that correspond to the state of the vehicle, including the orientation of the seat (in other words, the orientation of the listener L). Note that the orientation of the face of the listener L described above is detected by an in-car camera (not shown), but is not limited to this.

また、上記では、座席の向きや聴取者Ｌの顔の向きなどに応じて、音像Ａ１，ＡＬ，ＡＲの定位制御等が行われるようにしたが、これに限られず、例えば聴取者Ｌ（乗員）などの指示に応じて定位制御等が行われてもよい。また、定位制御等においては、図４Ｇに示すように、音像Ａ１，ＡＬ，ＡＲと騒音の音像Ｃとが重ならないようにすることで、音像Ａ１，ＡＬ，ＡＲを明瞭化することができる。 In the above, the localization control of the sound images A1, AL, and AR is performed according to the orientation of the seats and the orientation of the face of the listener L, but the present invention is not limited to this and the localization control may be performed according to instructions from the listener L (passenger), for example. In the localization control, the sound images A1, AL, and AR can be made clearer by preventing the sound images A1, AL, and AR from overlapping with the noise sound image C, as shown in FIG. 4G.

図２の説明を続けると、例えば音源信号に含まれる音声として、ボーカルと、ナビゲーションなどにおける音声案内とが含まれる場合がある。かかる場合、ボーカルに音声案内が重なって、聴取者Ｌは音声案内を聞き取りにくくなるおそれがある。 Continuing with the explanation of Figure 2, for example, the audio contained in the sound source signal may include vocals and voice guidance in navigation, etc. In such a case, the voice guidance may overlap with the vocals, making it difficult for the listener L to hear the voice guidance.

そこで、例えば出力制御部２３は、ボーカルを含む音声音源信号が除去された音源信号（除去音源信号）と、音声案内を含む音源信号とをスピーカＦＬ，ＦＲから直接音として出力する。これにより、図示は省略するが、車室内には、楽器音の音像ＡＬ，ＡＲと、音声案内の音像とが定位することとなる。そのため、ボーカルに音声案内が重なることがなく、よって聴取者Ｌは音声案内を容易に聞き取ることが可能になる。また、このときの出力制御部２３は、オーディオボリュームを下げずに音声案内を行うことも可能である。 For example, the output control unit 23 outputs the sound source signal from which the audio sound source signal including vocals has been removed (removed sound source signal) and the sound source signal including the audio guidance as direct sounds from the speakers FL and FR. As a result, although not shown in the figure, the sound images AL and AR of the instrument sounds and the sound image of the audio guidance are localized in the vehicle cabin. Therefore, the audio guidance does not overlap with the vocals, and the listener L can easily hear the audio guidance. In addition, the output control unit 23 at this time can also provide audio guidance without lowering the audio volume.

＜第１の実施形態に係る音響装置の制御処理＞
次に、音響装置１における具体的な処理手順について図５を用いて説明する。図５は、第１の実施形態に係る音響装置１が実行する処理手順を示すフローチャートである。 <Control process of the audio device according to the first embodiment>
Next, a specific processing procedure in the audio device 1 will be described with reference to Fig. 5. Fig. 5 is a flowchart showing the processing procedure executed by the audio device 1 according to the first embodiment.

図５に示すように、音響装置１の制御部２は、音源装置５０から音源信号を取得する（ステップＳ１０）。次いで、制御部２は、音源信号から、音声音源信号、Ｌ成分音源信号、Ｒ成分音源信号および残響音成分音源信号を分離する（ステップＳ１１）。 As shown in FIG. 5, the control unit 2 of the acoustic device 1 acquires a sound source signal from the sound source device 50 (step S10). Next, the control unit 2 separates a voice sound source signal, an L component sound source signal, an R component sound source signal, and a reverberation sound component sound source signal from the sound source signal (step S11).

次いで、制御部２は、音源信号から音声音源信号を除去し、音声音源信号が除去された音源信号、および、残響音成分音源信号にそれぞれフィルタ２３ａを適用して残響音信号を生成する（ステップＳ１２）。 Next, the control unit 2 removes the voice source signal from the sound source signal, and applies the filter 23a to the sound source signal from which the voice source signal has been removed and to the reverberation sound component sound source signal to generate a reverberation sound signal (step S12).

次いで、制御部２は、音源信号、分離された音声音源信号、Ｌ成分音源信号およびＲ成分音源信号、生成された残響音信号をそれぞれ出力制御する（ステップＳ１３）。 Next, the control unit 2 controls the output of the sound source signal, the separated voice sound source signal, the L component sound source signal, the R component sound source signal, and the generated reverberation sound signal (step S13).

上述してきたように、第１の実施形態に係る音響装置１は、分離部２２と、出力制御部２３とを備える。分離部２２は、音源信号から疑似的な疑似残響音の付加処理が不要な所定音源信号（音声音源信号）を分離し、分離された所定音源信号を音源信号から除去する。出力制御部２３は、分離部２２によって所定音源信号が除去された音源信号に対し、疑似残響音を生成するためのフィルタ２３ａを適用して出力する。これにより、音源信号に応じた適切なサラウンド再生を行うことができる。 As described above, the acoustic device 1 according to the first embodiment includes a separation unit 22 and an output control unit 23. The separation unit 22 separates a predetermined sound source signal (audio sound source signal) that does not require processing to add artificial pseudo-reverberation from the sound source signal, and removes the separated predetermined sound source signal from the sound source signal. The output control unit 23 applies a filter 23a for generating pseudo-reverberation to the sound source signal from which the predetermined sound source signal has been removed by the separation unit 22, and outputs the result. This makes it possible to perform appropriate surround reproduction according to the sound source signal.

また、分離部２２においては、時間周波数解析を用いるようにしたため、比較的低演算量で、音声音源信号などの各種音源信号を分離することが可能になる。また、本実施形態においては、分離された音声音源信号などの各種音源信号ごとに、音像定位制御や音像幅制御、包まれ感ＬＥＶの最適化などを行うようにしたので、スピーカＦＬ，ＦＲ，ＲＬ，ＲＲから再生される音について総合的な空間印象を向上させることができる。 In addition, because the separation unit 22 uses time-frequency analysis, it is possible to separate various sound source signals such as audio source signals with a relatively low amount of calculation. In addition, in this embodiment, sound image localization control, sound image width control, optimization of the envelopment feeling LEV, etc. are performed for each of the various sound source signals such as the separated audio source signals, so that the overall spatial impression of the sound reproduced from the speakers FL, FR, RL, and RR can be improved.

また、本実施形態にあっては、音源信号から音声音源信号などの各種音源信号を分離する処理と、フィルタ２３ａとを併用するようにした。これにより、例えばフィルタ２３ａのみを用いる場合に比べ、フィルタ２３ａのフィルタ長を短くすることができ、よって制御部２における演算量をよって減少させることが可能になる。 In addition, in this embodiment, the process of separating various sound source signals, such as a voice sound source signal, from the sound source signal is used in combination with the filter 23a. This allows the filter length of the filter 23a to be shorter than when only the filter 23a is used, for example, and therefore makes it possible to reduce the amount of calculation in the control unit 2.

また、例えば音源信号に継続時間の短い打楽器などの楽器音が含まれると、周波数特性において、所定の周波数付近でレベルが突出する卓越成分が存在することがある。かかる卓越成分が存在すると、サラウンド再生したときに疑似残響音を含む残響音が過剰になりやすい。また、高周波数ほど残響音が分離しやすく、残響音が過剰になりやすい。そこで、例えば制御部２は、高周波数ほど直線的または滑らかに減衰するような周波数特性にする処理を行うことで、上記した卓越成分の影響を低減させることができ、よって継続時間の短い打楽器の楽器音を含むような音源信号であっても、残響音が過剰になることを抑制することが可能になる。 For example, if the sound source signal contains a short-duration instrument sound such as a percussion instrument, the frequency characteristics may contain prominent components whose levels stand out near a certain frequency. If such prominent components are present, the reverberation sound, including pseudo-reverberation sound, tends to become excessive during surround playback. Also, the higher the frequency, the easier it is for the reverberation sound to be separated, and the more likely it is that the reverberation sound will become excessive. Therefore, for example, the control unit 2 can reduce the influence of the prominent components by performing a process that creates frequency characteristics that decay more linearly or smoothly as the frequency increases, and therefore it becomes possible to prevent the reverberation sound from becoming excessive, even for a sound source signal that contains a short-duration instrument sound such as a percussion instrument.

また、制御部２は、例えば各スピーカＦＲ，ＦＬ，ＲＬ，ＲＲ（サラウンドチャンネル）に残響信号を出力する際、ランダム的な遅延を入れたり、フィルタ２３ａの立上りなどの時間特性にランダム性を持たせたりして、相互相関が低くなるようにしてもよい。これにより、聴取者Ｌは、疑似残響音をより自然な残響音として感じることが可能になる。 In addition, the control unit 2 may, for example, when outputting the reverberation signal to each speaker FR, FL, RL, RR (surround channel), introduce a random delay or impart randomness to the time characteristics such as the rise time of the filter 23a, thereby reducing the cross-correlation. This allows the listener L to experience the artificial reverberation sound as a more natural reverberation sound.

（第２の実施形態）
＜第２の実施形態に係る音響装置の構成＞
次いで、第２の実施形態に係る音響装置１の構成について図６を参照しつつ説明する。図６は、第２の実施形態に係る音響装置１を備えた音響システム１００の構成例を示すブロック図である。なお、以下においては、第１の実施形態と共通の構成については、同一の符号を付して説明を省略する。 Second Embodiment
<Configuration of the audio device according to the second embodiment>
Next, the configuration of the acoustic device 1 according to the second embodiment will be described with reference to Fig. 6. Fig. 6 is a block diagram showing an example of the configuration of an acoustic system 100 including the acoustic device 1 according to the second embodiment. Note that, in the following, the same reference numerals are used for the configuration common to the first embodiment, and the description thereof will be omitted.

第２の実施形態に係る音響装置１の制御部２は、疑似残響音を生成するフィルタ２３ａのゲインを設定するフィルタ設定部２４を備える。第２の実施形態では、フィルタ設定部２４により、フィルタ２３ａのゲインを音源（楽器や音声）毎に変えることで、１曲の音源信号においてより自然な疑似残響音を出力することがきる。具体的には、第２の実施形態において、フィルタ設定部２４は、音源信号に含まれる音源毎の特徴に基づいて音源毎にゲインが最適化されたフィルタに設定することで、高音質なサラウンド再生を実現することができる。 The control unit 2 of the acoustic device 1 according to the second embodiment includes a filter setting unit 24 that sets the gain of the filter 23a that generates the pseudo-reverberation sound. In the second embodiment, the filter setting unit 24 changes the gain of the filter 23a for each sound source (musical instrument or voice), thereby outputting a more natural pseudo-reverberation sound from the sound source signal of one song. Specifically, in the second embodiment, the filter setting unit 24 sets a filter with an optimized gain for each sound source based on the characteristics of each sound source contained in the sound source signal, thereby realizing high-quality surround playback.

以下、具体的に説明すると、制御部２の取得部２１は、音源信号に関する音源情報を取得する。音源情報は、例えば、音源信号の種別（ジャンル）や、録音環境に関する情報であるが、これに限定されるものではない。なお、音源信号の種別は、例えば、音声や楽器（打楽器や管楽器等）、クラシック音楽等である。録音環境は、例えば、レコーディングスタジオやコンサートホール等の録音した場所の情報である。 To explain more specifically below, the acquisition unit 21 of the control unit 2 acquires sound source information related to the sound source signal. The sound source information is, for example, the type (genre) of the sound source signal and information on the recording environment, but is not limited to this. The type of sound source signal is, for example, voice, musical instruments (percussion instruments, wind instruments, etc.), classical music, etc. The recording environment is, for example, information on the location where the recording was made, such as a recording studio or concert hall.

取得部２１は、例えば、聴取者Ｌ等のユーザによる入力により音源情報を取得したり、インターネットを介してサーバ等から音源情報を取得したりする。また、取得部２１は、取得した音源信号を解析して音源情報を取得してもよい。取得部２１は、取得された音源情報をフィルタ設定部２４へ出力する。 The acquisition unit 21 acquires sound source information, for example, through input by a user such as the listener L, or acquires sound source information from a server or the like via the Internet. The acquisition unit 21 may also acquire sound source information by analyzing the acquired sound source signal. The acquisition unit 21 outputs the acquired sound source information to the filter setting unit 24.

フィルタ設定部２４は、検出部２４ａと、決定部２４ｂとを備える。検出部２４ａは、音源信号に基づいて、音源信号に含まれる音響成分の特徴である特徴情報を検出する。 The filter setting unit 24 includes a detection unit 24a and a determination unit 24b. The detection unit 24a detects feature information, which is a feature of the acoustic components contained in the sound source signal, based on the sound source signal.

例えば、検出部２４ａは、音響成分の特徴に関する特徴情報として、２つのチャンネルそれぞれで再生される音源信号の関係性に関するチャンネル関係情報を検出する。具体的には、チャンネル関係情報は、ＬＲチャンネル差やＬＲチャンネル相関の情報を含む。 For example, the detection unit 24a detects channel relationship information related to the relationship between the sound source signals reproduced on each of the two channels as feature information related to the features of the acoustic components. Specifically, the channel relationship information includes information on the LR channel difference and the LR channel correlation.

ＬＲチャンネル差とは、上記したチャンネル間レベル差（ＩＣＬＤ）や、チャンネル間時間差（Inter-channel Time Difference）であるが、これらに限定されるものではない。ＬＲチャンネル相関とは、２つのチャンネルでステレオ再生される２つの音源信号の相関成分に関する情報である。具体的には、ＬＲチャンネル相関は、チャンネル間の相互相関（ＩＣＣ：Inter-channel Cross Correlation）であるが、これに限定されるものではない。 The LR channel difference is the above-mentioned inter-channel level difference (ICLD) or inter-channel time difference (Inter-channel Time Difference), but is not limited to these. The LR channel correlation is information about the correlation components of two sound source signals reproduced in stereo on two channels. Specifically, the LR channel correlation is the inter-channel cross correlation (ICC), but is not limited to this.

このように、検出部２４ａは、特徴情報として、チャンネル関係情報を検出することで、音響成分を高精度に検出することができるため、出力制御部２３におけるフィルタ処理により最適な疑似残響音を生成することができる。 In this way, the detection unit 24a can detect the acoustic components with high accuracy by detecting channel relationship information as feature information, and therefore can generate optimal pseudo-reverberation sounds by filter processing in the output control unit 23.

そして、検出部２４ａは、チャンネル関係情報を一定間隔で連続して検出して時系列に並べ、検出した時系列のチャンネル関係情報について移動平均を算出することで、かかる移動平均を特徴情報として検出する。これにより、チャンネル関係情報の急峻な変化を平滑化できるため、後段の決定部２４ｂによって決定されるゲインの急峻な変化を平滑化できる。この結果、出力制御部２３によって生成される疑似残響音の変化を滑らかにすることができるため、より自然なサラウンド再生を実現できる。 The detection unit 24a then continuously detects the channel relationship information at regular intervals, arranges it in a time series, and calculates a moving average of the detected time-series channel relationship information, thereby detecting the moving average as feature information. This makes it possible to smooth out any sudden changes in the channel relationship information, and therefore to smooth out any sudden changes in the gain determined by the subsequent determination unit 24b. As a result, it is possible to smooth out the changes in the pseudo-reverberation sound generated by the output control unit 23, thereby achieving more natural surround playback.

なお、検出部２４ａは、時系列のチャンネル関係情報のうち、音源信号の音圧レベル（振幅）が所定値未満の区間についてはチャンネル関係情報を最大化した後移動平均する。つまり、検出部２４ａは、時系列のチャンネル関係情報のうち、音源信号の音圧レベルが所定値未満の区間はサラウンドを出さず所定値未満の区間周辺はサラウンドを出さない方向へ制御するよう移動平均を算出する。 The detection unit 24a maximizes the channel relationship information for sections of the time-series channel relationship information where the sound pressure level (amplitude) of the sound source signal is less than a predetermined value, and then calculates a moving average. In other words, the detection unit 24a calculates a moving average so as to control the time-series channel relationship information so that surround sound is not output for sections where the sound pressure level of the sound source signal is less than a predetermined value, and surround sound is not output around the section where the sound pressure level is less than the predetermined value.

これにより、打楽器のソロ演奏のように打点間に無音が存在するような場合に第一波面の法則が成立する上限が低くなるため、打楽器に対しフィルタゲインを下げて適したサラウンドレベルへ制御される。 This lowers the upper limit at which the law of the first wave front applies when there is silence between beats, such as in a solo percussion performance, so the filter gain for the percussion instruments is lowered to control the surround level to an appropriate level.

また、検出部２４ａは、音響成分の継続時間や周波数に関する情報を特徴情報として検出する。継続時間とは、音源信号が継続する時間である。具体的には、検出部２４ａは、音源信号の包絡線を算出し、算出した包絡線を微分後、再度包絡線を算出し、得られた包絡線の傾きから継続時間を検出する。周波数とは、音源信号の周波数成分に関する情報である。具体的には、検出部２４ａは、音源信号に対して短時間フーリエ変換した音源データにおいて、音源信号の周波数の重心を検出する。 The detection unit 24a also detects information related to the duration and frequency of the acoustic components as feature information. The duration is the time that the sound source signal continues. Specifically, the detection unit 24a calculates the envelope of the sound source signal, differentiates the calculated envelope, and then calculates the envelope again to detect the duration from the slope of the obtained envelope. The frequency is information related to the frequency components of the sound source signal. Specifically, the detection unit 24a detects the center of gravity of the frequency of the sound source signal in the sound source data obtained by performing a short-time Fourier transform on the sound source signal.

このように、検出部２４ａは、特徴情報として、継続時間や周波数の情報を検出することで、音響成分を高精度に検出することができるため、後段の出力制御部２３におけるフィルタ処理により最適な疑似残響音を生成することができる。 In this way, the detection unit 24a can detect acoustic components with high accuracy by detecting duration and frequency information as feature information, and therefore can generate optimal pseudo-reverberation sounds through filtering in the output control unit 23 at the downstream stage.

決定部２４ｂは、出力制御部２３が用いるフィルタ２３ａのゲインを決定する。具体的には、決定部２４ｂは、検出部２４ａが検出した特徴情報および取得部２１によって取得された音源情報の少なくとも一方に基づいて、フィルタ２３ａのゲインを決定する。 The determination unit 24b determines the gain of the filter 23a used by the output control unit 23. Specifically, the determination unit 24b determines the gain of the filter 23a based on at least one of the feature information detected by the detection unit 24a and the sound source information acquired by the acquisition unit 21.

具体的には、決定部２４ｂは、特徴情報および音源情報の少なくとも一方に基づいて、第１波面の法則から求めた閾値範囲を補正し、補正後の閾値範囲に基づいてゲインを決定する。ここで、第１波面の法則を元にしたゲインの決定処理について図７を用いて説明する。 Specifically, the determination unit 24b corrects the threshold range determined from the first wave front law based on at least one of the feature information and the sound source information, and determines the gain based on the corrected threshold range. Here, the process of determining the gain based on the first wave front law is described with reference to FIG. 7.

図７は、決定部２４ｂによるゲインの決定処理を説明する図である。図７に示すように、決定部２４ｂは、まず、第１波面の法則から求めた閾値範囲を特徴情報（チャンネル関係情報、継続時間および周波数）および音源情報に基づいて補正する。 Figure 7 is a diagram explaining the gain determination process performed by the determination unit 24b. As shown in Figure 7, the determination unit 24b first corrects the threshold range obtained from the first wave front law based on the feature information (channel relationship information, duration, and frequency) and sound source information.

例えば、図７の中段のグラフに示す閾値範囲である領域Ｒ１および領域Ｒ２を基準範囲とする。かかる場合、決定部２４ｂは、特徴情報および音源情報に基づいて、閾値ＴＨ１および閾値ＴＨ２を時間および振幅の２軸で補正することで、領域Ｒ１および領域Ｒ２を補正する。 For example, the threshold ranges shown in the graph in the middle of Figure 7, regions R1 and R2, are set as the reference ranges. In this case, the determination unit 24b corrects regions R1 and R2 by correcting the thresholds TH1 and TH2 on the two axes of time and amplitude based on the feature information and sound source information.

例えば、決定部２４ｂは、チャンネル関係情報が大きい程、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が大きくなる方向に補正する。つまり、決定部２４ｂは、チャンネル関係情報が大きい程、領域Ｒ１および領域Ｒ２が大きくなるように補正する。 For example, the larger the channel relationship information, the more the determination unit 24b corrects the thresholds TH1 and TH2 so that the time and amplitude become larger. In other words, the larger the channel relationship information, the more the determination unit 24b corrects the regions R1 and R2 so that they become larger.

一方、決定部２４ｂは、チャンネル関係情報が小さい程、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が小さくなる方向に補正する。つまり、決定部２４ｂは、チャンネル関係情報が小さい程、領域Ｒ１および領域Ｒ２が小さくなるように補正する。 On the other hand, the smaller the channel relationship information, the more the determination unit 24b corrects the thresholds TH1 and TH2 in the direction of decreasing the time and amplitude. In other words, the smaller the channel relationship information, the more the determination unit 24b corrects the regions R1 and R2 to become smaller.

なお、チャンネル関係情報であるチャンネル間レベル差が大きい程、または、チャンネル間時間差が大きい程、相互相関が低い程（無相関成分が多い程）、チャンネル関係情報が大きくなる。 The channel relationship information becomes larger the greater the inter-channel level difference, or the greater the inter-channel time difference, or the lower the cross-correlation (the more uncorrelated components there are).

また、決定部２４ｂは、音響成分の継続時間が長い程、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が大きくなる方向に補正する。つまり、決定部２４ｂは、音響成分の継続時間が長い程、領域Ｒ１および領域Ｒ２が大きくなるように補正する。 Furthermore, the determination unit 24b corrects the thresholds TH1 and TH2 in the direction of increasing the time and amplitude as the duration of the sound component becomes longer. In other words, the determination unit 24b corrects the regions R1 and R2 so that they become larger as the duration of the sound component becomes longer.

一方、決定部２４ｂは、音響成分の継続時間が短い程、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が小さくなる方向に補正する。つまり、決定部２４ｂは、音響成分の継続時間が短い程、領域Ｒ１および領域Ｒ２が小さくなるように補正する。 On the other hand, the determination unit 24b corrects the thresholds TH1 and TH2 in a direction that reduces the time and amplitude as the duration of the sound component becomes shorter. In other words, the determination unit 24b corrects the regions R1 and R2 so that they become smaller as the duration of the sound component becomes shorter.

また、決定部２４ｂは、音響成分の周波数（重心）が低い程、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が大きくなる方向に補正する。つまり、決定部２４ｂは、音響成分の周波数（重心）が低い程、領域Ｒ１および領域Ｒ２が大きくなるように補正する。 In addition, the determination unit 24b corrects the thresholds TH1 and TH2 in a direction in which the time and amplitude increase as the frequency (center of gravity) of the sound component decreases. In other words, the determination unit 24b corrects the regions R1 and R2 to increase as the frequency (center of gravity) of the sound component decreases.

一方、決定部２４ｂは、音響成分の周波数（重心）が高い程、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が小さくなる方向に補正する。つまり、決定部２４ｂは、音響成分の周波数（重心）が高い程、領域Ｒ１および領域Ｒ２が小さくなるように補正する。 On the other hand, the determination unit 24b corrects the thresholds TH1 and TH2 in the direction of decreasing the time and amplitude as the frequency (center of gravity) of the sound component increases. In other words, the determination unit 24b corrects the regions R1 and R2 to become smaller as the frequency (center of gravity) of the sound component increases.

また、決定部２４ｂは、音源情報から音源信号がクラシックである場合には、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が大きくなる方向に補正する。つまり、決定部２４ｂは、音源信号がクラシックである場合には、領域Ｒ１および領域Ｒ２が大きくなるように補正する。 In addition, when the sound source signal is determined to be classical based on the sound source information, the determination unit 24b corrects the thresholds TH1 and TH2 in the direction of increasing the time and amplitude. In other words, when the sound source signal is classical, the determination unit 24b corrects the regions R1 and R2 to become larger.

一方、決定部２４ｂは、音源情報から音源信号が音声である場合には、閾値ＴＨ１および閾値ＴＨ２を時間および振幅が小さくなる方向に補正する。つまり、決定部２４ｂは、音源信号が音声である場合には、領域Ｒ１および領域Ｒ２が小さくなるように補正する。 On the other hand, when the sound source information indicates that the sound source signal is voice, the determination unit 24b corrects the thresholds TH1 and TH2 in a direction that reduces the time and amplitude. In other words, when the sound source signal is voice, the determination unit 24b corrects the regions R1 and R2 to be smaller.

そして、決定部２４ｂは、補正後の閾値範囲に基づいて、後段の出力制御部２３で用いられるフィルタ２３ａのゲインを決定する。具体的には、決定部２４ｂは、フィルタ２３ａにより生成される疑似残響音が補正後の閾値範囲に収まるようにゲインを決定する。 Then, the determination unit 24b determines the gain of the filter 23a used in the output control unit 23 at the subsequent stage based on the corrected threshold range. Specifically, the determination unit 24b determines the gain so that the pseudo reverberation sound generated by the filter 23a falls within the corrected threshold range.

つまり、決定部２４ｂは、領域Ｒ１および領域Ｒ２が大きくなるように補正された場合には、疑似残響音の時間および振幅が大きくなるようにゲインを決定する。一方、決定部２４ｂは、領域Ｒ１および領域Ｒ２が小さくなるように補正された場合には、疑似残響音の時間および振幅が小さくなるようにゲインを決定する。 In other words, when regions R1 and R2 are corrected to be larger, the determination unit 24b determines the gain so that the time and amplitude of the artificial reverberation sound are increased. On the other hand, when regions R1 and R2 are corrected to be smaller, the determination unit 24b determines the gain so that the time and amplitude of the artificial reverberation sound are decreased.

このように、決定部２４ｂは、特徴情報および音源情報に基づいて補正した第１波面の法則の閾値範囲からゲインを決定することで、最適な疑似残響音を生成するためのゲインを決定することができる。 In this way, the determination unit 24b can determine the gain for generating the optimal pseudo-reverberation sound by determining the gain from the threshold range of the first wave front law corrected based on the feature information and sound source information.

そして、出力制御部２３は、決定部２４ｂによって決定されたゲインが設定されたフィルタ２３ａを用いて、疑似残響音を示す残響信号を生成して出力する。 Then, the output control unit 23 generates and outputs a reverberation signal indicating the artificial reverberation sound using the filter 23a to which the gain determined by the determination unit 24b is set.

これにより、第２の実施形態においては、例えば音声や、打楽器、クラシック音楽等のように、音源の残響音の特徴が異なる場合であっても、音源毎に最適な疑似残響音を生成でき、音源の特徴に応じて最適なサラウンド再生が可能となる。すなわち、第２の実施形態にあっては、高音質なサラウンド再生を行うことができる。 As a result, in the second embodiment, even if the characteristics of the reverberation of the sound source are different, such as for voice, percussion, classical music, etc., it is possible to generate an optimal pseudo-reverberation sound for each sound source, and optimal surround playback according to the characteristics of the sound source is possible. In other words, in the second embodiment, high-quality surround playback can be performed.

＜第２の実施形態に係る音響装置の制御処理＞
次に、第２の実施形態に係る音響装置１における具体的な処理手順について図８を用いて説明する。図８は、第２の実施形態に係る音響装置１が実行する処理手順を示すフローチャートである。 <Control process of the audio device according to the second embodiment>
Next, a specific processing procedure in the audio device 1 according to the second embodiment will be described with reference to Fig. 8. Fig. 8 is a flowchart showing the processing procedure executed by the audio device 1 according to the second embodiment.

図８に示すように、音響装置１の制御部２は、ステップＳ１０，Ｓ１１において、第１の実施形態と同様の処理を行う。次いで、制御部２は、音源信号に関する音源情報を取得する（ステップＳ１１ａ）。 As shown in FIG. 8, the control unit 2 of the audio device 1 performs the same processes as in the first embodiment in steps S10 and S11. Next, the control unit 2 acquires sound source information related to the sound source signal (step S11a).

次いで、制御部２は、取得された音源信号および音源情報に基づいて、音源信号に含まれる音響成分の特徴である特徴情報を検出する（ステップＳ１１ｂ）。次いで、制御部２は、検出された特徴情報に基づいて、第１波面の法則における閾値範囲を決定する（ステップＳ１１ｃ）。 Next, the control unit 2 detects feature information that is a feature of the acoustic components contained in the sound source signal based on the acquired sound source signal and sound source information (step S11b). Next, the control unit 2 determines a threshold range in the first wave front law based on the detected feature information (step S11c).

次いで、制御部２は、決定した閾値範囲に基づいて、疑似残響音が閾値範囲に収まるようフィルタ２３ａのゲインを決定する（ステップＳ１１ｄ）。そして、制御部２は、ゲインが決定して設定されたフィルタ２３ａを用いて、ステップＳ１２以降の処理を実行する。 Next, the control unit 2 determines the gain of the filter 23a based on the determined threshold range so that the pseudo reverberation sound falls within the threshold range (step S11d). Then, the control unit 2 executes the processes from step S12 onward using the filter 23a whose gain has been determined and set.

なお、上記では、音源信号から音声音源信号を除去する処理の際、Ｌｃｈ用音源信号およびＲｃｈ用音源信号に対して時間周波数解析が行われるようにしたが、これに限定されるものではない。すなわち、時間周波数解析を行わず、Ｌｃｈ用音源信号とＲｃｈ用音源信号との差や比、ＬＲチャンネル相関などを用いて、音源信号から音声音源信号を除去する処理が行われてもよく、これにより低演算化を図るようにしてもよい。 In the above description, when removing the audio source signal from the sound source signal, a time-frequency analysis is performed on the Lch sound source signal and the Rch sound source signal, but this is not limited to this. In other words, instead of performing a time-frequency analysis, a process of removing the audio source signal from the sound source signal may be performed using the difference or ratio between the Lch sound source signal and the Rch sound source signal, L/R channel correlation, etc., thereby reducing the amount of calculation.

さらなる効果や変形例は、当業者によって容易に導き出すことができる。このため、本発明のより広範な態様は、以上のように表しかつ記述した特定の詳細および代表的な実施形態に限定されるものではない。したがって、添付の特許請求の範囲およびその均等物によって定義される総括的な発明の概念の精神または範囲から逸脱することなく、様々な変更が可能である。 Further advantages and modifications may readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described above. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and equivalents thereof.

１音響装置
２制御部
２２分離部
２３出力制御部
２４ａ検出部
２４ｂ決定部 Reference Signs List 1 Acoustic device 2 Control unit 22 Separation unit 23 Output control unit 24a Detection unit 24b Determination unit

Claims

a separation unit that separates a predetermined sound source signal that does not require a process of adding a pseudo reverberation sound from a sound source signal, and removes the separated predetermined sound source signal from the sound source signal;
an output control unit that applies a filter for generating the pseudo reverberation sound to the sound source signal from which the predetermined sound source signal has been removed by the separation unit, and outputs the resultant signal ;
The output control unit is
applying the filter to the sound source signal from which the predetermined sound source signal has been removed and to the predetermined sound source signal, and outputting the artificial reverberation sound corresponding to the sound source signal from which the predetermined sound source signal has been removed such that a reverberation level of the artificial reverberation sound is higher than a reverberation level of the artificial reverberation sound corresponding to the predetermined sound source signal.
Sound equipment.

The output control unit is
outputting the predetermined sound source signal so that a sound image of the sound due to the predetermined sound source signal separated by the separation unit is localized at a predetermined position ;
2. An acoustic device according to claim 1.

The separation unit is
performing a time-frequency analysis on the sound source signal to separate the predetermined sound source signal ;
3. An acoustic device according to claim 1 or 2 .

The sound source signal is
A stereo signal that is played back in stereo on two channels.
The separation unit is
calculating difference information indicating a difference between the sound source signals reproduced on the two channels, and separating the predetermined sound source signal based on the calculated difference information ;
The acoustic device according to any one of claims 1 to 3 .

The sound source signal is
A stereo signal that is played back in stereo on two channels.
The separation unit is
Separating a first sound source signal including a component of a sound to be reproduced in one of the two channels from the sound source signal, and a second sound source signal including a component of a sound to be reproduced in the other of the two channels;
The output control unit is
outputting the first sound source signal separated by the separation unit from one of the channels to control a sound image width of the sound due to the first sound source signal, and outputting the second sound source signal from the other of the channels to control a sound image width of the sound due to the second sound source signal ;
The acoustic device according to any one of claims 1 to 4 .

a detection unit that detects, based on the sound source signal, feature information indicating features of an acoustic component included in the sound source signal;
a determination unit that determines a gain of the filter based on the feature information detected by the detection unit,
The output control unit is
generating and outputting a reverberation signal indicative of the artificial reverberation sound using the filter to which the gain determined by the determination unit is set ;
An acoustic device according to any one of claims 1 to 5 .

An acquisition unit for acquiring a state of the vehicle,
The output control unit is
applying the filter to the sound source signal from which the predetermined sound source signal has been removed in accordance with the state of the vehicle acquired by the acquisition unit, and outputting the sound source signal from a channel disposed at least one of behind and above a listener .
An acoustic device according to any one of claims 1 to 6 .

The predetermined sound source signal includes a voice component .
An acoustic device according to any one of claims 1 to 7 .

a separation step of separating a predetermined sound source signal that does not require addition of a pseudo reverberation sound from the sound source signal, and removing the separated predetermined sound source signal from the sound source signal;
an output control step of applying a filter for generating the artificial reverberation sound to the sound source signal from which the predetermined sound source signal has been removed by the separation step, and outputting the resultant sound;
The output control step includes:
applying the filter to the sound source signal from which the predetermined sound source signal has been removed and to the predetermined sound source signal, and outputting the artificial reverberation sound corresponding to the sound source signal from which the predetermined sound source signal has been removed such that a reverberation level of the artificial reverberation sound is higher than a reverberation level of the artificial reverberation sound corresponding to the predetermined sound source signal.
Acoustic control methods.