JP4882757B2

JP4882757B2 - Audio conference system

Info

Publication number: JP4882757B2
Application number: JP2007008682A
Authority: JP
Inventors: 利晃石橋; 田中　　良; 訓史鵜飼
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2007-01-18
Filing date: 2007-01-18
Publication date: 2012-02-22
Anticipated expiration: 2027-01-18
Also published as: JP2008177802A

Description

この発明は、互いに離れた位置に配置された二つの音声会議装置を接続して音声会議を行う音声会議システム、および当該音声会議システムに用いる音声会議装置に関するものである。 The present invention relates to an audio conference system that performs an audio conference by connecting two audio conference devices arranged at positions separated from each other, and an audio conference device used in the audio conference system.

従来、互いに離れた二地点間で音声会議を行う場合、それぞれの地点に特許文献１や特許文献２のような音声会議装置を配置し、当該音声会議装置を取り囲むように会議者が在席して会議を行う。 Conventionally, when an audio conference is performed between two points separated from each other, an audio conference device such as Patent Document 1 or Patent Document 2 is arranged at each point, and a conference person is present so as to surround the audio conference device. Hold a meeting.

特許文献１および特許文献２の音声会議装置では、天面から外部に放音するように、筐体の中心に一つのスピーカが配置され、側面の各コーナ部にそれぞれ異なる方位を収音方向とする複数のマイクが配置されている。 In the audio conference apparatuses of Patent Literature 1 and Patent Literature 2, a single speaker is arranged at the center of the housing so that sound is emitted from the top surface to the outside, and a different direction is set at each corner portion on the side surface as the sound collection direction. A plurality of microphones are arranged.

このような従来の音声会議装置では、各マイクでそれぞれに異なる方位からの発生音を収音して音声信号を相手側の音声会議装置に送信する。一方、音声会議装置は、相手側の音声会議装置で収音された音声信号を受信すると、そのままスピーカから放音する。
特開平８−２９８６９６号公報特開平８−２０４８０３号公報 In such a conventional audio conference apparatus, sound generated from different directions is collected by each microphone and an audio signal is transmitted to the audio conference apparatus on the other side. On the other hand, when the voice conference device receives the voice signal collected by the other party's voice conference device, the voice conference device directly emits the sound from the speaker.
JP-A-8-298696 JP-A-8-204803

前述の従来の音声会議システムでは、収音側（送信側）の音声会議装置は、全てのマイクで筐体周囲の全方位の音声を収音し、一つの放音用音声信号として送信する。そして、放音側（受信側）の音声会議装置は、受信した放音用音声信号をスピーカにあたえ全方位に均一に放音する。 In the above-described conventional audio conference system, the audio conferencing apparatus on the sound collection side (transmission side) collects audio in all directions around the casing with all microphones and transmits it as one sound emission audio signal. Then, the sound conferencing device on the sound emission side (reception side) applies the received sound emission sound signal to the speaker and emits the sound uniformly in all directions.

このような構成では、相手先の音声会議装置に複数の会議者が存在していても、相手先から受信した放音用音声信号が全方位に対して均一に放音されるので、相手先の音声会議装置に在席する会議者の位置関係が分からない。このため、会議にあまり臨場感を与えることができない。 In such a configuration, even if there are a plurality of participants in the other party's voice conference device, the sound emission sound signal received from the other party is emitted uniformly in all directions. I do not know the positional relationship of the participants who are present in the audio conference device. For this reason, it is not possible to give a sense of realism to the conference.

したがって、本発明の目的は、互いの音声会議装置に在席する会議者の位置に応じて、臨場感の溢れる会議を実現できる音声会議システムおよび当該音声システムに利用する音声会議装置を提供することにある。 Accordingly, an object of the present invention is to provide an audio conference system capable of realizing a conference with a sense of presence according to the positions of conference participants who are present at each audio conference device, and an audio conference device used for the audio system. It is in.

この発明の音声会議システムは、円板状の筐体と、該筐体に円周状に配置された複数の単一指向性マイクと、前記筐体に円周状に配置された複数のスピーカと、をそれぞれに備えた複数の音声会議装置と、当該複数の音声会議装置のうちの少なくとも二台を接続する接続手段と、を備えた音声会議システムであって、送信側の音声会議装置は、前記複数の単一指向性マイクの収音信号からそれぞれに異なる収音方位の収音ビーム信号を形成し、会議者の発生音に基づく収音ビーム信号を信号強度により選択するとともに、選択した収音ビーム信号に対応して、前記円板状の筐体の外方の全方位中の収音方位を検出して話者方位情報を生成し、前記選択した収音ビーム信号に基づく放音用音声信号と対応する話者方位情報とを関連付けして送信し、前記複数の単一指向性マイクは、前記筐体の上面側に、前記筐体を平面視した中心方向を収音方向として設置され、受信側の音声会議装置は、前記送信側の音声会議装置からの放音用音声信号と対応する話者方位情報とを受信し、当該話者方位情報の方位と同方位に対して仮想音源を設定し、当該仮想音源から放音されたように、前記複数のスピーカから放音される音声を制御し、前記複数のスピーカは、前記筐体の下面側に、前記筐体から外方を放音方向として設置されたことを特徴としている。 The audio conference system according to the present invention includes a disk-shaped housing, a plurality of unidirectional microphones arranged circumferentially in the housing, and a plurality of speakers arranged circumferentially in the housing. A voice conferencing system comprising: a plurality of audio conferencing apparatuses, and a connection unit that connects at least two of the plurality of audio conferencing apparatuses; The sound collecting beam signals having different sound collecting directions are formed from the sound collecting signals of the plurality of unidirectional microphones, and the sound collecting beam signal based on the sound generated by the conference is selected and selected according to the signal intensity. Corresponding to the sound collection beam signal, the sound collection direction is generated by detecting the sound collection direction in all the outer directions of the disk-shaped housing, and the sound emission based on the selected sound collection beam signal is generated. Associate audio signal and corresponding speaker orientation information The plurality of unidirectional microphones are on the upper surface of the housing, wherein the housing is disposed a central direction in a plan view as the sound collection direction, voice conference device on the receiving side, audio conference of the sender The sound signal for sound emission from the device and the corresponding speaker orientation information are received, a virtual sound source is set for the same orientation as the orientation of the speaker orientation information, and sound is emitted from the virtual sound source, The sound emitted from the plurality of speakers is controlled, and the plurality of speakers are installed on the lower surface side of the casing with the sound emitting direction outward from the casing.

この構成では、送信側の音声会議装置は、会議者の発言を収音するとともに、筐体に対する会議者の方位（収音方位）を取得する。そして、収音音声に基づく放音用音声信号と話者方位情報とを関連付けして送信する。受信側の音声会議装置は、受信した話者方位情報から筐体に対する仮想音源の方位を設定し、当該方位から放音用音声信号に基づく音が放音されているように複数のスピーカから放音される音声を制御する。これにより、受信側の音声会議装置に在席する会議者は、送信側の音声会議装置での話者方位と同じ方位から発声音が放音されているように聞き取ることができる。
また、この構成では、互いに最も近接するマイクの収音指向性方向とスピーカの放音指向性方向とが逆方向になり、スピーカからの回り込み音声をマイクで収音し難くなるので、筐体から外方の全方位中の目的とする話者方位を検出することがより高精度に可能になる。 In this configuration, the voice conference device on the transmission side collects the speech of the conference person, and acquires the orientation (sound collection direction) of the conference person with respect to the housing. Then, the sound emission sound signal based on the collected sound and the speaker orientation information are transmitted in association with each other. The audio conference device on the receiving side sets the direction of the virtual sound source with respect to the housing from the received speaker direction information, and emits sound from a plurality of speakers so that sound based on the sound output sound signal is emitted from the direction. Control the sound that is heard. Thereby, the conference person who is present at the voice conference device on the receiving side can listen as if the uttered sound is emitted from the same direction as the speaker direction in the voice conference device on the transmission side.
Also, with this configuration, the sound pickup directivity direction of the microphones closest to each other and the sound output directivity direction of the speaker are opposite to each other, making it difficult for the microphone to collect the sneak sound from the speaker. It becomes possible to detect the target speaker direction in all directions outside with higher accuracy.

また、この発明の受信側の音声会議装置は、設定した仮想音源位置と各スピーカとの位置関係とに基づいて、各スピーカから放音する音声の振幅および遅延制御を行うことを特徴としている。 In addition, the voice conference device on the receiving side of the present invention is characterized in that it controls the amplitude and delay of the sound emitted from each speaker based on the set virtual sound source position and the positional relationship between the speakers.

この構成では、各スピーカと仮想音源位置との位置関係に応じて、各スピーカからの放音音声が振幅制御・遅延制御される。より、具体的には、仮想音源位置からの距離に応じて放音音声の振幅を減衰させ、且つ仮想音源位置からの距離に応じて放音タイミングを遅延させる。これにより、放音側（受信側）の音声会議装置に在席する会議者は、音声会議装置に対してどの方位にいても、仮想音源位置から放音されたように聞き取ることができる。 In this configuration, the sound output from each speaker is subjected to amplitude control and delay control in accordance with the positional relationship between each speaker and the virtual sound source position. More specifically, the amplitude of the emitted sound is attenuated according to the distance from the virtual sound source position, and the sound emission timing is delayed according to the distance from the virtual sound source position. Thereby, the conference person who is present in the voice conference apparatus on the sound emission side (reception side) can hear as if the sound was emitted from the virtual sound source position regardless of the direction of the voice conference apparatus.

また、この発明の送信側の音声会議装置は、話者方位とともに発声音の発声位置と筐体の発声位置に最も近い側面との距離を算出し、話者方位と距離とから話者方位情報を生成する。受信側の音声会議装置は、受信した話者方位情報から得られる方位と距離とに基づいて仮想音源を設定することを特徴としている。 The voice conference device on the transmission side according to the present invention calculates the distance between the utterance position of the utterance sound and the side surface closest to the utterance position of the housing together with the speaker orientation, and the speaker orientation information from the speaker orientation and the distance. Is generated. The voice conference device on the receiving side is characterized in that a virtual sound source is set based on the direction and distance obtained from the received speaker direction information.

この構成では、話者方位のみでなく、筐体から話者までの距離をも算出して仮想音源位置を設定する。これにより、収音側の音声会議装置の会議者の発言位置を、さらに正確に、放音側の音声会議装置で再現することができる。 In this configuration, not only the speaker orientation but also the distance from the housing to the speaker is calculated to set the virtual sound source position. Thereby, the speech position of the conferee of the voice conference device on the sound collection side can be more accurately reproduced by the voice conference device on the sound output side.

この発明によれば、相手先装置の会議者の位置から放音されるので、目には見えない相手先の会議室での各会議者の位置が分かるとともに、当該位置から放音される音声を聞くことで、臨場感に溢れる音声会議システムを実現することができる。 According to this invention, since the sound is emitted from the position of the other party's conference person, the position of each conference person in the other party's invisible meeting room is known, and the sound emitted from that position is heard. By listening to, you can realize a voice conference system full of realism.

本発明の実施形態に係る音声会議システムについて、図を参照して説明する。
図１は本実施形態の音声会議システムの構成図である。
図２は本実施形態の音声会議システムに用いる音声会議装置の外形図であり、（Ａ）が平面図、（Ｂ）が側面図である。図２において、θは、音声会議装置１を平面視した中心を回転中心として、マイクＭＣ１、スピーカＳＰ１方向が０°となり、反時計回りに増加する角度を示す。
図３は図２に示した音声会議装置の機能ブロック図である。
図１に示すように、音声会議システムは、離間された会議室１００Ａ，１００Ｂにそれぞれ配置された音声会議装置１Ａ，１Ｂと、これら音声会議装置１Ａ，１Ｂを接続するネットワーク５００と、を備える。会議室１００Ａ，１００Ｂの略中心には、会議テーブル１０１Ａ，１０１Ｂがそれぞれ設置されており、それぞれの会議テーブル１０１Ａ，１０１Ｂ上に、音声会議装置１Ａ，１Ｂが配置されている。これら音声会議装置１Ａ，１Ｂには、入出力Ｉ／Ｆが備えられており、これら入出力Ｉ／Ｆを介してネットワークに接続している。例えば、このような会議室１００Ａで、会議者２０１Ａ，２０３Ａは音声会議装置１Ａを挟むように対向して着席しており、会議者２０１Ａが音声会議装置１ＡのスピーカＳＰ１側、会議者２０３Ａが音声会議装置１ＡのスピーカＳＰ３側に着席している。また、会議室１００Ｂで、会議者２０２Ｂ，２０４Ｂは、音声会議装置１Ｂを挟むように対向して着席しており、会議者２０２Ｂが音声会議装置１ＢのスピーカＳＰ２側、会議者２０４Ｂが音声会議装置１ＢのスピーカＳＰ４側に着席している。 An audio conference system according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a configuration diagram of an audio conference system according to the present embodiment.
2A and 2B are outline views of the audio conference apparatus used in the audio conference system of the present embodiment. FIG. 2A is a plan view and FIG. 2B is a side view. In FIG. 2, θ represents an angle that increases counterclockwise when the direction of the microphone MC1 and the speaker SP1 is 0 ° with the center of the audio conference apparatus 1 in plan view as the rotation center.
FIG. 3 is a functional block diagram of the voice conference apparatus shown in FIG.
As shown in FIG. 1, the audio conference system includes audio conference apparatuses 1A and 1B arranged in conference rooms 100A and 100B which are separated from each other, and a network 500 connecting the audio conference apparatuses 1A and 1B. Conference tables 101A and 101B are installed at substantially the center of the conference rooms 100A and 100B, respectively, and the audio conference apparatuses 1A and 1B are arranged on the conference tables 101A and 101B. These audio conference apparatuses 1A and 1B are provided with input / output I / Fs, and are connected to the network via these input / output I / Fs. For example, in such a conference room 100A, the conferees 201A and 203A are seated facing each other with the audio conference apparatus 1A in between, the conference person 201A is the speaker SP1 side of the audio conference apparatus 1A, and the conference person 203A is audio. The user sits on the speaker SP3 side of the conference apparatus 1A. In the conference room 100B, the conference members 202B and 204B are seated facing each other with the audio conference device 1B in between, the conference member 202B is the speaker SP2 side of the audio conference device 1B, and the conference person 204B is the audio conference device. Sitting on the speaker SP4 side of 1B.

各音声会議装置１Ａ，１Ｂは同仕様のものであり、円板状の筐体１１を備える。具体的に、筐体１１は、平面視した形状が円形であり、天面と底面との面積が垂直方向の途中部分の面積よりも狭く、側面視した形状が、高さ方向の一点から天面に向けて狭くなるとともに、前記一点から底面に向けて狭くなる形状からなる。すなわち、前記一点より上部側および下部側にそれぞれ傾斜面を有する形状からなる。筐体１１の天面には、該天面の面積よりも狭く、所定深さからなる凹部１２が形成されており、凹部１２の平面視した中心と天面の中心とが、一致するように設定されている。 Each of the audio conference apparatuses 1A and 1B has the same specifications, and includes a disk-shaped casing 11. Specifically, the casing 11 has a circular shape in plan view, the area between the top surface and the bottom surface is narrower than the area of the middle part in the vertical direction, and the shape in side view has a ceiling from one point in the height direction. It has a shape that narrows toward the surface and narrows from the one point toward the bottom surface. That is, it has a shape having inclined surfaces on the upper side and the lower side from the one point. The top surface of the housing 11 is formed with a recess 12 having a predetermined depth that is smaller than the area of the top surface so that the center of the recess 12 in plan view coincides with the center of the top surface. Is set.

１６個のマイクＭＣ１〜ＭＣ１６は、凹部１２の側面に沿った筐体１１の天面側内部に設置されており、各マイクＭＣ１〜ＭＣ１６は音声会議装置１を平面視した中心を回転中心として等角度ピッチ（この場合は約２２．５°間隔）で配置されている。この際、マイクＭＣ１がθ＝０°方向となり、順にθが２２．５°ずつ増加する方向に沿って各マイクＭＣ１〜ＭＣ１６が配置される。例えば、マイクＭＣ５はθ＝９０°方向に配置され、マイクＭＣ９はθ＝１８０°方向に配置され、マイクＭＣ１３は、θ＝２７０°方向に配置される。また、各マイクＭＣ１〜ＭＣ１６は、単一指向性を有し、それぞれが前記平面視した中心方向に強い指向性を有するように配置されている。例えば、マイクＭＣ１はθ＝１８０°方向を指向性の中心とし、マイクＭＣ５はθ＝２７０°方向を指向性の中心とし、マイクＭＣ９はθ＝０（３６０）°方向を指向性の中心とし、マイクＭＣ１３はθ＝９０°方向を指向性の中心とする。なお、マイクの個数はこれに限らず、仕様に応じて適宜設定すればよい。 The 16 microphones MC1 to MC16 are installed inside the top surface of the housing 11 along the side surface of the recess 12. Each of the microphones MC1 to MC16 has a center in the plan view of the audio conference device 1 as a rotation center, etc. They are arranged at an angular pitch (in this case, an interval of about 22.5 °). At this time, the microphone MC1 is in the direction of θ = 0 °, and the microphones MC1 to MC16 are arranged along the direction in which θ increases by 22.5 ° in order. For example, the microphone MC5 is disposed in the θ = 90 ° direction, the microphone MC9 is disposed in the θ = 180 ° direction, and the microphone MC13 is disposed in the θ = 270 ° direction. Further, each of the microphones MC1 to MC16 has a single directivity, and each microphone is arranged so as to have a strong directivity in the central direction as viewed from above. For example, the microphone MC1 has the direction of θ = 180 ° as the center of directivity, the microphone MC5 has the direction of θ = 270 ° as the center of directivity, and the microphone MC9 has the direction of θ = 0 (360) ° as the center of directivity. The microphone MC13 has the direction of θ = 90 ° as the center of directivity. The number of microphones is not limited to this, and may be set as appropriate according to specifications.

４個のスピーカＳＰ１〜ＳＰ４は、筐体１１の下部側の傾斜面と放音面が一致するようにそれぞれ設置されており、各スピーカＳＰ１〜ＳＰ４は音声会議装置１を平面視した中心を回転中心として等角度ピッチ（この場合は約９０°間隔）で配置されている。この際、スピーカＳＰ１がマイクＭＣ１と同じθ＝０°方向に配置され、スピーカＳＰ２がマイクＭＣ５と同じθ＝９０°方向に配置され、スピーカＳＰ３がマイクＭＣ９と同じθ＝１８０°方向に配置され、スピーカＳＰ４がマイクＭＣ１３と同じθ＝２７０°方向に配置される。また、各スピーカＳＰ１〜ＳＰ４は、放音面の正面方向に強い指向性を有するものであり、スピーカＳＰ１はθ＝０°方向を中心に放音し、スピーカＳＰ２はθ＝９０°方向を中心に放音し、スピーカＳＰ３はθ＝１８０°方向を中心に放音し、スピーカＳＰ４はθ＝２７０°方向を中心に放音する。 The four speakers SP1 to SP4 are respectively installed so that the inclined surface on the lower side of the housing 11 and the sound emitting surface coincide with each other, and the speakers SP1 to SP4 rotate around the center of the audio conference apparatus 1 in plan view. The centers are arranged at equiangular pitches (in this case, intervals of about 90 °). At this time, the speaker SP1 is arranged in the same θ = 0 ° direction as the microphone MC1, the speaker SP2 is arranged in the same θ = 90 ° direction as the microphone MC5, and the speaker SP3 is arranged in the same θ = 180 ° direction as the microphone MC9. The speaker SP4 is arranged in the same θ = 270 ° direction as the microphone MC13. The speakers SP1 to SP4 have strong directivity in the front direction of the sound emitting surface. The speaker SP1 emits sound around the θ = 0 ° direction, and the speaker SP2 centers around the θ = 90 ° direction. The speaker SP3 emits sound around the θ = 180 ° direction, and the speaker SP4 emits sound around the θ = 270 ° direction.

このように、スピーカＳＰ１〜ＳＰ４を筐体１１の下部側に配置し、マイクＭＣ１〜ＭＣ１６を筐体１１の上部側に配置し、マイクＭＣ１〜ＭＣ１６の収音方向を筐体１１の中心方向とすることで、各マイクＭＣ１〜ＭＣ１６は、スピーカＳＰ１〜ＳＰ４からの回り込み音声を収音しく難くなる。このため、後述する話者位置検出で、回り込み音声の影響を受け難くなり、より高精度に話者位置検出が行える。 As described above, the speakers SP1 to SP4 are arranged on the lower side of the casing 11, the microphones MC1 to MC16 are arranged on the upper side of the casing 11, and the sound collection direction of the microphones MC1 to MC16 is the center direction of the casing 11. This makes it difficult for the microphones MC1 to MC16 to collect the wraparound sound from the speakers SP1 to SP4. For this reason, it becomes difficult to be influenced by the wraparound sound in the speaker position detection described later, and the speaker position can be detected with higher accuracy.

操作部２９は、筐体１１の上部側の傾斜面に設置されており、図示しないが、各種の操作釦および液晶表示パネルを備える。
入出力Ｉ／Ｆは、筐体１１の下部側の傾斜面で、スピーカＳＰ１〜ＳＰ４が設置されていない位置に設置されており、図示しないが、ネットワーク接続端子、ディジタルオーディオ端子、アナログオーディオ端子等を備える。そして、このネットワーク接続端子にネットワークケーブルを接続して、前述のネットワーク５００に接続する。 The operation unit 29 is installed on an inclined surface on the upper side of the housing 11 and includes various operation buttons and a liquid crystal display panel (not shown).
The input / output I / F is an inclined surface on the lower side of the housing 11 and is installed at a position where the speakers SP1 to SP4 are not installed. Although not shown, a network connection terminal, a digital audio terminal, an analog audio terminal, etc. Is provided. Then, a network cable is connected to the network connection terminal to connect to the network 500 described above.

音声会議装置１は、このような構造上の構成とともに、図３に示すような機能的な構成を備える。
制御部２０は、音声会議装置１の設定、収音、放音等の全般制御を行うとともに、操作部２９により入力された操作指示内容に基づく制御を音声会議装置１の各部に与える。 The audio conference apparatus 1 has a functional configuration as shown in FIG. 3 in addition to such a structural configuration.
The control unit 20 performs general control such as setting, sound collection, and sound emission of the audio conference apparatus 1, and gives control to each unit of the audio conference apparatus 1 based on the operation instruction content input by the operation unit 29.

（１）放音
通信制御部２１は、入出力Ｉ／Ｆを介して受信した相手先音声会議装置からの音声通信データから、音声データを取得して放音用音声信号Ｓ１〜Ｓ３としてチャンネルＣＨ１〜ＣＨ３に出力する。この際、通信制御部２１は、音声通信データから相手先装置ＩＤを取得し、相手先装置ＩＤ毎にチャンネルＣＨを割り当てる。例えば、接続中の相手先装置が一台である場合、当該相手先装置からの音声データを放音用音声信号Ｓ１として、チャンネルＣＨ１に割り当てる。また、接続中の相手先装置が二台である場合、二台の相手先装置からの音声データをそれぞれ放音用音声信号Ｓ１，Ｓ２として、チャンネルＣＨ１，ＣＨ２に個別に割り当てる。同様に、接続中の相手先装置が三台である場合、三台の相手先装置からの音声データをそれぞれ放音用音声信号Ｓ１，Ｓ２，Ｓ３として、チャンネルＣＨ１，ＣＨ２，ＣＨ３に個別に割り当てる。チャンネルＣＨ１〜ＣＨ３は、エコーキャンセル部２８を介して放音制御部２２に接続される。
通信制御部２１は、音声通信データの各音声データに関連付けされた相手先音声会議装置での話者方位データＰｙ（Ｐｍ）を抽出し、チャンネル情報とともに放音制御部２２に与える。 (1) Sound release The communication control unit 21 acquires voice data from the voice communication data from the other party's voice conference apparatus received via the input / output I / F, and outputs the voice data as the voice signals S1 to S3 for the channel CH1. Output to ~ CH3. At this time, the communication control unit 21 acquires the counterpart device ID from the voice communication data, and allocates a channel CH for each counterpart device ID. For example, when there is one connected counterpart device, the audio data from the counterpart device is assigned to the channel CH1 as the sound output sound signal S1. Further, when there are two connected counterpart devices, the audio data from the two counterpart devices are individually assigned to the channels CH1 and CH2 as sound emission sound signals S1 and S2, respectively. Similarly, when there are three connected counterpart devices, the audio data from the three counterpart devices are individually assigned to channels CH1, CH2, and CH3 as sound emission sound signals S1, S2, and S3, respectively. . Channels CH <b> 1 to CH <b> 3 are connected to the sound emission control unit 22 via the echo cancellation unit 28.
The communication control unit 21 extracts the speaker orientation data Py (Pm) in the destination voice conference device associated with each voice data of the voice communication data, and gives it to the sound emission control unit 22 together with the channel information.

放音制御部２２は、放音用音声信号Ｓ１〜Ｓ３と、話者方位情報Ｐｙとに基づいて、各スピーカＳＰ１〜ＳＰ４に与えるスピーカ出力信号ＳＰＤ１〜ＳＰＤ４を生成する。 The sound emission control unit 22 generates speaker output signals SPD1 to SPD4 to be given to the speakers SP1 to SP4 based on the sound signals for sound emission S1 to S3 and the speaker orientation information Py.

図４は、本実施形態の放音制御部２２の主要構成を示すブロック図である。
図５（Ａ）は、放音仕様制御部２２０で設定する仮想音源の分布を示す図である。また、図５（Ｂ）は、放音仕様テーブル２２８１の内容を示す図である。 FIG. 4 is a block diagram illustrating a main configuration of the sound emission control unit 22 of the present embodiment.
FIG. 5A is a diagram illustrating a distribution of virtual sound sources set by the sound emission specification control unit 220. FIG. 5B shows the contents of the sound emission specification table 2281.

図４に示すように、放音制御部２２は、各放音用音声信号Ｓ１〜Ｓ３に対応する個別放音信号生成部２２１〜２２３、放音仕様制御部２２０、各スピーカ出力信号ＳＰＤ１〜ＳＰＤ４に対応する信号合成部２２４〜２２７、放音仕様テーブル２２８１を記憶するメモリ２２８と、を備える。 As shown in FIG. 4, the sound emission control unit 22 includes individual sound emission signal generation units 221 to 223 corresponding to the sound emission sound signals S1 to S3, a sound emission specification control unit 220, and speaker output signals SPD1 to SPD4. , And a memory 228 for storing a sound emission specification table 2281.

放音仕様制御部２２０は、通信制御部２１からの話者方位情報Ｐｙに基づいて、仮想音源を設定する。仮想音源は、図５（Ａ）に示すように、筐体１１から外方の水平方向に対して所定距離の位置で、且つ、筐体１１の中心を回転中心とし、４５°間隔で設定されている。より具体的には、筐体１１の中心からスピーカＳＰ１とマイクＭＣ１とが配置された方向である、θ＝０°方向に仮想音源９０１が設定され、当該仮想音源９０１から反時計回り方向に４５°の間隔で、順に仮想音源９０２〜９０８が設定される。なお、仮想音源数はこれに限るものではなく、装置仕様に応じて適宜設定すればよい。 The sound emission specification control unit 220 sets a virtual sound source based on the speaker orientation information Py from the communication control unit 21. As shown in FIG. 5A, the virtual sound source is set at a predetermined distance from the casing 11 in the horizontal direction outward and at 45 ° intervals with the center of the casing 11 as the center of rotation. ing. More specifically, the virtual sound source 901 is set in the direction of θ = 0 °, which is the direction in which the speaker SP1 and the microphone MC1 are arranged from the center of the housing 11, and 45 from the virtual sound source 901 in the counterclockwise direction. Virtual sound sources 902 to 908 are sequentially set at intervals of °. The number of virtual sound sources is not limited to this, and may be set as appropriate according to the device specifications.

放音仕様制御部２２０は、設定した仮想音源に基づいて、メモリ２２８に記憶されている放音仕様テーブル２２８１から該当する仮想音源を設定する遅延調整量Ｄおよびゲイン調整量Ｇを読み出す。例えば、話者方位情報Ｐｙがθ＝０°方向を示すものであれば、放音仕様制御部２２０は、仮想音源９０１が設定され、ＳＰ１用の遅延調整量Ｄ１１、ゲイン調整量Ｇ１１と、ＳＰ２用の遅延調整量Ｄ２１、ゲイン調整量Ｇ２１と、ＳＰ３用の遅延調整量Ｄ３１、ゲイン調整量Ｇ３１と、ＳＰ４用の遅延調整量Ｄ４１、ゲイン調整量Ｇ４１と、を読み出す。なお、ここで、遅延調整量Ｄおよびゲイン調整量Ｇは、設定する仮想音源と各スピーカＳＰ１〜ＳＰ４との距離により予め設定される。また、これらの調整量は、音声会議装置を設置した後に、その場で放収音環境測定の実験を行って設定しても良い。 The sound emission specification control unit 220 reads the delay adjustment amount D and the gain adjustment amount G for setting the corresponding virtual sound source from the sound emission specification table 2281 stored in the memory 228 based on the set virtual sound source. For example, if the speaker orientation information Py indicates the direction of θ = 0 °, the sound emission specification control unit 220 is set with the virtual sound source 901, the delay adjustment amount D11 for SP1, the gain adjustment amount G11, and SP2. Delay adjustment amount D21, gain adjustment amount G21, SP3 delay adjustment amount D31, gain adjustment amount G31, SP4 delay adjustment amount D41, and gain adjustment amount G41 are read. Here, the delay adjustment amount D and the gain adjustment amount G are preset according to the distance between the virtual sound source to be set and the speakers SP1 to SP4. Further, these adjustment amounts may be set by performing an experiment for measuring the sound emission and collection environment on the spot after installing the audio conference apparatus.

放音仕様制御部２２０は、各スピーカＳＰ１〜ＳＰ４に対する遅延調整量Ｄ、ゲイン調整量Ｇを読み出すと、話者方位情報Ｐｙとともに与えられたチャンネル情報に基づいて、該当するチャンネル用の個別放音信号生成部２２１〜２２３のいずれかに、遅延調整量Ｄ、ゲイン調整量Ｇを出力する。例えば、仮想音源９０１が設定され、チャンネル情報がＣＨ１を示すものであれば、放音仕様制御部２２０は、ＳＰ１用の遅延調整量Ｄ１１、ゲイン調整量Ｇ１１と、ＳＰ２用の遅延調整量Ｄ２１、ゲイン調整量Ｇ２１と、ＳＰ３用の遅延調整量Ｄ３１、ゲイン調整量Ｇ３１と、ＳＰ４用の遅延調整量Ｄ４１、ゲイン調整量Ｇ４１と、を個別放音信号生成部２２１に与える。 When the sound emission specification control unit 220 reads the delay adjustment amount D and the gain adjustment amount G for each of the speakers SP1 to SP4, the individual sound emission specification for the corresponding channel is based on the channel information given together with the speaker orientation information Py. The delay adjustment amount D and the gain adjustment amount G are output to any one of the signal generation units 221 to 223. For example, if the virtual sound source 901 is set and the channel information indicates CH1, the sound emission specification control unit 220 includes the SP1 delay adjustment amount D11, the gain adjustment amount G11, and the SP2 delay adjustment amount D21. The gain adjustment amount G21, the SP3 delay adjustment amount D31, the gain adjustment amount G31, the SP4 delay adjustment amount D41, and the gain adjustment amount G41 are provided to the individual sound emission signal generation unit 221.

個別放音信号生成部２２１〜２２３は、放音仕様制御部２２０から与えられたスピーカＳＰ１〜ＳＰ４毎の遅延調整量Ｄおよびゲイン調整量Ｇに基づいて、対応するチャンネルＣＨ１〜ＣＨ３から入力された放音用音声信号Ｓ１〜Ｓ３を遅延処理およびゲインコントロールして、信号合成部２２４〜２２７に出力する。より具体的には、個別放音信号生成部２２１は、遅延調整量Ｄ１＊（＊：仮想音源９０１〜９０８に対応する１〜８）、ゲイン調整量Ｇ１＊（＊：仮想音源９０１〜９０８に対応する１〜８）で、放音用音声信号Ｓ１の遅延処理およびゲインコントロールを行い、信号合成部２２４に出力する。同時に、個別放音信号生成部２２１は、遅延調整量Ｄ２＊（＊：仮想音源９０１〜９０８に対応する１〜８）、ゲイン調整量Ｇ２＊（＊：仮想音源９０１〜９０８に対応する１〜８）で、放音用音声信号Ｓ１の遅延処理およびゲインコントロールを行い、信号合成部２２５に出力する。また同時に、個別放音信号生成部２２１は、遅延調整量Ｄ３＊（＊：仮想音源９０１〜９０８に対応する１〜８）、ゲイン調整量Ｇ３＊（＊：仮想音源９０１〜９０８に対応する１〜８）で、放音用音声信号Ｓ１の遅延処理およびゲインコントロールを行い、信号合成部２２６に出力する。さらに同時に、個別放音信号生成部２２１は、遅延調整量Ｄ４＊（＊：仮想音源９０１〜９０８に対応する１〜８）、ゲイン調整量Ｇ４＊（＊：仮想音源９０１〜９０８に対応する１〜８）で、放音用音声信号Ｓ１の遅延処理およびゲインコントロールを行い、信号合成部２２７に出力する。個別放音信号生成部２２２〜２２３も個別放音信号生成部２２１と同様な処理で、放音用音声信号Ｓ２，Ｓ３の遅延処理およびゲインコントロールを行い、信号合成部２２４〜２２７に出力する。 The individual sound emission signal generation units 221 to 223 are input from the corresponding channels CH1 to CH3 based on the delay adjustment amount D and the gain adjustment amount G for each of the speakers SP1 to SP4 given from the sound emission specification control unit 220. The sound emission sound signals S1 to S3 are subjected to delay processing and gain control, and output to the signal synthesis units 224 to 227. More specifically, the individual sound emission signal generation unit 221 includes the delay adjustment amount D1 * (*: 1 to 8 corresponding to the virtual sound sources 901 to 908) and the gain adjustment amount G1 * (*: the virtual sound sources 901 to 908). Corresponding 1 to 8), delay processing and gain control of the sound output sound signal S1 are performed and output to the signal synthesis unit 224. At the same time, the individual sound emission signal generation unit 221 includes the delay adjustment amount D2 * (*: 1 to 8 corresponding to the virtual sound sources 901 to 908) and the gain adjustment amount G2 * (*: 1 to 1 corresponding to the virtual sound sources 901 to 908). 8), delay processing and gain control of the sound output sound signal S1 are performed and output to the signal synthesis unit 225. At the same time, the individual sound emission signal generation unit 221 has a delay adjustment amount D3 * (*: 1 to 8 corresponding to the virtual sound sources 901 to 908) and a gain adjustment amount G3 * (*: 1 corresponding to the virtual sound sources 901 to 908). ˜8), delay processing and gain control of the sound output sound signal S <b> 1 are performed and output to the signal synthesis unit 226. At the same time, the individual sound emission signal generation unit 221 has a delay adjustment amount D4 * (*: 1 to 8 corresponding to the virtual sound sources 901 to 908) and a gain adjustment amount G4 * (*: 1 corresponding to the virtual sound sources 901 to 908). ~ 8), delay processing and gain control of the sound output sound signal S1 are performed and output to the signal synthesis unit 227. The individual sound emission signal generation units 222 to 223 perform processing similar to that of the individual sound emission signal generation unit 221 and perform delay processing and gain control of the sound output sound signals S2 and S3, and output them to the signal synthesis units 224 to 227.

信号合成部２２４は、各個別放音信号生成部２２１〜２２３から出力されたＳＰ１用の信号を合成（加算）処理して、ＳＰ１用音声信号ＳＰＤ１として出力する。同様に、信号合成部２２５は、各個別放音信号生成部２２１〜２２３から出力されたＳＰ２用の信号を合成（加算）処理して、ＳＰ２用音声信号ＳＰＤ２として出力する。また同様に、信号合成部２２６は、各個別放音信号生成部２２１〜２２３から出力されたＳＰ３用の信号を合成（加算）処理して、ＳＰ３用音声信号ＳＰＤ３として出力する。さらに同様に、信号合成部２２７は、各個別放音信号生成部２２１〜２２３から出力されたＳＰ４用の信号を合成（加算）処理して、ＳＰ４用音声信号ＳＰＤ４として出力する。 The signal synthesizer 224 synthesizes (adds) the SP1 signals output from the individual sound emission signal generators 221 to 223, and outputs the SP1 audio signal SPD1. Similarly, the signal synthesis unit 225 synthesizes (adds) the SP2 signals output from the individual sound emission signal generation units 221 to 223, and outputs the SP2 audio signal SPD2. Similarly, the signal synthesis unit 226 synthesizes (adds) the SP3 signals output from the individual sound emission signal generation units 221 to 223, and outputs the SP3 audio signal SPD3. Similarly, the signal synthesis unit 227 synthesizes (adds) the SP4 signals output from the individual sound emission signal generation units 221 to 223, and outputs the SP4 audio signal SPD4.

Ｄ／Ａコンバータ２３は各ＳＰ用音声信号ＳＰＤ１〜ＳＰＤ４をディジタル−アナログ変換し、放音ＡＭＰ（アンプ）２４は、各ＳＰ用音声信号ＳＰＤ１〜ＳＰＤ４を一定の増幅率で増幅して、それぞれスピーカＳＰ１〜ＳＰ４に与える。 The D / A converter 23 performs digital-analog conversion on each of the SP audio signals SPD1 to SPD4, and the sound emission AMP (amplifier) 24 amplifies each of the SP audio signals SPD1 to SPD4 with a constant amplification factor, and each speaker Give to SP1-SP4.

スピーカＳＰ１〜ＳＰ４は、与えられたＳＰ用音声信号ＳＰＤ１〜ＳＰＤ４を音声変換して放音する。 The speakers SP1 to SP4 convert the given SP audio signals SPD1 to SPD4 into sound and emit the sound.

このような放音処理を行うことで、各スピーカＳＰ１〜ＳＰ４から放音される音声が所定の遅延関係および振幅関係になるため、あたかも設定した仮想音源から放音されたような感覚を会議者に与えることができる。 By performing such sound emission processing, the sound emitted from the speakers SP1 to SP4 has a predetermined delay relationship and amplitude relationship, so that the conference person feels as if the sound is emitted from the set virtual sound source. Can be given to.

（２）収音
前述のマイクＭＣ１〜ＭＣ１６は、会議者の発生音等の外部からの音声を収音して収音信号ＭＳ１〜ＭＳ１６を生成する。各収音ＡＭＰ（アンプ）２５は、対応する収音信号ＭＳ１〜ＭＳ１６を所定増幅率で増幅し、Ａ／Ｄコンバータ２６は、増幅された収音信号ＭＳ１〜ＭＳ１６をアナログ−ディジタル変換して収音制御部２７に出力する。 (2) Sound Collection The microphones MC1 to MC16 described above collect sound from outside such as the sound generated by the conference and generate sound collection signals MS1 to MS16. Each sound collecting AMP (amplifier) 25 amplifies the corresponding sound collecting signals MS1 to MS16 with a predetermined amplification factor, and the A / D converter 26 performs analog-digital conversion on the amplified sound collecting signals MS1 to MS16 and collects them. Output to the sound control unit 27.

図６は収音制御部２７の主要構成を示すブロック図である。図６に示すように、収音制御部２７は、方位別収音ビーム生成部２７１、出力データ決定部２７２を備える。
方位別収音ビーム生成部２７１は、収音信号ＭＳ１〜ＭＳ１６（ディジタルデータ）に対して適当な組み合わせを設定し、組み合わされた収音信号同士の遅延・加算処理等を行うことで、前述の仮想音源９０１〜９０８に対応するそれぞれに異なる八方位を収音方向とする収音ビーム信号ＭＢ１〜ＭＢ８を生成する。 FIG. 6 is a block diagram showing the main configuration of the sound collection control unit 27. As shown in FIG. 6, the sound collection control unit 27 includes a direction-specific sound collection beam generation unit 271 and an output data determination unit 272.
The direction-specific sound collection beam generation unit 271 sets an appropriate combination for the sound collection signals MS1 to MS16 (digital data), and performs the delay / addition processing of the combined sound collection signals, thereby performing the above-described processing. Sound collection beam signals MB1 to MB8 are generated with the eight different directions corresponding to the virtual sound sources 901 to 908 as the sound collection directions.

方位別収音ビーム生成部２７１は加算器２７１１〜２７１８を備える。
加算器２７１１は、収音信号ＭＳ１，ＭＳ２，ＭＳ１６を加算して、θ＝１８０°方向（仮想音源９０５に対応）に強い指向性を有する収音ビーム信号ＭＢ１を生成する。加算器２７１２は、収音信号ＭＳ２，ＭＳ３，ＭＳ４を加算して、θ＝２２５°方向（仮想音源９０６に対応）に強い指向性を有する収音ビーム信号ＭＢ２を生成する。加算器２７１３は、収音信号ＭＳ４，ＭＳ５，ＭＳ６を加算して、θ＝２７０°方向（仮想音源９０７に対応）に強い指向性を有する収音ビーム信号ＭＢ３を生成する。加算器２７１４は、収音信号ＭＳ６，ＭＳ７，ＭＳ８を加算して、θ＝３１５°方向（仮想音源９０８に対応）に強い指向性を有する収音ビーム信号ＭＢ４を生成する。加算器２７１５は、収音信号ＭＳ８，ＭＳ９，ＭＳ１０を加算して、θ＝０°方向（仮想音源９０１に対応）に強い指向性を有する収音ビーム信号ＭＢ５を生成する。加算器２７１６は、収音信号ＭＳ１０，ＭＳ１１，ＭＳ１２を加算して、θ＝４５°方向（仮想音源９０２に対応）に強い指向性を有する収音ビーム信号ＭＢ６を生成する。加算器２７１７は、収音信号ＭＳ１２，ＭＳ１３，ＭＳ１４を加算して、θ＝９０°方向（仮想音源９０３に対応）に強い指向性を有する収音ビーム信号ＭＢ７を生成する。加算器２７１８は、収音信号ＭＳ１４，ＭＳ１５，ＭＳ１６を加算して、θ＝１３５°方向（仮想音源９０４に対応）に強い指向性を有する収音ビーム信号ＭＢ８を生成する。 The direction-specific sound collecting beam generation unit 271 includes adders 2711 to 2718.
The adder 2711 adds the collected sound signals MS1, MS2, and MS16 to generate a collected sound beam signal MB1 having strong directivity in the θ = 180 ° direction (corresponding to the virtual sound source 905). The adder 2712 adds the collected sound signals MS2, MS3, and MS4 to generate a collected sound beam signal MB2 having strong directivity in the θ = 225 ° direction (corresponding to the virtual sound source 906). The adder 2713 adds the collected sound signals MS4, MS5, and MS6 to generate a collected sound beam signal MB3 having strong directivity in the θ = 270 ° direction (corresponding to the virtual sound source 907). The adder 2714 adds the collected sound signals MS6, MS7, and MS8 to generate a collected sound beam signal MB4 having strong directivity in the θ = 315 ° direction (corresponding to the virtual sound source 908). The adder 2715 adds the collected sound signals MS8, MS9, and MS10 to generate a collected sound beam signal MB5 having strong directivity in the θ = 0 ° direction (corresponding to the virtual sound source 901). The adder 2716 adds the collected sound signals MS10, MS11, and MS12 to generate a collected sound beam signal MB6 having strong directivity in the θ = 45 ° direction (corresponding to the virtual sound source 902). The adder 2717 adds the collected sound signals MS12, MS13, and MS14 to generate a collected sound beam signal MB7 having strong directivity in the θ = 90 ° direction (corresponding to the virtual sound source 903). The adder 2718 adds the collected sound signals MS14, MS15, and MS16 to generate a collected sound beam signal MB8 having strong directivity in the θ = 135 ° direction (corresponding to the virtual sound source 904).

このように、実施形態の例では、収音ビーム信号ＭＢ１を仮想音源９０５に対応させ、収音ビーム信号ＭＢ２を仮想音源９０６に対応させ、収音ビーム信号ＭＢ３を仮想音源９０７に対応させ、収音ビーム信号ＭＢ４を仮想音源９０８に対応させる。さらに、収音ビーム信号ＭＢ５を仮想音源９０１に対応させ、収音ビーム信号ＭＢ６を仮想音源９０２に対応させ、収音ビーム信号ＭＢ７を仮想音源９０３に対応させ、収音ビーム信号ＭＢ８を仮想音源９０４に対応させる。なお、生成する収音ビーム信号の個数は、これに限らず、仕様に応じて適宜設定することができる。 Thus, in the example of the embodiment, the sound collection beam signal MB1 is associated with the virtual sound source 905, the sound collection beam signal MB2 is associated with the virtual sound source 906, and the sound collection beam signal MB3 is associated with the virtual sound source 907. The sound beam signal MB4 is made to correspond to the virtual sound source 908. Furthermore, the sound collection beam signal MB5 is associated with the virtual sound source 901, the sound collection beam signal MB6 is associated with the virtual sound source 902, the sound collection beam signal MB7 is associated with the virtual sound source 903, and the sound collection beam signal MB8 is associated with the virtual sound source 904. To correspond to. Note that the number of sound collecting beam signals to be generated is not limited to this, and can be set as appropriate according to specifications.

方位別収音ビーム生成部２７１は、生成した収音ビーム信号ＭＢ１〜ＭＢ８を出力データ決定部２７２に出力する。 The direction-specific sound collection beam generation unit 271 outputs the generated sound collection beam signals MB1 to MB8 to the output data determination unit 272.

出力データ決定部２７２は、最大信号検出部２７２１とＳｅｌｅｃｔ／Ｍｉｘ回路２７２２とを備える。 The output data determination unit 272 includes a maximum signal detection unit 2721 and a Select / Mix circuit 2722.

最大信号検出部２７２１は、収音ビーム信号ＭＢ１〜ＭＢ８の信号レベルを比較して、最大の信号レベルを有する収音ビーム信号を選択する。最大信号検出部２７２１は、選択した収音ビーム信号を示す選択ビーム情報ＭＢＭをＳｅｌｅｃｔ／Ｍｉｘ回路２７２２に出力する。また、最大信号検出部２７２１は、選択した収音ビーム信号に対応する方位情報を話者方位情報Ｐｍとして通信制御部２１に出力する。 The maximum signal detector 2721 compares the signal levels of the sound collection beam signals MB1 to MB8 and selects the sound collection beam signal having the maximum signal level. The maximum signal detection unit 2721 outputs selected beam information MBM indicating the selected sound collection beam signal to the Select / Mix circuit 2722. Further, the maximum signal detection unit 2721 outputs the direction information corresponding to the selected sound collection beam signal to the communication control unit 21 as the speaker direction information Pm.

Ｓｅｌｅｃｔ／Ｍｉｘ回路２７２２は、最大信号検出部２７２１からの選択ビーム情報ＭＢＭに基づいて、当該情報で指定された収音ビーム信号ＭＢを選択して出力用収音ビーム信号ＭＢＳとして出力する。なお、Ｓｅｌｅｃｔ／Ｍｉｘ回路２７２２は、選択ビーム情報ＭＢＭで指定された収音ビーム信号ＭＢのみを選択して出力するのではなく、選択ビーム情報ＭＢＭで指定された収音ビーム信号ＭＢと隣り合う収音ビーム信号ＭＢとをミキシングして、出力用収音ビーム信号ＭＢＳとして出力するようにしてもよい。 Based on the selected beam information MBM from the maximum signal detector 2721, the Select / Mix circuit 2722 selects the sound collecting beam signal MB designated by the information and outputs it as the output sound collecting beam signal MBS. Note that the Select / Mix circuit 2722 does not select and output only the collected sound beam signal MB specified by the selected beam information MBM, but collects adjacent to the collected sound beam signal MB specified by the selected beam information MBM. The sound beam signal MB may be mixed and output as the output sound collection beam signal MBS.

エコーキャンセル部２８は、入力される出力用収音ビーム信号ＭＢＳに対して、各放音用音声信号Ｓ１〜Ｓ３に基づく擬似回帰音信号を生成する適応型フィルタと、出力用収音ビーム信号ＭＢＳから擬似回帰音信号を減算するポストプロセッサとからなる。エコーキャンセル回路は、適応型フィルタのフィルタ係数を逐次最適化しながら出力用収音ビーム信号ＭＢＳから擬似回帰音信号を減算することで、出力用収音ビーム信号ＭＢＳに含まれるスピーカＳＰ１〜ＳＰ４からマイクＭＣ１〜ＭＣ１６への回り込み成分を除去する。この回り込み成分が除去された出力用収音ビーム信号ＭＢＳ’は、通信制御部２１に出力される。 The echo canceling unit 28 generates a pseudo-regression sound signal based on the sound output sound signals S1 to S3 with respect to the input output sound collection beam signal MBS, and the output sound collection beam signal MBS. And a post processor for subtracting the pseudo-regression sound signal from the post processor. The echo cancellation circuit subtracts the pseudo-regression sound signal from the output sound collection beam signal MBS while sequentially optimizing the filter coefficient of the adaptive filter, so that the speakers SP1 to SP4 included in the output sound collection beam signal MBS are connected to the microphone. The wraparound component to MC1 to MC16 is removed. The output sound collection beam signal MBS ′ from which the wraparound component has been removed is output to the communication control unit 21.

通信制御部２１は、エコーキャンセル部２８で回帰音除去された出力用収音ビーム信号ＭＢＳ’と、収音制御部２７からの話者方位情報Ｐｍとを関連付けして、音声通信データを生成し、入出力Ｉ／Ｆに出力する。このように生成された音声通信データは、入出力Ｉ／Ｆ、ネットワーク５００を介して相手先音声会議装置に送信される。 The communication control unit 21 associates the output sound collection beam signal MBS ′ from which the return sound has been removed by the echo cancellation unit 28 and the speaker orientation information Pm from the sound collection control unit 27 to generate voice communication data. To the input / output I / F. The voice communication data generated in this way is transmitted to the destination voice conference apparatus via the input / output I / F and the network 500.

このような構成とすることで、収音側の音声会議装置に対する発言者の位置に対応する放音側の音声会議装置の位置で放音が行われるので、収音側の音声会議装置に在席する発言者が、あたかも放音側の音声会議装置に在席して発言しているかのような感覚を、放音側の音声会議装置に在席する各会議者に与えることができる。これにより、臨場感に溢れる遠隔会議を行うことができる。この際、仮想音源の位置に関係なく、全てのスピーカＳＰ１〜ＳＰ４からそれぞれに遅延・振幅関係が制御された音声を放音することで、単に仮想音源位置に近いスピーカから放音するよりも、よりリアルな音源定位を実現することができる。例えば、仮想音源位置に近いスピーカのみで放音した場合、放音方向が仮想音源方向であるため、音声会議装置１に対して仮想音源と対称の位置にいる会議者は、こもった音でしか聞き取ることができない。しかし、本実施形態の構成のように全てのスピーカで放音することで、正面ではなくとも会議者方向に放音するスピーカが少なくとも１つは存在するので、鮮明な音を聞き取ることができる。 With such a configuration, sound is emitted at the position of the sound emitting side audio conference apparatus corresponding to the position of the speaker with respect to the sound collecting side audio conference apparatus. It is possible to give each conference person who is present in the sound-conference device as if the speaker who is present is present in the sound-conference device. Thereby, a remote conference full of a sense of reality can be performed. At this time, regardless of the position of the virtual sound source, by emitting sound whose delay / amplitude relationship is controlled from all the speakers SP1 to SP4, respectively, rather than simply emitting sound from the speaker near the virtual sound source position, A more realistic sound source localization can be realized. For example, when sound is emitted only from a speaker close to the virtual sound source position, the sound emission direction is the virtual sound source direction. I can't hear you. However, by emitting sound from all the speakers as in the configuration of the present embodiment, there is at least one speaker that emits sound in the direction of the conference person, not the front, so that a clear sound can be heard.

次に、具体的な使用例について図を参照して説明する。
図７は、図１に示した状況で、会議者２０１Ａが発言した場合の放収音状態を説明する図である。
図１、図７の場合、会議室１００Ａには、会議者２０１Ａが音声会議装置１Ａのθ＝０°方向に在席し、会議者２０３Ａが音声会議装置１Ａのθ＝１８０°方向に在席している。会議室１００Ｂには、会議者２０２Ｂが音声会議装置１Ｂのθ＝９０°方向に在席し、会議者２０４Ｂが音声会議装置１Ｂのθ＝２７０°方向に在席している。 Next, a specific usage example will be described with reference to the drawings.
FIG. 7 is a diagram for explaining a sound emission and collection state when the conference person 201A speaks in the situation shown in FIG.
1 and 7, in the conference room 100A, a conference person 201A is present in the θ = 0 ° direction of the audio conference apparatus 1A, and a conference person 203A is present in the θ = 180 ° direction of the audio conference apparatus 1A. is doing. In the conference room 100B, the conference person 202B is present in the θ = 90 ° direction of the audio conference apparatus 1B, and the conference person 204B is present in the θ = 270 ° direction of the audio conference apparatus 1B.

会議室１００Ａの会議者２０１Ａが発言すると音声３０１Ａは音声会議装置１Ａで収音される。この際、音声３０１Ａは、主としてマイクＭＣ８，ＭＣ９，ＭＣ１０で収音されるので、これらのマイクＭＣ８，ＭＣ９，ＭＣ１０の収音信号で構成された収音ビーム信号ＭＢ５は、前記所定閾値以上となる。この収音ビーム信号ＭＢ５からなる出力用収音ビーム信号ＭＢＳはエコーキャンセルされて、θ＝０°の話者方位情報Ｐｍとともに音声通信データとして音声会議装置１Ｂに送信される。 When the conference person 201A in the conference room 100A speaks, the voice 301A is picked up by the voice conference apparatus 1A. At this time, since the sound 301A is mainly collected by the microphones MC8, MC9, and MC10, the sound collection beam signal MB5 composed of the sound collection signals of these microphones MC8, MC9, and MC10 is equal to or greater than the predetermined threshold value. . The output sound collecting beam signal MBS comprising the sound collecting beam signal MB5 is echo-cancelled and transmitted to the audio conference apparatus 1B as audio communication data together with the speaker orientation information Pm of θ = 0 °.

会議室１００Ｂの音声会議装置１Ｂは、音声会議装置１Ａからの音声通信データを受信すると、音声データを抽出して、例えば、チャンネルＣＨ１に割り当て、放音用音声信号Ｓ１に変換する。また、音声会議装置１Ｂは、音声通信データから話者方位情報Ｐｍ（＝Ｐｙ）を抽出する。音声会議装置１Ｂは、話者方位情報Ｐｙがθ＝０°であることから、θ＝０°方向の仮想音源９０１を設定し、この仮想音源９０１を実現する遅延調整量Ｄおよびゲイン調整量Ｇを読み出す。この際、各スピーカＳＰ１〜ＳＰ４に対応するＳＰ用音声信号ＳＰＤ１〜ＳＰＤ４は、振幅強度がＳＰ１用音声信号ＳＰ１＞ＳＰ２用音声信号ＳＰ２＝ＳＰ４用音声信号ＳＰ４＞ＳＰ３用音声信号ＳＰ３となるようにゲイン調整量Ｄが設定され、遅延時間がＳＰ１用音声信号ＳＰ１＜ＳＰ２用音声信号ＳＰ２＝ＳＰ４用音声信号ＳＰ４＜ＳＰ３用音声信号ＳＰ３となるように設定される。音声会議装置１Ｂは、このように調整されたＳＰ用音声信号ＳＰＤ１〜ＳＰＤ４を対応するスピーカＳＰ１〜ＳＰ４から放音する。このように放音することで、ＳＰ１用音声信号ＳＰＤ１に対応する放音音声４０１Ａは、ＳＰ２用音声信号ＳＰＤ２に対応する放音音声４０２Ａ、ＳＰ３用音声信号ＳＰＤ３に対応する放音音声４０３Ａ、ＳＰ４用音声信号ＳＰＤ４に対応する放音音声４０４Ａよりも大きな音となる。また、放音音声４０１Ａ、放音音声４０２Ａ、放音音声４０３Ａ、放音音声４０４Ａの順で放音される。これにより、会議者２０２Ｂ、２０４Ｂは、あたかも仮想音源９０１から放音されたように、聞き取ることができる。この結果、音声会議装置１０１Ｂ側に在席する会議者２０２Ｂ，２０４Ｂは、θ＝０°の方向に音声会議装置１０１Ａ側に在席する会議者２０１Ａが存在し、この会議者２０１Ａがその場で発言しているように感じることができる。 When the audio conference device 1B in the conference room 100B receives the audio communication data from the audio conference device 1A, the audio conference device 1B extracts the audio data, assigns it to the channel CH1, for example, and converts it into a sound output audio signal S1. Also, the audio conference apparatus 1B extracts the speaker orientation information Pm (= Py) from the audio communication data. Since the speaker orientation information Py is θ = 0 °, the audio conference device 1B sets a virtual sound source 901 in the direction of θ = 0 °, and the delay adjustment amount D and gain adjustment amount G that realize this virtual sound source 901. Is read. At this time, the SP audio signals SPD1 to SPD4 corresponding to the speakers SP1 to SP4 have an amplitude intensity such that SP1 audio signal SP1> SP2 audio signal SP2 = SP4 audio signal SP4> SP3 audio signal SP3. The gain adjustment amount D is set, and the delay time is set so that SP1 audio signal SP1 <SP2 audio signal SP2 = SP4 audio signal SP4 <SP3 audio signal SP3. The audio conference apparatus 1B emits the SP audio signals SPD1 to SPD4 thus adjusted from the corresponding speakers SP1 to SP4. By emitting the sound in this way, the emitted sound 401A corresponding to the SP1 audio signal SPD1 becomes the emitted sound 402A corresponding to the SP2 audio signal SPD2, and the emitted sounds 403A and SP4 corresponding to the SP3 audio signal SPD3. The sound is louder than the emitted sound 404A corresponding to the audio signal SPD4. In addition, the sound is emitted in the order of the emitted sound 401A, the emitted sound 402A, the emitted sound 403A, and the emitted sound 404A. Thereby, the conference participants 202B and 204B can hear as if the sound was emitted from the virtual sound source 901. As a result, the conference persons 202B and 204B who are present on the voice conference apparatus 101B side have the conference person 201A present on the voice conference apparatus 101A side in the direction of θ = 0 °, and this conference person 201A is present on the spot. You can feel like you are speaking.

なお、前述の説明では、一人が発言している時を例に示したが、二人以上が同時に発言している場合でも適用することができる。この場合、収音側の音声会議装置は、それぞれの発言者の音声信号を収音して個別に話者方位情報を与え、放音側の音声会議装置は、取得した話者方位情報に基づいて複数の仮想音源を設定するようにすればよい。 In the above description, the case where one person is speaking has been described as an example, but the present invention can be applied even when two or more persons are speaking at the same time. In this case, the voice conference device on the sound collection side collects the voice signal of each speaker and individually gives the speaker orientation information, and the voice conference device on the sound output side is based on the acquired speaker orientation information. Multiple virtual sound sources may be set.

また、前述の説明では、二台の音声会議装置で通信を行う場合を説明したが、三台や四台等の複数台の音声会議装置で通信を行う場合にも前述の仮想音源の設定を適用することができる。この場合、通信相手となる音声会議装置毎に個別にチャンネルを割り当て、それぞれのチャンネルの放音用音声信号に対して、仮想音源を設定し、遅延処理およびゲインコントロールを行えばよい。より具体的には、チャンネルＣＨ１に割り当てた放音音声信号Ｓ１は、これに対応する第１の音声会議装置により与えられた話者方位情報に基づいて遅延処理およびゲインコントロールされて、各スピーカＳＰ１〜ＳＰ４への音声信号が生成される。同様に、チャンネルＣＨ２に割り当てた放音音声信号Ｓ２は、これに対応する第２の音声会議装置により与えられた話者方位情報に基づいて遅延処理およびゲインコントロールされて、各スピーカＳＰ１〜ＳＰ４への音声信号が生成される。さらに、チャンネルＣＨ３に割り当てた放音音声信号Ｓ３は、これに対応する第３の音声会議装置により与えられた話者方位情報に基づいて遅延処理およびゲインコントロールされて、各スピーカＳＰ１〜ＳＰ４への音声信号が生成される。このように、各スピーカＳＰ１〜ＳＰ４に対して生成された音声信号を合成することで、ＳＰ用音声信号ＳＰＤ１〜ＳＰＤ４を生成し、各スピーカＳＰ１〜ＳＰ４から放音する。これにより、放音を行う音声会議装置に在席する会議者は、別の第１〜第３の音声会議装置に在席する会議者があたかも居るかのように発言を聞き取ることができる。 In the above description, the case where communication is performed by two audio conference apparatuses has been described. However, the above-described setting of the virtual sound source is also performed when communication is performed by a plurality of audio conference apparatuses such as three or four. Can be applied. In this case, it is only necessary to individually assign a channel to each audio conference device as a communication partner, set a virtual sound source for the sound output sound signal of each channel, and perform delay processing and gain control. More specifically, the sound output voice signal S1 assigned to the channel CH1 is subjected to delay processing and gain control based on the speaker orientation information given by the corresponding first voice conference device, and each speaker SP1. A sound signal to .about.SP4 is generated. Similarly, the emitted sound signal S2 assigned to the channel CH2 is subjected to delay processing and gain control based on the speaker orientation information given by the corresponding second audio conference apparatus, and is sent to the speakers SP1 to SP4. Audio signals are generated. Furthermore, the sound emission audio signal S3 assigned to the channel CH3 is subjected to delay processing and gain control based on the speaker orientation information given by the third audio conference device corresponding thereto, and is sent to the speakers SP1 to SP4. An audio signal is generated. Thus, by synthesizing the audio signals generated for the speakers SP1 to SP4, the SP audio signals SPD1 to SPD4 are generated and emitted from the speakers SP1 to SP4. Thereby, the conference person who is present in the voice conference apparatus that emits the sound can hear the speech as if there is a conference person who is present in another first to third voice conference apparatuses.

また、前述の説明では、筐体からの距離は同じで、方位のみから仮想音源を設定する場合を示したが、方位と距離とを用いて仮想音源を設定するようにしてもよい。この場合、収音側の音声会議装置は、各マイクで収音される音声信号の遅延関係から各マイクと発言位置の距離を算出し、少なくとも三つの距離を用いることで、発言位置を特定することができる。そして、収音側の音声会議装置は、方位情報とともに距離情報からなる話者方位情報を生成し、放音側の音声会議装置に送信する。放音側の音声会議装置は、受信した話者方位情報に基づいて、方位と距離とから得られる仮想音源を設定し、当該仮想音源を実現する遅延処理およびゲインコントロールを行う。これにより、さらにリアルに発言位置（発言した会議者の位置）を再現することができる。 In the above description, the distance from the housing is the same and the virtual sound source is set only from the azimuth. However, the virtual sound source may be set using the azimuth and the distance. In this case, the sound conferencing apparatus on the sound collection side calculates the distance between each microphone and the speech position from the delay relationship of the sound signal collected by each microphone, and specifies the speech position by using at least three distances. be able to. Then, the voice conference device on the sound collection side generates speaker direction information including distance information together with the direction information, and transmits it to the sound conference device on the sound emission side. The voice conference apparatus on the sound emitting side sets a virtual sound source obtained from the direction and distance based on the received speaker direction information, and performs delay processing and gain control for realizing the virtual sound source. As a result, it is possible to reproduce the speaking position (the position of the speaking party).

本発明の実施形態の音声会議システムの構成図である。It is a block diagram of the audio conference system of embodiment of this invention. 本発明の実施形態の音声会議システムに用いる音声会議装置の外形図である。It is an outline drawing of the audio conference apparatus used for the audio conference system of the embodiment of the present invention. 図２に示した音声会議装置の機能ブロック図である。It is a functional block diagram of the audio conference apparatus shown in FIG. 本発明の実施形態の放音制御部２２の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the sound emission control part 22 of embodiment of this invention. 放音仕様制御部２２０で設定する仮想音源の分布を示す図、および、放音仕様テーブル２２８１の内容を示す図である。It is a figure which shows distribution of the virtual sound source set by the sound emission specification control part 220, and a figure which shows the content of the sound emission specification table 2281. 収音制御部２７の主要構成を示すブロック図である。3 is a block diagram illustrating a main configuration of a sound collection control unit 27. FIG. 図１に示した状況で、それぞれの会議者２０１Ａが発言した場合の放収音状態を説明する図である。It is a figure explaining the sound emission / collection state when each conference person 201A speaks in the situation shown in FIG.

Explanation of symbols

１，１Ａ，１Ｂ−音声会議装置、１１−筐体、１２−凹部、２１−通信制御部、２２−放音制御部、２２１〜２２３−個別放音信号生成部、２２４〜２２７−信号合成部、２３−Ｄ／Ａコンバータ、２４−放音アンプ、２５−収音アンプ、２６−Ａ／Ｄコンバータ、２７−収音制御部、２７１−方位別収音ビーム生成部、２７２−出力データ決定部、２８−エコーキャンセル部、２９−操作部、１００Ａ，１００Ｂ−会議室、１０１Ａ、１０１Ｂ−会議テーブル、２０１Ａ、２０３Ａ、２０２Ｂ、２０４Ｂ−会議者、３０１Ａ−音声（収音音声）、４０１Ａ、４０２Ａ、４０３Ａ、４０４Ａ−音声（放音音声）、５００−ネットワーク、９０１〜９０８−仮想音源、ＳＰ１〜ＳＰ４−スピーカ、ＭＣ１〜ＭＣ１６−マイク 1, 1A, 1B-voice conference device, 11-housing, 12-recess, 21-communication control unit, 22-sound emission control unit, 221-223-separate sound emission signal generation unit, 224-227-signal synthesis unit 23-D / A converter, 24-sound emitting amplifier, 25-sound collecting amplifier, 26-A / D converter, 27-sound collecting control unit, 271-directional sound collecting beam generating unit, 272-output data determining unit , 28-Echo cancellation unit, 29-Operation unit, 100A, 100B-Conference room, 101A, 101B-Conference table, 201A, 203A, 202B, 204B-Conference person, 301A-Audio (acquired audio), 401A, 402A, 403A, 404A-voice (sound emitted voice), 500-network, 901-908-virtual sound source, SP1-SP4-speaker, MC1-MC16-microphone

Claims

A plurality of each provided with a disk-shaped housing, a plurality of unidirectional microphones arranged circumferentially in the housing, and a plurality of speakers arranged circumferentially in the housing An audio conference system, and a connection means for connecting at least two of the plurality of audio conference devices,
The audio conferencing device on the sending side
A sound collecting beam signal having a different sound collecting direction is formed from the sound collecting signals of the plurality of unidirectional microphones, and a sound collecting beam signal based on the sound generated by the conference is selected based on the signal intensity
Corresponding to the selected sound collecting beam signal, detecting the sound collecting direction in all the outer directions of the disk-shaped housing to generate speaker direction information,
Transmitting the sound signal for sound emission based on the selected sound collecting beam signal and the corresponding speaker orientation information,
The plurality of unidirectional microphones are installed on the upper surface side of the casing as a sound collection direction with a central direction in plan view of the casing.
The audio conferencing device on the receiving side
Receiving the voice signal for sound emission from the voice conference device on the transmitting side and the corresponding speaker orientation information;
Set a virtual sound source for the same direction as the direction of the speaker orientation information, and control the sound emitted from the plurality of speakers as emitted from the virtual sound source ,
The audio conferencing system , wherein the plurality of speakers are installed on a lower surface side of the housing with a sound emitting direction outward from the housing .

The audio conference system according to claim 1, wherein the audio conference device on the receiving side performs amplitude control and delay control of sound emitted from each speaker based on the set virtual sound source position and the positional relationship between the speakers.

The voice conference device on the transmitting side is
Calculate the distance between the utterance position of the utterance and the side surface closest to the utterance position of the housing together with the sound collection direction, and generate the speaker direction information from the sound collection direction and the distance,
The receiving side audio conferencing apparatus comprises:
The audio conference system according to claim 1, wherein the virtual sound source is set based on the direction and the distance obtained from the received speaker direction information.