JP4838282B2

JP4838282B2 - Hands-free call device and hands-free call method

Info

Publication number: JP4838282B2
Application number: JP2008115651A
Authority: JP
Inventors: 芳夫中台; 潮渋沢
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2008-04-25
Filing date: 2008-04-25
Publication date: 2011-12-14
Anticipated expiration: 2028-04-25
Also published as: JP2009267799A

Description

本発明は、通話者が手に保持しなくても通話相手と通話可能なハンズフリー通話装置及びハンズフリー通話方法の技術に関する。 The present invention relates to a technology of a hands-free call device and a hands-free call method capable of making a call with a call partner even if the caller does not hold it in his hand.

テレビ電話を行う通信装置において、送受話器としてのハンドセットを使用せず、スピーカー及びマイクロホンを組み合わせたスピーカーホンによって通信を行うハンズフリー音声通話が見直されている。旧来は、例えばＮＴＴ（株）が開発したテレビ電話機「ＰＩＣＳＥＮＤ−Ｒ」（図３参照）のように、テレビ画面を見ながら備え付けのハンドセットで通話するという通話形態が主流であった。しかしながら現在では、同ＮＴＴグループが開発した「フレッツホンＶＰ１０００」（図４参照）に見られるように、スピーカー及びマイクロホンがテレビ画面と一体的に形成され、ハンドセットを手に保持しなくてもハンズフリーで通話可能なテレビ電話が出現している。ハンドセットが不要になることで、通信装置自体の外観が従来よりも整えられ、ハンドセットの保持から通話者を解放し、自由な通話環境を実現している。 2. Description of the Related Art Hands-free voice communication in which communication is performed using a speakerphone that combines a speaker and a microphone without using a handset as a handset in a communication device that performs a videophone call has been reviewed. In the past, for example, a video telephone system called “PICSEND-R” (see FIG. 3) developed by NTT Corporation has been mainly used to make a call using a built-in handset while watching a TV screen. However, as seen in the “Flet's Phone VP1000” developed by the NTT Group (see FIG. 4), the speaker and microphone are formed integrally with the TV screen, so hands-free without holding the handset in hand. Videophones that can be used to make calls are emerging. By eliminating the need for the handset, the appearance of the communication device itself is better than before, releasing the caller from holding the handset and realizing a free call environment.

このようなハンズフリー通話が可能な通話装置は、スピーカーから通話相手の音声を再生すると共にマイクロホンが通話者の肉声を拾う機能を具備するものであるが、このスピーカーから放出された通話相手の音声がマイクロホンを通じて同通話相手に送り返されるため、遅延した自己の声を同通話相手に与えてしまう場合がある。また、通話相手及び通話者が共にハンズフリー通話を利用している場合には音響結合という現象が生じ、極端な場合にはハウリング（発振騒音）が発生することになる。このようなフィードバック現象を抑止するために、一般的には、マイクとスピーカーとの間にエコーキャンセラＥＣ又は音声スイッチＶＯＸと称される処理回路が組み込まれ、スピーカーから放出される音声が通話相手に戻ることを防止し、音響結合を遮断するようになっている。 Such a call device capable of hands-free calling has a function of reproducing the voice of the other party from the speaker and the microphone picking up the voice of the other party, but the voice of the other party released from this speaker. Is sent back to the other party through the microphone, and there is a case where a delayed own voice is given to the other party. In addition, when both the call partner and the caller use hands-free calling, a phenomenon called acoustic coupling occurs, and in an extreme case, howling (oscillation noise) occurs. In order to suppress such a feedback phenomenon, generally, a processing circuit called an echo canceler EC or a voice switch VOX is incorporated between a microphone and a speaker, and the voice emitted from the speaker is transmitted to the other party. It is designed to prevent return and to block acoustic coupling.

ところで、ハンズフリー通話に関しては、上述した利便性もある一方で普及を阻害する要因がある。具体的には、通話音声の漏洩に起因する第三者による盗聴やプライバシー流出の可能性である。すなわち、通話者の肉声とスピーカーから放出される通話相手の音声との双方の対話を一度に受聴できることから、第三者が同じ室内に居る場合や室内の音を収録されてしまった場合には、通話内容のプライバシーを守ることが非常に困難となる。 By the way, regarding the hands-free call, there is a factor that hinders the spread while having the above-described convenience. Specifically, there is a possibility of eavesdropping by a third party or leakage of privacy due to leakage of call voice. In other words, since the conversation between the caller's voice and the other party's voice emitted from the speaker can be heard at the same time, when a third party is in the same room or the sound in the room is recorded It is very difficult to protect the privacy of the content of calls.

そこで、このような問題を極端に防止する一方法として、通話音声が他人に聞き取られないようにマスキングする妨害音を出すことが考えられる。欧州のオフィスではオフィス用電話機は個人に属するという考え方もあり、ハンズフリー通話利用者の中には、ラジオやステレオなどのスピーカーから音楽や周囲騒音を流しながら通話する人もいる。しかし、このような方法は盗聴対策として一定の効果が期待できるものの、ラジオやステレオはハンズフリー通話装置と独立に動作するものであり、音源としてみた場合には通話者の肉声と同等である。すなわち、前述のエコーキャンセラＥＣや音声スイッチＶＯＸによる制約を受けないため、通話相手に対して音楽や周囲騒音がそのまま伝わり、明瞭な音声通話が困難となる。また、それら音楽や周囲騒音の音圧レベルが大きければ、通話者の周囲にいる善意の第三者に対して迷惑な妨害音を与えることになる。 Therefore, as one method for extremely preventing such a problem, it is conceivable to generate an interfering sound that masks the call voice so that it cannot be heard by others. In European offices, there is the idea that office telephones belong to individuals, and some hands-free users use telephones while listening to music and ambient noise from speakers such as radio and stereo. However, although such a method can be expected to have a certain effect as a countermeasure against eavesdropping, the radio and stereo operate independently of the hands-free call device, and are equivalent to the caller's real voice when viewed as a sound source. That is, since there is no restriction by the above-described echo canceler EC or voice switch VOX, music and ambient noise are transmitted as they are to the other party, making clear voice call difficult. Further, if the sound pressure level of the music or ambient noise is high, annoying disturbing sound is given to a bona fide third party around the caller.

このように、音楽や周囲騒音が通話相手に伝わることを防止すると共に妨害音としての効果を得るためには、妨害音自体をハンズフリー通話装置で生成し、打ち消してしまう方法が考えられる。このような方法を携帯電話の筐体やハンドセットで実現しようとする技術が特許文献１に開示されている。 As described above, in order to prevent the music and ambient noise from being transmitted to the calling party and to obtain the effect as the disturbing sound, a method of generating and canceling the disturbing sound itself with the hands-free call device is conceivable. A technique for realizing such a method with a casing or handset of a mobile phone is disclosed in Patent Document 1.

特許文献１の図１には、ハンドセットの送話マイクロホンの近傍に妨害音源を置き、通話者の肉声と妨害音とをマイクロホンで収音した後で、エコーキャンセラと同様の機能を持つ適応的ノイズキャンセラＡＮＣによって妨害音を相殺し、通話者の肉声のみを出力する技術が記載されている。しかしながら、ハンドセットの筐体を媒介として通話者自身の耳に回りこむ妨害音について考慮されていないため、通話相手との通話が非常に困難になる。 FIG. 1 of Patent Document 1 shows an adaptive noise canceller having a function similar to an echo canceller after placing a disturbing sound source in the vicinity of a transmission microphone of a handset and picking up a caller's real voice and disturbing sound with a microphone. A technique is described in which the interference sound is canceled by ANC and only the voice of the caller is output. However, since the disturbing sound that wraps around the caller's own ear through the case of the handset is not taken into consideration, it becomes very difficult to talk with the other party.

また、同特許文献の図５には、上記と同様の技術をハンズフリー通話装置で実現する技術が記載されている。しかしながら、受話用のスピーカーと妨害音生成用のスピーカーとが夫々分離して配置されているため、スピーカーの配置によっては装置の筐体が大きくなる。 FIG. 5 of the patent document describes a technique for realizing the same technique as described above with a hands-free call device. However, since the receiving speaker and the interfering sound generating speaker are separately arranged, the housing of the apparatus becomes large depending on the arrangement of the speakers.

更に、同特許文献においては、妨害音の音圧レベルについて配慮されていないため、周囲に過大な騒音を撒き散らす可能性が高くなり、現実的な使用を配慮したものとは言い難い。 Furthermore, in this patent document, since the sound pressure level of the disturbing sound is not considered, there is a high possibility that excessive noise will be scattered around the surroundings, and it is difficult to say that the practical use is considered.

更にまた、同特許文献の図５を実現する具体的な回路構成は記載されていないが、同特許文献の図４の回路図から推定すれば、ノイズ発生器から発生したマスキング音をマイクで収音した時に、これを打ち消すための適応フィルタＡＮＣと、スピーカー及びマイクを使ったハンズフリー通話を実現するためのエコーキャンセラＥＣ又は音声スイッチＶＯＸとを夫々独立に具備する回路構成であると予想される。適応フィルタＡＮＣとエコーキャンセラＥＣとは略同一の回路構成であるため、冗長な回路構成となってしまう。 Furthermore, although a specific circuit configuration for realizing FIG. 5 of the patent document is not described, if estimated from the circuit diagram of FIG. 4 of the patent document, the masking sound generated from the noise generator is collected by a microphone. It is expected that the circuit configuration includes an adaptive filter ANC for canceling the sound, and an echo canceller EC or a voice switch VOX for realizing a hands-free call using a speaker and a microphone. . Since the adaptive filter ANC and the echo canceller EC have substantially the same circuit configuration, the circuit configuration becomes redundant.

更にその上、同特許文献における妨害音を出す目的は異なる。通話者自身の音声はこれまでのハンドセットの使用下でも十分に漏れていたものである。これを前提に考えると、ハンズフリー通話において本来盗聴を抑止したい第１の目的は通話相手の音声である。通話者自身の肉声の盗聴防止は第２の目的である。同特許文献に開示された技術は、この第２の目的を達成しようとしたものであるが、第１の目的を達成するために特化したものではない。
特開平５−２２３９１号公報 Furthermore, the purpose of producing the disturbing sound in the patent document is different. The caller's own voice was sufficiently leaked even when using a handset so far. Considering this as a premise, the first purpose of originally suppressing wiretapping in hands-free calling is the voice of the other party. The second purpose is to prevent eavesdropping of the caller's own voice. The technique disclosed in this patent document is intended to achieve the second object, but is not specialized for achieving the first object.
Japanese Patent Laid-Open No. 5-22391

以上を整理すると、従来のハンズフリー通話装置において、通話相手の音声を第三者に盗聴されることを抑止する妨害音の生成に関して以下に示す問題があった。 To summarize the above, in the conventional hands-free call device, there has been the following problem regarding generation of disturbing sound that suppresses the voice of the other party being wiretapped by a third party.

まず、第三者による盗聴を確実に抑止するために妨害音の音圧レベルを大きくした場合にあっては、通話者自身も同妨害音を受聴することになるため、騒がしい中での通話を余儀なくされて通話が非常に困難になるという第１の問題があった。また、周囲に妨害音相当の雑音が存在する場合にあっては、その雑音が妨害音として作用するため、新たに妨害音を発生する効果が薄れるという第２の問題があった。更に、妨害音を周囲に人々に受聴させることになるため、一定の盗聴抑止効果以上の騒音を生成する場合にあっては、周囲の人々に不快な影響を与えるという第３の問題があった。更にまた、妨害音を放出する妨害音用スピーカーと通話相手の音声を放出する通話相手用スピーカーとの双方が必要になるため、装置筐体の大型化を招くだけではなく、妨害音を打ち消すための適応フィルタＡＮＣと、ハンズフリーを実現するためのエコーキャンセラＥＣ又は音声スイッチＶＯＸとの多段構成を取ることになり、処理が複雑化するという第４の問題があった。 First, if the sound pressure level of the interfering sound is increased in order to reliably prevent eavesdropping by a third party, the caller himself / herself will also hear the interfering sound, so the call should be made in a noisy environment. There was a first problem that it was forced to make calls very difficult. Further, in the case where noise equivalent to the disturbing sound exists in the surrounding area, the noise acts as the disturbing sound, so that there is a second problem that the effect of newly generating the disturbing sound is reduced. Furthermore, since the disturbing sound is heard by people in the surroundings, there is a third problem that in the case of generating noise exceeding a certain wiretapping suppression effect, the surrounding people are unpleasantly affected. . In addition, both the speaker for disturbing sound that emits the disturbing sound and the speaker for the other party that emits the voice of the other party are required, so that not only the size of the device casing is increased, but also the interference sound is canceled out. The adaptive filter ANC and the echo canceller EC or voice switch VOX for realizing the hands-free configuration have a multi-stage configuration, and there is a fourth problem that the processing becomes complicated.

本発明は、上記に鑑みてなされたものであり、簡易な構成で妨害音を生成すると共に、周囲騒音に合わせて該妨害音のレベルを適切に調整することを第１の課題とし、通話状況に応じて該妨害音のレベルを適切に調整することを第２の課題とするハンズフリー通話装置及びハンズフリー通話方法を提供することを課題とする。 The present invention has been made in view of the above, and it is a first problem to generate a disturbing sound with a simple configuration and appropriately adjust the level of the disturbing sound according to the ambient noise. Accordingly, it is an object of the present invention to provide a hands-free call device and a hands-free call method whose second problem is to appropriately adjust the level of the interference sound according to the situation.

請求項１に記載の本発明は、スピーカー及びマイクを備え、通話者が手に保持しなくても通話相手と通話可能なハンズフリー通話装置において、前記スピーカーから放出される前記通話相手の音声が前記通話者以外の者に聴取されることを妨害する妨害信号を発生する妨害信号発生手段と、前記通話相手の通話装置から送信された音声信号に前記妨害信号を加え、当該通話相手の音声信号に基づく音声と当該妨害信号に基づく妨害音とを前記スピーカーから放出させる妨害信号付加手段と、前記マイクで収音された音に基づく収音信号を入力し、通話音声でない非音声時間区間に対応する当該収音信号の大きさを雑音レベルとして測定する雑音レベル測定手段と、前記収音信号を入力し、通話音声である音声時間区間に対応する当該収音信号の大きさを通話音レベルとして測定する通話音レベル測定手段と、前記通話音レベルに対する前記雑音レベルとの差分を計算する差分計算手段と、前記差分が所定値よりも小さい場合に前記妨害信号の大きさを小さくし、前記差分が前記所定値よりも大きい場合に前記妨害信号の大きさを大きくする妨害信号レベル制御手段と、を有することを要旨とする。 The present invention according to claim 1 is a hands-free call device comprising a speaker and a microphone and capable of making a call with a call partner even if the caller does not hold it in his / her hand. A disturbing signal generating means for generating a disturbing signal that interferes with listening to a person other than the caller; and adding the disturbing signal to a sound signal transmitted from the call partner communication device, Interference signal addition means for emitting sound based on the sound and the interference sound based on the interference signal from the speaker, and a sound collection signal based on the sound collected by the microphone are input, and corresponds to a non-speech time period that is not a call voice Noise level measuring means for measuring the magnitude of the collected sound signal as a noise level; and the collected sound signal corresponding to a voice time interval that is a voice signal by inputting the collected sound signal A call sound level measuring means for measuring the volume as a call sound level; a difference calculating means for calculating a difference between the noise level with respect to the call sound level; and a magnitude of the jamming signal when the difference is smaller than a predetermined value. And a disturbance signal level control means for increasing the magnitude of the disturbance signal when the difference is larger than the predetermined value.

本発明にあっては、スピーカーから放出される通話相手の音声が通話者以外の者に聴取されることを妨害する妨害信号を発生し、通話相手の通話装置から送信された音声信号に該妨害信号を加え、通話相手の音声信号に基づく音声と妨害信号に基づく妨害音とをスピーカーから放出させるため、簡易な構成で妨害音を生成するハンズフリー通話装置を提供することができる。具体的には、通話相手の音声を放出するスピーカーと妨害音を放出するスピーカーとが１つになるため、ハンズフリー通話装置の筐体の大型化を防止し、妨害音生成のために新たなスピーカーの配置を必要とせず、従来のハンズフリー通話装置の筐体でも容易に適用が可能となる。 According to the present invention, a disturbing signal is generated to prevent the other party's voice from being heard from a speaker from being heard by a speaker, and the disturbing signal is transmitted to the other party's speech device. Since a signal is added and the sound based on the speech signal of the other party and the interference sound based on the interference signal are emitted from the speaker, a hands-free communication device that generates the interference sound with a simple configuration can be provided. Specifically, since the speaker that emits the voice of the other party and the speaker that emits the disturbing sound are combined into one, it is possible to prevent an increase in the size of the hands-free talking device and The arrangement of the speaker is not required, and the present invention can be easily applied to the case of a conventional hands-free call device.

また、本発明にあっては、マイクで収音された音に基づく収音信号を入力し、通話音声でない非音声時間区間に対応する該収音信号の大きさを雑音レベルとして測定すると共に、マイクで収音された音に基づく収音信号を同様に入力し、通話音声である音声時間区間に対応する該収音信号の大きさを通話音レベルとして測定し、通話音レベルに対する雑音レベルとの差分が所定値よりも小さい場合に妨害信号の大きさを小さくし、該差分が該所定値よりも大きい場合に妨害信号の大きさを大きくするため、周囲騒音に合わせて妨害音のレベルを適切に調整するハンズフリー通話装置を提供することができる。具体的には、通話者や通話相手が音声を発しない時間区間で周囲騒音を評価し、音声レベルに対して十分な騒音レベルが定在していれば妨害音の音圧レベルを下げ、一方、騒音レベルが低い場合には、妨害音の音圧レベルを上げるため、周囲騒音の状況に合わせて妨害音のレベルを適切に調整することができる。 Further, in the present invention, a sound pickup signal based on a sound picked up by a microphone is input, and the magnitude of the sound pickup signal corresponding to a non-voice time interval that is not a call voice is measured as a noise level. Similarly, a sound pickup signal based on the sound picked up by the microphone is input, and the magnitude of the sound pickup signal corresponding to the voice time interval that is the call voice is measured as a call sound level. In order to reduce the magnitude of the interfering signal when the difference is smaller than the predetermined value, and to increase the magnitude of the interfering signal when the difference is larger than the predetermined value, the level of the interfering sound is adjusted according to the ambient noise. It is possible to provide a hands-free communication device that is appropriately adjusted. Specifically, the ambient noise is evaluated in a time interval in which the caller or the call partner does not make a sound, and if a sufficient noise level is present relative to the sound level, the sound pressure level of the interference sound is lowered. When the noise level is low, the sound pressure level of the interfering sound is increased, so that the level of the interfering sound can be appropriately adjusted according to the surrounding noise conditions.

請求項２に記載の本発明は、前記妨害信号が加えられた前記通話相手の音声信号を用いて、前記スピーカーから放出された後に前記マイクで収音された前記通話相手の音声成分及び前記妨害音成分を前記収音信号から除去する除去手段と、前記通話相手の音声成分及び前記妨害音成分が除去された前記収音信号を入力し、前記通話者の音声である音声時間区間に対応する当該収音信号の大きさの時間変化量を通話者音声変化量として測定する通話者音声変化量測定手段と、前記通話相手の通話装置から送信された音声信号を入力し、当該通話相手の音声である音声時間区間に対応する当該音声信号の大きさの時間変化量を通話相手音声変化量として測定する通話相手音声変化量測定手段と、を更に有し、前記妨害信号レベル制御手段は、前記通話者音声変化量及び／又は前記通話相手音声変化量が所定値よりも大きい場合に、前記妨害信号の大きさを大きくし、前記通話者音声変化量及び／又は前記通話相手音声変化量が当該所定値よりも小さい場合に、前記妨害信号の大きさを小さくすることを要旨とする。 According to a second aspect of the present invention, there is provided the voice component of the call partner and the jamming signal collected by the microphone after being emitted from the speaker using the voice signal of the call partner to which the jamming signal is added. A removal means for removing a sound component from the collected sound signal, and the collected sound signal from which the voice component and the disturbing sound component of the other party are removed are input, and corresponds to a voice time interval that is the voice of the caller A caller voice change amount measuring means for measuring a time change amount of the collected sound signal as a caller voice change amount, and a voice signal transmitted from the call partner's call device are input, and the voice of the call partner is input. Further comprising: a call partner voice change amount measuring unit that measures a time change amount of the size of the voice signal corresponding to the voice time interval as a call partner voice change amount, wherein the disturbing signal level control unit includes: When the speaker voice change amount and / or the other party voice change amount is larger than a predetermined value, the magnitude of the disturbing signal is increased, and the talker voice change amount and / or the other party voice change amount The gist is to reduce the size of the interference signal when the value is smaller than a predetermined value.

本発明にあっては、通話相手の音声成分及び妨害音成分が除去された収音信号を入力し、通話者の音声である音声時間区間に対応する該収音信号の大きさの時間変化量を通話者音声変化量として測定すると共に、通話相手の通話装置から送信された音声信号を入力し、通話相手の音声である音声時間区間に対応する該音声信号の大きさの時間変化量を通話相手音声変化量として測定し、通話者音声変化量及び／又は通話相手音声変化量が所定値よりも大きい場合に、妨害信号の大きさを大きくし、通話者音声変化量及び／又は通話相手音声変化量が該所定値よりも小さい場合に、妨害信号の大きさを小さくするため、通話状況に応じて妨害音のレベルを適切に調整するハンズフリー通話装置を提供することができる。具体的には、通話者同士の音声を情報伝達量の観点で音声スペクトルの変化量を擬似的に評価し、音声スペクトルの変化が低下してきたら双方間で伝わる情報が少なくなって通話が困難になったものと見做して妨害音の音圧レベルを下げるため、通話の状況に合わせて妨害音の大きさを適切に調整することができる。 In the present invention, the sound collection signal from which the voice component and the interference sound component of the other party are removed is input, and the amount of time change in the magnitude of the sound collection signal corresponding to the voice time interval that is the voice of the caller Is measured as the amount of change in the caller's voice, and the voice signal transmitted from the other party's call device is input, and the amount of time change in the volume of the voice signal corresponding to the voice time interval that is the voice of the other party is called When the caller voice change amount and / or the call partner voice change amount is larger than a predetermined value, the size of the disturbing signal is increased and the caller voice change amount and / or the call partner voice is measured. When the amount of change is smaller than the predetermined value, it is possible to provide a hands-free call device that appropriately adjusts the level of the disturbing sound according to the call situation in order to reduce the magnitude of the disturbing signal. More specifically, the amount of change in the voice spectrum is evaluated in a pseudo manner from the viewpoint of the amount of information transmitted between the callers, and if the change in the voice spectrum decreases, the amount of information transmitted between the two parties will be reduced, making the call difficult. Since the sound pressure level of the interfering sound is lowered by assuming that it has become, it is possible to appropriately adjust the size of the interfering sound according to the situation of the call.

また、本発明にあっては、妨害信号が加えられた通話相手の音声信号を用いて、スピーカーから放出された後にマイクで収音された通話相手の音声成分及び妨害音成分を収音信号から除去するため、処理の単純化がされたハンズフリー通話装置を提供することができる。具体的には、スピーカーから放出された通話相手の音声がマイクへ回りこんで該通話相手に送り返されることを防止する機能（前述のエコーキャンセラＥＣに相当する機能）と、スピーカーから放出された妨害音が該マイクへ回り込んで該通話相手に送り返されることを防止する機能（前述の適応フィルタＡＮＣに相当する機能）とを共有化することができる。 Further, in the present invention, using the voice signal of the other party to which the disturbing signal is added, the voice component and the disturbing sound component of the other party collected by the microphone after being emitted from the speaker are extracted from the collected signal. Therefore, a hands-free communication device with simplified processing can be provided. Specifically, a function (a function corresponding to the above-described echo canceler EC) for preventing the voice of the other party from the speaker from being sent back to the other party, and a disturbance emitted from the speaker. It is possible to share a function (a function corresponding to the above-described adaptive filter ANC) that prevents a sound from reaching the microphone and being sent back to the other party.

請求項３に記載の本発明は、スピーカー及びマイクを備え、通話者が手に保持しなくても通話相手と通話可能なハンズフリー通話装置で行うハンズフリー通話方法において、前記スピーカーから放出される前記通話相手の音声が前記通話者以外の者に聴取されることを妨害する妨害信号を発生するステップと、前記通話相手の通話装置から送信された音声信号に前記妨害信号を加え、当該通話相手の音声信号に基づく音声と当該妨害信号に基づく妨害音とを前記スピーカーから放出させるステップと、前記マイクで収音された音に基づく収音信号を入力し、通話音声でない非音声時間区間に対応する当該収音信号の大きさを雑音レベルとして測定するステップと、前記収音信号を入力し、通話音声である音声時間区間に対応する当該収音信号の大きさを通話音レベルとして測定するステップと、前記通話音レベルに対する前記雑音レベルとの差分を計算するステップと、前記差分が所定値よりも小さい場合に前記妨害信号の大きさを小さくし、前記差分が前記所定値よりも大きい場合に前記妨害信号の大きさを大きくするステップと、を有することを要旨とする。 According to a third aspect of the present invention, there is provided a hands-free call method that is provided with a speaker and a microphone, and is performed by a hands-free call device capable of making a call with a call partner without being held by a caller. Generating a jamming signal that prevents the voice of the other party from being heard by a person other than the calling party; and adding the jamming signal to the voice signal transmitted from the other party's calling device; A voice signal based on the voice signal and a jamming sound based on the jamming signal are emitted from the speaker, and a voice collecting signal based on the sound picked up by the microphone is input to correspond to a non-speech time period that is not a call voice Measuring the magnitude of the collected sound signal as a noise level, and inputting the collected sound signal, and the collected sound signal corresponding to a voice time interval that is a call voice Measuring the magnitude as a call sound level; calculating a difference between the noise level with respect to the call sound level; and reducing the magnitude of the jamming signal when the difference is smaller than a predetermined value; And a step of increasing the magnitude of the interference signal when the difference is larger than the predetermined value.

請求項４に記載の本発明は、前記妨害信号が加えられた前記通話相手の音声信号を用いて、前記スピーカーから放出された後に前記マイクで収音された前記通話相手の音声成分及び前記妨害音成分を前記収音信号から除去するステップと、前記通話相手の音声成分及び前記妨害音成分が除去された前記収音信号を入力し、前記通話者の音声である音声時間区間に対応する当該収音信号の大きさの時間変化量を通話者音声変化量として測定するステップと、前記通話相手の通話装置から送信された音声信号を入力し、当該通話相手の音声である音声時間区間に対応する当該音声信号の大きさの時間変化量を通話相手音声変化量として測定するステップと、前記通話者音声変化量及び／又は前記通話相手音声変化量が所定値よりも大きい場合に、前記妨害信号の大きさを大きくし、前記通話者音声変化量及び／又は前記通話相手音声変化量が当該所定値よりも小さい場合に、前記妨害信号の大きさを小さくするステップと、を更に有することを要旨とする。 According to a fourth aspect of the present invention, there is provided the voice component of the call partner and the jamming signal collected by the microphone after being emitted from the speaker using the voice signal of the call partner to which the jamming signal has been added. Removing the sound component from the collected sound signal; inputting the collected sound signal from which the voice component of the other party and the disturbing sound component have been removed; and corresponding to a voice time interval that is the voice of the caller Measuring the amount of time change in the magnitude of the collected sound signal as the amount of change in the caller's voice, and inputting the audio signal transmitted from the call device of the other party, corresponding to the voice time interval that is the voice of the other party Measuring the amount of time change in the size of the voice signal as a call partner voice change amount, and when the caller voice change amount and / or the call partner voice change amount is larger than a predetermined value, A step of increasing the size of the jamming signal and reducing the size of the jamming signal when the amount of change in voice of the caller and / or the amount of change in voice of the other party of the call is smaller than the predetermined value. This is the gist.

本発明によれば、簡易な構成で妨害音を生成すると共に、周囲騒音や通話状況に合わせて該妨害音のレベルを適切に調整可能なハンズフリー通話装置及びハンズフリー通話方法を提供することができる。 According to the present invention, it is possible to provide a hands-free call device and a hands-free call method capable of generating a disturbing sound with a simple configuration and appropriately adjusting the level of the disturbing sound in accordance with ambient noise and a call situation. it can.

以下、本発明の実施の形態について図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本実施の形態に係るハンズフリー通話装置の機能ブロックを示す機能ブロック図である。このハンズフリー通話装置は、スピーカー７及びマイク８を備え、通話者が手に保持しなくても通話相手と通話可能であって、図１において音声入力端１は、通話相手の通話装置（図示せず）から送信される音声信号を入力するオーディオ端子である。音声出力端２は、マイク８で収音された通話者の音声や同通話者の周囲で発生している雑音等の音に基づく収音信号を通話相手の通話装置に出力するオーディオ端子である。妨害信号発生源３は、スピーカー７から放出される通話相手の音声が同通話者以外の者に聴取されることを妨害する妨害音としての妨害信号を発生する。その妨害音とは、例えばホワイトノイズや、小川のせせらぎ音等の人間が発する音声とは異なるスペクトル構造を持った音等を挙げることができる。スイッチ４は、妨害信号発生源３で発生した妨害信号が後段のミキサ５へ入力することをオン／オフするスイッチである。 FIG. 1 is a functional block diagram showing functional blocks of the hands-free call device according to the present embodiment. This hands-free call device includes a speaker 7 and a microphone 8 and can talk to a call partner even if the caller does not hold it in his / her hand. In FIG. 1, the voice input terminal 1 is a call device (FIG. This is an audio terminal for inputting an audio signal transmitted from a device (not shown). The audio output terminal 2 is an audio terminal that outputs a collected sound signal based on a voice of a caller picked up by the microphone 8 or a noise such as noise generated around the caller to a call device of a call partner. . The disturbing signal generation source 3 generates a disturbing signal as a disturbing sound that prevents the other party from listening to the other party's voice emitted from the speaker 7. Examples of the interfering sound include white noise and sound having a spectrum structure different from human-generated sound such as brook noise. The switch 4 is a switch for turning on / off the input of the interference signal generated by the interference signal generation source 3 to the mixer 5 at the subsequent stage.

ミキサ５は、音声入力端１及び妨害信号発生源３からの各信号を混合する端子であって、より具体的には、音声入力端１を介して通話相手の通話装置から送信された音声信号に妨害信号発生源３で発生した妨害信号を加え、通話相手の音声信号に基づく音声と妨害信号に基づく妨害音とをスピーカー７から放出させる。アンプ６は、ミキサ５から出力された通話相手の音声信号と妨害信号との大きさを増幅する。スピーカー７は、アンプ６を介して増幅された音声信号及び妨害信号に基づく通話相手の音声及び妨害音を通話者に対して放出する。マイク８は、通話者の音声や同通話者の周囲で発生している雑音等の音を収音する。 The mixer 5 is a terminal that mixes each signal from the voice input terminal 1 and the interference signal generation source 3, and more specifically, a voice signal transmitted from the other party's call device via the voice input terminal 1. In addition, the interference signal generated by the interference signal generation source 3 is added to the speaker 7 to emit the sound based on the voice signal of the communication partner and the interference sound based on the interference signal. The amplifier 6 amplifies the magnitude of the voice signal and jamming signal output from the mixer 5. The speaker 7 emits the voice and jamming sound of the other party based on the voice signal and the jamming signal amplified through the amplifier 6 to the caller. The microphone 8 collects sound such as noise of the caller and the noise generated around the caller.

エコーキャンセラ９は、スピーカー７から放出されてマイク８に入力される通話相手の音声及び妨害音を打ち消す回路であって、より具体的には、ミキサ５から出力されて妨害信号が加えられた通話相手の音声信号を入力して、スピーカー７から放出された後にマイク８で収音された通話相手の音声成分及び妨害音成分を前述の収音信号から除去する。 The echo canceller 9 is a circuit that cancels out the voice and disturbance sound of the other party that is emitted from the speaker 7 and input to the microphone 8. More specifically, the echo canceller 9 is a circuit that is output from the mixer 5 and added with the disturbance signal. The other party's voice signal is inputted, and the voice component and the disturbing sound component of the other party who are picked up by the microphone 8 after being emitted from the speaker 7 are removed from the collected sound signal.

第１音声パワー計算部１０は、マイク８で収音された収音信号を入力し、この収音信号の波形を用いて時間区間に対する音声パワーａ_ｎを算出して、算出された音声パワーの対数値からデシベル値ｐ_ｎを計算する。まず、音声パワーａ_ｎについては、例えば、音声信号の波形をサンプリング周波数８ｋＨｚ、１６ｂｉｔ線形波形でＡ／Ｄ（アナログ→デジタル）変換したデータ（離散的なサンプリング波形）｛ｗ_ｍ｝（ｍ＝１，２，…，８０００）について、フレーム長１秒単位、フレーム周期０．５秒単位のフレーム単位で区切り、その波形を自乗して総和演算して得ることができる（式（１））。

First audio power calculation unit 10 receives the collected sound signal collected by the microphone 8, to calculate the sound power a _n for time intervals with the waveform of the sound collection signal, the speech power calculated The decibel value _pn is calculated from the logarithmic value. First, the speech power _{a n,} for example, waveform sampling frequency 8kHz of the speech signal, A / D (analog → digital) with 16bit linear waveform converted data (

discrete sampling

_{waveform) {w m} (m =} 1 , 2,..., 8000) can be obtained by dividing the waveform into units of frames each having a frame length of 1 second and a frame period of 0.5 seconds, squared, and summing up the waveforms (formula (1)).

ここで、ｆは１フレーム内のサンプリング数（上記８０００に相当）、ｎは任意のフレーム番号である。 Here, f is the number of samplings in one frame (corresponding to the above 8000), and n is an arbitrary frame number.

次に、音声パワーａ_ｎのデシベル値ｐ_ｎについては、例えば、音声パワーａ_ｎに対して１０を底とする対数値を求め、それを１０倍することで得ることができる（式（２））。

Next, the decibel value p _n of the audio power a _n, for example, determine the logarithm to the base 10 with respect to speech power a _n, it can be obtained by multiplying 10 (the formula (2) ).

なお、第２音声パワー計算部１１及び第３音声パワー計算部１８についても同様の機能を備えているが、第２音声パワー計算部１１に入力される信号はエコーキャンセラ９により通話相手の音声成分及び妨害音成分が除去された収音信号であり、第３音声パワー計算部１８に入力される信号は通話相手の通話装置から送信された音声信号である点が第１音声パワー計算部１０と異なっている。 The second voice power calculation unit 11 and the third voice power calculation unit 18 have the same function, but the signal input to the second voice power calculation unit 11 is sent to the voice component of the other party by the echo canceller 9. And the collected sound signal from which the disturbing sound component has been removed, and the signal input to the third voice power calculation unit 18 is a voice signal transmitted from the other party's call device. Is different.

非音声時間区間判別部１２は、第１音声パワー計算部１０で計算された音声パワーのデシベル値ｐ_ｎを用いてマイク８で収音された収音信号に音声が含まれているかいないかの弁別を行い、音声が含まれていない非音声時間区間を判別し、この非音声時間区間に対応する音声パワーのデシベル値ｐ_ｎのみをフレーム単位で後段の平均非音声パワー算出部１４に送出する。ここで、音声の有無を判別する方法としては、例えば、音声のピッチ周波数に相当する１００Ｈｚから３００Ｈｚの信号のパワースペクトルを観測し、この観測値が全帯域（前出のサンプリング周波数に対応し、ここでは１００Ｈｚから４ｋＨｚの帯域とする）の音声パワーに対して一定の大きさを有していれば、その時点は音声が入力された音声時間区間であると判別し、その大きさを有していない場合には、その時点は音声でない非音声時間区間であると判別する方法を用いることができる。また、より高度な推定方法としては、それぞれの音声に対してケプストラム分析を行い、１００Ｈｚから３００Ｈｚの成分に相当する高ケフレンシー領域を観測し、有声音（音声時間区間）と無声音（非音声時間区間）とを区別する方法を利用することも可能である。 Non-speech time interval determining section 12, whether not it contains the audio collected sound signal collected by the microphone 8 using the decibel value p _n of the calculated voice power with the first audio power calculation unit 10 performs discrimination, discriminates the non-speech time interval does not include audio outputs only decibel value p _n of the sound power corresponding to the non-speech time interval the average non-voice power calculating section 14 of the subsequent frame by frame . Here, as a method for determining the presence or absence of speech, for example, the power spectrum of a signal of 100 Hz to 300 Hz corresponding to the pitch frequency of speech is observed, and this observation value corresponds to the entire band (corresponding to the sampling frequency described above, If it has a certain magnitude with respect to the voice power of 100 Hz to 4 kHz here, it is determined that the time is the voice time interval in which the voice is input, and has the magnitude. If not, it is possible to use a method for determining that the point in time is a non-speech time period that is not speech. Further, as a more advanced estimation method, cepstrum analysis is performed on each voice, a high cefency region corresponding to a component from 100 Hz to 300 Hz is observed, and voiced sound (voice time interval) and unvoiced sound (non-voice time interval) It is also possible to use a method of distinguishing between

一方、第１音声時間区間判別部１３は、第２音声パワー計算部１１で計算された音声パワーのデシベル値を用いて、エコーキャンセラ９により通話相手の音声成分及び妨害音成分が除去された収音信号に音声が含まれているかいないかの弁別を行い、通話者の音声が含まれている音声時間区間を判別し、この音声時間区間に対応する音声パワーのデシベル値のみをフレーム単位で後段の平均音声パワー算出部１５及び第１音声動的情報算出部１７に送出する。また、第２音声時間区間判別部１９にあっては、第３音声パワー計算部１８で計算された音声パワーのデシベル値を用いて通話相手の通話装置から送信された音声信号に音声が含まれているかいないかの弁別を行い、通話相手の音声が含まれている音声時間区間を判別し、この音声時間区間に対応する音声パワーのデシベル値のみをフレーム単位で後段の第２音声動的情報算出部２０に送出する。 On the other hand, the first voice time interval discriminating unit 13 uses the decibel value of the voice power calculated by the second voice power calculation unit 11 and the echo canceller 9 removes the voice component and the disturbing sound component of the other party. Discriminates whether or not the sound signal contains sound, determines the sound time interval in which the caller's sound is included, and only the decibel value of the sound power corresponding to this sound time interval is shown in frame units. To the average voice power calculation unit 15 and the first voice dynamic information calculation unit 17. Further, in the second voice time interval discriminating unit 19, voice is included in the voice signal transmitted from the other party's call device using the voice power decibel value calculated by the third voice power calculation unit 18. Whether or not the voice of the other party is included is discriminated, and only the decibel value of the voice power corresponding to this voice time interval is determined in frame units as the second second voice dynamic information. It is sent to the calculation unit 20.

平均非音声パワー算出部１４は、非音声時間区間判別部１２から送出された非音声時間区間に対応する音声パワーのデシベル値を一定フレーム数分蓄積し、その平均値を求めて雑音レベルとして出力する。例えば、前述したように第１音声パワー計算部１０から出力される音声パワーがフレーム周期０．５秒単位の値である場合、１５秒間、即ち３０フレーム分の音声パワーの平均値として得ることができる。また、平均音声パワー算出部１５にあっては、第１音声時間区間判別部１３から送出された音声時間区間に対応する音声パワーのデシベル値を用いてその平均値を算出し、通話音レベルとして出力する。なお、非音声時間区間判別部１２と第１音声時間区間判別部１３とにおいて非音声時間区間と音声時間区間とを夫々振り分けているので、平均非音声パワー算出部１４及び平均音声パワー算出部１５では、３０フレーム分の音声パワーが確実に蓄積されない可能性があるが、その場合には、その前に蓄積して計算しておいた音声パワーで代用するようにしてもよい。 The average non-speech power calculation unit 14 accumulates the decibel value of the speech power corresponding to the non-speech time interval transmitted from the non-speech time segment determination unit 12 for a certain number of frames, and obtains the average value and outputs it as a noise level. To do. For example, as described above, when the audio power output from the first audio power calculator 10 is a value in units of 0.5 second frame period, it can be obtained as an average value of audio power for 15 seconds, that is, 30 frames. it can. Further, the average voice power calculation unit 15 calculates the average value using the decibel value of the voice power corresponding to the voice time interval transmitted from the first voice time interval determination unit 13 and obtains it as the call sound level. Output. The non-speech time interval determination unit 12 and the first sound time interval determination unit 13 distribute the non-speech time interval and the sound time interval, respectively, so the average non-speech power calculation unit 14 and the average speech power calculation unit 15 Then, there is a possibility that the sound power for 30 frames may not be accumulated reliably. In that case, the sound power accumulated and calculated before that may be used instead.

差分計算部１６は、平均非音声パワー算出部１４から出力された雑音レベルと、平均音声パワー算出部１５から出力された通話音レベルとの差分、即ち、通話音レベルに対する雑音レベルとの差分を計算し、後段の妨害信号レベル制御部２２に送出する。 The difference calculation unit 16 calculates a difference between the noise level output from the average non-speech power calculation unit 14 and the call sound level output from the average sound power calculation unit 15, that is, the difference between the noise level with respect to the call sound level. This is calculated and sent to the interference signal level control unit 22 in the subsequent stage.

第１音声動的情報算出部１７は、第１音声時間区間判別部１３から送出された音声時間区間に対応する音声パワーのデシベル値がフレーム単位でどの程度変動しているかを通話者音声変化量として算出する。このような変化量を表す音声特徴量としては、例えば、「デルタパワー」として知られる対数音声パワーの回帰係数を用いることができる。デルタパワーは、音声がなだらかに変化するときは小さい値となり、音声が激しく変化するときは大きい値となる（デルタパワーの算出式は後述にて説明する）。第１音声動的情報算出部１７は、この値を自乗して総和演算して得た値を後段の妨害信号レベル制御部２１に送出する。また、第２音声動的情報算出部２０にあっては、第２音声時間区間判別部１９から送出された音声時間区間に対応する音声パワーのデシベル値を用いて通話相手音声変化量として算出する。 The first voice dynamic information calculation unit 17 determines how much the decibel value of the voice power corresponding to the voice time interval transmitted from the first voice time interval determination unit 13 varies in units of frames. Calculate as For example, a logarithmic voice power regression coefficient known as “delta power” can be used as the voice feature quantity representing such a change amount. The delta power becomes a small value when the sound changes gently, and becomes a large value when the sound changes drastically (the calculation formula of the delta power will be described later). The first voice dynamic information calculation unit 17 sends a value obtained by squaring this value and calculating the sum to the interference signal level control unit 21 at the subsequent stage. Further, the second voice dynamic information calculation unit 20 calculates the call partner voice change amount using the decibel value of the voice power corresponding to the voice time interval transmitted from the second voice time interval determination unit 19. .

妨害信号レベル制御部２１は、差分計算部１６，第１音声動的情報算出部１７，第２音声動的情報算出部２０からの出力に基づいて妨害信号レベル制御器２２を制御する。具体的には、差分計算部１６で計算された差分が所定値よりも小さい場合に妨害信号の大きさを小さくし、その差分が同所定値よりも大きい場合に妨害信号の大きさを大きくする。また、妨害信号レベル制御部２１は、通話者音声変化量及び／又は通話相手音声変化量が所定値よりも大きい場合に、妨害信号の大きさを大きくし、通話者音声変化量及び／又は通話相手音声変化量が同所定値よりも小さい場合に、妨害信号の大きさを小さくする。 The jamming signal level control unit 21 controls the jamming signal level controller 22 based on outputs from the difference calculation unit 16, the first voice dynamic information calculation unit 17, and the second voice dynamic information calculation unit 20. Specifically, when the difference calculated by the difference calculation unit 16 is smaller than a predetermined value, the size of the jamming signal is reduced, and when the difference is larger than the predetermined value, the size of the jamming signal is increased. . Further, the jamming signal level control unit 21 increases the magnitude of the jamming signal when the caller voice change amount and / or the call partner voice change amount is larger than a predetermined value, and the caller voice change amount and / or the call. When the other party voice change amount is smaller than the predetermined value, the size of the interference signal is reduced.

妨害信号レベル制御器２２は、スイッチ４とミキサ５との間に接続されたアッテネータであって、妨害信号レベル制御部２１からの制御に基づいて、妨害信号発生源３で発生した妨害信号の出力ゲインを変化させることが可能である。 The interference signal level controller 22 is an attenuator connected between the switch 4 and the mixer 5, and outputs an interference signal generated by the interference signal generation source 3 based on the control from the interference signal level control unit 21. It is possible to change the gain.

ここで、本実施の形態に記載されたハンズフリー通話装置の各機能ブロックと、特許請求の範囲に記載されたハンズフリー通話装置の各手段との関係について説明しておく。妨害信号発生源３は特許請求の範囲に記載された妨害信号発生手段に相当し、ミキサ５は特許請求の範囲に記載された妨害信号付加手段に相当し、第１音声パワー計算部１０，非音声時間区間判別部１２，平均非音声パワー算出部１４は特許請求の範囲に記載された雑音レベル測定手段に相当し、第２音声パワー計算部１１，第１音声時間区間判別部１３，平均音声パワー算出部１５は特許請求の範囲に記載された通話音レベル測定手段に相当し、第２音声パワー計算部１１，第１音声時間区間判別部１３，第１音声動的情報算出部１７は特許請求の範囲に記載された通話者音声変化量測定手段に相当し、第３音声パワー計算部１８，第２音声時間区間判別部１９，第２音声動的情報算出部２０は特許請求の範囲に記載された通話相手音声変化量測定手段に相当し、差分計算部１６は特許請求の範囲に記載された差分計算手段に相当し、妨害信号レベル制御部２１は特許請求の範囲に記載された妨害信号レベル制御手段に相当し、エコーキャンセラ９は特許請求の範囲に記載された除去手段に相当している。 Here, the relationship between each functional block of the hands-free call device described in the present embodiment and each means of the hands-free call device described in the claims will be described. The interference signal generation source 3 corresponds to the interference signal generation means described in the claims, and the mixer 5 corresponds to the interference signal addition means described in the claims. The voice time interval discriminating unit 12 and the average non-speech power calculating unit 14 correspond to the noise level measuring means described in the claims. The second voice power calculating unit 11, the first voice time interval discriminating unit 13, the average voice The power calculation unit 15 corresponds to a call sound level measurement unit described in the claims, and the second voice power calculation unit 11, the first voice time interval determination unit 13, and the first voice dynamic information calculation unit 17 are patented. The third voice power calculation unit 18, the second voice time interval determination unit 19, and the second voice dynamic information calculation unit 20 correspond to the caller voice change amount measuring unit described in the claims. Caller voice change listed The difference calculation unit 16 corresponds to the amount measurement unit, the difference calculation unit 16 corresponds to the difference calculation unit described in the claims, and the interference signal level control unit 21 corresponds to the interference signal level control unit described in the claims. The echo canceller 9 corresponds to the removing means described in the claims.

なお、本実施の形態では、第２音声パワー計算部１１及び第１音声時間区間判別部１３は、特許請求の範囲に記載された通話音レベル測定手段及び通話者音声変化量測定手段として共通的な役割を担うように記載されているが、無論、これら２つの手段に夫々対応する第２音声パワー計算部１１及び第１音声時間区間判別部１３を夫々備えることも可能である。 In the present embodiment, the second voice power calculation unit 11 and the first voice time interval determination unit 13 are common as the call sound level measurement unit and the caller voice change amount measurement unit described in the claims. Of course, it is possible to provide the second voice power calculation unit 11 and the first voice time interval discrimination unit 13 respectively corresponding to these two means.

続いて、上記機能を備えたハンズフリー通話装置の動作について説明する。図２は、本実施の形態に係るハンズフリー通話装置の処理フローを示すフローチャートである。なお、音声入力端１及び音声出力端２は通話相手の通話装置と接続されており、妨害信号レベル制御器２２のゲインは減じられていないものとする（ステップＳ１００）。 Next, the operation of the hands-free call device having the above function will be described. FIG. 2 is a flowchart showing a processing flow of the hands-free call device according to the present embodiment. Note that it is assumed that the voice input terminal 1 and the voice output terminal 2 are connected to the other party's communication device, and the gain of the interference signal level controller 22 is not reduced (step S100).

通話相手の通話装置から送信された音声信号は、ミキサ５及びアンプ６を経由してスピーカー７から通話相手の音声として通話者の環境に放出される。いま、通話者がスイッチ４をオンにすると、妨害信号発生源３から発生した妨害信号が、妨害信号レベル制御器２２，ミキサ５，アンプ６を経由してスピーカー７から妨害音として通話者の環境に放出される。ここで、通話者が発話を意図的に行わないとすれば、スピーカー７から放出された通話相手の音声及び妨害音は、通話者の環境における音響特性及び周囲雑音の影響を受けてマイク８に入力される。マイク８において収音された収音信号の波形が極端に歪むものでなければ、この通話相手の音声及び妨害音による合成音は、エコーキャンセラ９において、ミキサ５からの出力にデジタルフィルタによる畳み込み演算を行って得られる逆相の合成音によって、ある程度のレベルにまで打ち消すことができる。打ち消すことができない音とは、通話者の肉声、通話者の周囲に存在する周囲雑音、エコーキャンセラのデジタルフィルタの精度によって生じる合成音の予測誤差、及び、マイク８での収音時に発生する電気的なノイズや過大音量による歪み成分である。ここでは、電気的なノイズや歪み成分は、通話者の肉声や周囲雑音に比べてはるかに低いレベルとして無視できるものとする。そうすると、エコーキャンセラ９の出力は、主に通話者の肉声と周囲雑音との合成音になる。エコーキャンセラ９の出力はそのまま音声出力端２に送られ、通話相手は自分の声のフィードバックを多少聴くこともあるが、主に、通話者の音声と通話者の周囲雑音のみの比較的明瞭な音声を受聴することができる。なお、以降、エコーキャンセラ９に入力される前の収音信号は、通話相手の音声成分，雑音成分，通話者の音声成分，通話者の周囲雑音成分を含む除去前収音信号と称し、エコーキャンセラ９に入力された後の収音信号は、通話相手の音声成分及び雑音成分が除去された、通話者の音声成分及び通話者の周囲雑音成分を含む除去後収音信号と称するものとする。 The audio signal transmitted from the call device of the other party is released from the speaker 7 through the mixer 5 and the amplifier 6 to the caller's environment as the other party's voice. Now, when the caller turns on the switch 4, the disturbing signal generated from the disturbing signal generation source 3 passes through the disturbing signal level controller 22, the mixer 5, and the amplifier 6 as a disturbing sound from the speaker 7. To be released. Here, if the caller does not intentionally speak, the other party's voice and disturbance sound emitted from the speaker 7 are affected by the acoustic characteristics and ambient noise in the caller's environment and are applied to the microphone 8. Entered. If the waveform of the collected sound signal picked up by the microphone 8 is not extremely distorted, the synthesized sound of the other party's voice and the disturbing sound is convoluted by the digital filter into the output from the mixer 5 in the echo canceller 9. It is possible to cancel the sound to a certain level by the reverse phase synthesized sound obtained by performing the above. Sounds that cannot be canceled out include the voice of the caller, the ambient noise around the caller, the prediction error of the synthesized sound caused by the accuracy of the digital filter of the echo canceller, and the electricity generated when the microphone 8 collects the sound. This is a distortion component due to noise and excessive volume. Here, it is assumed that electrical noise and distortion components can be ignored at a much lower level than the caller's real voice and ambient noise. Then, the output of the echo canceller 9 is mainly a synthesized sound of the caller's real voice and ambient noise. The output of the echo canceller 9 is sent to the voice output terminal 2 as it is, and the other party may hear some feedback of his / her voice, but mainly the voice of the caller and the ambient noise of the caller are relatively clear. You can listen to audio. Hereinafter, the collected sound signal before being input to the echo canceller 9 is referred to as a pre-removed collected signal including a voice component, a noise component, a caller's voice component, and a caller's ambient noise component. The collected sound signal after being input to the canceller 9 is referred to as a post-removal collected sound signal including the voice component of the caller and the ambient noise component of the caller, from which the voice component and noise component of the other party have been removed. .

このとき、第１音声パワー計算部１０は、エコーキャンセラ９に入力される前の除去前収音信号を用いて、例えば、フレーム長１．０秒，フレーム周期０．５秒毎のフレームデータとして音声パワーを計算し、そのデシベル値を求める（ステップＳ１０１）。ここで、第１音声パワー計算部１０で計算された時刻すなわちフレームｎにおける音声パワーのデシベル値をｐ_ｎと定義する。 At this time, the first sound power calculation unit 10 uses the pre-removed sound collection signal before being input to the echo canceller 9, for example, as frame data with a frame length of 1.0 seconds and a frame period of 0.5 seconds. The voice power is calculated and its decibel value is obtained (step S101). Here, the time calculated by the first sound power calculation unit 10, that is, the decibel value of the sound power at the frame _n is defined as pn.

そして、非音声時間区間判別部１２は、求められたｐ_ｎから非音声時間区間のフレームのｐ_ｎのみを弁別し、平均非音声パワー算出部１４に送出する（ステップＳ１０２）。 Then, the non-speech time interval determining section 12, only the p _n from p _n determined in a non-speech time interval frames discriminated, and sends the average non-voice power calculating section 14 (step S102).

平均非音声パワー算出部１４は、非音声時間区間判別部１２から送出されたｐ_ｎを用いて、非音声時間区間に対する一定時間長の平均非音声パワーＰを算出する（ステップＳ１０３）。すなわち、ｐ_ｎがフレーム周期０．５秒毎に求められた対数パワーであるとすると、平均非音声パワー算出部１４で計算される非音声時間区間に対する平均非音声パワーＰの計算方法は、例えば、３０フレーム、１５秒単位での平均値を求めるものとすれば、式（３）を用いて計算することができる。

The average non-speech power calculation unit 14 uses the _pn sent from the non-speech time interval determination unit 12 to calculate an average non-speech power P having a fixed time length for the non-speech time interval (step S103). That is, _assuming that _pn is a logarithmic power obtained every frame period of 0.5 seconds, a method for calculating the average non-speech power P for the non-speech time interval calculated by the average non-speech power calculation unit 14 is, for example, If the average value in units of 30 frames and 15 seconds is to be calculated, it can be calculated using equation (3).

同様に、第２音声パワー計算部１１は、エコーキャンセラ９に入力される後の除去後収音信号を用いて音声パワーを計算し、そのデシベル値を求める（ステップＳ１０４）。ここで、第２音声パワー計算部１１で計算された時刻すなわちフレームｎにおける音声パワーのデシベル値をｑ_ｎと定義する。 Similarly, the second sound power calculation unit 11 calculates the sound power using the post-removal collected sound signal after being input to the echo canceller 9, and obtains the decibel value (step S104). Here, the time calculated by the second audio power calculator 11, that is, the decibel value of the audio power at the frame _n is defined as q _n .

そして、第１音声時間区間判別部１３は、求められたｑ_ｎから非音声時間区間のフレームのｑ_ｎのみを弁別し、平均音声パワー算出部１５及び第１音声動的情報算出部１７に送出する（ステップＳ１０５）。 Then, the first audio time interval discriminating unit 13 discriminates only q _n of the frame in the non-audio time interval from the obtained q _n and sends it to the average audio power calculating unit 15 and the first audio dynamic information calculating unit 17. (Step S105).

平均音声パワー算出部１５は、第１音声時間区間判別部１３から送出されたｑ_ｎを用いて、音声時間区間に対する一定時間長の平均音声パワーＱを算出する（ステップＳ１０６）。平均音声パワーＱは、式（３）と同様に式（４）を用いて計算することができる。

The average speech power calculation unit 15 uses the q _n sent from the first audio time section determination unit 13 calculates the average sound power Q of a certain time length for the speech time interval (step S106). The average voice power Q can be calculated using Expression (4) in the same manner as Expression (3).

なお、平均非音声パワーＰと平均音声パワーＱとは、非音声時間区間と音声時間区間とを弁別する過程を経由するため、時刻的に同時に得ることはできない。ここで、平均非音声パワー算出部１４で計算した非音声時間区間における平均非音声パワーＰは、通話者の周囲雑音と妨害音とによって生成された雑音レベルとみなすことができる。また、平均音声パワー算出部１５で計算した音声時間区間における平均音声パワーＱは、通話者の発する肉声によって生成された通話音レベルとみなすことができる。 Note that the average non-speech power P and the average speech power Q pass through the process of discriminating the non-speech time interval and the speech time interval, and therefore cannot be obtained simultaneously in time. Here, the average non-speech power P in the non-speech time interval calculated by the average non-speech power calculation unit 14 can be regarded as a noise level generated by the ambient noise and the disturbing sound of the caller. Further, the average voice power Q in the voice time interval calculated by the average voice power calculation unit 15 can be regarded as a call sound level generated by a real voice uttered by the caller.

その後、差分計算部１６は、式（５）を用いて差分値Ｄを計算する（ステップＳ１０７）。

Thereafter, the difference calculation unit 16 calculates the difference value D using Expression (5) (step S107).

既に説明したように、平均非音声パワーＰ，平均音声パワーＱ，差分値Ｄはいずれもデシベル値である。一般的な音声認識装置においては、平均的な音声入力レベルに対して平均的な周囲騒音レベルとの差分が約１０デシベル以下になると音声認識率が目立って低下してくる。実環境で考えた場合に、肉声によるマイクへの音声入力レベルが等価騒音レベル７６ｄＢＡ相当の大きさを持つものとすれば、周囲環境の雑音が６６ｄＢＡ（＝７６ｄＢＡ−１０ｄＢＡ）に相当し、これは二車線を有する道路における騒音規制レベルにほぼ匹敵するため、相当な大きさの雑音であることがわかる。これを盗聴が困難になる音圧レベルと同一と仮定すれば、即ち、差分値Ｄが１０デシベル以下の小さい値の場合には、第三者による盗聴が困難である一方、周囲へも悪影響を及ぼす多大な音として妨害音の抑制を行う。 As already described, the average non-voice power P, the average voice power Q, and the difference value D are all decibel values. In a general speech recognition apparatus, when the difference between the average speech input level and the average ambient noise level is about 10 decibels or less, the speech recognition rate is significantly reduced. When considered in a real environment, if the voice input level to the microphone by the real voice has a magnitude equivalent to the equivalent noise level of 76 dBA, the noise in the surrounding environment corresponds to 66 dBA (= 76 dBA-10 dBA). It can be seen that the noise is considerably large because it is almost equivalent to the noise regulation level in a road having two lanes. If this is assumed to be the same as the sound pressure level at which eavesdropping becomes difficult, that is, if the difference value D is a small value of 10 decibels or less, it is difficult for a third party to eavesdrop on, but also adversely affects the surroundings. Suppresses the disturbing sound as an enormous sound.

次に、第１音声動的情報算出部１７は、フレームｎにおけるデルタパワーｒ_ｎを式（６）を用いて算出する。

Next, the first audio dynamic information calculation unit 17, the delta power r _n at frame n is calculated using equation (6).

このデルタパワーｒ_ｎは、通話者の音声の変化量が分かり、変化が著しい程大きな値が得られる。逆に、定常的な音に近づくと、音圧レベルが大きい音であっても小さな数値となる傾向がある。 The delta power r _n is the amount of change talker speech understanding, large value changes more significant is obtained. On the other hand, when the sound approaches a steady sound, even if the sound has a high sound pressure level, it tends to be a small numerical value.

また、第１音声動的情報算出部１７は、デルタパワーｒ_ｎを自乗した値について、一定時間の長時間平均値Ｒを求める（ステップＳ１０８）。一定時間を既出の通り１５秒単位での平均値とすれば、式（７）で長時間平均値Ｒ求めることができる。

Further, the first audio dynamic information calculation unit 17, the squared value of the delta power r _n, determine the long-term average value R of the predetermined time (step S108). Assuming that the fixed time is the average value in units of 15 seconds as described above, the long-time average value R can be obtained by Expression (7).

同様に、第３音声パワー計算部１８は、通話相手の音声信号を用いて音声パワーを計算し、そのデシベル値ｓ_ｎを求める（ステップＳ１０９）。 Similarly, the third voice power calculating section 18 calculates the speech power by using the audio signal of the other party, obtain the decibel value s _n (step S109).

また同様に、第２音声時間区間判別部１９は、求められた対数パワーから音声時間区間のフレームのｓ_ｎのみを弁別し、第２音声動的情報算出部２０に送出する（ステップＳ１１０）。 Similarly, the second audio time segment discriminating unit 19 discriminates only s _n frames of the speech time interval from the logarithmic power obtained is sent to the second audio dynamic information calculation section 20 (step S110).

そして、第２音声動的情報算出部２０は、フレームｎにおけるデルタパワーｕ_ｎを式（８）を用いて算出する。

Then, second audio dynamic information calculation section 20 is calculated using equation (8) the delta power u _n at frame n.

更に、第２音声動的情報算出部２０は、デルタパワーｕ_ｎを自乗した値について、一定時間の長時間平均値Ｕを求める（ステップＳ１１１）。

Furthermore, the second audio dynamic information calculation unit 20, the squared value of the delta power u _n, determining the long-term average value U for a predetermined time (step S111).

これらの長時間平均値Ｒ及び長時間平均値Ｕは、平均値を計算したときの時間長１５秒単位で妨害信号レベル制御部２１に送出される。そして、妨害信号レベル制御部２１は、差分計算部１６から得られた差分値Ｄと、第１音声動的情報算出部１７及び第２音声動的情報算出部２０から得られたデルタパワーの自乗和としての長時間平均値Ｒ及び長時間平均値Ｕを用いて、例えば、１５秒単位で以下の制御を行う。 The long-time average value R and the long-time average value U are sent to the interference signal level control unit 21 in units of 15 seconds when the average value is calculated. Then, the interference signal level control unit 21 squares the difference value D obtained from the difference calculation unit 16 and the delta power obtained from the first sound dynamic information calculation unit 17 and the second sound dynamic information calculation unit 20. Using the long-time average value R and the long-time average value U as the sum, for example, the following control is performed in units of 15 seconds.

最初に、差分値Ｄが１０デシベル以下の場合について説明する。既に説明したように、差分値Ｄは音声と周囲雑音との信号対雑音比のデシベル値に相当するため、差分値Ｄが小さい値であることは相当な周囲雑音が存在していることを意味している。そこで、妨害信号レベル制御部２１は、例えば差分値Ｄが１０デシベル以下の場合に（ステップＳ１１２）、例えば−３デシベル単位で妨害信号レベル制御器２２のゲインを小さくし、妨害信号発生源３から発せられる妨害信号の出力を下げる（ステップＳ１１３）。 First, a case where the difference value D is 10 decibels or less will be described. As already described, since the difference value D corresponds to the decibel value of the signal-to-noise ratio between speech and ambient noise, a small value of the difference value D means that there is considerable ambient noise. is doing. Therefore, for example, when the difference value D is 10 decibels or less (step S112), the interfering signal level control unit 21 decreases the gain of the interfering signal level controller 22 in units of −3 decibels, for example, from the interfering signal generation source 3. The output of the disturbing signal to be emitted is lowered (step S113).

次に、差分値Ｄが１０デシベルよりも大きいが、長時間平均値Ｒと長時間平均値Ｕの和が小さい場合について説明する。長時間平均値Ｒと長時間平均値Ｕとの和の値を閾値Ｘと比較する。ここで、閾値Ｘは正の値であり、音声信号の分析フレーム長、フレーム周期などの分析条件から導出される値である。その和の値が小さいということは、通話者及び通話相手の対話において、音声の明瞭性が低下しているか、どちらも活発な討論をしていないか、どちらかが一方的に話をしているかのいずれかに該当すると考えられる。音声の明瞭性は、通話者の音声の音響特性に起因するものも考えられるが、妨害音が通話者及び通話相手の思考を妨げている可能性がある。そこでこのような可能性を排除するため、妨害信号レベル制御部２１は、差分値Ｄが閾値Ｘより小さいの場合に（ステップＳ１１４）、ステップＳ１１３と同様に、例えば−３デシベル単位で妨害信号レベル制御器２２のゲインを小さくし、妨害信号発生源３から発せられる妨害信号の出力を下げる。 Next, a case where the difference value D is larger than 10 decibels but the sum of the long time average value R and the long time average value U is small will be described. The sum of the long-time average value R and the long-time average value U is compared with the threshold value X. Here, the threshold value X is a positive value, and is a value derived from analysis conditions such as the analysis frame length and frame period of the audio signal. The small value of the sum means that in the conversation between the caller and the other party, the clarity of the voice has decreased, neither of them has been actively discussing, or one of them has spoken unilaterally. It is thought that it corresponds to either. The clarity of speech may be attributed to the acoustic characteristics of the caller's voice, but the disturbing sound may hinder the thinking of the caller and the other party. Therefore, in order to eliminate such a possibility, when the difference value D is smaller than the threshold value X (step S114), the interfering signal level control unit 21 performs the interfering signal level in units of, for example, -3 decibels as in step S113. The gain of the controller 22 is reduced, and the output of the interference signal emitted from the interference signal generation source 3 is lowered.

その後、妨害信号レベル制御部２１は、ステップＳ１１３によって妨害信号の出力を下げた結果、差分値Ｄ、長時間平均値Ｒ、長時間平均値Ｕを再度測定し、差分値Ｄが１０デシベル以上となったか、或いは長時間平均値Ｒと長時間平均値Ｕの和が一定値Ｘを上回ったかを確認する。もし変化がみられなかった場合にはステップＳ１１３を繰り返し、妨害信号発生源３から発せられる妨害信号の出力をさらに下げていく。一方、長時間平均値Ｒと長時間平均値Ｕの和が一定値Ｘを上回っている場合、妨害信号レベル制御部２１は、例えば＋３デシベル単位で妨害信号レベル制御器２２のゲインを大きくし、妨害信号発生源３から発せられる妨害信号の出力を元に戻していく（ステップＳ１１５）。但し、妨害信号レベル制御器２２では、最大０デシベルを超えてゲインの増幅は行わないものとする。 Thereafter, the interference signal level control unit 21 measures the difference value D, the long-time average value R, and the long-time average value U again as a result of lowering the output of the interference signal in step S113, and the difference value D is 10 decibels or more. Or whether the sum of the long-time average value R and the long-time average value U exceeds a certain value X. If no change is observed, step S113 is repeated, and the output of the interference signal emitted from the interference signal generation source 3 is further lowered. On the other hand, when the sum of the long-term average value R and the long-term average value U exceeds a certain value X, the jamming signal level control unit 21 increases the gain of the jamming signal level controller 22 in units of, for example, +3 decibels, The output of the disturbing signal emitted from the disturbing signal generation source 3 is restored (step S115). However, it is assumed that the interference signal level controller 22 does not perform gain amplification exceeding a maximum of 0 dB.

ハンズフリー通話装置は、ステップＳ１１２〜ステップＳ１１５を繰り返すことにより、適切な音量の妨害音を出力することが可能となる。 The hands-free call device can output a disturbing sound having an appropriate volume by repeating Steps S112 to S115.

最後に、ハンズフリー通話を終了する際に、通話者がスイッチ４をオフにすれば（ステップＳ１１６）、妨害信号レベル制御器２２で印加されたゲインが元に戻ることになる。 Finally, when the caller ends the hands-free call, if the caller turns off the switch 4 (step S116), the gain applied by the interference signal level controller 22 is restored.

なお、本実施の形態では、デルタパワーを使って音声の動的特徴量を計算したが、計算量に余裕があればデルタパワーよりも正確に音声の動的特徴を表すデルタケプストラムを用いて、より精緻な音声の変化を捉えることも可能である。 In this embodiment, the dynamic feature amount of the voice is calculated using the delta power. However, if there is a margin in the calculation amount, the delta cepstrum representing the dynamic feature of the voice more accurately than the delta power is used. It is also possible to capture more precise changes in speech.

更に、通話相手の音声を放出するスピーカーと、妨害音を放出するスピーカーとを同一としたが、ミキサ５を廃止し、通話相手の音声及び妨害音を夫々異なるスピーカーから放出させることも可能である。エコーキャンセラ９は、通話相手の音声及び妨害音のどちらかがマイク８に回り込んだ場合であっても、どちらも打ち消すことが原理的に可能だからである。 Furthermore, although the speaker that emits the voice of the other party is the same as the speaker that emits the disturbing sound, the mixer 5 can be eliminated and the voice and the disturbing sound of the other party can be emitted from different speakers. . This is because, in principle, the echo canceller 9 can cancel both the voice of the communication partner and the disturbing sound that have entered the microphone 8.

また、本実施の形態では、ステップＳ１１２での比較において、差分値Ｄが１０デシベル以下の場合について説明したが、差分値Ｄに対する比較値は１０デシベルに限られることはなく、他の比較値を用いても良い。 In the present embodiment, the case where the difference value D is 10 decibels or less has been described in the comparison in step S112. However, the comparison value for the difference value D is not limited to 10 decibels, and other comparison values are used. It may be used.

更に、本実施の形態では、ステップＳ１１４での比較において、長時間平均値Ｒと長時間平均値Ｕとの和の値を用いて説明したが、一定値Ｘの値によっては、加算のみではなく、乗算，除算，減算を用いてもよく、更には所定の数式を用いてもよい。また、いずれか一方の長時間平均値を用いてもよい。 Furthermore, in the present embodiment, the sum of the long-time average value R and the long-time average value U has been described in the comparison in step S114. However, depending on the value of the constant value X, not only the addition. , Multiplication, division, and subtraction may be used, and a predetermined mathematical formula may be used. Further, either one of the long-term average values may be used.

最後に、第１の課題として記載した「周囲騒音に合わせて該妨害音のレベルを調整すること」を考慮した場合、通話相手及び通話者が発する通話音声と周囲雑音とを判別することが可能であればよいので、エコーキャンセラ９により通話相手の音声を除去しなくても、換言すれば、エコーキャンセラ９を具備しなくとも上記課題が解決可能であることを付言しておく。 Finally, when considering “adjusting the level of the disturbing sound in accordance with the ambient noise” described as the first problem, it is possible to discriminate between the call voice and the ambient noise emitted by the other party and the caller. Therefore, it should be noted that the above problem can be solved without removing the voice of the communication partner by the echo canceller 9, in other words, without having the echo canceller 9.

本実施の形態によれば、スピーカー７から放出される通話相手の音声が通話者以外の者に聴取されることを妨害する妨害信号を発生し、通話相手の通話装置から送信された音声信号に該妨害信号を加え、通話相手の音声信号に基づく音声と妨害信号に基づく妨害音とをスピーカー７から放出させるので、簡易な構成で妨害音を生成するハンズフリー通話装置を提供することができる。具体的には、通話相手の音声を放出するスピーカー７と妨害音を放出するスピーカー７とが１つになるため、ハンズフリー通話装置の筐体の大型化を防止し、妨害音生成のために新たなスピーカー７の配置を必要とせず、従来のハンズフリー通話装置の筐体でも容易に適用が可能となる。 According to the present embodiment, a disturbing signal that prevents the other party's voice being heard from the speaker 7 from being heard by a person other than the calling party is generated, and the voice signal transmitted from the other party's calling apparatus is generated. Since the interference signal is added and the sound based on the speech signal of the other party and the interference sound based on the interference signal are emitted from the speaker 7, it is possible to provide a hands-free communication device that generates the interference sound with a simple configuration. Specifically, since the speaker 7 that emits the voice of the call partner and the speaker 7 that emits the interference sound are combined into one, it is possible to prevent an increase in the size of the housing of the hands-free communication device and to generate the interference sound. The arrangement of a new speaker 7 is not required, and it can be easily applied to the case of a conventional hands-free call device.

また、本実施の形態によれば、マイク８で収音された音に基づく収音信号を入力し、通話音声でない非音声時間区間に対応する該収音信号の大きさを雑音レベルとして測定すると共に、マイク８で収音された音に基づく収音信号を同様に入力し、通話音声である音声時間区間に対応する該収音信号の大きさを通話音レベルとして測定し、通話音レベルに対する雑音レベルとの差分が所定値よりも小さい場合に妨害信号の大きさを小さくし、該差分が該所定値よりも大きい場合に妨害信号の大きさを大きくするので、周囲騒音に合わせて妨害音のレベルを適切に調整するハンズフリー通話装置を提供することができる。具体的には、通話者や通話相手が音声を発しない時間区間で周囲騒音を評価し、音声レベルに対して十分な騒音レベルが定在していれば妨害音の音圧レベルを下げ、一方、騒音レベルが低い場合には、妨害音の音圧レベルを上げるため、周囲騒音の状況に合わせて妨害音のレベルを適切に調整することができる。 Further, according to the present embodiment, a sound collection signal based on the sound collected by the microphone 8 is input, and the magnitude of the sound collection signal corresponding to a non-voice time period that is not a call voice is measured as a noise level. At the same time, a sound pickup signal based on the sound picked up by the microphone 8 is input in the same manner, and the magnitude of the sound pickup signal corresponding to the voice time interval that is the call voice is measured as the call sound level, When the difference from the noise level is smaller than a predetermined value, the size of the interfering signal is reduced, and when the difference is larger than the predetermined value, the size of the interfering signal is increased. It is possible to provide a hands-free communication device that appropriately adjusts the level of the communication. Specifically, the ambient noise is evaluated in a time interval in which the caller or the call partner does not make a sound, and if a sufficient noise level is present relative to the sound level, the sound pressure level of the interference sound is lowered. When the noise level is low, the sound pressure level of the interfering sound is increased, so that the level of the interfering sound can be appropriately adjusted according to the surrounding noise conditions.

更に、本実施の形態によれば、通話相手の音声成分及び妨害音成分が除去された収音信号を入力し、通話者の音声である音声時間区間に対応する該収音信号の大きさの時間変化量を通話者音声変化量として測定すると共に、通話相手の通話装置から送信された音声信号を入力し、通話相手の音声である音声時間区間に対応する該音声信号の大きさの時間変化量を通話相手音声変化量として測定し、通話者音声変化量及び／又は通話相手音声変化量が所定値よりも大きい場合に、妨害信号の大きさを大きくし、通話者音声変化量及び／又は通話相手音声変化量が該所定値よりも小さい場合に、妨害信号の大きさを小さくするので、通話状況に応じて妨害音のレベルを適切に調整するハンズフリー通話装置を提供することができる。具体的には、通話者同士の音声を情報伝達量の観点で音声スペクトルの変化量を擬似的に評価し、音声スペクトルの変化が低下してきたら双方間で伝わる情報が少なくなって通話が困難になったものと見做して妨害音の音圧レベルを下げるため、通話の状況に合わせて妨害音の大きさを適切に調整することができる。 Furthermore, according to the present embodiment, the collected sound signal from which the voice component and the disturbing sound component of the other party are removed is input, and the magnitude of the collected sound signal corresponding to the voice time interval that is the voice of the caller is The time change amount is measured as the caller voice change amount, and the voice signal transmitted from the call device of the other party is input, and the time change of the magnitude of the voice signal corresponding to the voice time interval which is the voice of the other party When the amount of change in the other party's voice is measured and the amount of change in the other party's voice and / or the amount of change in the other party's voice is larger than a predetermined value, the magnitude of the disturbing signal is increased, and Since the magnitude of the disturbing signal is reduced when the amount of change in the other party's voice is smaller than the predetermined value, it is possible to provide a hands-free call device that appropriately adjusts the level of the disturbing sound according to the call situation. More specifically, the amount of change in the voice spectrum is evaluated in a pseudo manner from the viewpoint of the amount of information transmitted between the callers, and if the change in the voice spectrum decreases, the amount of information transmitted between the two parties will be reduced, making the call difficult. Since the sound pressure level of the interfering sound is lowered by assuming that it has become, it is possible to appropriately adjust the size of the interfering sound according to the situation of the call.

更にまた、本実施の形態によれば、妨害信号が加えられた通話相手の音声信号を用いて、スピーカー７から放出された後にマイク８で収音された通話相手の音声成分及び妨害音成分を収音信号から除去するため、処理の単純化がされたハンズフリー通話装置を提供することができる。具体的には、スピーカー７から放出された通話相手の音声がマイク８へ回りこんで該通話相手に送り返されることを防止する機能と、スピーカー７から放出された妨害音が該マイク８へ回り込んで該通話相手に送り返されることを防止する機能とを共有化することができる。 Furthermore, according to the present embodiment, the voice component and the disturbing sound component of the call partner collected by the microphone 8 after being emitted from the speaker 7 using the voice signal of the call partner to which the jamming signal has been added. Since it is removed from the collected sound signal, a hands-free communication device with simplified processing can be provided. Specifically, the function of preventing the other party's voice emitted from the speaker 7 from turning around the microphone 8 and being sent back to the other party, and the disturbing sound emitted from the speaker 7 wraps around the microphone 8. Thus, it is possible to share the function of preventing the call from being sent back to the other party.

本実施の形態に係るハンズフリー通話装置の機能ブロックを示す機能ブロック図である。It is a functional block diagram which shows the functional block of the hands-free call apparatus which concerns on this Embodiment. 本実施の形態に係るハンズフリー通話装置の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow of the hands-free call apparatus which concerns on this Embodiment. 旧来のテレビ電話の斜視図である。It is a perspective view of the conventional videophone. 現在のテレビ電話の斜視図である。It is a perspective view of the present videophone.

Explanation of symbols

１…音声入力端
２…音声出力端
３…妨害信号発生源
４…スイッチ
５…ミキサ
６…アンプ
７…スピーカー
８…マイク
９…エコーキャンセラ
１０…第１音声パワー計算部
１１…第２音声パワー計算部
１２…非音声時間区間判別部
１３…第１音声時間区間判別部
１４…平均非音声パワー算出部
１５…平均音声パワー算出部
１６…差分計算部
１７…第１音声動的情報算出部
１８…第３音声パワー計算部
１９…第２音声時間区間判別部
２０…第２音声動的情報算出部
２１…妨害信号レベル制御部
２２…妨害信号レベル制御器
Ｓ１００〜Ｓ１１６…ステップ DESCRIPTION OF SYMBOLS 1 ... Audio | voice input terminal 2 ... Audio | voice output terminal 3 ... Interference signal generation source 4 ... Switch 5 ... Mixer 6 ... Amplifier 7 ... Speaker 8 ... Microphone 9 ... Echo canceller 10 ... 1st audio | voice power calculation part 11 ... 2nd audio | voice power calculation Unit 12: Non-speech time interval discriminating unit 13 ... First voice time segment discriminating unit 14 ... Average non-speech power calculation unit 15 ... Average speech power calculation unit 16 ... Difference calculation unit 17 ... First voice dynamic information calculation unit 18 ... Third voice power calculation unit 19 ... second voice time interval determination unit 20 ... second voice dynamic information calculation unit 21 ... jamming signal level control unit 22 ... jamming signal level controller S100 to S116 ... step

Claims

In a hands-free call device that has a speaker and a microphone, and that allows the caller to talk to the other party without holding it,
Interference signal generating means for generating an interference signal that prevents the voice of the other party released from the speaker from being heard by a person other than the speaker;
An interference signal adding means for adding the interference signal to the audio signal transmitted from the communication device of the other party, and emitting the sound based on the audio signal of the other party and the interference sound based on the interference signal from the speaker;
A noise level measurement means for inputting a collected sound signal based on the sound collected by the microphone and measuring the magnitude of the collected sound signal corresponding to a non-voice time interval that is not a call voice as a noise level;
A call sound level measuring unit that inputs the sound pickup signal and measures the magnitude of the sound pickup signal corresponding to a voice time interval that is a call voice as a call sound level;
A difference calculating means for calculating a difference between the noise level and the call sound level;
A jamming signal level control means for reducing the magnitude of the jamming signal when the difference is smaller than a predetermined value and increasing the magnitude of the jamming signal when the difference is larger than the predetermined value;
A hands-free communication device comprising:

Using the voice signal of the other party to which the jamming signal has been added, the voice component and the jamming sound component of the other party that have been emitted from the speaker and collected by the microphone are removed from the collected signal. Removal means;
The collected sound signal from which the voice component and the disturbing sound component of the other party are removed is input, and the time change amount of the collected signal corresponding to the voice time interval that is the voice of the caller is determined. Caller voice change amount measuring means for measuring the voice change amount;
The other party who inputs the voice signal transmitted from the other party's calling device and measures the amount of time change of the voice signal corresponding to the voice time interval which is the voice of the other party as the other party voice change. Voice change amount measuring means,
The jamming signal level control means increases the magnitude of the jamming signal when the caller voice change amount and / or the call partner voice change amount is larger than a predetermined value, and the caller voice change amount and / or The hands-free call device according to claim 1, wherein when the call partner voice change amount is smaller than the predetermined value, the interference signal is reduced.

In a hands-free call method that is performed with a hands-free call device that includes a speaker and a microphone and can be called with a call partner even if the caller does not hold it in his / her hand,
Generating a jamming signal that prevents the voice of the other party emitted from the speaker from being heard by a person other than the caller;
Adding the jamming signal to a voice signal transmitted from the call device of the other party, and emitting sound based on the voice signal of the other party and jamming sound based on the jamming signal from the speaker;
Inputting a collected sound signal based on the sound collected by the microphone, and measuring the magnitude of the collected sound signal corresponding to a non-voice time interval that is not a call voice as a noise level;
Inputting the collected sound signal and measuring the magnitude of the collected sound signal corresponding to a voice time interval that is a call voice as a call sound level;
Calculating a difference between the noise level relative to the call sound level;
Reducing the magnitude of the jamming signal when the difference is smaller than a predetermined value, and increasing the magnitude of the jamming signal when the difference is larger than the predetermined value;
A hands-free calling method characterized by comprising:

Using the voice signal of the other party to which the jamming signal has been added, the voice component and the jamming sound component of the other party that have been emitted from the speaker and collected by the microphone are removed from the collected signal. Steps,
The collected sound signal from which the voice component and the disturbing sound component of the other party are removed is input, and the time change amount of the collected signal corresponding to the voice time interval that is the voice of the caller is determined. Measuring the amount of change in voice;
Inputting a voice signal transmitted from the call partner's call device, and measuring a time change amount of the voice signal corresponding to a voice time interval that is the voice of the call partner as a call partner voice change amount; ,
When the caller voice change amount and / or the call partner voice change amount is larger than a predetermined value, the size of the jamming signal is increased, and the caller voice change amount and / or the call partner voice change amount is Reducing the magnitude of the jamming signal if less than the predetermined value;
The hands-free call method according to claim 3, further comprising: