JP4647110B2

JP4647110B2 - Matheries similar sound conversion device and Matheries similar sound conversion method

Info

Publication number: JP4647110B2
Application number: JP2001024057A
Authority: JP
Inventors: 哲斉藤
Original assignee: Pigeon Corp
Current assignee: Pigeon Corp
Priority date: 2001-01-31
Filing date: 2001-01-31
Publication date: 2011-03-09
Anticipated expiration: 2021-01-31
Also published as: JP2002229600A

Description

【０００１】
【発明の属する技術分野】
この発明は、例えば、０歳児，特に月齢１０か月程度までの乳児が、特に関心を示す音声としてのマザリーズと類似した音を、入力された音声または音声信号から変換して得るためのマザリーズ類似音の変換方法と変換装置に関するものである。
【０００２】
【従来の技術】
従来、乳児のための音刺激をともなう玩具としては、玩具の笛や太鼓等が広く使用されている。
ところで、近年、乳児が注意をむけたり、聞き入ったりする音を通じてその感性の傾向が研究されてきている。
【０００３】
すなわち、月齢１０か月程度までの乳児は、この時期特有の母親による語りかけの特徴としての連続変化する音刺激に関しては、このような乳児が特に関心を示すものが明らかになりつつあり、このような音刺激の特徴は、その育児者の音声の特徴から、「マザリーズ」と呼ばれている。
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の笛や太鼓等の音を出す玩具では、特に乳児の発達段階に対応して、その関心をひく点については、特に考慮されていない。このため、上述のようなマザリーズに類似した音を生成することにより、乳児に適切な聴覚刺激を与えて、その関心の傾向にあわせた外部からのはたらきかけを行うことができる玩具等はこれまで着想すらされていなかった。
さらに、乳児の言語能力の発達過程と音声刺激との関係において、その発達を促進するように、適切な選択された音声を生成して、音声刺激を与えるような手段も工夫されていない。
【０００５】
本発明は、このような問題を解決するためになされたもので、特に乳児の関心をひくことができるマザリーズに類似した音を、装置に入力される他の音声または音声信号を変換して得ることができるマザリーズ類似音変換装置と、マザリーズ類似音変換方法を提供することを目的としている。
【０００６】
【課題を解決するための手段】
本発明は、音声または音声信号あるいは音声信号に対応した情報の入力手段と、入力された音声または音声信号を所定の基本周波数とその倍音構造を備えた基本音に変換する基本音変換手段と、所定の倍音構造を備えた前記基本音について一定の範囲のイントネーション変化を与えて変化音とする変化音生成手段と前記基本音変換手段が、入力された前記音声または音声信号の有する基本周波数を特定する手段と、前記基本音変換手段が、前記基本周波数を上昇または下降させるようにシフトさせることにより、変換後の基本周波数に基づく倍音構造をもつ基本音を生成する手段と、前記倍音構造を持つ周波数が次第に高まるようにすると共に、高い周波数になるほど音圧が弱くなるように調整する共鳴調整手段とを備え、入力された前記音声または音声信号を変換して、マザリーズ類似音を生成する構成としたことを特徴とする。
【０００７】
上記構成によれば、入力手段は、マザリーズ類似音に変換される元となる音声もしくは音声信号あるいは音声信号に対応した情報を入力するためのものである。上記基本音変換手段は、入力された音声もしくは音声信号あるいは音声信号に対応した情報に関して、その基本周波数を変換もしくは特定することで、マザリーズ類似音の基本周波数に合わせる機能をはたす。また、変化音生成手段は、後述するマザリーズの特性に適合するように、イントネーション変化を付与するものである。
ここで、本発明の「基本音」とは、マザリーズ類似音に適合した基本周波数をもち、その基本周波数の倍音構造を有する音である。また、「変化音」とは、マザリーズに対応した一定範囲のイントネーション変化を有する音である。
「音声信号に対応した情報」とは、音声もしくは音声信号ではないが、例えば、音声の代わりに入力される情報で、例えば、キーボード等を利用して入力される音声に変わる言語を意味する。
また、入力手段により取り込まれた音声もしくは音声信号に関して、例えば、スペクトル分析を行い、含まれている多数の周波数から、その基本周波数を特定することができる。
さらに、基本音変換手段は、基本周波数を上昇または下降させることにより、マザリーズの特性に適合した周波数の基本音を生成することができる。
そして、周波数が次第に高まるようにすると共に、高い周波数になるほど音圧が弱くなるように調整する共鳴調整手段により、マザリーズの特性に適合する倍音構造を実現することができる。
【０００８】
【０００９】
【００１０】
【００１１】
【００１２】
好ましくは、前記基本音変換手段が、入力された音声または音声信号の有する基本周波数をほぼ３００ヘルツないし５００ヘルツの基本周波数に上昇または下降シフトさせる構成としたことを特徴とする。
【００１３】
上記構成によれば、基本音変換手段は、基本周波数をほぼ３００ヘルツないし５００ヘルツの範囲で上昇または下降させることにより、マザリーズの特性により適合した周波数の基本音を生成することができる。
基本周波数が３００ヘルツ未満である場合には、乳児が関心を示しづらいことが確認されており、極端に低いと不快な刺激となる。基本周波数が５００ヘルツより高い場合には、必要以上に乳児に興奮を与えてしまい、極端な場合は、泣きだすこともあることが確認されている。
【００１４】
好ましくは、前記変化音生成手段が、入力された音声または音声信号に関して所定の範囲で上昇及び／または下降するイントネーション変化を付与することを特徴とする。
【００１５】
上記構成によれば、前記変化音生成手段は、所定範囲で上昇または下降、あるいは上昇及び下降するイントネーション変化を付与することで、適切なマザリーズ類似音を生成することができる。
【００１６】
好ましくは、前記イントネーション変化がほぼ１オクターブの範囲に設定されていることを特徴とする。
【００１７】
上記構成によれば、前記変化音生成手段によるイントネーション変化がほぼ１オクターブの範囲を越えると乳児にとって強すぎる刺激となって、このような刺激に対して泣きだしてしまうことがある。
【００１８】
また、本発明は、外部から音声または音声信号あるいは音声信号に対応した情報を入力する段階と、入力された音声または音声信号から処理単位となるサンプル音を特定する段階と、前記サンプル音から基本周波数を特定する段階と、前記基本周波数を含む音に関して、所定の範囲で上昇及び／または下降するイントネーション変化を付与して変化音を生成する段階と、前記変化音に関して、基本周波数を上昇または下降させるようにシフトさせる段階と、次いで、シフト後の基本周波数に所定の倍音構造を調整する段階とを備え、前記倍音構造を調整する段階では周波数が次第に高まるようにすると共に、高い周波数になるほど音圧が弱くなるように調整するマザリーズ類似音変換方法であることを特徴とする。
【００１９】
上記構成によれば、外部から音声または音声信号あるいは音声信号に対応した情報を入力する段階は、変換元の音声もしくは音声信号を取得するのに必要な段階である。また、サンプル音を特定は、変換を行う処理単位の特定等に必要とされる段階である。このサンプル音に対して、基本周波数を特定する。この特定された基本周波数に対しては、マザリーズの特性に適合するようなイントネーション変化を付与する。次いで基本周波数をマザリーズの特性に適合するように周波数が次第に高まるようにすると共に、高い周波数になるほど音圧が弱くなるように調整する。
【００２０】
好ましくは、前記サンプル音の特定において、言語認識に基づく単語単位でサンプル音を特定することを特徴とする。
【００２１】
上記構成によれば、入力された音声または音声信号あるいは音声信号に対応した情報に関して、例えば日本語認識等の言語認識を利用して単語抽出を行うと、比較的簡便に処理単位を定めることができる。
【００２２】
好ましくは、前記変化音を生成する段階で、イントネーション変化がほぼ１オクターブの範囲に設定されることを特徴とする。
【００２３】
【００２４】
【００２５】
また、本発明は、外部から音声または音声信号あるいは音声信号に対応した情報を入力する段階と、入力された音声または音声信号あるいは音声信号に対応した情報から処理単位となるサンプル音を特定する段階と、前記サンプル音から基本周波数を特定する段階と、前記基本周波数を含む音に関して、所定の範囲で上昇及び／または下降するイントネーション変化を付与して変化音を生成する段階と、前記変化音に関して、基本周波数を上昇または下降させるようにシフトさせる段階と、次いで、シフト後の基本周波数に所定の倍音構造を調整する段階とを備え、前記倍音構造を調整する段階では周波数が次第に高まるようにすると共に、高い周波数になるほど音圧が弱くなるように調整するプログラムを格納したコンピュータ読み取り可能な記録媒体であることを特徴とする。
【００２６】
【発明の実施の形態】
以下、この発明の好適な実施形態を添付図面を参照しながら、詳細に説明する。
尚、以下に述べる実施形態は、本発明の好適な具体例であるから、技術的に好ましい種々の限定が付されているが、本発明の範囲は、以下の説明において特に本発明を限定する旨の記載がない限り、これらの態様に限られるものではない。
【００２７】
図１は、本発明の実施形態に係るマザリーズ類似音変換装置の外観の一例を示す斜視図であり、図２はその電気的構成を示すブロック構成図である。
図１において、マザリーズ類似音変換装置１０は、本体１２と、この本体１２に接続される入力手段としての例えばマイクロフォン１１と、本体１２に接続される出力手段としての例えばスピーカ２３を備えている。
【００２８】
このマザリーズ類似音変換装置１０の構成を図２を参照して詳しく説明する。
マザリーズ類似音変換装置１０は、音声または音声信号の入力手段１１と、入力された音声または音声信号を所定の基本周波数とその倍音構造を備えた基本音に変換する基本音変換手段３０と、所定の倍音構造を備えた基本音について一定の範囲のイントネーション変化を与えて変化音とする変化音生成手段４０とを備えている。
【００２９】
本実施形態では、例えば、上記基本音変換手段３０は、図２に示すように、音声認識手段１４、メモリ１５、声紋分析手段１６、ピッチシフト部１８、倍音フィルタを備えている。上記変化音生成手段４０は、上記音声認識手段１４、メモリ１５、イントネーションシフト部１７を備えている。
【００３０】
具体的に説明すると、図において、入力手段１１は、変換元になる音声または音声信号あるいは音声信号に対応した情報を入力するための音声または音声信号の入力手段１１を備えており本体側には、この入力手段１１が接続される処理手段が設けられている。このような本体１２の構成は、例えば、後述する各機能を備えたシンセサイザーとしての電気，電子回路を形成してもよいし、例えば、中央処理装置（ＣＰＵ）を中心として、必要な記憶手段を備えたコンピュータにより構成し、後述する処理手順を備えるソフトウエアプログラムが格納され、実行されることで、ＣＰＵ及び記憶手段により、図２の各処理機能が実現されるように構成してもよい。この場合各機能は全てがソフトウエアにより実現されてもよいし、一部を専用の回路で実現して、他の構成をソフトウエアで実現してもよい。あるいは、パーソナルコンピュータ等の情報処理端末を用いて、後述する各段階の処理手順を有するソフトウエアが格納されることで、この情報処理端末上でマザリーズ類似音変換装置１０が実現されてもよい。この場合、このようなソフトウエアは、フロッピィーディスクやハードディスク，光磁気ディスク等のコンピュータにより読み取り可能な記録媒体に格納して供給することができる。
【００３１】
本体１２には、制御部１３が備えられている。制御部１３は、例えば、ＣＰＵの一部の機能や制御基板により実現され、本体内の各部に接続されることで、各部の動作を制御するようになっている。
本体１２に接続される入力手段１１は種々の態様が考えられる。例えば、マザリーズ類似音変換装置１０が、図１のような形態で可搬性のある装置とした場合や、ハンディーな拡声器等に組み込む場合には、入力手段１１は集音マイクが利用される。また、マザリーズ類似音変換装置１０が、パーソナルコンピュータに格納されるソフトウエアで実現される場合には、コンピュータ本体の入力端子であったり、インターネット等の通信リンクに接続した場合には、このような接続に使用される端子等が利用される。また、マザリーズ類似音変換装置１０が、ビデオデッキ等の他の装置の機能に付属して装置内に格納される態様で実現される場合には、当該装置内における例えばビデオ信号の入力端子もしくはビデオ信号から音声信号を分離した信号の入力端子等が利用される。
【００３２】
入力手段１１には直接もしくは制御手段１３を介して音声認識手段１４が接続されている。音声認識手段１４は、入力された音声または音声信号の処理単位の特定手段である。このため音声認識手段１４は、単純に音声または音声信号を時定数を利用して切り出したり、または、好ましくは、日本語認識等の言語認識機能を利用できるようにＡ／Ｄ（アナログ−デジタル）変換手段を備えていて、入力された音声信号から、言語を認識し、単語単位，文節単位，品詞分解による予め定めた単位決定ルール等にしたがって、変換決定を特定するようになっている。尚、入力手段１１からデジタル信号で音声信号が送られる場合には、Ａ／Ｄ変換手段は使用しない。
【００３３】
メモリ１５は、例えば、装置内蔵の記憶装置やコンピュータのＲＡＭ及び／または内蔵ハードディスクや外部記憶手段の特定の領域等が利用される。メモリ１５は、制御手段１３を介して，あるいは直接音声認識手段１４に接続されており、直接音声認識手段１４により決定した処理単位毎に分離した音声信号データを記憶するようになっており、かつ処理にあたって、制御部１３の指示により、音声信号データを声紋分析手段１６に送るようになっている。
声紋分析手段１６は、制御手段１３を介して，あるいはメモリ１５と接続され、さらに、イントネーションシフト部１７やピッチシフト部１８と接続されている。声紋分析手段１６は、後述するように、メモリ１５から読みだした、あるいは直接音声認識手段１４から送られてくる処理単位毎に音声信号データに関して、声紋分析によりその基本周波数を決定するようになっている。したがって、声紋分析手段１６は、例えば、イコライザや周波数のスペクトル分析機能を果たすソフトウエアを利用することができる。
【００３４】
イントネーションシフト部１７は、制御手段１３を介して、あるいは声紋分析手段１６と直接接続され、さらに、ピッチシフト部１８やスピード調整手段１９と接続されている。イントネーションシフト部１７は、本発明の変化音生成手段４０の中心であり、後述するように、例えば処理単位とされた音声信号に関して、基本周波数が決定された後、その音声信号に関して、後述するマザリーズの特性に適合するように、所定の範囲，例えば、１オクターブの範囲で上昇及び／または下降するイントネーション変化を付与した変化音を生成するようになっている。すなわち、処理単位となる音声に関して、特定部分を１オクターブの範囲で上昇及び／または下降するピッチシフトさせるようになっている。
【００３５】
ピッチシフト部１８は、制御手段１３を介して、あるいはイントネーションシフト部１７と直接接続され、さらに、スピード調整手段１９と接続されている。ピッチシフト部１８は、例えば、時系列に沿って次第に周波数が変化するように加工する機能を備える手段が使用され、これと近似した機能をもつソフトウエアを用いて構成してもよい。このピッチシフト部１８は、基本音変換手段３０の中心機能を担っており、イントネーション変化が付与された変化音について、後述するマザリーズの特性に適合するように、基本周波数が、例えば、３００ヘルツないし５００ヘルツ程度となる比較的高い周波数になるように、所定の周波数シフトを行うようになっている。
【００３６】
具体的には、ピッチシフト部１８は、処理単位となる音声について、例えば、バンドパスフィルタを用いて選択した所定の周波数成分を増幅手段を用いてその音圧を強めることで、周波数シフトすることができるし、この音声を構成する全ての成分について一定の周波数シフトを行うように構成してもよい。
また、周波数シフトを行った後、後述するように、倍音フィルタ２４により倍音構造を付与し、特に、周波数が高くなるに従って、倍音の音圧が次第に低下するように、さらに好ましくは、曲線的に音圧が低下するような倍音構造を生成するようになっている。
【００３７】
スピード調整部１９は、ピッチシフト部１８またはイントネーションシフト部１７と接続されており、イントネーション変化を受けた音声信号に関して、音長の調節を行うようになっている。
【００３８】
スピード調整部１９からは、後述するようなマザリーズ類似音の音声信号が出力されるので、好ましくは、本体１２内には、このスピード調整部１９から出力されるマザリーズ類似音の音声信号が制御部１３の指示により格納されるメモリ２０が設けられている。このメモリ２０は、メモリ１５と同じ記憶手段が利用されてもよい。
【００３９】
図２のマザリーズ類似音変換装置１０では、出力手段２１は、スピーカ２３だけで形成されていて、本体１２側のＤ／Ａコンバータ２２に接続されるようになっている。このＤ／Ａコンバータ２２は本体１２内で、スピード調整部１９またはメモリ２０から出力されるマザリーズ類似音の音声信号をアナログ信号に変換してスピーカ２３に送るようになっている。スピーカ２３は、アナログ変換された音声信号に基づいて、マザリーズ類似音を出力するようになっている。尚、Ｄ／Ａコンバータ２２は、出力手段側に設けられていてもよい。
【００４０】
この実施形態のマザリーズ類似音変換装置１０は以上のように構成されており、この装置により、入力された音声または音声信号をマザリーズ類似音に変換する方法を説明する前に、先ず、マザリーズの特性について説明する。
図３は、乳児に母親等の育児者が話しかける際の音声等に対応したマザリーズを示すオシロ波形であり、具体的には、母親が乳児に対して、「おいしーねー」と語りかけた時のものである。図４は、その基本周波数特性についてプロットしたグラフであり、図５は、図４の音声に関して、その音圧、すなわち音の大きさをプロットしたグラフである。
【００４１】
特に、図４で示されているように、その基本周波数は、ほぼ２００ヘルツ〜５００ヘルツの範囲の属する音声であり、比較的高いピッチ（周波数）で示されている。（基本音）。
【００４２】
そして、図４及び図５にて示されているように、この周波数は０．４秒から１．０秒と比較的短い音長の中で、変化する変化音であり、ほぼ１オクターブ以内の範囲で変化している。
【００４３】
さらに、図６及び図７を参照すると、マザリーズの倍音構造が理解される。すなわち、この音声は、図６に示すように、図４に対応したＳ１の音声だけでなく、同様の変化を示す２倍音Ｓ２，３倍音Ｓ３・・・Ｓ５といった倍音構造となっている。ここで、この音声は倍音構造をもっているが、せいぜい、Ｓ４，Ｓ５程度までであり、響きすぎる音声ではない。尚、図７は、図６の周波数スケールを変えて示したグラフであり、基本音Ｓ１の様子と、その倍音の状態が理解されやすくするための図である。また、図８は、図５のデータを処理してなだらかな変化を示すようにしたグラフである。
【００４４】
また、図９ないし図１２は、人間の音声としてのマザリーズの倍音構造の特徴を説明するための図である。
これらの図において、横軸は右に向かって周波数が高くなっており、縦軸は音圧をｄＢで示している。
図９は、音声の音源波スペクトルを示しており、図示されているように、沢山の倍音を有し、倍音の音圧は、そのピークとピークを結ぶ線（Ｐ−Ｐ線）が曲線を描いて下降している。図９において、最も音圧が大きい音が、所定の高い周波数となるようにシフトさせた音が、本発明の基本音となる。
【００４５】
図１０は「ア」の音を発声した時，すなわち、声道をあまり狭めないようにした状態における人の声道の伝達関数に基づく伝達特性を模式的に示したものであり、開放空間の伝達関数を示す図１１と比較すると、特徴的なのは、一定の間隔を置いた倍音の音圧が強められていることである。この結果、図９の基本音の音声のスペクトルは、図１０の伝達特性によって、図１２に示すようになる。すなわち、その倍音構造は、周波数が高まるにつれて、各倍音の音圧のピークを結んだＰ−Ｐ線は、図９と同様に曲線的な下降を示す合成音となる。このような音を基準音と呼ぶこととし、このような特性は、後述するように付与される。
【００４６】
以上を前提として、本実施形態ではマザリーズ類似音を生成する場合、次のような特徴を備えている音としたものである。
（１）マザリーズは、高いピッチ、すなわち高い基本周波数（基本周波数＝Ｆ０）を備えた音声で、キーが高い音声であり、その周波数は、だいたい３００Ｈｚ以上で５００Ｈｚ以下であり、平均３５０Ｈｚ程度である。
（２）マザリーズは、倍音構造を有しており、したがって、響きのある音声である。このため、本実施形態では、マザリーズ類似音を生成するため、上述の（１）の条件とこの倍音の条件に適合するように、基本音変換手段３０の機能に基づいて、後述するように、基本音を生成する。ただし、この倍音は、あまり高音域が強調されると、響き過ぎる音となって、「やさしい声」でなくなるおそれがある。このため、特に、この基本音を変化させて、高倍音域となるほど、ほぼ曲線的に音圧が弱まるような、図１２に示す特性を有する基準音とするようにしている。
（３）また、マザリーズは、音声の高低差が大きく、基本周波数が変化する音声であり、抑揚（イントネーション）に富んだ音である。本実施形態では、これを「変化音」と呼び、変化音生成手段４０の機能により、後述するように、実現している。ただし、その変化は、１オクターブ以内であることが好ましく、これを越えると、「やさしい声」でなくなるおそれがある。
【００４７】
図１３は、図２の構成に基づいて、マザリーズ類似音変換装置１０により、入力された音声または音声信号を上述のマザリーズの特性を備えたマザリーズ類似音に変換する方法の一例を示すフローチャートである。
図１３を参照して、マザリーズ類似音の変換方法の一例を説明する。
【００４８】
図２の入力手段１１に音声または音声信号が入力される（ＳＴ１）。
例えば、音声が直接入力されるタイプの装置であれば、図１のようなマイク１１やハンディー拡声器等から入力される。ビデオデッキ内蔵の変換装置またはビデオデッキに接続される変換装置である場合には、ビデオ信号から分離された音声信号が入力される。また変換装置１０をネットワーク等に接続した場合には、ネットワークを介して配信される各種コンテンツ等の音声信号が入力される。また、記録したテープレコーダやコンパクトディスク，光磁気ディスク等から再生された再生信号に対応した音声信号が入力されるようにしてもよい。さらに、マザリーズ類似音変換装置がテレビ等に内蔵される場合には、放送局から送られる放送波から分離された音声信号が入力される。
【００４９】
入力手段からの音声信号がアナログ信号である場合には、図２の音声認識手段１４でデジタル信号に変換された後、例えば、日本語認識プログラムにより、単語や文節等といった予め定められた処理単位を識別して、この処理単位毎に、制御手段１３の指示によりメモリ１５に格納される。処理単位に分けられた音声信号（以下、「サンプル信号」と言う。）。図１４は、このような音声信号を入力した状態をオシログラフの波形にして示しており、横軸が時間で、縦軸が振幅である。音声信号ＶＳは、ステップ２において、ｔ１からｔ２の部分が切り出されてサンプル信号（サンプル音）とされる（ＳＴ２）。
【００５０】
次いで、声紋分析手段１６は、サンプル音のピッチ分析、すなわち周波数分析を行って、基本周波数Ｆ０を特定する（ＳＴ３）。基本音は、図１５に示すように、特定された基本周波数Ｆ０と、その倍音構造ＶＳ２，ＶＳ３・・・を持つ音声である。
この実施形態では、図示の都合上、基本周波数Ｆ０に対応した音声ＶＳ１だけを取り出して説明するが、以下で特別に説明する場合を除き、その各倍音構造に関しても、同時に以下の処理が行われる。
【００５１】
次に、変化音を生成する作業を行う。ここでは、図２のイントネーションシフト部１７が、例えばシンセサイザー機能を用いて、音声ＶＳ１の基本周波数の変化に強調を与えるため、所定の範囲，例えば、１オクターブの範囲で、上昇及び／または下降するイントネーションシフトを付与する。例えば、図１６では、０．８オクターブ程度に変化率を上昇させており、ＣＶＳ１としている。
【００５２】
続いて、図１７に示す作業を行う。図１７ではイントネーション変化を加えた音声ＣＶＳ１に関して、図２のピッチシフト部１８がピッチ分析（周波数分析）を行い、平均周波数ＡＦ０を求める（ＳＴ５）。
この平均周波数ＡＦ０に基づいて、ピッチシフト部１８は、さらに、図１８に示すＡＦ’０となるようにピッチシフト（周波数シフト）を行う（ＳＴ６）。
このピッチシフトは、マザリーズに近似した基本周波数をもつようにするためのものであり、その変更幅を、好ましくは、３００Ｈｚから５００Ｈｚの範囲に設定する。この実施形態では、３５０Ｈｚに設定する。このようなピッチシフトは、所定の周波数成分の音圧を強めることで、周波数シフトすることができるし、この音声を構成する全ての成分について一定の周波数シフトを行うようにしてもよい。本実施形態では、ＣＶＳ１の全体を上昇させて（図１８参照）、音声ＳＣＶＳ１としている。
【００５３】
ピッチシフト後，すなわち周波数変更後の音声ＳＣＶＳ１について、次に倍音部の共鳴調整を行う（ＳＴ７）。ここで、基本周波数に対する倍音部ＶＳ２，ＶＳ３・・・については、イントネーション調整（ＳＴ４）、及びピッチシフト（ＳＴ６）に関して、基本周波数Ｆ０に対応した音声ＶＳ１の変換過程において同時に行っている。
ピッチシフト部１８は、好ましくは、共鳴調整機能を備える倍音フィルタ２４と接続されている。これにより、例えば、音声ＳＣＶＳ１に対応したイントネーション変化ピッチシフトされた各倍音に対して、例えば、図１０のような伝達関数等を用いて図１２に示すような倍音構造とする。つまり、周波数が次第に高まる各倍音の音圧のピークを結んだＰ−Ｐ線が、曲線的な下降を示す合成音となった基準音とする。
この場合、伝達関数を直接用いず、高い周波数の倍音ほど音圧が弱くなるようなフィルタを用いて調整してもよい。
【００５４】
最後に、イントネーションシフトした結果、音長もしくはスピード変化を生じた倍音構造を付与された音声ＳＣＶＳ１について、スピード調整部１９によりスピードを調整し、元の音長に戻す（ＳＴ８）。
この場合、マザリーズにおいては、ゆっくり話す傾向があることに対応して、このスピード調整時にこれを加味して、入力された音声よりもゆっくりしたスピードに調整するようにしてもよい。
これにより、マザリーズ類似音に良く適合した音声信号が生成されたので、必要に応じて、メモリ２０に格納し、あるいは、そのままＤ／Ａコンバータ２２によりアナログ信号に変換して、スピーカ２３からマザリーズ類似音が出力される。
【００５５】
以上説明したように、本実施形態のマザリーズ類似音変換装置１０は、入力される他の音声または音声信号を、特に乳児の関心をひくことができるマザリーズに類似した音に変換して出力することができる。このため、乳児あるいは幼児を対象としたきわめて広い範囲の製品に、例えば以下のように応用することができる。
例えば、音声が直接入力されるハンディー拡声器等からマザリーズ類似音で乳幼児に語りかけることができる。
ビデオデッキ内蔵の変換装置とした場合には、記録されたビデオ信号の音声信号をマザリーズ類似音として再生することができる。
また、マザリーズ類似音変換装置１０をネットワーク等に接続した場合には、ネットワークを介して配信される各種コンテンツ等の音声信号がマザリーズ類似音に変換されるので、マザリーズ類似音として再生することができる。
さらに、記録したテープレコーダやコンパクトディスク，光磁気ディスク等から再生された再生信号がマザリーズ類似音とされて再生される。
さらにまた、マザリーズ類似音変換装置１０がテレビ等に内蔵される場合には、放送局から送られる放送波から分離された音声信号をマザリーズ類似音として再生することができる。
さらに、マザリーズを上手に話すことができない人が、録音された自分の声を入力手段に入力することによって、マザリーズ類似音に変換された自分の声を聴くことで、マザリーズによる話し方や発声の練習をすることができる。
また、入力手段として、パソコンやワープロのキーボードその他の言語入力手段を使用することにより、言葉を話すことができない人や障害のある人でも、マザリーズによる語りかけをすることができる。
さらに、電話やインターネット等の通信回線を利用することにより、遠隔地に居る場合でも、離れた場所からマザリーズによる語りかけが可能となる。
【００５６】
この発明は、上述の実施形態に限定されない。例えば、マザリーズ類似音との近似性が劣っても、音声品質が問われない場合には、入力された音声をピッチシフト部に入力して、即時にマザリーズに適合するように周波数変更して、そのまま、あるいは、所定のイントネーション変化を加えて出力する簡便な装置構成もしくは変換手法としてもよい。
また、言語認識手段により、例えば、日本語認識した場合には、単語や文節等の所定の単位毎に、マザリーズ類似音を、例えばテーブルデータ等としてメモリに予め保持しておき、このようなデータを用いて変換する構成としてもよい。
したがって、この言語認識手段により、例えば、日本語認識した場合には、各言葉におけるイントネーションについて、例えば、テーブルデータ等として上記メモリに予め保持しておき、このデータを参照してイントネーション調整を行ってもよい。
さらに、上述の実施形態に各構成は、マザリーズ類似音の品質等との関係で許容される場合等には、その一部を省略してもよいし、記載されない他の構成と組み合わせて実現してもよい。
【００５７】
【発明の効果】
以上述べたように、本発明によれば、特に乳児の関心をひくことができるマザリーズに類似した音を、装置に入力される他の音声または音声信号を変換して得ることができるマザリーズ類似音変換装置と、マザリーズ類似音変換方法を提供することができる。
【図面の簡単な説明】
【図１】マザリーズ類似音変換装置の実施形態の外観の一例を示す概略斜視図である。
【図２】図１のマザリーズ類似音変換装置の構成を示すブロック図である。
【図３】マザリーズの声紋をしめすオシロ波形図である。
【図４】図３の声紋の基本周波数を示すグラフである。
【図５】図３の音声の音圧を示すグラフである。
【図６】図３の音声の倍音を示すグラフである。
【図７】図３の音声の倍音を周波数のスケールを拡大して示すグラフである。
【図８】図７の音声の音圧を示すグラフである。
【図９】人の声に関連して基本音の倍音構造を音源波スペクトルで示す図である。
【図１０】人の製造の伝達関数を示す図である。
【図１１】開放された空間の伝達関数を示す図である。
【図１２】マザリーズ類似音の音声スペクトルを説明するための図である。
【図１３】図２のマザリーズ類似音変換装置による変換方法の一例を示すフローチャートである。
【図１４】入力された音声のオシロ波形図である。
【図１５】入力された音声から処理単位を切り取り、その基本周波数を示すグラフである。
【図１６】処理単位の音声について、イントネーション変化付与する様子を示すグラフである。
【図１７】処理単位の音声について、平均周波数を求める様子を示すグラフである。
【図１８】処理単位の音声について、周波数シフトを行う様子を示すグラフである。
【符号の説明】
１０・・・マザリーズ類似音変換装置、１１・・・入力手段、１２・・・本体、１３・・・制御手段、２１・・・出力手段、３０・・・基本音変換手段、４０・・・変化音生成手段[0001]
BACKGROUND OF THE INVENTION
  The present invention is, for example, similar to a mazary for obtaining sounds similar to a mazarise as a voice that particularly interests a 0-year-old child, particularly an infant up to about 10 months of age, by converting it from an input audio or audio signal. The present invention relates to a sound conversion method and a conversion device.
[0002]
[Prior art]
  Conventionally, toys with sound stimulation for babies have been widely used, such as toys whistle and drums.
  By the way, in recent years, the tendency of sensibility has been studied through sounds that infants pay attention to or listen to.
[0003]
  In other words, infants up to about 10 months of age are becoming particularly interested in such infants with regard to continuously changing sound stimulation as a characteristic of talking to mothers specific to this period. The characteristic of sound stimulation is the voice of the childcare personSpecialIt is called “Mazaly's”.
[0004]
[Problems to be solved by the invention]
  However, with respect to conventional toys that produce sounds such as whistle and drums, no particular consideration has been given to the point of interest, especially corresponding to the developmental stage of infants. For this reason, toys and the like that have been conceived so far can generate sounds similar to the Mazaries as described above to give appropriate auditory stimuli to infants and perform external actions in accordance with their interest trends. It was not even done.
  Further, in relation to the development process of the language ability of the infant and the voice stimulation, a means for generating a voice and selecting the appropriate voice to stimulate the development is not devised.
[0005]
  The present invention has been made to solve such a problem, and obtains a sound similar to Matheries, which can be particularly interesting to infants, by converting other sounds or sound signals input to the apparatus. It is an object of the present invention to provide a Matheries-like sound conversion device and a Matheries-like sound conversion method.
[0006]
[Means for Solving the Problems]
  The present inventionVoice or voice signal or information input means corresponding to the voice signal, basic sound converting means for converting the inputted voice or voice signal into a basic sound having a predetermined fundamental frequency and its harmonic structure, and a predetermined harmonic structure A change sound generating means for providing a change sound by giving a certain range of intonation change for the basic sound comprising:The fundamental sound converting means specifies the fundamental frequency of the input voice or audio signal, and the fundamental sound converting means shifts the fundamental frequency so as to increase or decrease, thereby converting the converted fundamental frequency. Means for generating a fundamental sound having a harmonic structure based on the fundamental frequency, and a resonance adjusting means for adjusting the sound pressure to become weaker as the frequency becomes higher while the frequency having the harmonic structure gradually increases.And converting the input voice or voice signal to generate a Mazaise-like sound.
[0007]
  the aboveAccording to the configuration, the input means is for inputting a voice or a voice signal or information corresponding to the voice signal to be converted into a Mazaise-like sound. The basic sound conversion means functions to match the basic frequency of the Mazaise-like sound by converting or specifying the basic frequency of the input voice or voice signal or information corresponding to the voice signal. Further, the change sound generation means gives an intonation change so as to conform to the characteristics of the mazalies described later.
  Here, the “fundamental sound” of the present invention is a sound having a fundamental frequency suitable for a Mazaise-like sound and having a harmonic structure of the fundamental frequency. Further, the “change sound” is a sound having a certain range of intonation changes corresponding to the mazalies.
  “Information corresponding to a voice signal” means a language that is not a voice or a voice signal but is input instead of voice, for example, and changes to voice input using a keyboard or the like.
  Further, for example, spectrum analysis can be performed on the voice or voice signal taken in by the input means, and the fundamental frequency can be specified from a number of contained frequencies.
  Furthermore, the basic sound conversion means can generate a basic sound having a frequency suitable for the characteristics of the mazalies by increasing or decreasing the basic frequency.
  Then, a resonance adjustment unit that adjusts the frequency to gradually increase and the sound pressure to become weaker as the frequency becomes higher can realize a harmonic structure that conforms to the characteristics of the Matheries.
[0008]
[0009]
[0010]
[0011]
[0012]
  Preferably,The fundamental sound converting means is configured to shift the fundamental frequency of the input voice or speech signal up or down to a fundamental frequency of approximately 300 to 500 Hz.
[0013]
  the aboveAccording to the configuration, the basic sound converting means can generate a basic sound having a frequency that is more suitable for the characteristics of Mazalies by raising or lowering the fundamental frequency in a range of approximately 300 to 500 hertz.
  If the fundamental frequency is less than 300 Hz, the infant is interestedZIf it is extremely low, it becomes an unpleasant stimulus. It has been confirmed that when the fundamental frequency is higher than 500 Hz, the infant is excited more than necessary, and in extreme cases, it may start crying.
[0014]
  Preferably,The change sound generation means applies an intonation change that rises and / or falls within a predetermined range with respect to the input sound or sound signal.
[0015]
  the aboveAccording to the configuration, the change sound generation means can generate an appropriate mazaise similar sound by giving an intonation change that rises or falls within a predetermined range, or rises and falls.
[0016]
  Preferably,The intonation change is set in a range of approximately one octave.
[0017]
  the aboveAccording to the configuration, when the intonation change by the change sound generating means exceeds approximately one octave, it becomes a stimulus that is too strong for an infant and may cry against such a stimulus.
[0018]
  In addition, the present invention includes a step of inputting sound or a sound signal or information corresponding to the sound signal from the outside, a step of identifying a sample sound as a processing unit from the input sound or sound signal, and a basis from the sample sound Identifying a frequency, generating a change sound by applying an intonation change that rises and / or falls within a predetermined range with respect to a sound including the fundamental frequency, and raising or lowering the fundamental frequency with respect to the change sound Shifting, and then adjusting a predetermined harmonic structure to the shifted fundamental frequency,At the stage of adjusting the harmonic structure, the frequency is gradually increased, and the sound pressure is decreased as the frequency becomes higher.Mazaries similar sound conversion methodIt is characterized by being.
[0019]
  the aboveAccording to the configuration, the step of inputting sound or a sound signal or information corresponding to the sound signal from the outside is a step necessary for acquiring the sound or sound signal of the conversion source. The specification of the sample sound is a stage required for specifying the processing unit for conversion. The fundamental frequency is specified for this sample sound. The specified fundamental frequency is given an intonation change that matches the characteristics of the mazalies. Next, make the fundamental frequency conform to the characteristics of Matheries.Adjust the frequency so that the sound pressure becomes weaker as the frequency increases..
[0020]
  Preferably,In specifying the sample sound, the sample sound is specified in units of words based on language recognition.
[0021]
  the aboveAccording to the configuration, when word extraction is performed using language recognition such as Japanese language recognition for input speech or speech signal or information corresponding to the speech signal, a processing unit can be determined relatively easily. .
[0022]
  Preferably, in the step of generating the change sound, the intonation change is set in a range of approximately one octave.
[0023]
[0024]
[0025]
  The present invention also includes a step of inputting sound or a sound signal or information corresponding to the sound signal from the outside, and a step of specifying a sample sound as a processing unit from the input sound or sound signal or information corresponding to the sound signal. A step of specifying a fundamental frequency from the sample sound, a step of generating a change sound by applying an intonation change that rises and / or falls within a predetermined range with respect to the sound including the fundamental frequency, and the change sound Shifting the fundamental frequency to increase or decrease, and then adjusting a predetermined harmonic structure to the shifted fundamental frequency;In the stage of adjusting the harmonic structure, the frequency is gradually increased and the sound pressure is decreased as the frequency becomes higher.It is a computer-readable recording medium storing a program.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
  Preferred embodiments of the present invention will be described below in detail with reference to the accompanying drawings.
  The embodiments described below are preferable specific examples of the present invention, and thus various technically preferable limitations are given. However, the scope of the present invention particularly limits the present invention in the following description. As long as there is no description of the effect, it is not restricted to these aspects.
[0027]
  FIG. 1 is a perspective view illustrating an example of the appearance of a Mazaise-like sound converter according to an embodiment of the present invention, and FIG. 2 is a block configuration diagram illustrating an electrical configuration thereof.
  In FIG. 1, the Mothers-like sound conversion device 10 includes a main body 12, for example, a microphone 11 as input means connected to the main body 12, and a speaker 23 as output means connected to the main body 12.
[0028]
  The configuration of the Mazaise-like sound converter 10 will be described in detail with reference to FIG.
  The mazaise-similar sound conversion apparatus 10 includes a sound or sound signal input means 11, a basic sound conversion means 30 for converting the input sound or sound signal into a basic sound having a predetermined fundamental frequency and its harmonic structure, and a predetermined sound. And a change sound generation means 40 for changing the intonation of a certain range to generate a change sound for the basic sound having the overtone structure.
[0029]
  In the present embodiment, for example, as shown in FIG. 2, the basic sound conversion means 30 includes a speech recognition means 14, a memory 15, a voiceprint analysis means 16, a pitch shift unit 18, and a harmonic filter. The change sound generation means 40 includes the voice recognition means 14, a memory 15, and an intonation shift unit 17.
[0030]
  More specifically, in the figure, the input means 11 is provided with a voice or voice signal input means 11 for inputting a voice or voice signal to be converted or information corresponding to the voice signal. A processing means to which the input means 11 is connected is provided. Such a configuration of the main body 12 may form, for example, an electric or electronic circuit as a synthesizer having each function to be described later. For example, a necessary storage means is mainly provided around a central processing unit (CPU). 2 may be realized by the CPU and the storage means by storing and executing a software program having a processing procedure to be described later. In this case, all the functions may be realized by software, a part may be realized by a dedicated circuit, and the other configuration may be realized by software. Alternatively, by using an information processing terminal such as a personal computer to store software having a processing procedure at each stage described later, the Matheries similar sound conversion apparatus 10 may be realized on the information processing terminal. In this case, such software can be supplied by being stored in a computer-readable recording medium such as a floppy disk, hard disk, or magneto-optical disk.
[0031]
  The main body 12 is provided with a control unit 13. The control unit 13 is realized by, for example, a partial function of the CPU or a control board, and is connected to each unit in the main body to control the operation of each unit.
  Various modes can be considered for the input means 11 connected to the main body 12. For example, when the Mazaise-like sound conversion device 10 is a portable device in the form as shown in FIG. 1 or incorporated in a handy loudspeaker or the like, a sound collecting microphone is used as the input means 11. Further, when the Mazaise-like sound conversion device 10 is realized by software stored in a personal computer, such a case is obtained when it is an input terminal of a computer main body or connected to a communication link such as the Internet. Terminals used for connection are used. Further, in the case where the Mazaries similar sound conversion device 10 is realized in a mode of being stored in the device attached to the function of another device such as a video deck, for example, a video signal input terminal or a video in the device. A signal input terminal that separates the audio signal from the signal is used.
[0032]
  The voice recognition means 14 is connected to the input means 11 directly or via the control means 13. The voice recognition means 14 is a means for specifying the processing unit of the input voice or voice signal. For this reason, the voice recognition means 14 simply cuts out a voice or a voice signal by using a time constant, or preferably, an A / D (analog-digital) so that a language recognition function such as Japanese language recognition can be used. A conversion means is provided for recognizing the language from the input speech signal and specifying the conversion decision in accordance with word unit, phrase unit, predetermined unit decision rule by part-of-speech decomposition, or the like. From the input means 11DesiWhen an audio signal is sent as a tall signal, the A / D conversion means is not used.
[0033]
  As the memory 15, for example, a storage device built in the device, a RAM of a computer and / or a built-in hard disk, a specific area of an external storage unit, or the like is used. The memory 15 is connected to the voice recognition means 14 via the control means 13 or directly, and stores voice signal data separated for each processing unit determined by the direct voice recognition means 14. In processing, the voice signal data is sent to the voiceprint analysis means 16 in accordance with an instruction from the control unit 13.
  The voiceprint analysis unit 16 is connected to the memory 15 via the control unit 13 or further to the intonation shift unit 17 and the pitch shift unit 18. As will be described later, the voiceprint analysis means 16 determines the fundamental frequency of the voice signal data for each processing unit read from the memory 15 or sent directly from the voice recognition means 14 by voiceprint analysis. ing. Therefore, the voiceprint analysis means 16 can use, for example, an equalizer or software that performs a frequency spectrum analysis function.
[0034]
  The intonation shift unit 17 is directly connected to the voice print analysis unit 16 via the control unit 13, and further connected to the pitch shift unit 18 and the speed adjustment unit 19. The intonation shift unit 17 is the center of the change sound generation means 40 of the present invention. As will be described later, for example, a fundamental frequency is determined for an audio signal as a processing unit, and then the later-described mazalies for the audio signal is determined. In order to conform to the above characteristics, a change sound to which an intonation change that rises and / or falls within a predetermined range, for example, a range of one octave is generated. In other words, with respect to the voice that is the processing unit, the specific portion is pitch-shifted up and / or down within a range of one octave.
[0035]
  The pitch shift unit 18 is connected to the control unit 13 or directly to the intonation shift unit 17 and further to the speed adjustment unit 19. For example, the pitch shifter 18 may be configured using software having a function of processing so that the frequency gradually changes along a time series, and having a function approximate to this. The pitch shifter 18 is responsible for the central function of the basic sound converting means 30, and the fundamental frequency of the change sound to which the intonation change is applied is set to, for example, 300 hertz or so so as to conform to the characteristics of later-described Mothers. A predetermined frequency shift is performed so as to obtain a relatively high frequency of about 500 hertz.
[0036]
  Specifically, the pitch shift unit 18 shifts the frequency of the sound as a processing unit by, for example, increasing the sound pressure of a predetermined frequency component selected using a bandpass filter using an amplifying unit. It is also possible to perform a certain frequency shift for all the components constituting this sound.
  Further, after the frequency shift, as described later, a harmonic structure is added by the harmonic filter 24, and it is more preferable that the harmonic pressure gradually decreases as the frequency increases. A harmonic structure that lowers the sound pressure is generated.
[0037]
  The speed adjustment unit 19 is connected to the pitch shift unit 18 or the intonation shift unit 17 and adjusts the sound length of the audio signal subjected to the intonation change.
[0038]
  Since the speed adjusting unit 19 outputs a sound signal of a mazaise-like sound as described later, preferably, a sound signal of a mazaise-like sound output from the speed adjusting unit 19 is provided in the main body 12. A memory 20 is provided for storing according to 13 instructions. The memory 20 may use the same storage means as the memory 15.
[0039]
  2, the output means 21 is formed only by the speaker 23 and is connected to the D / A converter 22 on the main body 12 side. The D / A converter 22 converts the audio signal of the Mazaise-like sound output from the speed adjusting unit 19 or the memory 20 into an analog signal in the main body 12 and sends it to the speaker 23. The speaker 23 is configured to output a Mazaise-like sound based on the analog-converted audio signal. The D / A converter 22 may be provided on the output means side.
[0040]
  The Mazaise-like sound conversion device 10 of this embodiment is configured as described above. Before describing a method for converting an input voice or audio signal into a Mazaise-like sound using this device, first, the characteristics of the Mazalies will be described. Will be described.
  FIG. 3 is an oscilloscope waveform showing the mazaries corresponding to the voices etc. when a mother or other caregiver speaks to the infant. Specifically, when the mother talks to the infant as “Oishii Ne”. Is. FIG. 4 is a graph in which the fundamental frequency characteristics are plotted, and FIG. 5 is a graph in which the sound pressure, that is, the loudness of the sound in FIG. 4 is plotted.
[0041]
  In particular, as shown in FIG. 4, the fundamental frequency is a voice belonging to a range of approximately 200 Hz to 500 Hz, and is shown at a relatively high pitch (frequency). (Basic sound).
[0042]
  As shown in FIGS. 4 and 5, this frequency is a changing sound that changes within a relatively short sound length of 0.4 to 1.0 seconds, and is within approximately one octave. The range has changed.
[0043]
  Further, referring to FIGS. 6 and 7, the overtone structure of Mazalies can be understood. That is, as shown in FIG. 6, this sound has not only the sound of S1 corresponding to FIG. 4 but also a harmonic structure such as second overtone S2, third overtone S3,. Here, although this sound has a harmonic structure, it is at most up to about S4 and S5, and is not a sound that resonates excessively. FIG. 7 is a graph showing the frequency scale of FIG. 6 changed, and is a diagram for facilitating understanding of the state of the basic sound S1 and the state of its harmonics. Further, FIG. 8 is a graph showing a gentle change by processing the data of FIG.
[0044]
  9 to 12 are diagrams for explaining the characteristics of the Mazarez harmonic structure as human speech.
  In these figures, the horizontal axis indicates the frequency increasing toward the right, and the vertical axis indicates the sound pressure in dB.
  FIG. 9 shows a sound source wave spectrum of speech. As shown in the figure, there are many overtones, and the sound pressure of overtones has a curve (PP line) connecting the peaks. Draw down. In FIG. 9, the sound shifted so that the sound with the highest sound pressure has a predetermined high frequency is the basic sound of the present invention.
[0045]
  FIG. 10 schematically shows the transfer characteristics based on the transfer function of the human vocal tract when the sound of “a” is uttered, that is, in a state where the vocal tract is not so narrowed. Compared with FIG. 11 showing the transfer function, it is characteristic that the sound pressure of overtones with a constant interval is increased. As a result, the spectrum of the basic sound shown in FIG. 9 becomes as shown in FIG. 12 due to the transfer characteristics shown in FIG. That is, in the harmonic structure, as the frequency increases, the PP line connecting the peak of the sound pressures of the respective harmonics becomes a synthesized sound showing a curvilinear descent as in FIG. Such a sound is referred to as a reference sound, and such characteristics are given as described later.
[0046]
  On the premise of the above, in the present embodiment, when generating a Mazaise-like sound, the sound has the following characteristics.
  (1) Mazalies is a voice with a high pitch, that is, a high fundamental frequency (basic frequency = F0) and a high key, and the frequency is about 300 Hz to 500 Hz, and the average is about 350 Hz. .
  (2) Mazalies have a harmonic structure and are therefore reverberant sounds. For this reason, in the present embodiment, in order to generate the Matheries-like sound, as will be described later, based on the function of the basic sound conversion means 30 so as to meet the above-mentioned condition (1) and the condition of this overtone, Generate basic sounds. However, this overtone, if the high frequency range is emphasized too much, may become a sound that resonates excessively and may not be a “friendly voice”. For this reason, in particular, the basic sound is changed to be a reference sound having the characteristics shown in FIG.
  (3) In addition, the mazalies are sounds with a large difference in sound level, a fundamental frequency changing, and a sound rich in intonation. In the present embodiment, this is called “change sound” and is realized by the function of the change sound generation means 40 as described later. However, the change is preferably within one octave, and if it exceeds this, there is a possibility that the voice is not “friendly”.
[0047]
  FIG. 13 is a flowchart showing an example of a method for converting an input voice or audio signal into a Mazaise-like sound having the above-mentioned Mazalies characteristics by the Mazaise-like sound converter 10 based on the configuration of FIG. .
  With reference to FIG. 13, an example of a method for converting a Mazaise-like sound will be described.
[0048]
  A voice or a voice signal is input to the input means 11 of FIG. 2 (ST1).
  For example, in the case of a device that directly inputs sound, the sound is input from a microphone 11 or a handy loudspeaker as shown in FIG. In the case of a conversion device built in a video deck or a conversion device connected to the video deck, an audio signal separated from the video signal is input. When the conversion apparatus 10 is connected to a network or the like, audio signals such as various contents distributed via the network are input. In addition, an audio signal corresponding to a reproduction signal reproduced from a recorded tape recorder, compact disk, magneto-optical disk or the like may be input. Further, when the Mazaise-like sound conversion device is built in a television or the like, an audio signal separated from a broadcast wave sent from a broadcast station is input.
[0049]
  When the voice signal from the input means is an analog signal, the voice recognition means 14 in FIG.DesiAfter being converted into a tall signal, for example, a predetermined processing unit such as a word or a phrase is identified by a Japanese language recognition program, and the processing unit 13 stores the processing unit in accordance with an instruction from the control unit 13. . Audio signals divided into processing units (hereinafter referred to as “sample signals”). FIG. 14 shows the state in which such a sound signal is input as an oscillograph waveform, in which the horizontal axis represents time and the vertical axis represents amplitude. In step 2, the audio signal VS is cut out from t1 to t2 to be a sample signal (sample sound) (ST2).
[0050]
  Next, the voiceprint analysis means 16 performs the pitch analysis of the sample sound, that is, the frequency analysis, and specifies the fundamental frequency F0 (ST3). As shown in FIG. 15, the basic sound is a sound having the specified basic frequency F0 and its harmonic structure VS2, VS3,.
  In this embodiment, for the sake of illustration, only the voice VS1 corresponding to the fundamental frequency F0 is extracted and described. However, the following processing is also performed simultaneously for each harmonic structure, except for the case where it is specifically described below. .
[0051]
  Next, an operation for generating a change sound is performed. Here, the intonation shift unit 17 shown in FIG. 2 raises and / or lowers within a predetermined range, for example, one octave range, in order to emphasize the change in the fundamental frequency of the audio VS1 using, for example, a synthesizer function. Grants intonation shift. For example, in FIG. 16, the rate of change is increased to about 0.8 octave, which is CVS1.
[0052]
  Subsequently, the operation shown in FIG. 17 is performed. In FIG. 17, the pitch shift unit 18 in FIG. 2 performs pitch analysis (frequency analysis) on the voice CVS1 to which the intonation change is added, and the average frequency is obtained.AF0Is obtained (ST5).
  This average frequencyAF0Based on the above, the pitch shifter 18 further performs a pitch shift (frequency shift) so as to be AF′0 shown in FIG. 18 (ST6).
  This pitch shift is intended to have a fundamental frequency that approximates to Mothers, and the range of change is preferably set in the range of 300 Hz to 500 Hz. In this embodiment, it is set to 350 Hz. Such a pitch shift can be frequency shifted by increasing the sound pressure of a predetermined frequency component, or a constant frequency shift may be performed for all components constituting the sound. In the present embodiment, the entire CVS1 is raised (see FIG. 18) to obtain the voice SCVS1.
[0053]
  Next, with respect to the voice SCVS1 after the pitch shift, that is, after the frequency change, resonance adjustment of the harmonic part is performed (ST7). Here, the harmonic overtone parts VS2, VS3... With respect to the fundamental frequency are simultaneously performed in the conversion process of the voice VS1 corresponding to the fundamental frequency F0 with respect to the intonation adjustment (ST4) and the pitch shift (ST6).
  The pitch shift unit 18 is preferably connected to a harmonic filter 24 having a resonance adjustment function. As a result, for example, for each overtone that has been pitch-shifted into the intonation change corresponding to the audio SCVS1, a harmonic structure as shown in FIG. 12 is formed using, for example, a transfer function as shown in FIG. That is, the PP line connecting the peak of the sound pressure of each overtone whose frequency increases gradually is set as a reference sound that is a synthesized sound indicating a curvilinear descent.
  In this case, the transfer function is not used directly, and the filter is such that the higher the harmonics, the lower the sound pressure.TYou may adjust using.
[0054]
  Lastly, as a result of the intonation shift, the speed of the voice SCVS1 to which the tone length or speed change is given is adjusted by the speed adjusting unit 19 and returned to the original tone length (ST8).
  In this case, in the case of Mothers, in response to the tendency to speak slowly, the speed may be adjusted to be slower than the input voice by taking this into account when adjusting the speed.
  As a result, an audio signal that is well suited to the Mazaise-like sound is generated. Therefore, the audio signal is stored in the memory 20 as necessary, or is directly converted into an analog signal by the D / A converter 22, and the Mazaise-like sound is output from the speaker 23. Sound is output.
[0055]
  As described above, the Mothers-like sound conversion apparatus 10 according to the present embodiment converts other sounds or sound signals that are input into sounds similar to Mothers that can attract the attention of infants and outputs the sound. Can do. Therefore, for infants or infantsWhenFor example, the present invention can be applied to a very wide range of products as follows.
  For example, it is possible to talk to the infant with a muzzle-like sound from a handy loudspeaker or the like to which voice is directly input.
  In the case of a conversion device with a built-in video deck, the audio signal of the recorded video signal can be reproduced as a mazaise-like sound.
  In addition, when the Mazaise-like sound conversion device 10 is connected to a network or the like, audio signals such as various contents distributed via the network are converted to the Mazaise-like sound, so that it can be reproduced as a Mazaise-like sound. .
  Further, a reproduction signal reproduced from a recorded tape recorder, compact disk, magneto-optical disk or the like is reproduced as a Mazaise-like sound.
  Furthermore, when the Mazaise-like sound converter 10 is built in a television or the like, an audio signal separated from a broadcast wave sent from a broadcasting station can be reproduced as a Mazaise-like sound.
  Furthermore, those who cannot speak Mazarez well can listen to their voice converted to the Mazaries sound by inputting their recorded voice into the input means, and practice speaking and uttering by Mazaries. Can do.
  Further, by using a language input means such as a personal computer, a word processor keyboard or the like as an input means, even a person who cannot speak a language or a person with a disability can speak by the Mothers.
  Furthermore, by using a communication line such as a telephone or the Internet, even if you are in a remote place, you can talk to Mazaries from a remote place.
[0056]
  The present invention is not limited to the above-described embodiment. For example, even if the sound quality is not questioned even if the approximation with the Mazaise similar sound is inferior, the input audio is input to the pitch shift unit, and the frequency is changed so that it immediately matches the Mazalies, A simple apparatus configuration or a conversion method may be used as it is or after adding a predetermined intonation change.
  For example, when Japanese language is recognized by the language recognizing means, a Mazaise-like sound is stored in advance in a memory as table data or the like for each predetermined unit such as a word or a phrase, and such data It is good also as a structure converted using.
  Therefore, for example, when Japanese is recognized by this language recognition means, the intonation in each word is stored in advance in the memory as, for example, table data, and the intonation adjustment is performed with reference to this data. Also good.
  Further, in the above-described embodiment, each configuration may be omitted or may be realized in combination with other configurations not described in the case where it is allowed in relation to the quality of the Mazaise-like sound. May be.
[0057]
【The invention's effect】
  As described above, according to the present invention, a sound similar to a mazarise that can attract the attention of an infant in particular, a mazaise-like sound that can be obtained by converting other sounds or audio signals input to the apparatus. It is possible to provide a conversion device and a method for converting a Mazaise-like sound.
[Brief description of the drawings]
FIG. 1 is a schematic perspective view showing an example of the appearance of an embodiment of a Mazaise-like sound converter.
FIG. 2 is a block diagram showing a configuration of the Mazaise-like sound conversion device of FIG. 1;
FIG. 3 is an oscilloscope waveform chart showing a muzzle's voiceprint.
4 is a graph showing the fundamental frequency of the voiceprint of FIG. 3;
5 is a graph showing the sound pressure of the voice in FIG.
6 is a graph showing overtones of the sound in FIG. 3. FIG.
7 is a graph showing an overtone of the voice of FIG. 3 with an enlarged frequency scale. FIG.
8 is a graph showing the sound pressure of the voice in FIG.
FIG. 9 is a diagram showing a harmonic structure of a basic sound in relation to a human voice in a sound source wave spectrum.
FIG. 10 is a diagram showing a transfer function of human production.
FIG. 11 is a diagram showing a transfer function of an open space.
FIG. 12 is a diagram for explaining a voice spectrum of a Mazaise-like sound.
FIG. 13 is a flowchart showing an example of a conversion method by the Matheries similar sound converter in FIG. 2;
FIG. 14 is an oscilloscope waveform diagram of input voice.
FIG. 15 is a graph showing a fundamental frequency obtained by cutting a processing unit from an input voice.
FIG. 16 is a graph showing a state where a change in intonation is applied to a voice in a processing unit.
FIG. 17 is a graph showing how an average frequency is obtained for speech in units of processing.
FIG. 18 is a graph showing how a frequency shift is performed on audio in units of processing.
[Explanation of symbols]
  DESCRIPTION OF SYMBOLS 10 ... Mazaise similar sound conversion apparatus, 11 ... Input means, 12 ... Main body, 13 ... Control means, 21 ... Output means, 30 ... Basic sound conversion means, 40 ... Change sound generation means

Claims

Input means of information corresponding to voice or voice signal or voice signal;
A basic sound converting means for converting an input voice or audio signal into a basic sound having a predetermined fundamental frequency and its harmonic structure;
A change sound generating means for giving a change in intonation within a certain range for the basic sound having a predetermined harmonic structure;
Means for specifying the fundamental frequency of the input sound or sound signal by the basic sound converting means;
Means for generating a basic sound having a harmonic structure based on the converted fundamental frequency by shifting the fundamental frequency so as to raise or lower the fundamental frequency;
Resonance adjustment means for adjusting the frequency with the harmonic structure to gradually increase and adjusting the sound pressure to become weaker as the frequency becomes higher ,
A Mazaries similar sound conversion device characterized in that the input voice or audio signal is converted to generate a Mazaries similar sound.

2. The mazalies according to claim 1, wherein the basic sound converting means is configured to shift the basic frequency of the input voice or voice signal to a basic frequency of approximately 300 to 500 Hz. Similar sound conversion device.

3. The mazaise-like sound conversion device according to claim 1 or 2, wherein the change sound generation means gives an intonation change that rises and / or falls within a predetermined range with respect to the input sound or sound signal.

4. The Mazaise-like sound converter according to claim 1 , wherein the intonation change is set in a range of approximately one octave.

Inputting information corresponding to voice or voice signal or voice signal from outside;
Identifying a sample sound as a processing unit from the input voice or voice signal or information corresponding to the voice signal;
Identifying a fundamental frequency from the sample sound;
Generating a change sound by applying an intonation change that rises and / or falls within a predetermined range with respect to the sound including the fundamental frequency;
Shifting the fundamental frequency to increase or decrease with respect to the change sound;
And adjusting a predetermined harmonic structure to the shifted fundamental frequency,
A method of converting a mazaise-like sound, wherein in the step of adjusting the overtone structure, the frequency is gradually increased and the sound pressure is decreased as the frequency becomes higher .

6. The method of converting a mothers-like sound according to claim 5, wherein the sample sound is specified in units of words based on language recognition.

The method for converting a mazaise-like sound according to claim 5 or 6 , wherein intonation change is set in a range of approximately one octave in the step of generating the change sound.

Inputting information corresponding to voice or voice signal or voice signal from outside;
Identifying a sample sound as a processing unit from the input voice or voice signal or information corresponding to the voice signal;
Identifying a fundamental frequency from the sample sound;
Generating a change sound by applying an intonation change that rises and / or falls within a predetermined range with respect to the sound including the fundamental frequency;
Shifting the fundamental frequency to increase or decrease with respect to the change sound;
Next, adjusting a predetermined harmonic structure to the shifted fundamental frequency;
With
A computer-readable recording medium characterized by storing a program for adjusting the harmonic structure so that the frequency gradually increases and the sound pressure decreases as the frequency increases .