JP3546779B2

JP3546779B2 - Acoustic signal analysis method

Info

Publication number: JP3546779B2
Application number: JP31509999A
Authority: JP
Inventors: 正尋柿下
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1999-11-05
Filing date: 1999-11-05
Publication date: 2004-07-28
Anticipated expiration: 2019-11-05
Also published as: JP2001134269A

Description

【０００１】
【発明の属する技術分野】
本発明は、特定の成分音を含む音響信号と前記特性の成分音を含まない音響信号とから、前記特定の成分音を抽出することのできる音響信号の分析方法に関する。
【０００２】
【従来の技術】
ピアノの音は、主に鍵盤操作（タッチ）でコントロールされるが、ダンパーペダル（サステインペダル）の操作による音色制御（共鳴制御）も演奏表現に重大な影響力を有している。
電子ピアノなどの電子楽器において、このようなピアノの音を再現しようとする場合、ダンパーペダルＯＮ時、ＯＦＦ時それぞれの楽音波形をサンプリングしておき、それらをペダル操作量に応じて重み付け混合（クロスフェード）することが行われている。しかしながら、この方法では、自然な音色変化を得るのが難しいという問題点がある。
そこで、通常押鍵音にダンパーペダル操作量に応じて音量やその変化態様（特に、リリースレート）が制御された共鳴音成分を加える方法が知られている。この方法による電子楽器においては、ノートオンが来るとノーマル音と共鳴音を同時にスタートし、共鳴音の音量、リリースレートをダンパーペダル（サステインペダル）の踏み込み量に応じて制御するようにしている。これによって、パー不良行きの微妙なペダルのコントロールに応じて響き具合を変化させたり、打鍵した後からペダルが踏まれた場合の共鳴効果を再現することができ、より自然な音色変化を得ることができる。
【０００３】
【発明が解決しようとする課題】
そこで、残る問題は、ダンパーペダルをＯＮとしたときの共鳴音成分をピアノのサンプリング音からどうやって抽出するかという点になる。
単純には、ダンパーペダルＯＮ時、ＯＦＦ時それぞれにおけるピアノの楽音をサンプリングし、両者の差をとれば共鳴音成分が抽出されるはずであるが、両者の倍音構造、位相関係の違いなどから、通常押鍵音にこの単純差分による共鳴音成分を加算しても、自然な共鳴感は得られず、本物のピアノらしい、うなり感、にごり感を再現することができないというのが現状である。
【０００４】
そこで、本発明は、共鳴音成分などの特定の成分音を含まない音響信号に加算したときに、不自然さがない前記特定の成分音を抽出することのできる音響信号分析方法を提供することを目的としている。
【０００５】
【課題を解決するための手段】
上記目的を達成するために、本発明の音響信号分析方法は、特定の成分音を含む第１の音響信号と、該第１の音響信号と同一音高であって前記特定の成分音を含まない第２の音響信号とから、前記特定の成分音を抽出する音響信号分析方法であって、前記第１の音響信号波形をスペクトル分析し、その時間軸に対する振幅情報と時間軸に対する位相情報とを得るステップと、前記第２の音響信号波形をスペクトル分析し、その時間軸に対する振幅情報と時間軸に対する位相情報とを得るステップと、前記第１の音響信号の振幅情報と前記第２の音響信号の位相情報とにより、第３の音響信号波形を生成するステップと、該第３の音響信号波形と前記第２の音響信号波形とから前記特定の成分音波形の倍音成分を求めるステップとを有するものである。
【０００６】
また、本発明の他の音響信号分析方法は、特定の成分音を含む第１の音響信号と、該第１の音響信号と同一音高であって前記特定の成分音を含まない第２の音響信号とから、前記特定の成分音を抽出する音響信号分析方法であって、前記第１の音響信号波形をスペクトル分析し、その時間軸に対する振幅情報と時間軸に対する位相情報とを得るステップと、前記第２の音響信号波形をスペクトル分析し、その時間軸に対する振幅情報と時間軸に対する位相情報とを得るステップと、前記第１の音響信号の振幅情報と位相情報および前記第２の音響信号の振幅情報と位相情報から、それぞれ、前記第１の音響信号波形と前記第２の音響信号波形に共通して含まれている倍音成分を抜き出した結果に基づいて、第３の音響信号波形を生成するステップと、前記第３の音響信号波形と前記第２の音響信号波形とから前記特定の成分音波形の倍音成分を求めるステップとを有するものである。
【０００７】
さらに、本発明のさらに他の音響信号分析方法は、特定の成分音を含む第１の音響信号と、該第１の音響信号と同一音高であって前記特定の成分音を含まない第２の音響信号とから、前記特定の成分音を抽出する音響信号分析方法であって、前記第１の音響信号波形をスペクトル分析し、その時間軸に対する振幅情報と時間軸に対する位相情報とを得る第１のステップと、前記第２の音響信号波形をスペクトル分析し、その時間軸に対する振幅情報と時間軸に対する位相情報とを得る第２のステップと、前記第１の音響信号波形から、前記第１の音響信号波形と前記第２の音響信号波形に共通して含まれている倍音成分を抜き出した結果に基づいて、前記特定の成分音を含む第３の音響信号波形を生成する第３のステップと、前記第２の音響信号波形から、前記第１の音響信号波形と前記第２の音響信号波形に共通して含まれている倍音成分を抜きだした結果に基づいて、前記特定の成分音を含まない第４の音響信号波形を生成する第４のステップと、前記第３の音響信号波形のうちの非倍音成分波形の複数の帯域毎の成分を求める第５のステップと、前記第４の音響信号波形のうちの非倍音成分波形の複数の帯域毎の成分を求める第６のステップと、前記第５のステップと前記第６のステップにより求められた前記各帯域毎の成分から前記特定の成分音の非倍音成分波形を求めるステップとを有するものである。
【０００８】
【発明の実施の形態】
図１は、本発明の音響信号分析方法が実行される楽音分析合成装置のハードウエア構成の一例を示すブロック図である。
この図において、１はこの楽音分析合成装置全体の制御を行うＣＰＵ、２はＣＰＵ１が実行する各種制御プログラム、楽音分析プログラムおよび楽音合成プログラムなどの各種プログラムを記憶するプログラムメモリ、３は各種制御情報、後述する各種のデータの記憶および一時記憶領域（バッファ）やワークエリアとして使用されるデータメモリ、４は表示装置、５はキーボードおよびポインティングデバイスなどの入力装置、６は鍵盤などの演奏操作子、７は楽音を合成する楽音合成部（シンセサイズユニット）、８は楽音波形サンプルをアナログ信号に変換し、図示しないサウンドシステムに出力するデジタルアナログ変換器（ＤＡＣ）である。また、９は電話回線、インターネット、ＬＡＮなどの通信ネットワーク１１と接続するためのネットワークインターフェース回路、１０はシステムバスである。
なお、図示していないが、ＣＤ−ＲＯＭ、ＤＶＤ、ＭＯ、ＦＤなどの外部記憶媒体の駆動装置を接続してもよいことは当然である。
【０００９】
前記楽音分析プログラムおよび楽音合成プログラムは、いわゆる分析・再合成（Ａｎａｌｙｓｉｓ＆Ｒｅｓｙｎｔｈｅｓｉｓ）方式の楽音合成を行なうものである。すなわち、楽音波形をスペクトル解析してその楽音に含まれている基音周波数およびその倍音周波数に対応する線スペクトル成分を抽出する。具体的には、分析対象となる音響信号波形サンプルに時間窓を掛けてフーリエ分析を行ない、振幅、位相、周波数のデータを得（短時間フーリエ分析（ＳＴＦ：ＳｈｏｒｔＴｉｍｅＦｏｕｒｉｅｒａｎａｌｙｓｉｓ））、該フーリエ分析出力の振幅データからピークを成す全ての周波数位置を検出する処理を、前記時間窓を移動しながら行い、各フレームにおけるピークのうち軌跡をなすものを追跡する（以下、特に断りがない限り、この対象音の短時間フーリエ分析からピーク追跡に至る処理をここではＳＴＦ分析処理と呼ぶことにする）。そして、このようにして得られた軌跡のうち、所望のデータを選択し、それらを正弦波合成することにより、元の楽音波形のうちの決定論的（確定的）成分（Ｄｅｔｅｒｍｉｎｉｓｔｉｃ成分）波形を合成する。そして、前記元の楽音波形から、このＤｅｔｅｒｍｉｎｉｓｔｉｃ成分波形を減算することにより、残余（確率的、非決定論的）成分（Ｒｅｓｉｄｕａｌ成分）波形を得る。
合成時には、前記軌跡のデータをモディファイして再合成することにより、Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形をモディファイし、前記Ｒｅｓｉｄｕａｌ波形については、イコライザなどによる信号処理によりモディファイして、両者を加算することにより所望の波形を得ることができる。
【００１０】
次に、本発明の音響信号分析方法について、図２〜５のフローチャートを参照して説明する。
本発明の音響信号分析方法は、共鳴音などの特定の成分音を、該特定の成分音を含む音響信号波形データと含まない音響信号波形データから算出するものであり、その概要は、まず、前記特定の成分音を含む音響信号波形データと含まない音響信号波形データの両者を前述した短時間フーリエ分析によりＤｅｔｅｒｍｉｎｉｓｔｉｃ成分とＲｅｓｉｄｕａｌ成分とに分離し、Ｄｅｔｅｒｍｉｎｉｓｔｉｃ成分については、特定の成分音を含む波形の位相を特定の成分音を含まない波形の位相に揃えて引き算し、Ｒｅｓｉｄｕａｌ成分については、両者のＲｅｓｉｄｕａｌ成分を帯域フィルタ群により帯域分割し、各帯域のエンベロープの差を使って引き算し、前記Ｄｅｔｅｒｍｉｎｉｓｔｉｃ成分の引き算結果と前記Ｒｅｓｉｄｕａｌ成分の引き算結果を加算することにより、前記特定の成分音のみの音響信号波形を生成するものである。
【００１１】
以下、特定の成分音としてピアノのダンパーペダルをオンとしたときの共鳴音成分を含む楽音波形サンプルデータ（Ｐ＿ｋｃ＿ｔ＿Ｄ．ｗａｖｅ）とそのような共鳴音成分を含まない通常押鍵時の楽音波形サンプルデータ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｗａｖｅ）を入力として、前記ダンパーペダルをオンとしたときの共鳴音成分波形を抽出する場合を例にとって説明する。ここで、前記データの名称中の、Ｐは楽器名（ピアノ）を示し、ｋｃはキーコード（音高）、ｔはタッチ、Ｄはダンパーペダルがオンであること、ＮＤはダンパーペダルがオフであることを示している。
図６は、このような入力データの一例を示す図であり、（ａ）はダンパーペダルをオンとしたときのピアノのＧ３音（Ｐ＿３Ｇ＿ｔ＿Ｄ．ｗａｖｅ）、（ｂ）はダンパーペダルが踏み込まれていないときのピアノのＧ３音（Ｐ＿３Ｇ＿ｔ＿ＮＤ．ｗａｖｅ）を示している。
【００１２】
図２のステップＳ１において、まず、前記両入力データの波形の頭出し処理、すなわち、両入力波形データの時間軸を合わせる処理を行ない、両入力データの不要区間を除いた波形データを生成する。この処理は、例えば、入力データのうちの一方を基準とし、両者の相関が最大となるように、他方の波形データのタイミングを調節することにより行われる。ここでは、共鳴音成分を含まない波形データ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｗａｖｅ）を基準に共鳴音成分を含む波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ．ｗａｖｅ）のタイミングを調節するものとする。したがって、このステップＳ１の時間軸合せ＆波形切り出し処理の結果、頭出しされた共鳴音成分を含む波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｗａｖｅ）が生成される。
【００１３】
次に、ステップＳ２において、前記頭出しされた共鳴音成分を含む波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｗａｖｅ）と前記共鳴音成分を含まない波形データ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｗａｖｅ）それぞれについて、先に述べた短時間フーリエ分析に基づくＳＴＦ分析処理が実行される。これにより、前述のように、前記共鳴音成分を含む波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｗａｖｅ）をＳＴＦ分析したデータ（ＳＴＦデータ）（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｓｔｆ）が生成され、その軌跡をなすデータからＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｄｅｔ）が合成され、さらに、元の波形データから前記Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形データを減算することによりＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｒｅｓ）が生成される。
また、同様に、前記共鳴音成分を含まない波形データ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｗａｖｅ）から、ＳＴＦデータ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｓｔｆ）、Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｄｅｔ）、および、Ｒｅｓｉｄｕａｌ波形データ（Ｐ＿ＫＣ＿ｔ＿ＮＤ．ｒｅｓ）が生成される。
【００１４】
図７の（ａ）は前記図６の（ｂ）に示した共鳴音成分を含まない楽音波形（Ｐ＿３Ｇ＿ｔ＿ＮＤ．ｗａｖｅ）のＳＴＦデータ（Ｐ＿３Ｇ＿ｔ＿ＮＤ．ｓｔｆ）を示し、図７の（ｂ）は、前記図６の（ａ）に示した共鳴音成分を含む楽音波形（Ｐ＿３Ｇ＿ｔ＿Ｄ．ｗａｖｅ）を頭出しした楽音波形のＳＴＦデータ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｓｔｆ）を示している。この図７の（ａ）および（ｂ）に示すように、ＳＴＦ分析処理により、各倍音成分ごとに、その振幅情報（すなわち、エンベロープの時間変化）およびその位相情報（すなわち、ピッチの時間変動）を求めることができる。
【００１５】
次に、このように生成された各データにもとづいて、共鳴音成分のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形およびＲｅｓｉｄｕａｌ波形の抽出が行われる。
まず、共鳴音成分のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形の抽出処理が行われる。共鳴音成分のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形を抽出するためには、共鳴音成分を含む波形データのＤｅｔｅｒｍｉｎｉｓｔｉｃ波形から共鳴音成分を含まない波形データのＤｅｔｅｒｍｉｎｉｓｔｉｃ波形を減算すればよいのであるが、前記図７の（ａ）および（ｂ）に示すように、両波形の倍音成分は必ずしも一致していない。すなわち、図７の（ａ）に示した共鳴音成分を含まない波形においては、第１３倍音成分が含まれていないのに対し、図７の（ｂ）に示した共鳴音成分を含む波形においては、第１４倍音成分が含まれていない。したがって、両波形のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形成分同士の減算を行なうためには、両者に含まれている倍音成分を一致させることが必要となる。
【００１６】
そこで、図３のステップＳ３において、前記各ＳＴＦデータ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｓｔｆおよびＰ＿ｋｃ＿ｔ＿ＮＤ．ｓｔｆ）からそれらに共通に含まれている倍音成分のみを抜き出して新たなＳＴＦデータ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ＿ＣＯＭ．ｓｔｆおよびＰ＿ｋｃ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆ）を作成する。このとき、Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形データＰ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｄｅｔ２およびＰ＿ｋｃ＿ｔ＿ＮＤ＿．ｄｅｔ２、ならびに、Ｒｅｓｉｄｕａｌ波形データＰ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｒｅｓ２およびＰ＿ｋｃ＿ｔ＿ＮＤ．ｒｅｓ２も作り直される。すなわち、Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形データは、それぞれ第１２倍音までのデータを含むものとなり、残された倍音成分（第１３倍音成分あるいは第１４倍音成分）は、それぞれ、Ｒｅｓｉｄｕａｌ波形に移行される。具体的には、それぞれ、前記共通の倍音成分を有する新たなＳＴＦデータに基づいて、それぞれ、新たなＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データを合成する。そして、この再合成された波形データをそれぞれの元波形データから減算してそれぞれ対応する新たなＲｅｓｉｄｕａｌ波形データを求める。
【００１７】
前記図７に示した例においては、各ＳＴＦデータから第１２倍音までを含む新たなＳＴＦデータ（Ｐ＿３Ｇ＿ｔ＿Ｄ＿ＡＳＹＮＣ＿ＣＯＭ．ｓｔｆおよびＰ＿３Ｇ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆ）、Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（Ｐ＿３Ｇ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｄｅｔ２およびＰ＿３Ｇ＿ｔ＿ＮＤ．ｄｅｔ２）、および、Ｒｅｓｉｄｕａｌ波形データ（Ｐ＿３Ｇ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｒｅｓ２およびＰ＿３Ｇ＿ｔ＿ＮＤ．ｒｅｓ２）を生成する。このとき、新たなＳＴＦデータに含まれなかった第１３倍音あるいは第１４倍音のＤｅｔｅｒｍｉｎｉｓｔｉｃ成分は、それぞれＲｅｓｉｄｕａｌ成分の中に移行されることとなる。図８の（ａ）および（ｂ）は、このようにして生成された新たなＳＴＦデータ（Ｐ＿３Ｇ＿ｔ＿Ｄ＿ＡＳＹＮＣ＿ＣＯＭ．ｓｔｆおよびＰ＿３Ｇ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆ）を示している。この図８の（ａ）と（ｂ）とを比較すると明らかなように、新たに生成されたＳＴＦデータは、ともに第１２倍音までのデータとなっている。
【００１８】
このように共通の倍音成分のみを含む新たなＳＴＦデータを生成した後、ステップＳ４（図３）に進み、共鳴音を含む楽音の振幅情報と共鳴音を含まない楽音の位相情報を持つ波形データを生成する。すなわち、前記Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＣＯＭ．ｓｔｆの振幅情報と前記Ｐ＿ｋｃ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆの位相情報（周波数情報）を用いて、再合成を行ない、共鳴音を含む波形データＰ＿ｋｃ＿Ｔ＿Ｄ．ｄｅｔ＃を生成する。この処理により、得られた共鳴音を含む波形データＰ＿ｋｃ＿Ｔ＿Ｄ．ｄｅｔ＃から前記作り直した共鳴音を含まないＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ＿．ｄｅｔ２）を減算して得られる波形データは、位相を揃えて減算しているため、前記元の共鳴音を含む波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｄｅｔ２）から前記作り直した共鳴音を含まないＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ＿．ｄｅｔ２）を減算して得られたデータと比較して、振幅の差のみを含むものとなっている。
【００１９】
このようにして求めた波形データは、これを前記共鳴音を含まないＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ＿．ｄｅｔ２）に加算することにより、前記共鳴音を含む波形データＰ＿ｋｃ＿Ｔ＿Ｄ．ｄｅｔ＃となるのであるから、この波形をそのまま共鳴音のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形として使用することもできる。しかしながら、この場合には、この波形を前記元の共鳴音を含まない波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ．ｄｅｔ２）に加算したときに、元の音のアタック部にも波形データが加算されてしまうこととなるので、アタック部分でフェードインするように変更する。図３のステップＳ５において、このアタック部に含まれる成分音の処理が行われる。
【００２０】
図５は、このアタック部に含まれる成分音の処理の内容を示すフローチャートである。図５のステップＳ１１において、まず、前記共鳴音を含まないＲｅｓｉｄｕａｌ波形（Ｐ＿ｋｃ＿ｔ＿ＮＤ．ｒｅｓ２）を調べる。図９は、この例における共鳴音を含まないＲｅｓｉｄｕａｌ波形データＰ＿３Ｇ＿ｔ＿ＮＤ．ｒｅｓ２を示す図である。この例では、図９に示すように、立上がりから約２００ｍｓの期間には、鍵盤を押したときに鍵盤下部が鍵盤を保持する棚板に当たり発生する震動音（棚板音）が含まれていることがわかる。周知のように、ピアノでは、概略、押鍵操作開始→棚板音（下死点到達）→共鳴音（上死点到達）→ハンマー打撃（弦震動音）というシーケンスとなる。したがって、この棚板音の部分については、共鳴音を含まないようにして、元の音を保持するようにしている。
【００２１】
このステップＳ１１において、アタック部分のチェックをした結果、棚板音などの重畳成分音が含まれていないときには、ステップＳ１２の判断結果がＮＯとなり、ステップＳ１４に進む。
一方、前記図９に示した例のように、重畳成分音が含まれているときには、ステップＳ１３に進み、その重畳成分音（棚板音）の含まれている区間時間Ｔｎを計測する。図９に示した例では、この区間時間Ｔｎは２００ｍｓとなる。
そして、ステップＳ１４に進み、前記区間時間Ｔｎ経過後の時刻から所定時間区間Ｔｘ（例えば、１００ｍｓ）で、前記Ｐ＿ｋｃ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆと前記Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＣＯＭ．ｓｔｆとをクロスフェード処理して、新たなＳＴＦデータＰ＿ｋｃ＿ｔ＿ＸＦ．ｓｔｆ２を生成する。図示した例では、アタックの開始時点からＴｎ（２００ｍｓ）までは、前記共鳴音を含まないＳＴＦデータＰ＿ｋｃ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆとし、ＴｎからＴｘの時間、すなわち、２００ｍｓから３００ｍｓまでの時間で、前記共鳴音を含まないＳＴＦデータＰ＿３Ｇ＿ｔ＿ＮＤ＿ＣＯＭ．ｓｔｆから前記共鳴音を含むＳＴＦデータＰ＿３Ｇ＿ｔ＿Ｄ＿ＣＯＭ．ｓｔｆにフェードインを行ない、３００ｍｓ以降の区間では共鳴音を含むＳＴＦデータＰ＿３Ｇ＿ｔ＿Ｄ＿ＣＯＭ．ｓｔｆとなるような、新たなＳＴＦデータＰ＿３Ｇ＿ｔ＿ＸＦ．ｓｔｆ２を生成する。
【００２２】
このようにして、このステップＳ５（図３）において、棚板音の区間は共鳴音を含まず、その後共鳴音を含む楽音のＳＴＦデータＰ＿３Ｇ＿ｔ＿ＸＦ．ｓｔｆ２が生成される。
そして、ステップＳ６に進み、共鳴音のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データＰ＿ｋｃ＿ｔ＿ＤＭＰ．ｄｅｔを生成する。この処理は、前記ステップＳ５において生成したＳＴＦデータＰ＿３Ｇ＿ｔ＿ＸＦ．ｓｔｆ２からＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データを再合成し、この再合成したＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データを前記共鳴音を含まないＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データＰ＿ｋｃ＿ｔ＿ＮＤ．ｄｅｔ２から減算することにより、共鳴音の倍音成分Ｄｅｔｅｒｍｉｎｉｓｔｉｃ波形データＰ＿ｋｃ＿ｔ＿ＤＭＰ．ｄｅｔを生成する処理である。
図１０は、このようにして生成されたピアノのＧ３音のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データＰ＿３Ｇ＿ｔ＿ＤＭＰ．ｄｅｔを示している。
以上により、共鳴音成分のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データを得ることができた。
【００２３】
次に、共鳴音成分のＲｅｓｉｄｕａｌ波形データの抽出について説明する。
共鳴音（全弦開放状態での響き）は、多数の異なる音高の弦の微妙な振動の重なりや干渉に基づくものであり、共鳴音を含む楽音のＲｅｓｉｄｕａｌ波形と共鳴音を含まない楽音のＲｅｓｉｄｕａｌ波形の差を単純にとるだけでは、自然な共鳴感のある楽音を得ることができない。そこで、本発明においては、次に説明するように、帯域分割を行ない、Ｒｅｓｉｄｕａｌ波形データを抽出するようにしている。
Ｒｅｓｉｄｕａｌ波形データを抽出するために、本発明では、共鳴音を含む楽音のＲｅｓｉｄｕａｌ波形データと共鳴音を含まない楽音のＲｅｓｉｄｕａｌ波形データそれぞれについて帯域フィルタ群（帯域フィルタバンク）を用いて各帯域毎のエンベロープを求め、該エンベロープ同士の引き算を行ない、その結果を前記共鳴音を含む楽音のＲｅｓｉｄｕａｌ波形に乗算するという処理を行なう。
【００２４】
図４のステップＳ７において、前記共鳴音を含む楽音のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｒｅｓ２）と前記共鳴音を含まない楽音のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ．ｒｅｓ２）それぞれを帯域分割し、各帯域毎のＲｅｓｉｄｕａｌ成分波形データを求め、該各帯域毎のＲｅｓｉｄｕａｌ成分波形データからそれぞれ振幅エンベロープを生成する。
図１１は、この帯域分割に用いられる帯域フィルタバンクの特性の一例を示す図であり、ここでは、この図１１に示すような、対数間隔で設定された周波数間隔を有する中心周波数の異なる複数個のバンドパスフィルタからなる帯域フィルタバンクを用いて、（ａ）〜（ｅ）に５つの帯域に分割するものとして説明する。なお、分割帯域数は、これに限らず、任意の数とすることができる。
【００２５】
前記共鳴音を含む楽音のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｒｅｓ２）を帯域フィルタバンクに通すことにより、各帯域毎のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ１．ｒｅｓ、…、Ｐ＿ｋｃ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂｎ．ｒｅｓ）が得られる。そして、該得られた各帯域毎のＲｅｓｉｄｕａｌ波形データから、それぞれ、その振幅エンベロープ（Ｐ＿ｋｃ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ１．ｅｎｖ、…、Ｐ＿ｋｃ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂｎ．ｅｎｖ）を算出する。
また、前記共鳴音を含まない楽音のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ．ｒｅｓ２）を前記帯域フィルタバンクに通すことにより、各帯域毎のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｒｅｓ、…、Ｐ＿ｋｃ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂｎ．ｒｅｓ）を求める。そして、該各帯域毎のＲｅｓｉｄｕａｌ波形データから、それぞれの振幅エンベロープ（Ｐ＿ｋｃ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｅｎｖ、…、Ｐ＿ｋｃ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂｎ．ｅｎｖ）を算出する。
【００２６】
図１２の（ａ）〜（ｅ）は、共鳴音を含む前記Ｐ＿３Ｇ＿ｔ＿Ｄ＿ＡＳＹＮＣ．ｒｅｓ２を前記図１１に示した帯域フィルタバンクに入力したときの各帯域毎の波形データ、Ｐ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ１．ｒｅｓ〜Ｐ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ５．ｒｅｓを示している。
また、図１３の（ａ）〜（ｅ）は、前記図１２に示した各帯域毎のＲｅｓｉｄｕａｌ波形データの振幅エンベロープＰ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ１．ｅｎｖ〜Ｐ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ５．ｅｎｖを示している。なお、この振幅エンベロープは、それぞれ、その最大値を１００とするように正規化して示している。
さらに、図１４の（ａ）〜（ｅ）は、共鳴音を含まない前記Ｐ＿３Ｇ＿ｔ＿ＮＤ．ｒｅｓ２を前記図１１に示した帯域フィルタバンクに入力したときの各帯域毎のＲｅｓｉｄｕａｌ波形データＰ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｒｅｓ〜Ｐ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ５．ｒｅｓを示しており、図１５の（ａ）〜（ｅ）は、該各帯域毎のＲｅｓｉｄｕａｌ波形データＰ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｒｅｓ〜Ｐ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ５．ｒｅｓからそれぞれ算出した各帯域毎の振幅エンベロープＰ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｅｎｖ〜Ｐ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ５．ｅｎｖを示している。
【００２７】
このようにして求めた共鳴音を含む楽音の各帯域毎のＲｅｓｉｄｕａｌ波形の振幅エンベロープ（図１３）をターゲットとし、共鳴音を含まない楽音の各帯域毎の振幅エンベロープ（図１５）をリファレンスとして、同じ帯域同士の減算をすればよいのであるが、このまま減算を行なった結果を用いると、この結果（ターゲット）−（リファレンス）をリファレンスに加算したときに、リファレンスのアタック部を劣化させてしまう可能性がある。また、響きの楽音の引き算を行なうという意味からも、ここでは、アタック部については引き算は行なわず、そのまま保存することとしている。そのため、各エンベロープの立上がりをマスクしたのち、引き算を行なうようにしている。
【００２８】
図４のステップＳ８は、このマスク処理を行なうものであり、各エンベロープＰ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ１．ｅｎｖ〜Ｐ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ５．ｅｎｖ、Ｐ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｅｎｖ〜Ｐ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ５．ｅｎｖそれぞれについて、時刻０〜最大レベル到達時間までの区間（アタック区間）のエンベロープを最大値（１００）に変更する処理を行なう。
図１６は、このステップＳ８によりマスク処理が行われた共鳴音を含むＲｅｓｉｄｕａｌ波形（ターゲット）の各帯域毎の振幅エンベロープ（Ｐ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ１．ｅｎｖ＃〜Ｐ＿３Ｇ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂ５．ｅｎｖ＃）を示す図であり、図１７は、同じくマスク処理が行われた共鳴音を含まないＲｅｓｉｄｕａｌ波形（リファレンス）の各帯域毎の振幅エンベロープＰ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ１．ｅｎｖ＃〜Ｐ＿３Ｇ＿ｔ＿ＮＤ＿ｒｅｓ２＿Ｂ５．ｅｎｖ＃を示す図である。
【００２９】
次に、ステップＳ９に進み、前記ステップＳ８においてマスク処理された各帯域毎の振幅エンベロープから各帯域毎の共鳴音のエンベロープ（Ｐ＿ｋｃ＿ｔ＿ＲＳ＿Ｂ１．ｅｎｖ〜Ｐ＿ｋｃ＿ｔ＿ＲＳ＿Ｂ５．ｅｎｖ）を算出する。この計算は、次の式（１）に基づいて実行される。
【数１】

この式（１）により得られるエンベロープＰ＿ｋｃ＿ｔ＿ＲＳ＿Ｂ１．ｅｎｖ〜Ｐ＿ｋｃ＿ｔ＿ＲＳ＿Ｂ５．ｅｎｖは、各帯域における共鳴音の差によるエンベロープであると考えられる。
【００３０】
次に、ステップＳ１０に進み、前記ステップＳ９において算出した各帯域毎の共鳴音のエンベロープを対応する各帯域毎のターゲットの波形データ、すなわち、共鳴音を含む楽音の波形データＰ＿ｋｃ＿ｔ＿Ｄ＿ｒｅｓ２＿Ｂｘ．ｒｅｓ（ｘ＝１，２，．．．，ｎ）と乗算して、各帯域毎の差の波形データＰ＿ｋｃ＿ｔ＿ＲＳ＿Ｂｘ．ｗａｖｅ（ｘ＝１，２，．．．，ｎ）を求める。
【数２】

図１８の（ａ）〜（ｅ）は、このようにして算出した、前述した例における各帯域毎の共鳴音成分のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿ＲＳ＿Ｂ１．ｗａｖｅ〜Ｐ＿ｋｃ＿ｔ＿ＲＳ＿Ｂ５．ｗａｖｅ）を示している。
【００３１】
そして、このようにして算出した各帯域毎の共鳴音成分のＲｅｓｉｄｕａｌ波形データを加算した後、係数Ｐｓｒを乗算して、共鳴音成分のＲｅｓｉｄｕａｌ波形データ（Ｐ＿ｋｃ＿ｔ＿ＤＭＰ．ｒｅｓ）を算出する。
すなわち、
【数３】

ここで、係数Ｐｒｓは、共鳴音を含む楽音のＲｅｓｉｄｕａｌ波形（ターゲット）と共鳴音を含まない楽音のＲｅｓｉｄｕａｌ波形（リファレンス）の差のパワーと、これまでのステップで算出した共鳴音のＲｅｓｉｄｕａｌ波形のパワーとを一致させるために乗ぜられる係数である。
図１９は、このようにして算出されたピアノのＧ３音の共鳴音成分のＲｅｓｉｄｕａｌ波形データ（Ｐ＿３Ｇ＿ｔ＿ＤＭＰ．ｒｅｓ）を示す図である。
【００３２】
このようにして、共鳴音成分のＲｅｓｉｄｕａｌ波形データを得ることができた。
次に、前記ステップＳ６で求めた共鳴音成分のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データ（図１０）と前記共鳴音成分のＲｅｓｉｄｕａｌ波形データ（図１９）とを加算して、共鳴音成分Ｐ＿ｋｃ＿ｔ＿ＤＭＰ．ｗａｖｅを算出する。
すなわち、
【数４】

図２０は、このようにして算出したピアノのＧ３音の共鳴音成分（Ｐ＿Ｇ３＿ｔ＿ＤＭＰ．ｗａｖｅ）を示す図である。
このようにして、共鳴音成分を抽出することができた。
【００３３】
図２１は、このようにして抽出した共鳴音成分を楽音生成に使用する場合の例を示す概念図である。
この図において、２１は前述した共鳴音成分を含まない楽音波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ．ｗａｖｅ）、２２は前述のようにして抽出した共鳴音成分の波形データ（Ｐ＿ｋｃ＿ｔ＿ＤＭＰ．ｗａｖｅ）、２３は利得調節器であり、図示するように、ダンパーペダルの踏み込み量によりゲインが制御されるようになされている。また、２４は前記共鳴音成分を含まない楽音波形データ（Ｐ＿ｋｃ＿ｔ＿ＮＤ．ｗａｖｅ）２１と前記共鳴音成分の波形データ（Ｐ＿ｋｃ＿ｔ＿ＤＭＰ．ｗａｖｅ）とを加算して出力する加算器である。このように、ダンパーペダルの踏み込み量に応じた重みを付けられた共鳴音成分を共鳴音成分を含まない楽音波形に加算することにより、自然な共鳴音を有する楽音波形を生成することができる。特に、微妙なペダルのコントロールに応じて音の響き具合を変化させたり、打鍵した後からペダルが踏まれた場合の共鳴効果を再現することが可能となる。
【００３４】
なお、以上においては、特定の成分音として、共鳴音成分、特に、ダンパーペダルを操作したときの共鳴音成分を例にとって説明したが、本発明の楽音分析方法は、これに限られることはなく、タッチの異なる楽音、例えば、ｆｆ（フォルテッシモ）とｍｆ（メゾフォルテ）の楽音とを分析対象として、ｆｆ−ｍｆの楽音波形を作成して、タッチの違いによる成分音を抽出するようにするなど、各種の場合に適用することができる。
また、楽音に限らず、２つあるいは複数の音響信号を対象として、一方には含まれず、他方には含まれているような特定の成分音を抽出する場合に同様に適用することができる。
さらに、上述においては、楽音分析合成装置において実行するものとして説明したが、本発明の音響信号分析方法は、汎用のコンピュータ上のソフトウエアとして実現することもでき、あるいは、専用のハードウエア構成として実現することも可能である。
【００３５】
【発明の効果】
以上説明したように、本発明の音響信号分析方法によれば、特定の成分音を含む音響信号と特定の成分音を含まない音響信号とから、前記特定の成分音を含まない音響信号と加算しても不自然さの無い特定の成分音を抽出することが可能となる。
【図面の簡単な説明】
【図１】本発明の音響信号分析方法が適用される装置の構成例を示すブロック図である。
【図２】本発明の音響信号分析方法を説明するためのフローチャートその１である。
【図３】本発明の音響信号分析方法を説明するためのフローチャートその２である。
【図４】本発明の音響信号分析方法を説明するためのフローチャートその３である。
【図５】アタック部に含まれる成分音の処理を説明するための図である。
【図６】処理対象となる楽音波形の例を示す図であり、（ａ）は共鳴音成分を含む楽音波形、（ｂ）は共鳴音成分を含まない楽音波形である。
【図７】処理対象となる楽音波形の短時間フーリエ分析結果を示す図であり、（ａ）は共鳴音成分を含まない楽音波形の分析結果、（ｂ）は共鳴音成分を含む楽音波形の分析結果である。
【図８】図７の結果から、共通の倍音成分を抜きだした結果を示す図であり、（ａ）は共鳴音成分を含まない楽音、（ｂ）は共鳴音成分を含む楽音である。
【図９】共鳴音を含まないＲｅｓｉｄｕａｌ波形データを示す図である。
【図１０】共鳴音成分のＤｅｔｅｒｍｉｎｉｓｔｉｃ波形データを示す図である。
【図１１】帯域フィルタバンクの特性の例を示す図である。
【図１２】共鳴音成分を含む楽音のＲｅｓｉｄｕａｌ成分を帯域フィルタバンクに通したときの各帯域の出力波形を示す図である。
【図１３】図１２の出力の振幅エンベロープを示す図である。
【図１４】共鳴音成分を含まない楽音のＲｅｓｉｄｕａｌ成分を帯域フィルタバンクに通したときの各帯域の出力波形を示す図である。
【図１５】図１４の出力波形の振幅エンベロープを示す図である。
【図１６】図１３の振幅エンベロープにマスク処理を施した結果を示す図である。
【図１７】図１５の振幅エンベロープのマスク処理を施した結果を示す図である。
【図１８】共鳴音成分のＲｅｓｉｄｕａｌ波形の各帯域毎の成分を示す図である。
【図１９】算出された共鳴音成分のＲｅｓｉｄｕａｌ波形を示す図である。
【図２０】抽出された共鳴音成分の波形を示す図である。
【図２１】本発明の音響信号分析方法により抽出された共鳴音成分を使用するときの概念図である。
【符号の説明】
１ＣＰＵ、２プログラムメモリ、３データメモリ、４表示装置、５入力装置、６演奏操作子、７楽音合成部、８ＤＡＣ、９ネットワークインターフェース、１０システムバス、１１通信ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an acoustic signal analysis method capable of extracting the specific component sound from an audio signal including a specific component sound and an audio signal not including the characteristic component sound.
[0002]
[Prior art]
The sound of the piano is mainly controlled by keyboard operation (touch), but timbre control (resonance control) by operating a damper pedal (sustain pedal) also has a significant influence on performance expression.
To reproduce the sound of such a piano in an electronic musical instrument such as an electronic piano, each tone waveform is sampled when the damper pedal is ON and OFF, and weighted mixing (crossing) is performed according to the pedal operation amount. Fade) is being done. However, this method has a problem that it is difficult to obtain a natural tone color change.
Therefore, there is known a method of adding a resonance tone component whose volume and its change mode (particularly, release rate) are controlled in accordance with the damper pedal operation amount to the normal key depression sound. In the electronic musical instrument according to this method, when a note-on comes, a normal sound and a resonance sound are started at the same time, and the volume and release rate of the resonance sound are controlled in accordance with the depression amount of a damper pedal (sustain pedal). This makes it possible to change the sound level according to the subtle pedal control for poor par and reproduce the resonance effect when the pedal is depressed after hitting the key, and obtain a more natural tone change Can be.
[0003]
[Problems to be solved by the invention]
Therefore, the remaining problem is how to extract the resonance component when the damper pedal is turned on from the sampled sound of the piano.
Simply, when the damper pedal is ON and OFF, the piano tone is sampled and the difference between the two is taken, so that the resonance component should be extracted. At present, even if a resonance component based on this simple difference is added to a normal key depression sound, a natural resonance feeling cannot be obtained, and it is not possible to reproduce a beat, a feeling of smell, and a feeling of a real piano.
[0004]
Therefore, the present invention provides an acoustic signal analysis method capable of extracting the specific component sound having no unnaturalness when added to an audio signal not including a specific component sound such as a resonance component. It is an object.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, an acoustic signal analysis method according to the present invention includes a first acoustic signal including a specific component sound, At the same pitch as the first acoustic signal, An audio signal analysis method for extracting the specific component sound from a second audio signal that does not include the specific component sound, the spectrum analysis of the first audio signal waveform, To the time axis Amplitude information and To the time axis Obtaining phase information; and performing a spectrum analysis on the second acoustic signal waveform. To the time axis Amplitude information and To the time axis Obtaining phase information; generating a third audio signal waveform based on the amplitude information of the first audio signal and the phase information of the second audio signal; From the second acoustic signal waveform and the specific component sound waveform Overtone Obtaining a component.
[0006]
Further, another acoustic signal analysis method of the present invention provides a method of analyzing a first acoustic signal including a specific component sound. At the same pitch as the first acoustic signal, An audio signal analysis method for extracting the specific component sound from a second audio signal that does not include the specific component sound, the spectrum analysis of the first audio signal waveform, To the time axis Amplitude information and To the time axis Obtaining phase information; and performing a spectrum analysis on the second acoustic signal waveform. To the time axis Amplitude information and To the time axis Obtaining phase information; and obtaining the first sound signal waveform and the second sound from the amplitude information and the phase information of the first sound signal and the amplitude information and the phase information of the second sound signal, respectively. Generating a third acoustic signal waveform based on a result of extracting a harmonic component commonly included in the signal waveform; and generating the third acoustic signal waveform from the third acoustic signal waveform and the second acoustic signal waveform. Specific component Overtone Obtaining a component.
[0007]
Further, still another method for analyzing an acoustic signal according to the present invention includes the first acoustic signal including a specific component sound, At the same pitch as the first acoustic signal, An audio signal analysis method for extracting the specific component sound from a second audio signal that does not include the specific component sound, the spectrum analysis of the first audio signal waveform, To the time axis Amplitude information and To the time axis A first step of obtaining phase information, and a spectrum analysis of the second acoustic signal waveform, To the time axis Amplitude information and To the time axis A second step of obtaining phase information, and a result of extracting, from the first acoustic signal waveform, harmonic components commonly included in the first acoustic signal waveform and the second acoustic signal waveform. A third step of generating a third sound signal waveform including the specific component sound based on the first sound signal waveform and the second sound signal waveform from the second sound signal waveform. A fourth step of generating a fourth acoustic signal waveform that does not include the specific component sound based on a result of extracting a harmonic component that is included in common; Non Overtone A fifth step of obtaining a component of each of the plurality of bands of the component waveform; Overtone A sixth step of obtaining a component for each of a plurality of bands of the component waveform; and determining a non-specific sound of the specific component sound from the components of each band obtained in the fifth step and the sixth step. Overtone Obtaining a component waveform.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a block diagram showing an example of a hardware configuration of a musical sound analysis / synthesis apparatus on which an acoustic signal analysis method of the present invention is executed.
In FIG. 1, reference numeral 1 denotes a CPU for controlling the entire tone analyzing / synthesizing apparatus, 2 denotes a program memory for storing various programs such as various control programs executed by the CPU 1, a tone analyzing program and a tone synthesizing program, and 3 denotes various control information. A data memory used as a storage area and a temporary storage area (buffer) for various data described later and a work area, 4 is a display device, 5 is an input device such as a keyboard and a pointing device, 6 is a performance operator such as a keyboard, Reference numeral 7 denotes a tone synthesizing unit (synthesizing unit) for synthesizing a tone, and 8 a digital-to-analog converter (DAC) for converting a tone waveform sample into an analog signal and outputting the analog signal to a sound system (not shown). Reference numeral 9 denotes a network interface circuit for connecting to a communication network 11 such as a telephone line, the Internet, and a LAN. Reference numeral 10 denotes a system bus.
Although not shown, a driving device for an external storage medium such as a CD-ROM, DVD, MO, or FD may be connected.
[0009]
The tone analysis program and tone synthesis program perform tone synthesis according to the so-called analysis and resynthesis (Analysis & Synthesis) method. That is, the spectrum of the musical tone waveform is analyzed to extract a line spectral component corresponding to the fundamental frequency included in the musical tone and its harmonic frequency. Specifically, a time window is applied to the acoustic signal waveform sample to be analyzed to perform Fourier analysis to obtain amplitude, phase, and frequency data (Short Time Fourier Analysis (STF)). The process of detecting all frequency positions forming peaks from the amplitude data of the analysis output is performed while moving the time window, and the trajectory of peaks in each frame is tracked (hereinafter, unless otherwise specified, The processing from the short-time Fourier analysis of the target sound to the peak tracking will be referred to herein as STF analysis processing). Then, desired data is selected from the trajectories obtained in this way, and the selected data is sine-wave-synthesized to form a deterministic (deterministic) component waveform of the original musical sound waveform. Combine. Then, the residual (probabilistic, non-deterministic) component (residual component) waveform is obtained by subtracting this deterministic component waveform from the original musical sound waveform.
At the time of synthesis, the deterministic waveform is modified by modifying and re-synthesizing the data of the trajectory, and the Residual waveform is modified by signal processing using an equalizer or the like, and a desired waveform is obtained by adding both. be able to.
[0010]
Next, the acoustic signal analysis method of the present invention will be described with reference to the flowcharts of FIGS.
The acoustic signal analysis method of the present invention is a method of calculating a specific component sound such as a resonance sound from audio signal waveform data including the specific component sound and audio signal waveform data not including the specific component sound. Both the acoustic signal waveform data including the specific component sound and the acoustic signal waveform data not including the specific component sound are separated into a Deterministic component and a Residual component by the short-time Fourier analysis described above, and the Deterministic component has a waveform including the specific component sound. Is subtracted by aligning the phase of the waveform with the phase of a waveform that does not include the specific component sound. The subtraction result of the Deterministic component and the Residual component By adding the can calculation result, and generates a sound signal waveform of only the specific component tones.
[0011]
Hereinafter, tone waveform sample data (P_kc_t_D.wave) containing a resonance component when the damper pedal of the piano is turned on as a specific component tone, and tone waveform sample data at the time of normal key pressing that does not contain such a resonance component (P_KC_t_ND.wave) will be described as an example to extract a resonance sound component waveform when the damper pedal is turned on. Here, in the name of the data, P indicates a musical instrument name (piano), kc is a key code (pitch), t is touch, D is a damper pedal is on, and ND is a damper pedal is off. It indicates that there is.
FIG. 6 is a diagram showing an example of such input data. FIG. 6A shows a piano G3 sound (P_3G_t_D.wave) when the damper pedal is turned on, and FIG. 6B shows a state in which the damper pedal is not depressed. G3 sound (P_3G_t_ND.wave) of the piano at the time.
[0012]
In step S1 of FIG. 2, first, a cueing process of the waveforms of both input data, that is, a process of adjusting the time axis of both input waveform data, is performed to generate waveform data excluding unnecessary sections of both input data. This processing is performed, for example, by adjusting one of the input data as a reference and adjusting the timing of the other waveform data so that the correlation between the two becomes maximum. Here, it is assumed that the timing of the waveform data (P_kc_t_D.wave) including the resonance component is adjusted based on the waveform data (P_KC_t_ND.wave) not including the resonance component. Therefore, as a result of the time axis alignment & waveform cutout processing in step S1, waveform data (P_kc_t_D_ASYNC.wave) including the cueed resonance component is generated.
[0013]
Next, in step S2, the short-time Fourier analysis described above is performed on the waveform data (P_kc_t_D_ASYNC.wave) including the cueed resonance component and the waveform data (P_KC_t_ND.wave) not including the resonance component, respectively. Is performed based on the STF. As a result, as described above, data (STF data) (P_kc_t_D_ASYNC.stf) obtained by subjecting the waveform data (P_kc_t_D_ASYNC.wave) including the resonance component to STF analysis is generated. .Det) and Residual waveform data (P_kc_t_D_ASYNC.res) is generated by subtracting the Deterministic waveform data from the original waveform data.
Similarly, STF data (P_KC_t_ND.stf), Deterministic waveform data (P_KC_t_ND.det), and Residual waveform data (P_KC_t_ND.res) are derived from waveform data (P_KC_t_ND.wave) that does not include the resonance component. Is done.
[0014]
FIG. 7A shows the STF data (P_3G_t_ND.stf) of the tone waveform (P_3G_t_ND.wave) that does not include the resonance component shown in FIG. 6B, and FIG. 7B shows the STF data. 7 shows STF data (P_kc_t_D_ASYNC.stf) of a musical sound waveform obtained by cueing out a musical sound waveform (P_3G_t_D.wave) including a resonance component shown in FIG. 6A. As shown in FIGS. 7A and 7B, the amplitude information (ie, time change of the envelope) and the phase information (ie, time change of the pitch) are obtained by the STF analysis process for each harmonic component. Can be requested.
[0015]
Next, a Deterministic waveform and a Residual waveform of a resonance component are extracted based on each data thus generated.
First, a process of extracting a deterministic waveform of a resonance component is performed. In order to extract the Deterministic waveform of the resonance component, the Deterministic waveform of the waveform data not including the resonance component may be subtracted from the Deterministic waveform of the waveform data including the resonance component. ) And (b), the harmonic components of both waveforms do not always match. That is, while the waveform containing no resonance component shown in FIG. 7A does not include the thirteenth harmonic component, the waveform containing the resonance component shown in FIG. Does not include the 14th harmonic component. Therefore, in order to subtract the Deterministic waveform components of both waveforms, it is necessary to match the harmonic components contained in both.
[0016]
Therefore, in step S3 of FIG. 3, only the harmonic components that are commonly included in each of the STF data (P_kc_t_D_ASYNC.stf and P_kc_t_ND.stf) are extracted and new STF data (P_kc_t_D_ASYNC_COM.stf and P_kc_N_st.Pf). Create At this time, the Deterministic waveform data P_kc_t_D_ASYNC. det2 and P_kc_t_ND_. det2 and Residual waveform data P_kc_t_D_ASYNC. res2 and P_kc_t_ND. res2 is also recreated. That is, the Deterministic waveform data includes data up to the twelfth harmonic, and the remaining harmonic components (the thirteenth harmonic component or the fourteenth harmonic component) are respectively shifted to Residual waveforms. Specifically, new Deterministic waveform data is synthesized based on the new STF data having the common harmonic components, respectively. Then, the re-synthesized waveform data is subtracted from the respective original waveform data to obtain corresponding new Residual waveform data.
[0017]
In the example shown in FIG. 7, new STF data (P_3G_t_D_ASYNC_COM.stf and P_3G_t_ND_COM.stf) including each STF data to the twelfth harmonic, Deterministic waveform data (P_3G_t_D_ASYNC.det_d_Nt.Det2_D_N_d.Det2_d_D_A_N_d.Det2_d_D_A_N_d.Det2_Dt2D.det2D.det2D.det2.Det2.Det2.Det2.Dt2D.det3.Dt2D.Det2D.Det2.Det2.Det2.Dt2.Dt2D3D.Det2D.Det2.Det2.Det2.Det2.Dt2D.Det2.Det2.Det2.Det3. Generate waveform data (P_3G_t_D_ASYNC.res2 and P_3G_t_ND.res2). At this time, the Deterministic components of the thirteenth harmonic and the fourteenth harmonic that are not included in the new STF data are respectively transferred to the Residual components. FIGS. 8A and 8B show new STF data (P_3G_t_D_ASYNC_COM.stf and P_3G_t_ND_COM.stf) generated in this manner. As is apparent from a comparison between FIGS. 8A and 8B, the newly generated STF data is data up to the twelfth harmonic.
[0018]
After the new STF data including only the common harmonic component is generated, the process proceeds to step S4 (FIG. 3), where the waveform data having the amplitude information of the tone including the resonance and the phase information of the tone not including the resonance is generated. Generate That is, the P_kc_t_D_COM. stf amplitude information and P_kc_t_ND_COM. The re-synthesis is performed using the phase information (frequency information) of the stf and the waveform data P_kc_T_D. Generate det #. By this processing, the waveform data P_kc_T_D. Since the waveform data obtained by subtracting the Deterministic waveform data (P_kc_t_ND_.det2) that does not include the recreated resonance sound from det # is phase-aligned and subtracted, the waveform data (P_kc_t_D_ASYNC) including the original resonance sound is subtracted. .Det2) is compared with data obtained by subtracting the recreated Deterministic waveform data (P_kc_t_ND_.det2) that does not include the resonance sound, and includes only the amplitude difference.
[0019]
The waveform data thus obtained is added to the Deterministic waveform data (P_kc_t_ND_.det2) that does not include the resonance, thereby obtaining the waveform data P_kc_T_D. Since it is det #, this waveform can also be used as it is as a deterministic waveform of the resonance sound. However, in this case, when this waveform is added to the waveform data (P_kc_t_ND.det2) that does not include the original resonance sound, the waveform data is also added to the attack portion of the original sound. , Change to fade in at the attack part. In step S5 in FIG. 3, the processing of the component sound included in the attack portion is performed.
[0020]
FIG. 5 is a flowchart showing the content of processing of the component sound included in the attack unit. In step S11 of FIG. 5, first, a Residual waveform (P_kc_t_ND.res2) not including the resonance sound is examined. FIG. 9 shows Residual waveform data P_3G_t_ND. It is a figure showing res2. In this example, as shown in FIG. 9, during a period of about 200 ms from the rise, a vibration sound (shelf sound) generated when the lower part of the keyboard hits the shelf holding the keyboard when the keyboard is pressed. You can see that. As is well known, in a piano, the sequence is roughly as follows: key depression operation start → shelf sound (at bottom dead center) → resonance sound (at top dead center) → hammer strike (string vibration sound). Therefore, the portion of the shelf sound does not include the resonance sound, and the original sound is retained.
[0021]
In step S11, if the result of checking the attack portion indicates that no superimposed component sound such as a shelf sound is included, the determination result in step S12 is NO, and the process proceeds to step S14.
On the other hand, when the superposed component sound is included as in the example shown in FIG. 9, the process proceeds to step S13, and the section time Tn including the superposed component sound (shelf sound) is measured. In the example shown in FIG. 9, the section time Tn is 200 ms.
Then, the process proceeds to step S14, and at a predetermined time interval Tx (for example, 100 ms) from the time after the lapse of the interval time Tn, the P_kc_t_ND_COM. stf and the P_kc_t_D_COM. stf and a new STF data P_kc_t_XF. Generate stf2. In the illustrated example, from the start of the attack to Tn (200 ms), the STF data P_kc_t_ND_COM. STF data P_3G_t_ND_COM. which does not include the resonance sound in a time from Tn to Tx, that is, a time from 200 ms to 300 ms. stf data P_3G_t_D_COM. In the section after 300 ms, the STF data P_3G_t_D_COM. new STF data P_3G_t_XF. Generate stf2.
[0022]
Thus, in this step S5 (FIG. 3), the section of the shelf sound does not include the resonance sound, and thereafter, the STF data P_3G_t_XF. stf2 is generated.
Then, the process proceeds to step S6, where the deterministic waveform data P_kc_t_DMP. Generate det. This process is based on the STF data P_3G_t_XF. Deterministic waveform data is re-synthesized from stf2, and the re-synthesized Deterministic waveform data is converted to Deterministic waveform data P_kc_t_ND. det2 to obtain the overtone component Deterministic waveform data P_kc_t_DMP. This is a process for generating det.
FIG. 10 shows Deterministic waveform data P_3G_t_DMP. det.
As described above, Deterministic waveform data of the resonance component was obtained.
[0023]
Next, extraction of Residual waveform data of a resonance component will be described.
The resonance sound (sound in a full string open state) is based on the overlapping or interference of subtle vibrations of a large number of strings with different pitches, and includes a Residual waveform of a musical tone including a resonance tone and a musical tone not including a resonance tone. Simply taking the difference between the Residual waveforms cannot provide a musical tone with a natural resonance. Therefore, in the present invention, as described below, band division is performed to extract Residual waveform data.
In order to extract Residual waveform data, in the present invention, a band filter group (band filter bank) is used for each of the Residual waveform data of a musical tone including a resonance tone and the Residual waveform data of a musical tone not including a resonance tone. An envelope is obtained, the envelopes are subtracted from each other, and the result is multiplied by a Residual waveform of a musical tone including the resonance sound.
[0024]
In step S7 of FIG. 4, the Residual waveform data (P_kc_t_D_ASYNC.res2) of the tone including the resonance and the Residual waveform data (P_kc_t_ND.res2) of the tone not including the resonance are band-divided, and the Residual of each band is divided. Component waveform data is obtained, and an amplitude envelope is generated from the Residual component waveform data for each band.
FIG. 11 is a diagram showing an example of the characteristics of a band filter bank used for this band division. In this case, as shown in FIG. 11, a plurality of bands having different center frequencies having frequency intervals set in logarithmic intervals are used. The description will be made assuming that the band is divided into five bands in (a) to (e) using a band filter bank including the band-pass filters described above. The number of divided bands is not limited to this, and may be any number.
[0025]
By passing Residual waveform data (P_kc_t_D_ASYNC.res2) of the musical tone including the resonance tone through a band filter bank, Residual waveform data (P_kc_t_D_res2_B1.res,..., P_kc_t_D_res2_Bn.res) for each band is obtained. Then, the amplitude envelopes (P_kc_t_D_res2_B1.env,..., P_kc_t_D_res2_Bn.env) are calculated from the obtained Residual waveform data for each band.
Further, by passing Residual waveform data (P_kc_t_ND.res2) of the musical tone not including the resonance tone through the band filter bank, Residual waveform data (P_kc_t_ND_res2_B1.res,..., P_kc_t_ND_res2_Bn.res.) For each band is obtained. Then, respective amplitude envelopes (P_kc_t_ND_res2_B1.env,..., P_kc_t_ND_res2_Bn.env) are calculated from the Residual waveform data for each band.
[0026]
(A) to (e) of FIG. 12 show the P_3G_t_D_ASYNC. res2 is input to the bandpass filter bank shown in FIG. 11, and the waveform data for each band, P_3G_t_D_res2_B1. res to P_3G_t_D_res2_B5. Res is shown.
13A to 13E show amplitude envelopes P_3G_t_D_res2_B1.B1... Of the Residual waveform data for each band shown in FIG. env to P_3G_t_D_res2_B5. env. The amplitude envelopes are normalized so that the maximum value is 100.
Further, (a) to (e) of FIG. 14 show the P_3G_t_ND. Res2 is input to the bandpass filter bank shown in FIG. 11 and the residual waveform data P_3G_t_ND_res2_B1. res to P_3G_t_ND_res2_B5. FIG. 15A to FIG. 15E show Residual waveform data P_3G_t_ND_res2_B1. res to P_3G_t_ND_res2_B5. res calculated from P.res for each band P_3G_t_ND_res2_B1. env to P_3G_t_ND_res2_B5. env.
[0027]
The amplitude envelope (FIG. 13) of the Residual waveform for each band of the musical tone including the resonance thus obtained is set as a target, and the amplitude envelope (FIG. 15) of each band of the musical tone not including the resonance is used as a reference. The subtraction of the same band may be performed, but if the result of the subtraction is used as it is, when the result (target) − (reference) is added to the reference, the attack portion of the reference may be deteriorated. There is. In addition, from the viewpoint of subtracting the sound of the sound, here, the attack portion is not subtracted but is stored as it is. Therefore, the subtraction is performed after masking the rising edge of each envelope.
[0028]
Step S8 in FIG. 4 carries out this masking process. Each of the envelopes P_3G_t_D_res2_B1. env to P_3G_t_D_res2_B5. env, P_3G_t_ND_res2_B1. env to P_3G_t_ND_res2_B5. For each env, processing is performed to change the envelope in the section (attack section) from time 0 to the maximum level arrival time to the maximum value (100).
FIG. 16 is a diagram showing amplitude envelopes (P_3G_t_D_res2_B1.env # to P_3G_t_D_res2_B5.env #) of each band of the Residual waveform (target) including the resonance sound subjected to the mask processing in step S8, and FIG. , The amplitude envelope P_3G_t_ND_res2_B1... Of each band of the Residual waveform (reference) that does not include the resonance sound that has also been subjected to the mask processing. env # to P_3G_t_ND_res2_B5. It is a figure which shows env #.
[0029]
Next, the process proceeds to step S9, and the envelopes (P_kc_t_RS_B1.env to P_kc_t_RS_B5.env) of the resonance sound for each band are calculated from the amplitude envelope for each band masked in step S8. This calculation is performed based on the following equation (1).
(Equation 1)

The envelopes P_kc_t_RS_B1. env to P_kc_t_RS_B5. env is considered to be an envelope due to a difference in resonance sound in each band.
[0030]
Next, the process proceeds to step S10, where the envelope of the resonance sound for each band calculated in step S9 is converted into the waveform data of the target for each band, that is, the waveform data P_kc_t_D_res2_Bx. res (x = 1, 2,..., n), and the difference waveform data P_kc_t_RS_Bx. wave (x = 1, 2,..., n) is obtained.
(Equation 2)

FIGS. 18A to 18E show Residual waveform data (P_kc_t_RS_B1.wave to P_kc_t_RS_B5.wave) of the resonance component of each band in the above-described example, which are calculated as described above.
[0031]
Then, after adding the thus-calculated Residual waveform data of the resonance component for each band, the resultant is multiplied by a coefficient Psr to calculate Residual waveform data (P_kc_t_DMP.res) of the resonance component.
That is,
[Equation 3]

Here, the coefficient Prs is the power of the difference between the Residual waveform (target) of the musical tone including the resonance and the Residual waveform (reference) of the musical tone not including the resonance, and the Residual waveform of the resonance calculated in the previous steps. This is a coefficient multiplied to match power.
FIG. 19 is a diagram illustrating the Residual waveform data (P_3G_t_DMP.res) of the resonance component of the G3 sound of the piano calculated in this manner.
[0032]
In this way, Residual waveform data of the resonance component could be obtained.
Next, the deterministic waveform data of the resonance component obtained in step S6 (FIG. 10) and the Residual waveform data of the resonance component (FIG. 19) are added, and the resonance component P_kc_t_DMP. Wave is calculated.
That is,
(Equation 4)

FIG. 20 is a diagram illustrating the resonance component (P_G3_t_DMP.wave) of the G3 sound of the piano calculated in this manner.
Thus, the resonance component was able to be extracted.
[0033]
FIG. 21 is a conceptual diagram showing an example of a case where the thus-extracted resonance component is used for musical tone generation.
In this figure, reference numeral 21 denotes musical tone waveform data (P_kc_t_ND.wave) containing no resonance component, 22 denotes waveform data of the resonance component extracted as described above (P_kc_t_DMP.wave), and 23 denotes a gain controller. As shown, the gain is controlled by the amount of depression of the damper pedal. Reference numeral 24 denotes an adder for adding and outputting the musical tone waveform data (P_kc_t_ND.wave) 21 not including the resonance component and the waveform data (P_kc_t_DMP.wave) of the resonance component. As described above, by adding the resonance component weighted according to the depression amount of the damper pedal to the tone waveform that does not include the resonance component, a tone waveform having a natural resonance can be generated. In particular, it is possible to change the sound level of the sound in accordance with subtle pedal control, and to reproduce the resonance effect when the pedal is depressed after the key is hit.
[0034]
In the above, the resonance component, particularly the resonance component when the damper pedal is operated has been described as an example of the specific component sound, but the musical sound analysis method of the present invention is not limited to this. For example, a tone waveform of ff-mf is created by analyzing musical sounds having different touches, for example, ff (fortessimo) and mf (mesoforte), and component sounds due to differences in touch are extracted. It can be applied to various cases.
In addition, the present invention can be similarly applied to a case where a specific component sound which is not included in one and is included in the other is extracted for two or a plurality of audio signals as well as the musical sound.
Further, in the above description, the sound signal analyzing and synthesizing apparatus is described as being executed. However, the acoustic signal analyzing method of the present invention can be realized as software on a general-purpose computer, or as a dedicated hardware configuration. It is also possible to realize.
[0035]
【The invention's effect】
As described above, according to the acoustic signal analysis method of the present invention, an acoustic signal including a specific component sound and an acoustic signal not including a specific component sound are added to the acoustic signal not including the specific component sound. Even if it is, a specific component sound without unnaturalness can be extracted.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of an apparatus to which an acoustic signal analysis method according to the present invention is applied.
FIG. 2 is a first flowchart illustrating an acoustic signal analysis method according to the present invention.
FIG. 3 is a second flowchart illustrating the acoustic signal analysis method of the present invention.
FIG. 4 is a third flowchart illustrating the acoustic signal analysis method of the present invention.
FIG. 5 is a diagram for explaining processing of a component sound included in an attack unit.
6A and 6B are diagrams showing examples of musical sound waveforms to be processed, wherein FIG. 6A is a musical sound waveform including a resonance component, and FIG. 6B is a musical sound waveform not including a resonance component.
7A and 7B are diagrams showing the results of short-time Fourier analysis of a musical sound waveform to be processed, wherein FIG. 7A shows the analysis result of a musical sound waveform not including a resonance component, and FIG. It is an analysis result.
8A and 8B are diagrams showing a result obtained by extracting a common harmonic component from the result of FIG. 7, wherein FIG. 8A is a musical tone not including a resonance component, and FIG. 8B is a musical tone including a resonance component.
FIG. 9 is a diagram showing Residual waveform data that does not include a resonance sound.
FIG. 10 is a view showing Deterministic waveform data of a resonance component.
FIG. 11 is a diagram illustrating an example of characteristics of a bandpass filter bank.
FIG. 12 is a diagram showing output waveforms of each band when a Residual component of a musical tone including a resonance component is passed through a band filter bank.
FIG. 13 is a diagram showing an amplitude envelope of the output of FIG. 12;
FIG. 14 is a diagram showing output waveforms of each band when a Residual component of a musical tone not including a resonance component is passed through a band filter bank.
FIG. 15 is a diagram illustrating an amplitude envelope of the output waveform of FIG. 14;
FIG. 16 is a diagram illustrating a result of performing a mask process on the amplitude envelope of FIG. 13;
FIG. 17 is a diagram illustrating a result of performing a mask process on the amplitude envelope in FIG. 15;
FIG. 18 is a diagram showing a component of a Residual waveform of a resonance component for each band.
FIG. 19 is a diagram showing a calculated Residual waveform of a resonance component.
FIG. 20 is a diagram showing a waveform of an extracted resonance component.
FIG. 21 is a conceptual diagram when using a resonance component extracted by the acoustic signal analysis method of the present invention.
[Explanation of symbols]
1 CPU, 2 program memory, 3 data memory, 4 display device, 5 input device, 6 performance operator, 7 tone synthesizer, 8 DAC, 9 network interface, 10 system bus, 11 communication network

Claims

Extracting the specific component sound from a first audio signal including the specific component sound and a second audio signal having the same pitch as the first audio signal and not including the specific component sound; An acoustic signal analysis method,
Wherein the first acoustic signal waveform spectral analysis, and obtaining the phase information for the amplitude information and the time axis for the time axis,
The second acoustic signal waveform spectral analysis, and obtaining the phase information for the amplitude information and the time axis for the time axis,
Generating a third acoustic signal waveform based on the amplitude information of the first acoustic signal and the phase information of the second acoustic signal;
Obtaining a harmonic component of the specific component sound waveform from the third acoustic signal waveform and the second acoustic signal waveform.

Extracting the specific component sound from a first audio signal including the specific component sound and a second audio signal having the same pitch as the first audio signal and not including the specific component sound; An acoustic signal analysis method,
Wherein the first acoustic signal waveform spectral analysis, and obtaining the phase information for the amplitude information and the time axis for the time axis,
The second acoustic signal waveform spectral analysis, and obtaining the phase information for the amplitude information and the time axis for the time axis,
From the amplitude information and the phase information of the first acoustic signal and the amplitude information and the phase information of the second acoustic signal, the amplitude information and the phase information are commonly included in the first acoustic signal waveform and the second acoustic signal waveform, respectively. Generating a third acoustic signal waveform based on the result of extracting the overtone component,
Obtaining a harmonic component of the specific component sound waveform from the third acoustic signal waveform and the second acoustic signal waveform.

Extracting the specific component sound from a first audio signal including the specific component sound and a second audio signal having the same pitch as the first audio signal and not including the specific component sound; An acoustic signal analysis method,
Wherein the first acoustic signal waveform spectral analysis of a first step of obtaining the phase information for the amplitude information and the time axis for the time axis,
The second acoustic signal waveform spectral analysis, a second step of obtaining the phase information for the amplitude information and the time axis for the time axis,
Based on a result of extracting a harmonic component commonly included in the first acoustic signal waveform and the second acoustic signal waveform from the first acoustic signal waveform, a second harmonic including the specific component sound is extracted. A third step of generating a third acoustic signal waveform;
The specific component sound is included based on a result of extracting a harmonic component commonly included in the first audio signal waveform and the second audio signal waveform from the second audio signal waveform. A fourth step of generating no fourth acoustic signal waveform;
A fifth step of obtaining a component for each of a plurality of bands of the non- overtone component waveform of the third acoustic signal waveform;
A sixth step of obtaining a component for each of a plurality of bands of the non- overtone component waveform of the fourth acoustic signal waveform;
An acoustic signal analysis method, comprising: obtaining a non- overtone component waveform of the specific component sound from the components for each band obtained in the fifth step and the sixth step.