JP4655811B2

JP4655811B2 - Acoustic signal processing apparatus and program

Info

Publication number: JP4655811B2
Application number: JP2005228664A
Authority: JP
Inventors: 崇野口; 徹北山
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-08-05
Filing date: 2005-08-05
Publication date: 2011-03-23
Anticipated expiration: 2025-08-05
Also published as: JP2007047215A

Description

この発明は、入力信号と、入力信号に対して処理を施して得た信号とを混合して出力するエフェクト機能を有する音響信号処理装置及び、コンピュータをこのような音響信号処理装置として機能させるためのプログラムに関する。 The present invention provides an acoustic signal processing device having an effect function of mixing and outputting an input signal and a signal obtained by performing processing on the input signal, and for causing a computer to function as such an acoustic signal processing device. Related to the program.

従来から、入力信号と、入力信号に対して処理を施して得たボイス信号と呼ばれる信号とを混合して出力する、ダブリングエフェクトと呼ばれるエフェクトを付与するエフェクタが知られている。
このようなエフェクタにおいて、ボイス信号は、ディレイを用いて入力信号を遅延させて生成したり、入力信号のピッチを単に一定量ずらすピッチシフトを行ったりして生成したりすることが行われていた。
また、出力信号は、入力信号とボイス信号を重ねたり、これらの各信号をパンで左右に定位させたりしたものである。 Conventionally, an effector for adding an effect called a doubling effect that mixes and outputs an input signal and a signal called a voice signal obtained by processing the input signal is known.
In such an effector, the voice signal is generated by delaying the input signal using a delay, or by performing a pitch shift that simply shifts the pitch of the input signal by a certain amount. .
The output signal is obtained by superimposing the input signal and the voice signal, or panning these signals left and right by panning.

このようなエフェクタについては、例えば以下の特許文献１乃至５に記載されている。
特許第３１８３１１７号公報米国特許第５３０１２５９号明細書米国特許第５２３１６７１号明細書米国特許第５５６７９０１号明細書米国特許第６０４６３９５号明細書 Such effectors are described in, for example, Patent Documents 1 to 5 below.
Japanese Patent No. 3183117 US Pat. No. 5,301,259 US Pat. No. 5,231,671 US Pat. No. 5,567,901 US Pat. No. 6,046,395

しかしながら、従来のエフェクタにおいては、入力信号のピッチに関係なく同じ基準でボイス信号の生成を行うようにしていた。そしてこのため、入力信号の音程が音階音からずれている場合にもそのずれを強調してしまったり、ボイス信号のピッチが音階音のピッチから大きく外れてしまったりして、ダブリングエフェクトをかけると却って不自然な聴感になってしまう場合があるという問題があった。
この発明は、このような問題を解決し、常に自然な聴感の出力信号を得られるエフェクトを実現することを目的とする。 However, in the conventional effector, the voice signal is generated based on the same reference regardless of the pitch of the input signal. For this reason, even if the pitch of the input signal deviates from the scale sound, the difference is emphasized, or the pitch of the voice signal deviates greatly from the pitch of the scale sound. On the other hand, there was a problem that an unnatural hearing might be caused.
An object of the present invention is to solve such a problem and realize an effect capable of always obtaining a natural audible output signal.

上記の目的を達成するため、この発明の音響信号処理装置は、入力信号にピッチ変換処理を施して加工信号を生成する加工信号生成手段と、上記入力信号と上記加工信号とを混合して出力する混合手段と、ある音階の所定範囲の全ての音階音のピッチを記憶するピッチ記憶手段と、上記入力信号と対応する音階音を推定する推定手段とを設け、上記加工信号生成手段を、上記推定手段が入力信号と対応する音階音を推定できた場合のみ、上記入力信号のピッチをその推定した音階音のピッチに近づけるように上記ピッチ変換処理を行って上記加工信号を生成する手段としたものであり、上記推定手段を、上記入力信号のピッチと、上記ピッチ記憶手段に記憶されているある音階音のピッチとの差が所定誤差範囲内の場合に、その音階音が上記入力信号と対応する音階音であると推定する手段としたものであり、上記所定誤差範囲を、上記ピッチ記憶手段がピッチを記憶する音階音の全てについて、ある音階音についての誤差範囲が他の音階音についての誤差範囲と重ならないように定めたものである。 In order to achieve the above object, an acoustic signal processing device according to the present invention includes a processing signal generating means for generating a processing signal by subjecting an input signal to pitch conversion processing, and a mixture of the input signal and the processing signal for output. and mixing means for the pitch storage means for storing the pitch of all the chromatic notes in a predetermined range of a scale, provided with estimation means for estimating a chromatic note corresponding to the input signal, the processing signal generating means, the If the estimating means is able to estimate the chromatic note corresponding to the input signal only, and means for generating the processing signal I row the pitch conversion processing to approximate the pitch of the input signal to the pitch of the estimated chromatic note all SANYO that, the estimation means, and the pitch of the input signal, when the difference between the pitch of a scale notes stored in said pitch storage means within a predetermined error range, the chromatic note is the It is a means for estimating that the scale signal corresponds to the force signal, and the predetermined error range is set for all the scale sounds for which the pitch storage means stores the pitch, and the error range for a certain tone is set for other scale sounds. It is determined so as not to overlap with the error range for the scale sound.

また、この発明は、装置の発明として構成し、実施することができるのみならず、方法、コンピュータまたはデジタル信号プロセッサ等のプロセッサのプログラム、そのようなプログラムを記憶した記憶媒体等の形態で実施することもできる。 In addition, the present invention can be configured and implemented as an apparatus invention, and is also implemented in the form of a method, a program of a computer or a processor such as a digital signal processor, and a storage medium storing such a program. You can also.

以上のようなこの発明の音響信号処理装置によれば、常に自然な聴感の出力信号を得られるエフェクトを実現することができる。
また、この発明のプログラムによれば、コンピュータを音響信号処理装置として機能させ、同様な効果を得ることができる。 According to the acoustic signal processing apparatus of the present invention as described above, it is possible to realize an effect that can always obtain a natural audible output signal.
Moreover, according to the program of this invention, a computer can be functioned as an acoustic signal processing apparatus, and the same effect can be acquired.

以下、この発明を実施するための最良の形態を図面に基づいて具体的に説明する。
〔第１の実施形態：図１乃至図１６〕
まず、図１を用いて、この発明の音響信号処理装置の第１の実施形態である電子楽器の構成について説明する。図１はその電子楽器の構成を示すブロック図である。
図１に示すように、この電子楽器１０は、ＣＰＵ１１，ＲＯＭ１２，ＲＡＭ１３，検出回路１４，表示回路１５，オーディオ信号インタフェース（Ｉ／Ｆ）１６，通信Ｉ／Ｆ１７，音源部１８，信号処理部１９を備え、これらがシステムバス２０によって接続されている。そして、検出回路１４には操作子２１が、表示回路１５には表示器２２が、信号処理部１９にはサウンドシステム２３が接続されている。 Hereinafter, the best mode for carrying out the present invention will be specifically described with reference to the drawings.
[First Embodiment: FIGS. 1 to 16]
First, the configuration of an electronic musical instrument which is a first embodiment of the acoustic signal processing apparatus of the present invention will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the electronic musical instrument.
As shown in FIG. 1, the electronic musical instrument 10 includes a CPU 11, a ROM 12, a RAM 13, a detection circuit 14, a display circuit 15, an audio signal interface (I / F) 16, a communication I / F 17, a sound source unit 18, and a signal processing unit 19. These are connected by a system bus 20. An operation element 21 is connected to the detection circuit 14, a display 22 is connected to the display circuit 15, and a sound system 23 is connected to the signal processing unit 19.

そして、ＣＰＵ１１は、電子楽器１０を統括制御する制御部であり、ＲＯＭ１２に記憶された所要の制御プログラムを実行することにより、検出回路１４を介した操作子２１の操作内容検出、表示回路１５を介した表示器２２の表示制御、オーディオ信号Ｉ／Ｆ１６を介したオーディオ信号の入力受付、通信Ｉ／Ｆ１７を介した通信の制御、音源部１８における波形データ生成の制御、信号処理部１９における信号処理の制御等の制御動作を行う。 The CPU 11 is a control unit that performs overall control of the electronic musical instrument 10, and by executing a required control program stored in the ROM 12, the operation content detection and display circuit 15 of the operation element 21 via the detection circuit 14 is detected. Display control of the display device 22 via the audio signal I / F 16, input reception of the audio signal via the audio signal I / F 16, control of communication via the communication I / F 17, control of waveform data generation in the sound source unit 18, signal in the signal processing unit 19 Control operations such as processing control are performed.

ＲＯＭ１２は、ＣＰＵ１１が実行する制御プログラムや、変更する必要のないデータ等を記憶する記憶手段である。このＲＯＭ１２をフラッシュメモリ等の書き換え可能な不揮発性記憶手段によって構成し、これらのデータを更新できるようにすることも考えられる。
ＲＡＭ１３は、ＣＰＵ１１のワークメモリとして使用したり、一時的に使用するパラメータの値等を記憶したりする記憶手段である。 The ROM 12 is a storage unit that stores a control program executed by the CPU 11, data that does not need to be changed, and the like. It is conceivable that the ROM 12 is constituted by rewritable nonvolatile storage means such as a flash memory so that these data can be updated.
The RAM 13 is a storage unit that is used as a work memory for the CPU 11 or that temporarily stores parameter values and the like.

検出回路１４は、操作子２１に対してなされた操作内容を検出してその内容に従った信号をＣＰＵ１１に伝達するための回路である。また、操作子２１は、キー、ボタン、ダイヤル、スライダ等によって構成され、電子楽器１０に対するユーザからの操作を受け付けるための操作手段である。なお、タッチパネルをＬＣＤに積層する等して表示器２２と操作子２１とを一体に形成することもできる。また、電子楽器１０の種類に応じて、鍵盤、弦、パッド、ペダル、ブレスコントローラ等、演奏操作を受け付けるための操作子も含む。 The detection circuit 14 is a circuit for detecting an operation content performed on the operation element 21 and transmitting a signal according to the content to the CPU 11. The operation element 21 is constituted by keys, buttons, dials, sliders, and the like, and is an operation means for receiving an operation from the user on the electronic musical instrument 10. Note that the display 22 and the operation element 21 can be integrally formed by, for example, stacking a touch panel on the LCD. In addition, depending on the type of the electronic musical instrument 10, an operation element for receiving a performance operation such as a keyboard, strings, pads, pedals, and a breath controller is included.

表示回路１５は、ＣＰＵ１１からの指示に従って表示器２２における表示を制御する回路である。また、表示器２２は、液晶ディスプレイ（ＬＣＤ）や発光ダイオード（ＬＥＤ）ランプ等によって構成され、電子楽器１０の動作状態や設定内容あるいはユーザへのメッセージ、ユーザからの指示を受け付けるためのグラフィカル・ユーザ・インタフェース（ＧＵＩ）等を表示するための表示手段である。 The display circuit 15 is a circuit that controls display on the display 22 in accordance with an instruction from the CPU 11. The display 22 is configured by a liquid crystal display (LCD), a light emitting diode (LED) lamp, or the like, and is a graphical user for receiving the operating state and setting contents of the electronic musical instrument 10, a message to the user, and an instruction from the user. Display means for displaying an interface (GUI) or the like.

オーディオ信号Ｉ／Ｆ１６は、マイクや他の音響機器等を接続し、オーディオ信号の入力を受け付けるためのインタフェースである。そして、ここに入力されたオーディオ信号は、信号処理部１９における信号処理に供するようにしている。このとき、ＲＡＭ１３等により信号をバッファできるようにしてもよい。また、アナログ信号の入力を受け付ける場合にはＡ／Ｄ変換を行ってデジタル信号に変換するようにしている。 The audio signal I / F 16 is an interface for connecting a microphone and other audio equipment and receiving an input of an audio signal. The audio signal input here is used for signal processing in the signal processing unit 19. At this time, the signal may be buffered by the RAM 13 or the like. When receiving an analog signal input, A / D conversion is performed to convert it into a digital signal.

通信Ｉ／Ｆ１７は、ＬＡＮ（ローカルエリアネットワーク）のようなネットワークに接続する等して、ＰＣ（パーソナルコンピュータ）等の外部装置と通信するためのインタフェースである。そして、例えばイーサネット（登録商標）規格のインタフェースを用いて構成することができる。
また、通信Ｉ／Ｆ１７として、他の電子楽器、音源装置等、ＭＩＤＩデータを取り扱う外部装置との間でＭＩＤＩデータの送受信を行うためのインタフェースを設けてもよい。このようなインタフェースは、例えばＵＳＢ規格や、ＩＥＥＥ１３９４（Institute of Electrical and Electronic Engineers 1394）規格、あるいはＲＳ２３２Ｃ（Recommended Standard 232 version C）規格等に準拠したインタフェースによって構成することができる。ＭＩＤＩデータとそれ以外のデータを、共通のインタフェースを介して送受信できるようにすることも考えられる。 The communication I / F 17 is an interface for communicating with an external device such as a PC (personal computer) by connecting to a network such as a LAN (local area network). For example, an Ethernet (registered trademark) standard interface can be used.
Further, as the communication I / F 17, an interface for transmitting / receiving MIDI data to / from an external device handling MIDI data, such as another electronic musical instrument or a sound source device, may be provided. Such an interface can be configured by an interface conforming to, for example, the USB standard, the IEEE 1394 (Institute of Electrical and Electronic Engineers 1394) standard, or the RS232C (Recommended Standard 232 version C) standard. It may be possible to transmit and receive MIDI data and other data via a common interface.

音源部１８は、演奏操作子の操作に従ってＣＰＵ１１が生成したり、通信Ｉ／Ｆ１７を介して外部装置から受信したりしたＭＩＤＩ形式の演奏データを基に、複数の発音チャンネルでデジタル音響信号である波形データを生成する音源手段である。そして、生成した波形データは信号処理部１９に入力して信号処理に供する。 The sound source unit 18 is a digital sound signal in a plurality of sound generation channels based on MIDI performance data generated by the CPU 11 according to the operation of the performance operator or received from an external device via the communication I / F 17. This is a sound source means for generating waveform data. The generated waveform data is input to the signal processing unit 19 for signal processing.

信号処理部１９は、エフェクタやミキサ等として機能し、音源部１８によって生成されたりオーディオ信号Ｉ／Ｆ１６を介して入力されたりした波形データに対し、ＣＰＵ１１により設定される処理パラメータに従ったエフェクト付与やミキシング等の信号処理を施す信号処理手段である。また、処理後の信号は、サウンドシステム２３に入力し、その信号に基づく発音を行わせるようにしている。
これらの音源部１８や信号処理部１９は、ソフトウェアによって実現してもハードウェアによって実現してもよい。 The signal processing unit 19 functions as an effector, a mixer, and the like, and applies effects according to processing parameters set by the CPU 11 to waveform data generated by the sound source unit 18 or input via the audio signal I / F 16. And signal processing means for performing signal processing such as mixing. Further, the processed signal is input to the sound system 23, and sound generation based on the signal is performed.
The sound source unit 18 and the signal processing unit 19 may be realized by software or hardware.

ところで、上述の電子楽器１０は、信号処理部１９に、入力信号に対してダブリングのエフェクトを付与するダブリングエフェクタを備えている。なおここでは、ダブリングとは、元の音に、その音を加工した音を重ね、音に厚みを出す処理を指すものとする。そして、このようなダブリング自体は、ダブルトラック録音やＡＤＴ（Artificial Double Tracking）を模した効果として広く利用されており、ボーカルだけでなくエレキギター等の楽器音に使われることも多い。 By the way, the electronic musical instrument 10 described above includes a doubling effector that applies a doubling effect to the input signal in the signal processing unit 19. Here, doubling refers to a process of adding a processed sound to the original sound to increase the thickness of the sound. Such doubling itself is widely used as an effect simulating double track recording or ADT (Artificial Double Tracking), and is often used not only for vocals but also for musical instruments such as electric guitars.

また、上記の「加工」としては例えば、ディレイ、パンニング、ピッチシフト、あるいはこれらにモジュレーションやデチューンを組み合わせた処理が使用されていた。
この電子楽器１０において行う「加工」も、処理の内容としてはこれらのものを利用可能であるが、これらの「加工」を、入力音のピッチと音階音のピッチとの関係を意識して行うようにしており、この点が、この実施形態の特徴である。 In addition, as the above-mentioned “processing”, for example, delay, panning, pitch shift, or a process in which these are combined with modulation or detune has been used.
The “processing” performed in the electronic musical instrument 10 can also be used as the contents of the processing, but these “processing” are performed in consideration of the relationship between the pitch of the input sound and the pitch of the scale sound. This is a feature of this embodiment.

次に、図２に、信号処理部１９に備える上記のようなダブリングエフェクタの機能構成を示す。
図２に示すように、ダブリングエフェクタ３０は、ボイス信号生成部４０，遅延処理部５０，ミックス部６０を備えている。
このうち、ボイス信号生成部４０は、入力信号に対してピッチ変換処理を行ってボイス信号を生成する加工信号生成手段であり、その構成は図３に示すものである。
そして、図３に示すように、ボイス信号生成部４０は、ピッチ検出部４１，ピッチバッファ４２，音階音推定部４３，音高テーブル４４，推定値バッファ４５，ピッチ加工部４６，ピッチ変換部４７を備えている。 Next, FIG. 2 shows a functional configuration of the doubling effector as described above provided in the signal processing unit 19.
As shown in FIG. 2, the doubling effector 30 includes a voice signal generation unit 40, a delay processing unit 50, and a mixing unit 60.
Among these, the voice signal generation unit 40 is a processing signal generation means for generating a voice signal by performing a pitch conversion process on the input signal, and its configuration is shown in FIG.
As shown in FIG. 3, the voice signal generation unit 40 includes a pitch detection unit 41, a pitch buffer 42, a scale sound estimation unit 43, a pitch table 44, an estimated value buffer 45, a pitch processing unit 46, and a pitch conversion unit 47. It has.

そして、ピッチ検出部４１は、入力信号のピッチ（１周期分の時間長もしくはその周波数）を検出してピッチ情報を取得する機能を有するピッチ検出手段である。ピッチ情報取得のためのピッチ検出処理の詳細については、後述する。
ピッチバッファ４２は、ピッチ検出部４１が検出した入力信号のピッチのサンプルを一時的に記憶しておく機能を有する。このピッチバッファは、容量が一杯になった場合に古いデータから消去するリングバッファとするとよい。 The pitch detector 41 is a pitch detector having a function of acquiring pitch information by detecting the pitch (time length of one cycle or its frequency) of the input signal. Details of the pitch detection process for acquiring pitch information will be described later.
The pitch buffer 42 has a function of temporarily storing a sample of the pitch of the input signal detected by the pitch detector 41. The pitch buffer may be a ring buffer that erases old data when the capacity is full.

音階音推定部４３は、入力信号と対応する音階音を推定する機能を有する推定手段である。そして、ピッチバッファ４２に記憶している入力信号のピッチと、音高テーブル４４に記憶している音階音のピッチとを比較し、入力信号のピッチと、ある音階音のピッチとの差が所定誤差範囲内である場合、入力信号と対応する音階音がその音階音であると推定するようにしている。 The scale sound estimation unit 43 is an estimation unit having a function of estimating a scale sound corresponding to an input signal. Then, the pitch of the input signal stored in the pitch buffer 42 is compared with the pitch of the scale sound stored in the pitch table 44, and the difference between the pitch of the input signal and the pitch of a certain scale sound is predetermined. If it is within the error range, it is estimated that the scale sound corresponding to the input signal is the scale sound.

より具体的には、例えば、音高テーブル４４に、平均律の音階音として、音高ｎ＝０〜１２７についてピッチＳｃａｌｅ（ｎ）＝４４０×２^{（ｎ−６９）／１２}Ｈｚ（ヘルツ）を記憶させておくと共に、各音高の許容誤差範囲Δｈ（ｎ）を適当な値に定めておき、入力信号のピッチＰ_Ｉ（ここでは周波数）について以下の数１を満たすｎが存在する場合に、その入力信号と対応する音階音の音高がｎであると推定するようにすることが考えられる。

More specifically, for example, a pitch Scale (n) = 440 × 2 ^{(n−69) / 12} Hz (Hertz) is applied to the pitch table 44 as pitches of equal temperament, with pitches n = 0 to 127. In addition, the allowable error range Δh (n) of each pitch is set to an appropriate value, and n satisfying the following formula 1 is present for the pitch P _I (frequency in this case) of the input signal. It is conceivable to estimate that the pitch of the scale sound corresponding to the input signal is n.

この場合、Δｈ（ｎ）の値は、ｎによって異なっていてもよいが、以下の数２を満たすように定めるようにする。すなわち、各音階音に関する許容誤差範囲が重ならないように定めるようにする。また、Ｐ_Ｉがどのｎについても数１を満たさない場合は、入力信号はどの音階音とも対応しないと推定し、推定値として無効な値を出力するようにする。

In this case, the value of Δh (n) may be different depending on n, but is determined so as to satisfy the following formula 2. That is, it is determined so that the allowable error ranges related to the scale sounds do not overlap. Also, if P _I do not even meet the number 1 for any n, it estimates that the input signal does not correspond to any scale sound, so as to output an invalid value as the estimated value.

また、ある時刻ｔにおいて、Δｗ（ｎ）だけ過去の時点（すなわち時刻ｔ−Δｗ（ｎ））から現在まで連続して同じｎについてＰ_Ｉが上記数１を満たす場合のみ、入力信号と対応する音階音の音高がｎであると推定するようにしてもよい。入力信号がある音階音から別の音階音に移行する途中で、その間の音階音についてごく短時間だけ数１を満たしてしまうことも考えられるが、このような場合には入力信号が音階音と対応していると捉えない方が適切であるためである。 Further, at a certain time t, only if P _I for the same n continuously from [Delta] w (n) by a past time (i.e., time t-Δw (n)) to the current satisfies the above Equation 1, corresponding to the input signal You may make it estimate that the pitch of a scale sound is n. While the input signal shifts from one scale sound to another, it is possible that the scale sound during that time will satisfy Equation 1 for a very short time. In such a case, the input signal is This is because it is more appropriate not to think that it is compatible.

ここでは、この方式を採用しており、音階音推定部４３は、数１により推定した音階音のピッチを推定値バッファ４５に記憶させておき、数１により推定した音階音がΔｗ（ｎ）だけの間同じ音高であった場合に、推定値バッファ４５に記憶させているピッチを、入力信号と対応すると推定される音階音のピッチである推定音階音ピッチＰ_Ｅとしてピッチ加工部４６に出力するようにしている。数１により推定される音階音がない場合や、Δｗ（ｎ）だけの間同じ音高でない場合には、推定音階音ピッチＰ_Ｅとしてその旨のデータをピッチ加工部４６に出力するようにすればよい。なお、Δｗ（ｎ）の値も、ｎによって異なっていてよい。 Here, this method is adopted, and the scale sound estimation unit 43 stores the pitch of the scale sound estimated by Equation 1 in the estimated value buffer 45, and the scale sound estimated by Equation 1 is Δw (n). The pitch stored in the estimated value buffer 45 in the pitch processing unit 46 as the estimated scale pitch P _E that is the pitch of the scale sound estimated to correspond to the input signal. I am trying to output. And when there is no chromatic note is estimated by the number 1, if not by the same pitch between the [Delta] w (n) is them to data to that effect as the estimated chromatic note pitch P _E to output the pitch processing unit 46 That's fine. Note that the value of Δw (n) may be different depending on n.

図４に、以上の方式で入力信号と対応する音階音を推定した場合の推定結果の例を示す。
この図に示すように、入力信号のピッチが時間の経過に従って実線７１に示すように推移した場合、上述した方式で入力信号の音高を推定すると、時刻ｔ２からｔ３までの間は入力信号と対応する音階音の音高がｋ、時刻ｔ５からｔ６までの間は入力信号と対応する音階音の音高がｋ−１であると推定することになる。破線７２〜７５が各音階音の音高における許容誤差範囲を、太線７６，７７が入力信号Ｐ_Ｉと対応する音階音及びそのピッチを示している。時刻ｔ１以前にも、Ｐ_ＩがＳｃａｌｅ（ｋ−１）の許容誤差範囲内に入っているが、その時間はΔｗ（ｋ−１）より短いため、ここではＰ_Ｉと対応する音階音はないと推定されている。 FIG. 4 shows an example of the estimation result when the scale sound corresponding to the input signal is estimated by the above method.
As shown in this figure, when the pitch of the input signal changes as shown by the solid line 71 with the passage of time, if the pitch of the input signal is estimated by the above-described method, the input signal is It is estimated that the pitch of the corresponding scale sound is k, and the pitch of the scale sound corresponding to the input signal is k-1 between the times t5 and t6. The tolerance dashed 72-75 is the pitch of each note in the scale, thick lines 76 and 77 shows a chromatic note and its pitch corresponding to the input signal P _I. Before time t1 to be, although _{P I} is within the allowable error range of Scale (k-1), the time is shorter than Δw (k-1), there is no chromatic note corresponding to _{P I} here It is estimated that.

なお、リアルタイムでエフェクト処理を行う際には、将来の時点における入力信号のピッチＰ_Ｉはわからないが、予め用意された音響信号についてエフェクト処理を行う際には、将来の時点における入力信号のピッチＰ_Ｉも処理に利用できる。そして、この場合、上記のΔｗ（ｎ）に関し、Δｗ（ｎ）以上の時間連続して同じｎについてＰ_Ｉが上記数１を満たす場合に、その連続する全ての時間範囲で、入力信号と対応する音階音の音高がｎであると推定するようにすることも考えられる。このようにした場合、図５に示した例では、時刻ｔ１からｔ３までの間は入力信号と対応する音階音の音高がｋ、時刻ｔ４からｔ６までの間は入力信号と対応する音階音の音高がｋ−１であると推定することになる。
また、ある時点からΔｗ（ｎ）だけ未来の時間まで連続して同じｎについてＰ_Ｉが上記数１を満たす場合に、その時点において、入力信号と対応する音階音の音高がｎであると推定するようにすることも考えられる。 Incidentally, when performing effect processing in real time, but do not know the pitch P _I of an input signal at a future time, when performing effect processing on previously prepared acoustic signal, the pitch P of the input signal at a future time _I can also be used for processing. In this case, relates to the aforementioned [Delta] w (n), when P _I for the same n consecutively [Delta] w (n) or longer satisfies the above Equation 1, at all time range that continuous, corresponding to the input signal It is also conceivable to estimate that the pitch of the scale sound to be performed is n. In this case, in the example shown in FIG. 5, the pitch of the scale sound corresponding to the input signal is k from time t1 to time t3, and the scale sound corresponding to the input signal is from time t4 to time t6. Is assumed to be k−1.
Further, when P _I for the same n continuously from a certain time point [Delta] w (n) only future time satisfies the above Equation 1, at which point, the pitch of the chromatic note corresponding to the input signal is a n It is also conceivable to make an estimation.

次に、ピッチ加工部４６は、入力信号に対するピッチシフト量を求め、これをピッチ検出部４１が取得した入力信号のピッチに加算して、ボイス信号のピッチを示すピッチ情報を生成し、ピッチ変換部４７におけるピッチ変換処理に供する機能を有する。このとき、ピッチシフト量は、ボイス信号のピッチが、入力信号のピッチよりも、音階音推定部４３により入力信号と対応すると推定された音階音のピッチに近づくように定めるとよい。また、音階音推定部４３により入力信号と対応する音階音が推定できた場合のみ、そのピッチシフト量を求め、ピッチ変換部４７によりボイス信号を生成させるようにするとよい。これらのピッチ加工処理については、後に詳述する。 Next, the pitch processing unit 46 obtains a pitch shift amount with respect to the input signal, adds this to the pitch of the input signal acquired by the pitch detection unit 41, generates pitch information indicating the pitch of the voice signal, and performs pitch conversion. The unit 47 has a function used for pitch conversion processing. At this time, the pitch shift amount may be determined so that the pitch of the voice signal is closer to the pitch of the scale sound estimated by the scale sound estimation unit 43 to correspond to the input signal than the pitch of the input signal. Further, only when the scale sound estimation unit 43 can estimate the scale sound corresponding to the input signal, the pitch shift amount is obtained, and the pitch conversion unit 47 generates the voice signal. These pitch processing processes will be described in detail later.

ピッチ変換部４７は、ピッチ検出部４１が取得した入力信号のピッチ情報と、ピッチ加工部４６が生成したボイス信号のピッチ情報とを利用し、入力信号に対してピッチ変換処理を行ってボイス信号を生成する機能を有するピッチ変換手段である。このとき、なるべく音色を変えず、ピッチのみ変換するような処理を行うことが好ましい。この実施形態で採用しているピッチ変換処理については後に詳述する。また、ピッチ変換部４７は、生成したボイス信号をミックス部６０に供給する。 The pitch conversion unit 47 uses the pitch information of the input signal acquired by the pitch detection unit 41 and the pitch information of the voice signal generated by the pitch processing unit 46 to perform a pitch conversion process on the input signal to obtain a voice signal. Is a pitch conversion means having a function of generating. At this time, it is preferable to perform processing that converts only the pitch without changing the timbre as much as possible. The pitch conversion process employed in this embodiment will be described in detail later. In addition, the pitch conversion unit 47 supplies the generated voice signal to the mixing unit 60.

図２の説明に戻ると、遅延処理部５０は、バッファメモリ等によって構成され、ミックス部６０に入力する入力信号を、ボイス信号生成部４０でのボイス信号生成処理に必要な時間だけ遅延する遅延手段である。この遅延の長さは、例えば２０ミリ秒（ｍｓ）程度とすればよい。 Returning to the description of FIG. 2, the delay processing unit 50 is configured by a buffer memory or the like, and delays the input signal input to the mixing unit 60 by a time necessary for the voice signal generation processing in the voice signal generation unit 40. Means. The length of this delay may be about 20 milliseconds (ms), for example.

ミックス部６０は、入力信号とボイス信号とを混合して出力する混合手段であり、ゲイン調整部６１，６４，パン調整部６２，６５，加算部６３，６６を備えている。そして、遅延処理部５０によって遅延された入力信号と、ボイス信号生成部４０によって生成されたボイス信号とに対してそれぞれゲイン調整部６１，６４でゲイン調整を行った上でパン調整部６２，６５によりＬ側信号とＲ側信号に振り分け、これらを加算部６３，６６で加算して、ＬとＲのステレオ信号として出力する。
なお、ゲイン調整やパン調整は必須ではなく、単に入力信号とボイス信号とを加算して出力するようにしてもよい。 The mixing unit 60 is a mixing unit that mixes and outputs an input signal and a voice signal, and includes gain adjusting units 61 and 64, pan adjusting units 62 and 65, and adding units 63 and 66. The gain adjustment units 61 and 64 perform gain adjustment on the input signal delayed by the delay processing unit 50 and the voice signal generated by the voice signal generation unit 40, respectively, and then the pan adjustment units 62 and 65. Are assigned to the L side signal and the R side signal, added by the adders 63 and 66, and output as an L and R stereo signal.
Note that gain adjustment and pan adjustment are not essential, and an input signal and a voice signal may be simply added and output.

信号処理部１９に、他のエフェクタやミキサの機能を設けてもよいことはもちろんであり、ダブリングエフェクタ３０の出力を、それらのエフェクタに入力してさらにエフェクトを付与したり、ミキサに入力してミキシング処理に供したりすることもできる。逆に、他のエフェクタやミキサによる処理後の信号をダブリングエフェクタ３０に入力するようにすることも考えられる。
また、信号処理部１９が複数のチャンネルで信号処理を行う場合に、エフェクタを各チャンネル毎に設けてそれぞれ独立に動作させられるようにしてよいことは、もちろんである。 Of course, the signal processing unit 19 may be provided with functions of other effectors and mixers, and the output of the doubling effector 30 is input to these effectors for further effects, or input to the mixer. It can also be used for mixing processing. Conversely, it is also conceivable to input a signal after processing by another effector or mixer to the doubling effector 30.
In addition, when the signal processing unit 19 performs signal processing on a plurality of channels, it is needless to say that an effector may be provided for each channel and operated independently.

次に、図５を用いて、ピッチ検出部４１におけるピッチ検出処理について説明する。
ピッチ検出部４１においては、ピッチの検出は、基本的には、入力信号波形１０１と、その入力信号波形１０１の＋側及び−側のエンベロープに所定値（又は所定の関数値）を乗算して得た＋側エンベロープ１０２及び−側エンベロープ１０３とが交差する（サンプル値の大小関係が入れ替わる）タイミングを検出することにより行っている。 Next, the pitch detection process in the pitch detection part 41 is demonstrated using FIG.
In the pitch detection unit 41, basically, the pitch is detected by multiplying the input signal waveform 101 and the + side and − side envelopes of the input signal waveform 101 by a predetermined value (or a predetermined function value). This is done by detecting the timing at which the obtained + side envelope 102 and − side envelope 103 intersect (the magnitude relationship between the sample values is switched).

より具体的には、検出フラグＩＲＱを用意し、入力信号波形１０１が＋側エンベロープ１０２と交差した時点Ｔ_１，Ｔ_３でこれを０から１に立ち上げ、入力信号波形１０１が−側エンベロープ１０３と交差した時点Ｔ_２，Ｔ_４で１から０に立ち下げるようにし、ＩＲＱフラグの立ち上がりから次の立ち上がりまでの時間を、サンプル数をカウントすることにより計測するようにしている。この計測した時間（サンプル数）を検出したピッチとする。 More specifically, a detection flag IRQ is prepared, and when the input signal waveform 101 intersects with the + side envelope 102, this is raised from 0 to 1 at the time points T ₁ and T ₃ , and the input signal waveform 101 becomes the − side envelope 103. At time points T ₂ and T ₄ when crossing, the time is decreased from 1 to 0, and the time from the rising edge of the IRQ flag to the next rising edge is measured by counting the number of samples. The measured time (number of samples) is taken as the detected pitch.

なお、Ｔ_１の直後にも入力信号波形１０１が＋側エンベロープ１０２と交差するが、この時点ではＩＲＱは既に１であるので、立ち上がりは起こらない。そして、Ｔ_２にＩＲＱフラグが立ち下がった後で入力信号波形１０１が＋側エンベロープ１０２と交差するＴ_３で、次の立ち上がりが起こる。同様なことが、−側エンベロープ１０３についても言える。
このようにエンベロープを利用するのは、高調波成分を多く含み、１周期内で何度もゼロクロスを繰り返すような信号や、波形の形が崩れていくつものピークを持つような信号等についてのピッチ検出間違いを防ぐためで、ゼロクロスのみの検出に比べるとはるかに正確なピッチが得られる。 Although intersects the input signal waveform 101 is the + side envelope 102 immediately after T _1, since at this point the IRQ is already 1, the rise does not occur. Then, after the IRQ flag falls at T ₂ , the next rise occurs at T ₃ where the input signal waveform 101 intersects the + side envelope 102. The same is true for the negative envelope 103.
The use of the envelope in this way is the pitch for signals that contain many harmonic components and repeat zero crosses many times within one period, or signals that have a number of peaks due to their waveform being deformed. In order to prevent detection errors, a much more accurate pitch can be obtained compared to detection of only zero cross.

またここでは、処理対象のオーディオ信号のサンプリング周波数は４４．１キロヘルツ（ｋＨｚ）とし、この場合サンプリング周期は約０．０２ｍｓである。そして、サンプリング周期より細かい精度でピッチを求めようとする場合には、補間を行って、交差のタイミングをより細かく求めるようにすることも考えられる。
また、図５に示した例では、＋側及び−側のエンベロープ１０２，１０３は、時間の経過に応じて減衰するようなものとし、前者はＩＲＱフラグの立ち上がり、後者は立ち下がりをトリガに減衰をリセットするようなものとしている。 Here, the sampling frequency of the audio signal to be processed is 44.1 kilohertz (kHz), and in this case, the sampling period is about 0.02 ms. When it is desired to obtain the pitch with a finer accuracy than the sampling period, it is conceivable to perform interpolation to obtain the intersection timing more finely.
In the example shown in FIG. 5, the + and − envelopes 102 and 103 are attenuated as time elapses, the former is attenuated by the rising edge of the IRQ flag, and the latter is triggered by the falling edge. Is like resetting.

そして、ピッチ検出部４１は、以上のような動作により検出した入力信号のピッチを、順にピッチバッファ４２に記録していき、所定タイミング毎、ここでは６ｍｓ毎に、ピッチバッファ４２に記録したピッチのうち所定個数の平均値を、その時点の入力信号波形１０１のピッチを示すピッチ情報として出力するようにしている。また、上記の所定個数は例えば１６個とすればよく、ピッチバッファ４２に記録した数がこれに満たない場合には、既に記録されている分のみの平均値とすればよい。 Then, the pitch detection unit 41 sequentially records the pitch of the input signal detected by the operation as described above in the pitch buffer 42, and at a predetermined timing, here, every 6 ms, the pitch recorded in the pitch buffer 42 is recorded. Among them, a predetermined number of average values are output as pitch information indicating the pitch of the input signal waveform 101 at that time. Further, the predetermined number may be set to 16, for example, and if the number recorded in the pitch buffer 42 is less than this, the average value for the amount already recorded may be used.

また、上記の検出を行う場合に、ノイズを除去して精度を上げるため、また、ボイス信号の生成を行うべき部分と行うべきでない部分を区別するため、検出条件や検出結果について、以下のような評価を行うようにするとよい。 In addition, when performing the above detection, in order to remove noise and improve accuracy, and to distinguish a portion that should generate a voice signal from a portion that should not be generated, detection conditions and detection results are as follows. It is recommended to make a proper evaluation.

まず、ピッチ検出は、入力信号のレベルが所定値以上の場合にのみ行うようにするとよい。あまりにレベルが低い信号は、無音の信号に混入したノイズと考えられるためである。
また、入力信号波形１０１がゼロレベルと交差するゼロクロスの回数をカウントし、時間当たりのゼロクロス回数が所定値以上あった場合に、ピッチ検出を行わないようにするとよい。この閾値をここでは６ｍｓ当たり３０回以上としている。このようになる部分では、入力信号は、人の声のうち子音に該当するものであり、このような部分ではボイス信号の加算を行わない方が好ましい出力音が得られることが経験的にわかっているので、ピッチ検出をやめ、それに連動させてボイス信号の生成も停止させるためである。 First, the pitch detection may be performed only when the level of the input signal is equal to or higher than a predetermined value. This is because a signal whose level is too low is considered as noise mixed in a silent signal.
Also, the number of zero crossings where the input signal waveform 101 crosses the zero level is counted, and when the number of zero crosses per time is equal to or greater than a predetermined value, it is preferable not to perform pitch detection. Here, the threshold value is 30 times or more per 6 ms. In such a part, the input signal corresponds to a consonant of human voice, and it is empirically found that it is preferable to add the voice signal in such a part without adding the voice signal. Therefore, the pitch detection is stopped and the generation of the voice signal is stopped in conjunction with the detection.

また、以上の基準を満たす入力信号に対してピッチの検出を開始した場合でも、連続して検出したピッチのばらつきが、所定範囲内、例えば１２．５％以内であった場合に初めて連続検出モードに移行し、これが満たされるまでは検出したピッチの値をピッチバッファ４２に記録しないようにするとよい。誤差が大きい場合には、検出結果を信用できないためである。 Even when the pitch detection is started for an input signal satisfying the above criteria, the continuous detection mode is not used until the continuously detected pitch variation is within a predetermined range, for example, 12.5%. It is preferable that the detected pitch value is not recorded in the pitch buffer 42 until this condition is satisfied. This is because the detection result cannot be trusted when the error is large.

さらに、ＩＲＱフラグの立ち上がりから立ち下がりまでの期間の長さをＰＣＮＴ１、立ち下がりから立ち上がりまでの期間の長さをＰＣＮＴ０としてそれぞれ計測し、以下の（ａ）〜（ｃ）の値を求めてバッファに記録し、最新の検出値を１つ前にピッチバッファ４２に記録した値と比較した場合の誤差が所定範囲内、例えば１２．５％以内であった場合にのみ、検出した値を新たにピッチバッファ４２に記録するようにしてもよい。（ａ）〜（ｃ）のうち任意の個数について同時に誤差が所定範囲内であった場合に記録を行うようにしてもよい。
（ａ）ＰＣＮＴ１＋ＰＣＮＴ０（ＩＲＱフラグの立ち上がりから次の立ち上がりまで）
（ｂ）ＰＣＮＴ０＋ＰＣＮＴ１（ＩＲＱフラグの立ち下がりから次の立ち下がりまで）
（ｃ）２周期分のＰＣＮＴ１＋ＰＣＮＴ０
（ａ）は図５に示したピッチの検出値そのものである。 Further, the length of the period from the rise to the fall of the IRQ flag is measured as PCNT1, the length of the period from the fall to the rise is measured as PCNT0, and the following values (a) to (c) are obtained and buffered: When the error when the latest detected value is compared with the value previously recorded in the pitch buffer 42 is within a predetermined range, for example, within 12.5%, the detected value is newly set. You may make it record on the pitch buffer 42. FIG. Recording may be performed when the error is within a predetermined range for any number of (a) to (c).
(A) PCNT1 + PCNT0 (from the rise of the IRQ flag to the next rise)
(B) PCNT0 + PCNT1 (from the fall of the IRQ flag to the next fall)
(C) PCNT1 + PCNT0 for two cycles
(A) is the detected pitch value itself shown in FIG.

また、上記（ａ）〜（ｃ）に代えてまたはこれに加えて、（ｄ）として２周期分のＰＣＮＴ１＋ＰＣＮＴ０を検出して（ａ）の２倍の値と比較し、周期ミスの確認を行うようにしてもよい。
さらに、上記の（ａ）〜（ｄ）で誤差が所定範囲内でなかった場合に、検出ミスとしてその回数をカウントし、これが所定回数以上となった場合に検出を中止して初めからやり直すようにしてもよい。
例えば、ミスが３回以下の場合には単にピッチバッファ４２への記録を行わずにピッチ検出を続行し、ミスが４回から７回の場合には検出した値をピッチバッファ４２に記録し、比較対象の値を更新してピッチ検出を続行し、ミスが８回以上の場合にはそれまでピッチバッファ４２に記録したデータを全て破棄して初めから検出をやり直す等である。 Also, instead of or in addition to the above (a) to (c), PCNT1 + PCNT0 for two cycles is detected as (d) and compared with twice the value of (a), and a cycle error is confirmed. You may do it.
Further, when the error is not within the predetermined range in the above (a) to (d), the number of times is counted as a detection error, and when this exceeds the predetermined number, the detection is stopped and the process is started again from the beginning. It may be.
For example, if the miss is 3 times or less, the pitch detection is continued without simply recording in the pitch buffer 42, and if the miss is 4 to 7 times, the detected value is recorded in the pitch buffer 42, The value to be compared is updated and the pitch detection is continued. If there are more than eight mistakes, all the data recorded in the pitch buffer 42 is discarded and the detection is restarted from the beginning.

次に、図６及び図７を用いて、ピッチ変換部４７におけるピッチ変換処理について説明する。
ピッチ変換部４７においては、ピッチ変換処理として、入力信号１１１を窓関数を用いて切り出し、これを要素として並べ、その並べる周期によって変換後の波形のピッチを決定する処理を行うようにしている。なおここでは、図６及び図７に示すように、入力信号１１１の切り出しは、ＯＵＴ０とＯＵＴ１の２系統でタイミングをずらして行い、これらを加算したものをピッチ変換後のボイス信号として出力するようにしている。そして、このような処理によれば、入力信号１１１のフォルマント情報を保持したままピッチ変換を行うことができる。
この手法は、Ｌｅｎｔ法と呼ばれ、以下の論文に記載された方法を応用したものである。
Keith Lent (1989) “An efficient method for pitch shifting digitally sampled sounds.” Computer Music Journal Vol. 13 No.4. pp.65-71 Next, the pitch conversion process in the pitch conversion unit 47 will be described with reference to FIGS. 6 and 7.
In the pitch conversion unit 47, as the pitch conversion process, the input signal 111 is cut out using a window function, arranged as an element, and a process of determining the converted waveform pitch according to the arrangement period is performed. Here, as shown in FIGS. 6 and 7, the input signal 111 is cut out by shifting the timing of the two systems OUT0 and OUT1, and the sum of these signals is output as a voice signal after pitch conversion. I have to. According to such processing, pitch conversion can be performed while maintaining the formant information of the input signal 111.
This method is called the Lent method and is an application of the method described in the following paper.
Keith Lent (1989) “An efficient method for pitch shifting digitally sampled sounds.” Computer Music Journal Vol. 13 No.4. Pp.65-71

図６に示すのが、ピッチダウン（周波数減少）の場合の処理例、図７に示すのが、ピッチアップ（周波数増加）の場合の処理例である。
また、これらの図において、Ｐ_Ｉは、ピッチ検出部４１が検出結果として出力する入力信号１１１のピッチの値、Ｐ_Ｖは、ピッチ加工部４６が出力するボイス信号のピッチの値である。また、ＳＢ及びＲＢは、それぞれ基準区間及び出力区間の長さを示すが、これらの符号は区間自体を表わす符号としても用いる。また、上記の各値は、信号の内容によって変化するものであるので、異なる時点の値には「′」や「″」をつけて区別している。 FIG. 6 shows a processing example in the case of pitch down (frequency decrease), and FIG. 7 shows a processing example in the case of pitch up (frequency increase).
Further, in these figures, P _I, the pitch value of the input signal 111 pitch detector 41 is output as the detection result, the P _V, a value of the pitch of the voice signal pitch processing unit 46 outputs. SB and RB indicate the lengths of the reference section and the output section, respectively. These codes are also used as codes representing the sections themselves. Further, since each of the above values changes depending on the content of the signal, “′” and “″” are added to the values at different time points to distinguish them.

そして、ピッチ変換処理においてはまず、ボイス信号の出力とは関係なく、入力信号１１１をバッファに書き込むと共に、その入力信号１１１についてピッチＰ_Ｉの２倍の期間を持つ基準区間ＳＢを順に設定していくようにしている。そして、出力のための窓関数による切り出しを行う際には、この基準区間を単位に行うようにしている。
基準区間の長さは、ピッチＰ_Ｉが変われば当然変わるが、上述のようにピッチ検出部４１はピッチ情報の出力を６ｍｓ毎に行うようにしているので、次の出力が行われるまでは、ピッチＰ_Ｉの値は変化しないことになる。
なお、上記のバッファは、遅延処理部５０のバッファと共通化してもよい。 Then, the pitch conversion processing, first, regardless of the output of the voice signal, writes the input signal 111 to the buffer, and set the reference section SB having twice the period of the pitch P _I in order for the input signal 111 I am going to go. Then, when performing extraction using a window function for output, this reference interval is used as a unit.
The length of the reference interval, of course vary but if Kaware the pitch P _I, because the pitch detector 41 as described above is to perform the output of the pitch information for each 6 ms, until the next output is performed, the value of the pitch P _I will not change.
Note that the above buffer may be shared with the buffer of the delay processing unit 50.

一方、出力信号の生成としては、まずＯＵＴ０系統の信号生成を開始するが、この場合、ピッチＰ_Ｖの２倍の期間を持つ出力区間ＲＢの設定を行う。そして、その出力区間ＲＢにおいては、その出力区間ＲＢの開始時点における最新の基準区間ＳＢ内の入力信号１１２を、その先頭から順にバッファから読み出して出力する。このとき、読み出した信号には窓関数１１３を乗算するが、ここでは、この窓関数として、長さが読み出しを行う基準区間ＳＢと等しいハニング窓を用いている。また、入力信号１１１のバッファへの書き込みと、出力のための読み出しは、並行して行われることになる。 On the other hand, the generation of the output signal, first it starts the signal generation of OUT0 system, in this case, to set the output interval RB having twice the period of the pitch P _V. Then, in the output section RB, the input signal 112 in the latest reference section SB at the start time of the output section RB is read from the buffer sequentially and output. At this time, the read signal is multiplied by the window function 113. Here, a Hanning window having a length equal to the reference interval SB for reading is used as the window function. Further, the writing of the input signal 111 to the buffer and the reading for output are performed in parallel.

また、ピッチダウンの場合、ＲＢ＞ＳＢであるので、該当する基準区間ＳＢの入力信号１１２を全て読み出した後も、出力区間ＲＢは続くことになるが、この部分については、「０」のデータを出力するようにしている。
そして、出力区間ＲＢが終了すると、その時点でのボイス信号のピッチＰ_Ｖ″に従って新たな出力区間ＲＢ″を設定し、その開始時点の最新の基準区間ＳＢ′の入力信号の読み出しを行い、以後この処理を繰り返す。 In the case of pitch down, since RB> SB, the output section RB continues even after all of the input signals 112 of the corresponding reference section SB are read out. Is output.
When the output section RB ends, a new output section RB ″ is set according to the pitch P _V ″ of the voice signal at that time, the input signal of the latest reference section SB ′ at the start time is read, and thereafter This process is repeated.

ＯＵＴ１系統の信号生成についても、開始時点をＰ_Ｖだけずらす点以外は、ＯＵＴ０系統の場合の処理と同じものとしている。ただし、読み出しを行う基準区間や、出力区間の長さについては、各出力区間の設定時の情報に従って定めるので、ＯＵＴ０系統の出力信号と全く同じ信号が生成されるとは限らない。
そして、上述のように、ＯＵＴ０系統とＯＵＴ１系統の出力を加算して、ボイス信号として出力する。このような処理により、入力信号１１１と同様なフォルマントを有するピッチＰ_ｖのボイス信号を出力することができる。 For even OUT1 system of signal generation, except for shifting the start time only P _V are the same as the processing in the case of OUT0 system. However, since the reference section for reading and the length of the output section are determined according to the information at the time of setting each output section, the same signal as the output signal of the OUT0 system is not always generated.
Then, as described above, the outputs of the OUT0 system and the OUT1 system are added and output as a voice signal. By such processing, a voice signal having a pitch _Pv having the same formant as the input signal 111 can be output.

一方、ピッチアップの場合には、図７に示す通りＲＢ＜ＳＢであるので、基準区間ＳＢの入力信号１１２を全て読み出す前に出力区間ＲＢが終了するが、この場合には、出力区間ＲＢの終了時に読み出しを中止するようにしている。図７に仮想線で示した波形は、その後の読み出されない部分である。そして、これに対応して、窓関数１１４として、幅がＲＢと等しいハニング窓を用いている。 On the other hand, in the case of pitch up, since RB <SB as shown in FIG. 7, the output section RB ends before all the input signals 112 in the reference section SB are read. In this case, in the output section RB Reading is stopped at the end. The waveform indicated by the phantom line in FIG. 7 is a portion that is not read out thereafter. Correspondingly, a Hanning window having a width equal to RB is used as the window function 114.

しかし、出力区間ＲＢの終了時に、次の出力区間ＲＢ″の設定を行い、その開始時点の最新の基準区間ＳＢの入力信号の読み出しを開始する点は、図６の場合と同様である。ただし、図７の例のように、出力区間ＲＢ″の設定時に入力信号１１１において基準区間ＳＢが終了していない場合、同じ基準区間ＳＢの入力信号１１２を、出力区間ＲＢ″でも再度読み出すことになる。
ＯＵＴ１系統の信号生成について開始時点をＰ_ｖだけずらす点も、図６の場合と同様である。 However, at the end of the output section RB, the next output section RB ″ is set and reading of the input signal of the latest reference section SB at the start time is the same as in the case of FIG. As shown in the example of FIG. 7, when the reference section SB is not completed in the input signal 111 when the output section RB ″ is set, the input signal 112 of the same reference section SB is read again in the output section RB ″. .
Similarly to the case of FIG. 6, the start time is shifted by _Pv for OUT1 system signal generation.

このような処理により、ピッチアップの場合にも、入力信号１１１と同様なフォルマントを有するピッチＰ_ｖのボイス信号を出力することができる。
なお、もしボイス信号のピッチを入力信号を等しくするのであれば、どちらの処理も適用可能である。 By such processing, a voice signal having a pitch _Pv having the same formant as that of the input signal 111 can be output even in the case of pitch up.
Note that either processing can be applied if the pitch of the voice signal is made equal to the input signal.

また、上述したピッチ変換処理において、入力信号１１１のバッファへの書き込みと読み出しの速度（時間当たりの処理サンプル数）は等しくするとよいが、読み出し速度を異ならせることにより、入力音声の声質を、男性から女性又はその逆に変換させるジェンダー効果を得ることも考えられる。
さらに、ピッチがサンプル数の整数倍にならない場合等、サンプルとサンプルの間のタイミングにおける信号値が必要になった場合には、適宜補間処理を行うようにするとよい。 In the pitch conversion process described above, the input signal 111 may be written to and read from the buffer at the same speed (number of processed samples per hour). It is also conceivable to obtain a gender effect that is converted from female to female or vice versa.
Furthermore, when a signal value at the timing between samples is necessary, such as when the pitch is not an integral multiple of the number of samples, it is preferable to perform interpolation processing as appropriate.

次に、図８乃至図１６を用いて、ＣＰＵ１１が実行する、以上説明してきたピッチ検出、音階音推定、ピッチ加工及びピッチ変換に関する処理について説明する。これらの処理は、ここではＣＰＵ１１が信号処理部１９から必要な情報を取得して行うものとする。そしてこの場合、ＣＰＵ１１は、ダブリングエフェクタ３０の機能を有効にする旨の設定がなされると、図８乃至図１０及び図１５，図１６のフローチャートに示す処理を、それぞれ独立に開始する。ただし、これらの処理は、信号処理部１９側で行うようにしてもよい。 Next, the processes related to pitch detection, scale sound estimation, pitch processing, and pitch conversion, which have been described above, executed by the CPU 11 will be described with reference to FIGS. Here, it is assumed that the CPU 11 acquires necessary information from the signal processing unit 19 and performs the processing. In this case, when the setting for enabling the function of the doubling effector 30 is made, the CPU 11 starts the processes shown in the flowcharts of FIGS. 8 to 10, 15 and 16 independently. However, these processes may be performed on the signal processing unit 19 side.

まず、図８に、ピッチ検出処理のフローチャートを示す。
この処理においては、まず、ダブリング処理対象の入力信号を１サンプル分入力信号バッファ及び出力信号バッファへ記録する（Ｓ１１）。ここで、入力信号バッファは、ピッチ変換部４７におけるボイス信号の生成に用いるバッファであり、１００ｍｓ分程度のデータを記憶する容量を有するリングバッファとすればよい。また、出力信号バッファは、遅延処理部５０による遅延処理に用いるバッファであり、１秒分程度のデータを記憶する容量を有するリングバッファとすればよい。 First, FIG. 8 shows a flowchart of the pitch detection process.
In this process, first, an input signal to be subjected to a doubling process is recorded in the input signal buffer and the output signal buffer for one sample (S11). Here, the input signal buffer is a buffer used for generating a voice signal in the pitch converter 47, and may be a ring buffer having a capacity for storing data of about 100 ms. The output signal buffer is a buffer used for delay processing by the delay processing unit 50, and may be a ring buffer having a capacity for storing data for about one second.

その後、入力信号のゼロクロスをカウントする（Ｓ１２）と共に、サンプルカウンタをカウントアップする（Ｓ１３）。
そしてその後、ピッチ検出中であれば（Ｓ１４）、入力信号が周期の開始位置か否かの判定を行う（Ｓ１５）。ピッチ検出中か否かは、次の図９に示す処理で設定するピッチフラグの内容により判断することができる。また、ステップＳ１５の判定は、図５を用いて説明したように、入力信号とエンベロープとの交差の検出に応じてＩＲＱフラグを変化させ、その立ち上がりの有無を検出することにより行うことができる。 Thereafter, the zero cross of the input signal is counted (S12) and the sample counter is counted up (S13).
After that, if the pitch is being detected (S14), it is determined whether or not the input signal is the start position of the cycle (S15). Whether or not the pitch is being detected can be determined based on the contents of the pitch flag set in the process shown in FIG. Further, as described with reference to FIG. 5, the determination in step S15 can be performed by changing the IRQ flag according to the detection of the intersection between the input signal and the envelope and detecting the presence or absence of the rise.

そして、その判定の結果周期の開始位置であれば（Ｓ１６）、サンプルカウンタの現在値をピッチデータとしてピッチバッファ４２に記録する（Ｓ１７）と共に、サンプルカウンタをリセットする（Ｓ１８）。その後、入力信号の次のサンプルタイミングまで待機し（Ｓ１９）、次のサンプルタイミングでステップＳ１１に戻って処理を繰り返す。
一方、ステップＳ１６で周期の開始位置でなければ、そのままステップＳ１９に進んで次のサンプルタイミングまで待機する。また、ステップＳ１４でピッチ検出中でなければ、サンプルカウンタをリセットする（Ｓ１８）と共に、次のサンプルタイミングまで待機する（Ｓ１９）。 If the result of the determination is the start position of the cycle (S16), the current value of the sample counter is recorded as pitch data in the pitch buffer 42 (S17), and the sample counter is reset (S18). Then, it waits until the next sample timing of the input signal (S19), returns to step S11 at the next sample timing, and repeats the processing.
On the other hand, if it is not the start position of the cycle in step S16, the process proceeds to step S19 and waits until the next sample timing. If the pitch is not being detected in step S14, the sample counter is reset (S18) and waits until the next sample timing (S19).

なお、以上の図８に示した処理において、ステップＳ１７でピッチデータの記録を行う際、検出条件や検出結果について種々の検討を行うとよいことは、図５の説明で述べた通りであるが、説明を簡単にするため、ここではこのような検討に係る処理は示していない。また、ピッチの検出を行うか否かについては、次の図９に示す処理により、入力信号のゼロクロス数に基づいて判断するようにしている。 In the process shown in FIG. 8, as described in the description of FIG. 5, it is preferable to perform various studies on the detection condition and the detection result when recording the pitch data in step S17. In order to simplify the explanation, the processing related to such examination is not shown here. Whether or not to detect the pitch is determined based on the number of zero crosses of the input signal by the processing shown in FIG.

次に、図９に、ピッチ検出制御処理のフローチャートを示す。
この処理においては、まず、図８のステップＳ１２でカウントしているゼロクロスの数が所定値（ここでは上述のように３０回）以下である場合（Ｓ２１）、ピッチフラグを「１」に設定し、ピッチ検出実行を示す（Ｓ２２）。その後、ゼロクロス数をリセットし（Ｓ２５）、ステップＳ２１の処理から所定時間（ここでは上述のように６ｍｓ）経過するまで待機し（Ｓ２６）、その後ステップＳ２１に戻って処理を繰り返す。
また、ステップＳ２１でゼロクロス数が所定値以上である場合には、ピッチフラグを「０」に設定し、ピッチ検出停止を示す（Ｓ２３）と共に、ピッチバッファ４２に記録しているピッチデータをクリアして（Ｓ２４）、ステップＳ２５以降の処理に進む。 Next, FIG. 9 shows a flowchart of the pitch detection control process.
In this process, first, when the number of zero crosses counted in step S12 in FIG. 8 is equal to or less than a predetermined value (here, 30 times as described above) (S21), the pitch flag is set to “1”. The pitch detection execution is shown (S22). Thereafter, the number of zero crosses is reset (S25), and the process waits until a predetermined time (here, 6 ms) elapses from the process of step S21 (S26), and then returns to step S21 to repeat the process.
If the number of zero crosses is greater than or equal to the predetermined value in step S21, the pitch flag is set to "0", indicating that the pitch detection is stopped (S23), and the pitch data recorded in the pitch buffer 42 is cleared. (S24), the process proceeds to step S25 and subsequent steps.

従って、図９のフローチャートの処理においては、ステップＳ２１の処理を所定時間毎に行い、その間のゼロクロス数が所定値以下の場合にピッチ検出実行を設定し、所定値より大きい場合にはピッチ検出停止を設定することになる。
以上の図８及び図９に示した処理により、ピッチ検出部４１におけるピッチの検出とその制御を行うことができる。ただし、最終的にピッチ検出部４１から検出結果として出力されるピッチの値は、次の図１０の処理により求めた値である。 Therefore, in the process of the flowchart of FIG. 9, the process of step S21 is performed every predetermined time, and the pitch detection execution is set when the number of zero crosses during that time is equal to or smaller than the predetermined value, and when it is larger than the predetermined value, the pitch detection is stopped. Will be set.
With the processing shown in FIGS. 8 and 9, the pitch detection unit 41 can detect and control the pitch. However, the pitch value finally output as a detection result from the pitch detection unit 41 is a value obtained by the processing of the next FIG.

次に、図１０に、音階音推定及びピッチ設定処理のフローチャートを示す。
この処理においては、まず、ピッチバッファ４２に記録されているピッチデータのうち所定個（例えば１６個）のデータの平均値を求めて入力信号のピッチＰ_Ｉの値とする（Ｓ３１）。ここでは、この値がピッチ検出部４１から検出結果として出力される入力信号のピッチの値となり、この値は、ピッチ変換部４７におけるピッチ変換処理でも使用される。 Next, FIG. 10 shows a flowchart of scale sound estimation and pitch setting processing.
In this process, first, the value of the pitch P _I of a predetermined number (e.g. 16) input signal the average value of the data of the pitch data recorded in the pitch buffer 42 (S31). Here, this value is the value of the pitch of the input signal output as a detection result from the pitch detection unit 41, and this value is also used in the pitch conversion processing in the pitch conversion unit 47.

そして、このピッチＰ_Ｉの値に基づき、音高テーブル４４に記憶している中でピッチＰ_Ｉに最も近い音階音のピッチを取得する（Ｓ３２）。そして、ピッチＰ_Ｉとその音階音のピッチとの差が所定誤差Δｈ（ｎ）／２以内でなければ（Ｓ３３）、現在の入力信号に対応する音階音はないと判断し、推定値バッファをクリアする（Ｓ３４）と共に、音高の継続時間を計測するためのタイマを停止する（Ｓ３５）。その後、ボイス信号のピッチＰ_Ｖをクリアし（Ｓ３９）、後にピッチ変換処理の説明で述べるように、このことにより、ボイス信号の出力を停止する。そして、ステップＳ３１の処理から所定時間（ここでは上述のように６ｍｓ）経過するまで待機し（Ｓ４４）、その後ステップＳ３１に戻って処理を繰り返す。なお、ステップＳ３１で適当なＰ_Ｉが求められなかった場合には、ステップＳ３２でピッチを取得せず、ステップＳ３３の判断がＮＯになるようにするとよい。 Then, based on this value of the pitch P _I, to obtain a pitch closest chromatic note to the pitch P _I in which stores to the pitch table 44 (S32). Then, it is determined that the difference between the pitch of the pitch P _I and their chromatic note unless the predetermined error Δh (n) / 2 less (S33), scale notes corresponding to the current input signal is not an estimate buffer At the same time, the timer for measuring the duration of the pitch is stopped (S35). Then, clear the pitch P _V of the voice signal (S39), as described in the description of a pitch conversion processing later Thus, it stops the output of the voice signal. And it waits until predetermined time (here 6 ms as mentioned above) passes since the process of step S31 (S44), returns to step S31, and repeats a process after that. In the case where appropriate P _I was not determined in the step S31 does not get a pitch in step S32, may determine in step S33 is made to be to NO.

一方、ステップＳ３３でＹＥＳであった場合、現在の入力信号に対応する音階音が一応あると判断できる。そして、ステップＳ３２で取得した音階音のピッチが、推定値バッファ４５に記憶している推定音階音ピッチＰ_Ｅと異なる場合には（Ｓ３６）、ステップＳ３２で取得したピッチを推定値バッファ４５に登録し（Ｓ３７）、上記のタイマをリセットして計時を開始して（Ｓ３８）、ステップＳ３９以下の処理に進む。
また、ステップＳ３６でＹＥＳであった場合、タイマの計測時間が所定時間Δｗ（ｎ）だけ経過していれば（Ｓ４０）、入力信号のピッチＰ_Ｉが所定時間の間続けて推定音階音ピッチＰ_Ｅから所定誤差範囲内にあったことがわかるので、その時点の入力信号と対応する音階音は、ステップＳ３２でピッチを取得した音階音であると推定する。 On the other hand, if “YES” in the step S33, it can be determined that there is a scale sound corresponding to the current input signal. The pitch of the obtained chromatic notes in step S32 is, if other than the estimated chromatic note pitch P _E stored in the estimated value buffer 45 (S36), registers the pitch acquired in step S32 to estimate the buffer 45 In step S37, the timer is reset to start measuring time (step S38), and the process proceeds to step S39 and subsequent steps.
If YES in step S36, if the measured time of the timer has elapsed by a predetermined time Δw (n) (S40), the pitch P _{I of the} input signal continues for the predetermined time and the estimated scale pitch P Since it can be seen from _E that it was within the predetermined error range, it is estimated that the scale sound corresponding to the input signal at that time is the scale sound whose pitch was acquired in step S32.

そして、推定値バッファ４５に登録されているピッチを、入力信号と対応すると推定される音階音のピッチである推定音階音ピッチＰ_Ｅとする（Ｓ４１）と共に、ボイス信号の生成を行うべく、入力信号のピッチＰ_Ｉと推定音階音ピッチＰ_Ｅとからピッチシフト量ΔＰを決定し（Ｓ４２）、これを入力信号のピッチＰ_Ｉに加算した値をボイス信号のピッチＰ_Ｖとする（Ｓ４３）。このＰ_Ｖは、ピッチ変換部４７におけるピッチ変換処理で使用される。 Then, the pitch is registered in the estimated value buffer 45, the estimated chromatic note pitch P _E is the pitch of the sound scale that is estimated to correspond to the input signal with (S41), to perform the generation of the voice signal, input A pitch shift amount ΔP is determined from the signal pitch P _I and the estimated scale pitch P _E (S42), and a value obtained by adding this to the pitch P _{I of the} input signal is defined as the pitch P _{V of the} voice signal (S43). The P _V is used in a pitch conversion processing in the pitch conversion unit 47.

その後、ステップＳ３１の処理から所定時間経過するまで待機し（Ｓ４４）、ステップＳ３１に戻って処理を繰り返す。
また、ステップＳ４０でＮＯであれば、入力信号が音階音と対応しているのか、たまたま音階音に近いピッチとなっただけであるのかの区別ができないため、ステップＳ３９に進み、ボイス信号の出力を停止したままとして、以後の処理を続ける。 Then, it waits until predetermined time passes from the process of step S31 (S44), returns to step S31, and repeats a process.
If NO in step S40, it is not possible to distinguish whether the input signal corresponds to a scale sound or if it just happens to have a pitch close to a scale sound, so the process proceeds to step S39 to output a voice signal. And the subsequent processing is continued.

以上の図１０に示した処理により、入力信号のピッチを求め、そのピッチに基づいて入力信号と対応する音階音を推定すると共に、その推定した音階音のピッチを利用して、ピッチ変換部４７におけるピッチ変換処理に使用するボイス信号のピッチを生成及び設定することができる。そして、これらのピッチは、上記の所定時間毎に更新されることになる。
また、図１０に示した処理のうち、ステップＳ３１の処理はピッチ検出部４１の機能と対応する処理、ステップＳ３２乃至Ｓ４１は音階音推定部４３の機能と対応する処理、ステップＳ４２及びＳ４３はピッチ加工部４６の機能と対応する処理である。 Through the processing shown in FIG. 10, the pitch of the input signal is obtained, the scale sound corresponding to the input signal is estimated based on the pitch, and the pitch conversion unit 47 uses the estimated pitch of the scale sound. It is possible to generate and set the pitch of the voice signal used for the pitch conversion process in FIG. These pitches are updated every predetermined time.
Of the processes shown in FIG. 10, the process of step S31 is a process corresponding to the function of the pitch detection unit 41, steps S32 to S41 are the processes corresponding to the function of the scale sound estimation unit 43, and steps S42 and S43 are the pitch. This is processing corresponding to the function of the processing unit 46.

次に、図１１に、図１０のステップＳ４２に示したピッチシフト量決定処理の第１の例のフローチャートを示す。
この処理においては、まず、｜Ｐ_Ｅ−Ｐ_Ｉ｜×２を求め、この値と正の定数Ｃとのうち、小さい方の値をΔＰ″とする（Ｓ５１）。そして、この値に０から１の範囲の乱数Ｒを乗じてローパスフィルタ（ＬＰＦ）を通した値をΔＰ′とする（Ｓ５２）。ここで、ＬＰＦは、カットオフ周波数が０．１Ｈｚ〜１０Ｈｚ、ゲインが最大で１のものを用いるとよい。 Next, FIG. 11 shows a flowchart of a first example of the pitch shift amount determination processing shown in step S42 of FIG.
In this process, first, | P _E −P _I | × 2 is obtained, and the smaller one of this value and the positive constant C is set to ΔP ″ (S51). A value obtained by multiplying by a random number R in the range of 1 and passing through a low-pass filter (LPF) is set as ΔP ′ (S52), where LPF has a cutoff frequency of 0.1 Hz to 10 Hz and a gain of 1 at maximum. Should be used.

その後、Ｐ_Ｅ＞Ｐ_ＩであればΔＰ＝ΔＰ′、そうでなければΔＰ＝−ΔＰ′としてΔＰを求め（Ｓ５３〜Ｓ５５）、元の処理に戻る。
ΔＰ′≧０であるので、これらの処理により求めたピッチシフト量ΔＰは、入力信号のピッチＰ_Ｉを推定音階音ピッチＰ_Ｅに近づけるようなシフト量となる。 Thereafter, if P _E > P _I , ΔP = ΔP ′, otherwise ΔP = −ΔP ′ and ΔP is obtained (S53 to S55), and the process returns to the original process.
Because it is [Delta] P '≧ 0, the pitch shift amount [Delta] P obtained by these processes, a shift amount as close pitch P _I on the estimated chromatic note pitch P _E of the input signal.

図１２に、図１１に示した処理によりΔＰを求める場合の、Ｐ_ＩとＰ_Ｖの関係の例を示す。
この図には、入力信号のピッチが実線８１で示すように変化し、太線８２で示す期間において、入力信号が、その太線８２で示すピッチの音階音と対応すると推定された場合の例を示している。そして、この例においては、符号ａ及びｃで示す区間で、Ｃ＞｜Ｐ_Ｅ−Ｐ_Ｉ｜×２、符号ｂで示す区間でＣ＜｜Ｐ_Ｅ−Ｐ_Ｉ｜×２である。 12, when determining the ΔP in the process shown in FIG. 11 shows an example of the relationship between _{P I} and _{P V.}
This figure shows an example in which the pitch of the input signal changes as indicated by a solid line 81, and the input signal is estimated to correspond to the scale sound having the pitch indicated by the thick line 82 in the period indicated by the thick line 82. ing. In this example, C> | P _E −P _I | × 2 in the sections indicated by the symbols a and c, and C <| P _E −P _I | × 2 in the section indicated by the symbols b.

このような場合、各時点のＰ_Ｖ＝Ｐ_Ｉ＋ΔＰは、乱数Ｒの値及びＬＰＦの処理結果に応じて、実線８１と破線８３との間の、矢印を並べて示した範囲のどこかに位置することになる。従って、図１１の処理によってΔＰを定めることにより、Ｐ_Ｖを、常に｜Ｐ_Ｉ−Ｐ_Ｅ｜≧｜Ｐ_Ｖ−Ｐ_Ｅ｜となるように定めることができる。すなわち、ピッチ加工部４６が生成するボイス信号のピッチを、対応する入力信号のピッチよりも音階音のピッチに近づけるようにすることができると言える。 In such a case, P _V = P _I + ΔP at each time point is located somewhere in the range indicated by the arrows between the solid line 81 and the broken line 83 according to the value of the random number R and the processing result of the LPF. Will do. Therefore, by determining the ΔP by the process of FIG. 11, the _{P V,} always _{_|} P I -P _E _| ≧ _| it can be defined such that | P V -P _E. That is, it can be said that the pitch of the voice signal generated by the pitch processing unit 46 can be made closer to the pitch of the scale tone than the pitch of the corresponding input signal.

また、図１３に、図１０のステップＳ４２に示したピッチシフト量決定処理の第２の例のフローチャートを示す。
この処理においては、正の定数Ｃと０から１の範囲の乱数Ｒについて、Ｃ×（Ｒ−０．５）をＬＰＦ又はバンドパスフィルタ（ＢＦＰ）でフィルタ処理して得た値をΔＰ′とし（Ｓ６１）、ΔＰ＝Ｐ_Ｅ−Ｐ_Ｉ＋ΔＰ′としてΔＰを求め（Ｓ６２）、元の処理に戻る。ＬＰＦやＢＰＦとしては、例えばカットオフ周波数や中心周波数が０．１〜１０Ｈｚ、ゲインが最大で１のものを用いるとよい。 FIG. 13 shows a flowchart of a second example of the pitch shift amount determination process shown in step S42 of FIG.
In this processing, for a positive constant C and a random number R in the range from 0 to 1, C × (R−0.5) is filtered with an LPF or a bandpass filter (BFP), and ΔP ′ (S61), ΔP is obtained as ΔP = P _E −P _I + ΔP ′ (S62), and the process returns to the original process. As the LPF or BPF, for example, a cutoff frequency or center frequency of 0.1 to 10 Hz and a maximum gain of 1 may be used.

図１４に、図１３に示した処理によりΔＰを求める場合の、Ｐ_ＩとＰ_Ｖの関係の例を示す。
この図に示す入力信号及び音階音は、図１２に示したものと同じである。しかし、この例においては、各時点のＰ_Ｖ＝Ｐ_Ｉ＋ΔＰは、乱数Ｒの値及びフィルタ処理の結果に応じて、破線８４と破線８５との間の、定数Ｃの幅の矢印を並べて示した範囲のどこかに位置することになる。 14, when determining the ΔP in the process shown in FIG. 13 shows an example of the relationship between _{P I} and _{P V.}
The input signal and scale sound shown in this figure are the same as those shown in FIG. However, in this example, P _V = P _I + ΔP at each time point indicates a constant C width arrow between the broken line 84 and the broken line 85 according to the value of the random number R and the result of the filtering process. It will be located somewhere in the range.

そして、この図からわかるように、Ｐ_ＩがＰ_Ｅに近い場合には部分的にＰ_ＶがＰ_ＩよりもＰ_Ｅから遠くなってしまう場合もあるが、図１３の処理によってΔＰを定めても、全体としては、ピッチ加工部４６が生成するボイス信号のピッチを、対応する入力信号のピッチよりも音階音のピッチに近づけるようにすることができると言える。 As can be seen from this figure, when P _I is close to P _E , P _V may partially be farther from P _E than P _I , but ΔP is determined by the process of FIG. However, as a whole, it can be said that the pitch of the voice signal generated by the pitch processing unit 46 can be made closer to the pitch of the scale tone than the pitch of the corresponding input signal.

次に、図１５に、基準区間設定処理のフローチャートを示す。
この処理においては、まず、基準位置に基準区間が設定されていないか又は基準位置が基準区間の最後尾に達したかのいずれかが満たされたか否か判断する（Ｓ７１）。
そして、満たされていない場合には、入力信号バッファに記録されている入力信号について基準位置を１サンプル分進めて（Ｓ７４）、次のサンプルタイミングまで待機し（Ｓ７５）、その後ステップＳ７１に戻って処理を繰り返す。 Next, FIG. 15 shows a flowchart of the reference section setting process.
In this process, first, it is determined whether or not either a reference section is set at the reference position or the reference position has reached the end of the reference section (S71).
If not satisfied, the reference position of the input signal recorded in the input signal buffer is advanced by one sample (S74), waits for the next sample timing (S75), and then returns to step S71. Repeat the process.

一方、ステップＳ７１でいずれかが満たされていた場合、その時点で図１０のステップＳ３１の処理により入力信号のピッチＰ_Ｉが設定されていれば（Ｓ７２）、入力信号バッファに記録されている入力信号について、現在の基準位置を開始位置とし、長さＳＢをピッチＰ_Ｉの２倍とする次の基準区間を設定して（Ｓ７３）、ステップＳ７４に進み、以下の処理を続ける。ステップＳ７２で設定されていなければ、そのままステップＳ７４に進み、以下の処理を続ける。
以上の図１５に示した処理により、ピッチ変換部４７における処理対象の入力信号に対し、図６及び図７を用いて説明したような基準区間を設定することができる。なお、「基準位置」は、単に基準区間の終了を検出するために利用するものであるので、処理の進行度合いを測れるようなパラメータであれば、どのようなものを用いてもよい。 On the other hand, if either has been met at step S71, if the pitch P _I of an input signal by the processing of step S31 in FIG. 10 is set at that time (S72), the input stored in the input signal buffer the signal, and the start position of the current reference position, and sets the next reference period to the length SB and 2 times the pitch P _I (S73), the process proceeds to step S74, the continued following process. If it is not set in step S72, the process proceeds to step S74 as it is and the following processing is continued.
With the processing shown in FIG. 15 described above, the reference interval as described with reference to FIGS. 6 and 7 can be set for the input signal to be processed in the pitch conversion unit 47. The “reference position” is simply used to detect the end of the reference section, and any parameter can be used as long as it can measure the progress of the process.

次に、図１６に、ピッチ変換処理のフローチャートを示す。
この処理においては、まず、出力区間が設定されていないか又は、読出位置を現在の出力区間においてその出力区間が終了するだけ進めたかのいずれかが満たされたか否か判断する（Ｓ８１）。
そして、満たされていない場合には、読出位置が基準区間の最後尾を越えたか否か判断し（Ｓ８５）、越えていない場合には、入力信号バッファから読出位置の１サンプルのデータを読み出し、読出位置に応じた窓関数の値を乗じて、ボイス信号のデータとして出力する（Ｓ８６）。超えていた場合には、０を出力する（Ｓ８７）。そして、どちらの場合も、読出位置を１サンプル分進める（Ｓ８８）。なお、上記の窓関数については、図６及び図７を用いて説明した通りである。 Next, FIG. 16 shows a flowchart of the pitch conversion process.
In this process, first, it is determined whether or not any output section is set or whether the reading position is advanced in the current output section to the end of the output section (S81).
If not satisfied, it is determined whether or not the reading position has exceeded the end of the reference section (S85). If not, one sample of data at the reading position is read from the input signal buffer. The value of the window function corresponding to the reading position is multiplied and output as voice signal data (S86). If it exceeds, 0 is output (S87). In either case, the reading position is advanced by one sample (S88). Note that the window function is the same as described with reference to FIGS.

そしてその後、ピッチを検出中（ピッチフラグが「１」）であれば（Ｓ８９）、次のサンプルタイミングまで待機し（Ｓ９０）、その後ステップＳ８１に戻って処理を繰り返す。一方、ピッチを検出中でなければ、設定されている出力区間をクリアして（Ｓ９１）、その後ボイス信号のピッチＰ_Ｖが設定されるまで待機し（Ｓ９２）、ピッチＰ_Ｖが設定されると、ステップＳ８１に戻って処理を繰り返す。すなわち、再度ピッチＰ_ｖが設定されるまで、ボイス信号の出力を中止する。 If the pitch is being detected (pitch flag is “1”) (S89), the process waits until the next sample timing (S90), and then returns to step S81 to repeat the process. On the other hand, if being detected pitch, clear the output section is configured (S91), then waits until the pitch P _V of the voice signal is set (S92), the pitch P _V is set Returning to step S81, the process is repeated. That is, the output of the voice signal is stopped until the pitch _Pv is set again.

また、ステップＳ８１でＹＥＳであれば、図１０のステップＳ４２の処理でボイス信号のピッチＰ_Ｖが設定されているか否か判断し（Ｓ８２）、設定されていれば、次の出力区間の長さＲＢをピッチＰ_Ｖの２倍に設定する（Ｓ８３）と共に、読み出し位置を、処理時点の最新の基準区間の開始位置へ移動して（Ｓ８４）、ステップＳ８５以下の処理に進む。一方、ステップＳ８２で設定されていなければ、そのままステップＳ９１以下の処理に進む。すなわち、この場合も、再度ピッチＰ_ｖが設定されるまで、ボイス信号の出力を中止する。 Also, if YES in step S81, it is determined whether or not the pitch P _V of the voice signal in the process of step S42 in FIG. 10 is set (S82), if set, the next output period length RB and set to 2 times the pitch _{P V} with (S83), the read position, and moved to the start position of the latest reference section of the processing time (S84), the process proceeds to step S85 following process. On the other hand, if not set in step S82, the process proceeds to step S91 and subsequent steps. That is, also in this case, the output of the voice signal is stopped until the pitch _Pv is set again.

以上の図１６に示した処理により、ピッチ変換部４７において、処理対象の入力信号に基づき、図６及び図７を用いて説明したようなＯＵＴ０系統の出力信号を生成することができる。そして、上述の通り、この出力信号と、ＯＵＴ１系統の出力信号とを加算することによりボイス信号を生成することができる。このＯＵＴ１系統の出力信号の生成処理は、開始時期をピッチＰ_Ｖだけずらす点以外は、以上の図１６に示した処理と同様なものであるが、ピッチＰ_Ｖの設定がなくなったりピッチの検出が中止されたりした後で出力を再開する際にも開始時期をずらせるようにするため、ステップＳ９２の後に、ピッチＰ_Ｖ分の待機処理を追加するとよい。 With the processing shown in FIG. 16 described above, the pitch converter 47 can generate an output signal of the OUT0 system as described with reference to FIGS. 6 and 7 based on the input signal to be processed. Then, as described above, a voice signal can be generated by adding this output signal and the OUT1 system output signal. Generation processing of the output signal of the OUT1 line, the start timing except shifted by the pitch P _V a, but those same as the processing shown in above FIG. 16, the pitch P _V settings lost or pitch detection of There so that shifting the timing initiated when resume output after or aborted, after step S92, it is preferable to add a standby processing of the pitch P _V min.

電子楽器１０においては、ＣＰＵ１１や信号処理部１９が以上のような処理を実行することにより、入力信号と対応する音階音を推定すると共に、入力信号と対応する音階音を推定できた場合には、入力信号のピッチをその音階音のピッチに近づけるようにピッチ変換処理を行ってボイス信号を生成し、入力信号とそのボイス信号とを混合して出力することができる。
従って、ボイス信号のピッチが音階音の音高から大きく外れてしまうことを防止し、その重畳により出力音が不自然な聴感になってしまうことを防止できる。 In the electronic musical instrument 10, when the CPU 11 and the signal processing unit 19 execute the above-described processing, the scale sound corresponding to the input signal is estimated and the scale sound corresponding to the input signal can be estimated. The voice signal can be generated by performing pitch conversion processing so that the pitch of the input signal approaches the pitch of the scale sound, and the input signal and the voice signal can be mixed and output.
Therefore, it is possible to prevent the pitch of the voice signal from greatly deviating from the pitch of the scale sound, and to prevent the output sound from becoming unnaturally audible due to the superposition thereof.

また、入力信号のピッチと、ある音階音のピッチとの差が所定誤差範囲内の場合に、その音階音が前記入力信号と対応する音階音であると推定するようにすると共に、入力信号と対応する音階音が推定できた場合のみ、ボイス信号の生成を行うようにしているので、音階音のピッチに近い部分のみ入力信号を強調し、出力音を自然な聴感にすることができる。
これらの特徴は、組み合わせることにより相乗的に好適な効果を得ることができるが、それぞれ単独でも効果を発揮する。 Further, when the difference between the pitch of the input signal and the pitch of a certain scale sound is within a predetermined error range, the scale sound is estimated to be a scale sound corresponding to the input signal, and the input signal Since the voice signal is generated only when the corresponding scale sound can be estimated, it is possible to emphasize the input signal only in the portion close to the pitch of the scale sound and make the output sound natural.
These features can be combined to obtain a suitable effect synergistically, but each also exerts an effect.

すなわち、例えば、図１０のステップＳ４２において、ボイス信号のピッチＰ_Ｖを推定音階音ピッチＰ_Ｅに近づけることを特に意識せず、正の定数Ｃと０から１の範囲の乱数Ｒについて、Ｃ×（Ｒ−０．５）をＬＰＦ又はバンドパスフィルタ（ＢＦＰ）でフィルタ処理して得た値をΔＰとするようにしてもよい。この場合も、ＬＰＦやＢＰＦは、図１３の処理で用いるものと同様でよい。 That is, for example, in step S42 in FIG. 10, it is not particularly conscious that the pitch P _V of the voice signal is close to the estimated scale pitch P _E. A value obtained by filtering (R−0.5) with an LPF or a bandpass filter (BFP) may be ΔP. Also in this case, the LPF and BPF may be the same as those used in the processing of FIG.

また、図１０のステップＳ３９でボイス信号のピッチＰ_Ｖをクリアせず、Ｃ×（Ｒ−０．５）をＬＰＦ又はバンドパスフィルタ（ＢＦＰ）でフィルタ処理して得た値をΔＰとしてステップＳ４２に進むようにしてもよい。すなわち、入力信号と対応する音階音が推定できない場合にも、入力信号のピッチを定数Ｃの幅でランダムに変化させた信号をボイス信号として重畳するようにしてもよい。
この場合において、ステップＳ４２でΔＰを求める区間とステップＳ３９でΔＰを求める区間の境でΔＰの不連続が目立つようであれば、不連続を緩和するため、ΔＰ又はＰ_Ｖについてクロスフェードを行う処理を追加してもよい。
なお、ボイス信号のピッチＰ_Ｖの定め方がこれらに限られないことは、もちろんである。 Further, step S42 without clearing the pitch _{P V} of the voice signal at step S39 of FIG. 10, a value obtained by filtering with C × (R-0.5) the LPF or bandpass filters (BFP) as ΔP You may make it progress to. That is, even when the scale sound corresponding to the input signal cannot be estimated, a signal obtained by randomly changing the pitch of the input signal by the width of the constant C may be superimposed as a voice signal.
In this case, if ΔP is prominently discontinuous border with the ΔP interval seeking at intervals and step S39 to determine a ΔP at the step S42, in order to mitigate a discontinuity, performing cross-fade for ΔP or P _V treatment May be added.
Incidentally, the method of determining the pitch P _V of the voice signal is not limited to these, of course.

〔第２の実施形態：図１７乃至図２０〕
次に、この発明の音響信号処理装置の第２の実施形態である電子楽器について説明する。ただし、この電子楽器は、ボイス信号生成部の構成が若干異なる点以外は、第１の実施形態の電子楽器と同様なものであるので、この点以外の説明は省略する。また、第１の実施形態と対応する構成については、同じ符号を用いる。また、この実施形態は、リアルタイムで入力する入力信号ではなく、予めデータとして用意されている入力信号、すなわち処理中の時点より先の時点の内容も参照できる入力信号に対してエフェクト処理を行う場合に好適な実施形態である。 [Second Embodiment: FIGS. 17 to 20]
Next, an electronic musical instrument which is a second embodiment of the acoustic signal processing apparatus of the present invention will be described. However, since this electronic musical instrument is the same as the electronic musical instrument of the first embodiment except that the configuration of the voice signal generation unit is slightly different, the description other than this point is omitted. Moreover, the same code | symbol is used about the structure corresponding to 1st Embodiment. Further, in this embodiment, when effect processing is performed on an input signal prepared as data in advance, that is, an input signal that can also refer to the contents of a time point before the time point being processed, instead of an input signal input in real time. This is a preferred embodiment.

まず、図１７に、この実施形態の電子楽器におけるボイス信号生成部の構成を示す。
この図に示すとおり、この実施形態におけるボイス信号生成部４０′は、第１の実施形態の場合と同様な構成に加え、フレーズ検出部４８を有する。また、ピッチ加工部４６′におけるピッチ加工処理の内容も、第１の実施形態の場合と異なる。
そして、フレーズ検出部４８は、入力信号におけるフレーズの位置を検出する機能を有する検出手段である。このフレーズは、一連のほぼ途切れずに続く発音の区間を指し、例えば、人間の声であれば一息で発声された部分に該当する。そして、フレーズ検出部４８は、フレーズの検出結果をピッチ加工部４６に伝達する。ピッチ加工部４６は、フレーズ検出部４８からの情報をもとに、フレーズ単位で入力信号に対するピッチシフト量を求め、これをピッチ検出部４１が取得した入力信号のピッチに加算して、ボイス信号のピッチを示すピッチ情報を生成する。 First, FIG. 17 shows a configuration of a voice signal generation unit in the electronic musical instrument of this embodiment.
As shown in this figure, the voice signal generator 40 'in this embodiment has a phrase detector 48 in addition to the same configuration as in the first embodiment. Further, the content of the pitch machining process in the pitch machining unit 46 ′ is also different from that in the first embodiment.
And the phrase detection part 48 is a detection means which has a function which detects the position of the phrase in an input signal. This phrase refers to a series of pronunciation intervals that are almost uninterrupted. For example, in the case of a human voice, this phrase corresponds to a portion uttered at a breath. Then, the phrase detection unit 48 transmits the phrase detection result to the pitch processing unit 46. The pitch processing unit 46 obtains the pitch shift amount for the input signal in units of phrases based on the information from the phrase detection unit 48, and adds this to the pitch of the input signal acquired by the pitch detection unit 41 to obtain the voice signal. Pitch information indicating the pitch of is generated.

次に、この実施形態の電子楽器においてＣＰＵ１１が実行する処理について説明する。
この電子楽器においても、ＣＰＵ１１（又は信号処理部１９）に実行させる処理は、第１の実施形態で図８乃至図１６を用いて説明したものと概ね同様である。しかし、処理は予めデータとして用意されている入力信号に対して行なうようにし、処理のタイミングは、入力信号のサンプリング周期に基づきタイミングをサンプル数に換算して、サンプル数を用いて管理するようにしている。従って、所定時間待機する処理においては、その時間と対応する数のサンプルの処理が終わるまで待機するようにすればよいし、各処理を同期させて行わなくても、あるタイミングで発生したイベントを、サンプルの位置と対応させて記憶しておけば、後からその情報を参照して処理を行うこともできる。
ここでは、この点以外で第１の実施形態の場合と異なる処理についてのみフローチャートを用いて説明する。 Next, processing executed by the CPU 11 in the electronic musical instrument of this embodiment will be described.
Also in this electronic musical instrument, the processing to be executed by the CPU 11 (or the signal processing unit 19) is substantially the same as that described with reference to FIGS. 8 to 16 in the first embodiment. However, the processing is performed on an input signal prepared as data in advance, and the processing timing is controlled by using the number of samples by converting the timing into the number of samples based on the sampling period of the input signal. ing. Therefore, in the process of waiting for a predetermined time, it is only necessary to wait until the processing of the number of samples corresponding to the time is completed, and an event that has occurred at a certain timing can be performed without performing each process in synchronization. If it is stored in correspondence with the position of the sample, it can be processed later with reference to the information.
Here, only processing different from the case of the first embodiment except for this point will be described using a flowchart.

まず、この実施形態においては、図８に示したピッチ検出処理において、ステップＳ１１とＳ１２の間で、フレーズ切れ目検出処理を実行させる点が第１の実施形態の場合と異なる。
図１８に、このフレーズ切れ目検出処理のフローチャートを示す。
ＣＰＵ１１は、図８のステップＳ１１の処理の後、入力信号のサンプル値が所定値以下か否か判断する（Ｓ１０１）。そして、所定値以下であった場合、無音区間カウンタがカウント中でなければ（Ｓ１０２）、そのカウントを開始する（Ｓ１０６）とともに、処理中のサンプルのタイミングがフレーズ中である旨を記録し（Ｓ１０９）、図８のステップＳ１２に進む。無音区間カウンタは、入力信号の音量レベルが所定値以下の状態が継続している長さをカウントするためのカウンタである。 First, this embodiment is different from the first embodiment in that the phrase break detection process is executed between steps S11 and S12 in the pitch detection process shown in FIG.
FIG. 18 shows a flowchart of the phrase break detection process.
After the process of step S11 in FIG. 8, the CPU 11 determines whether the sample value of the input signal is equal to or less than a predetermined value (S101). If it is equal to or less than the predetermined value, if the silent section counter is not being counted (S102), the count is started (S106), and the fact that the timing of the sample being processed is in the phrase is recorded (S109). ), The process proceeds to step S12 in FIG. The silent section counter is a counter for counting the length of time that the volume level of the input signal continues below a predetermined value.

一方、ステップＳ１０２で無音区間カウンタがカウント中であれば、無音区間カウンタをカウントアップする（Ｓ１０３）。そして、そのカウント値が、所定の閾値以上であれば（Ｓ１０４）、処理中のサンプルのタイミングがフレーズの切れ目である旨を記録し（Ｓ１０５）、元の処理に戻る。ステップＳ１０４で閾値以上でなければ、該当タイミングがフレーズ中である旨を記録し（Ｓ１０９）、図８のステップＳ１２に進む。
また、ステップＳ１０１でＮＯであった場合には、無音区間カウンタがカウント中であればカウンタをリセットしてカウントを停止すると共に、カウント中でなければそのまま、該当タイミングがフレーズ中である旨を記録し（Ｓ１０７〜Ｓ１０９）、図８のステップＳ１２に進む。 On the other hand, if the silent section counter is counting in step S102, the silent section counter is counted up (S103). If the count value is equal to or greater than a predetermined threshold (S104), the fact that the timing of the sample being processed is a phrase break is recorded (S105), and the process returns to the original process. If it is not greater than or equal to the threshold value in step S104, the fact that the corresponding timing is in the phrase is recorded (S109), and the process proceeds to step S12 in FIG.
If NO in step S101, the counter is reset if the silent section counter is counting, and the count is stopped, and if it is not counting, the fact that the corresponding timing is in the phrase is recorded. (S107 to S109), the process proceeds to step S12 in FIG.

この処理は、フレーズ検出部４８の機能と対応する処理であり、この処理により、入力信号のサンプル値が所定値以下の期間が所定時間以上継続した場合に、これを入力信号のフレーズの切れ目として検出することができる。
なお、ステップＳ１０５又はＳ１０９で記録したフレーズの切れ目とフレーズ中の部分の情報については、各サンプルのタイミングについて逐一ピッチ加工部４６′に伝達するようにしてもよいが、ある程度情報を蓄積した後で、各フレーズの開始タイミングと終了タイミングの情報として伝達するようにするとよい。 This process is a process corresponding to the function of the phrase detection unit 48. When a period of a sample value of the input signal that is equal to or smaller than a predetermined value continues for a predetermined time or longer as a result of this process, this is regarded as a break between phrases of the input signal. Can be detected.
Note that the information about the break of the phrase and the portion in the phrase recorded in step S105 or S109 may be transmitted to the pitch processing unit 46 'for each sample timing one by one. The information may be transmitted as information on the start timing and end timing of each phrase.

また、ステップＳ１０１において、入力信号のサンプル値ではなく、音量エンベロープを求め、これが示す音量が所定値以下か否か判断するようにしてもよい。この場合において、検出の正確を期すため、入力信号を何らかのフィルタに通してから音量エンベロープを求めるようにしてもよい。
また、ステップＳ１０４で使用する閾値は、通常の人が耳で聞いてフレーズの切れ目であると認識できる程度の時間を示す値とするとよい。 In step S101, a volume envelope may be obtained instead of the sample value of the input signal, and it may be determined whether or not the volume indicated by the volume envelope is equal to or less than a predetermined value. In this case, in order to ensure the accuracy of detection, the volume envelope may be obtained after passing the input signal through some filter.
Further, the threshold used in step S104 may be a value indicating a time that can be recognized as a break of a phrase when a normal person hears it with an ear.

また、この実施形態においては、音階音推定及びピッチ設定処理も、第１の実施形態の場合と異なる。すなわち、ピッチシフト量ΔＰを、フレーズ毎に決めるようにしているため、音階音推定及び入力信号のピッチＰ_Ｉを設定する処理と、ピッチシフト量ΔＰ及びボイス信号のピッチＰ_Ｖを設定する処理とを、分離して行うようにしている。 In this embodiment, scale sound estimation and pitch setting processing are also different from those in the first embodiment. That is, the pitch shift amount [Delta] P, because of the so determined for each phrase, the process of setting the pitch P _I of the scale sound estimation and the input signal, a processing for setting the pitch P _V of the pitch shift amount [Delta] P and the voice signal Are done separately.

図１９に、この実施形態における音階音推定及びピッチ設定処理のフローチャートを示す。この図においては、図１０に示した第１の実施形態における処理と共通する部分には、同じステップ番号を付している。
この処理が図１０に示した処理と異なる点は、まず、ステップＳ３１′において、入力信号のピッチＰ_Ｉを、処理タイミングでの値として、タイミングの情報と対応させて記憶しておくようにしている点である。このタイミングの情報としては、ピッチ検出処理が何サンプル分の入力信号について完了している状態のデータに基づいてＰ_Ｉを求めたかを記憶しておけばよい。
また、ステップＳ３８等におけるタイマによる計時は、実時間ではなく、タイミング管理の場合と同様に、入力信号のサンプル数に基づいて入力データの再生時間を測るようにしている。 FIG. 19 shows a flowchart of scale sound estimation and pitch setting processing in this embodiment. In this figure, the same step numbers are assigned to parts common to the processing in the first embodiment shown in FIG.
And differs from the process shown in the process 10, first, in step S31 ', the pitch P _I of an input signal, as the value of the processing time, as stored in correspondence with the timing information It is a point. The information of the timing, may be stored or to determine the P _I on the basis of the data in the state where the pitch detection process is completed for several samples of the input signal.
Further, the time measurement by the timer in step S38 or the like is not the actual time, but the reproduction time of the input data is measured based on the number of samples of the input signal, as in the case of timing management.

そして、ステップＳ４０でＹＥＳの場合、推定値バッファ４５に登録されているピッチを処理タイミングでの推定音階音ピッチＰ_Ｅとして、後でピッチ加工処理において参照できるようにタイミングの情報と対応させて記憶し（ＳＸ）、次の処理タイミングまで待機し（ＳＹ）、ステップＳ３１′に戻って処理を繰り返す。
ここでのタイミングの情報は、ステップＳ３１′で用いたタイミングの情報と同じものであり、次の処理タイミングは、例えば、ステップＳ３１′の処理を行ってから再生時間にして６ｍｓ分の入力信号についてピッチ検出処理が完了したタイミングとすることができる。 In the case of YES at step S40, as the estimated chromatic notes pitch P _E of a pitch processing timing registered in the estimated value buffer 45, in correspondence with the timing information so that they can be referenced later in the pitch processing and storing (SX), wait until the next processing timing (SY), and return to step S31 'to repeat the processing.
The timing information here is the same as the timing information used in step S31 ′, and the next processing timing is, for example, for the input signal for 6 ms as the reproduction time after the processing in step S31 ′. The timing at which the pitch detection process is completed can be set.

また、ステップＳ４０でＮＯの場合及び、ステップＳ３５又はＳ３８の後では、処理タイミングでの推定音階音ピッチは「なし」である旨の情報をタイミングの情報と対応させて記憶し（ＳＺ）、ステップＳＹに進む。ここでのタイミングの情報も、ステップＳ３１′で用いたタイミングの情報と同じものである。
以上のような処理を行うことにより、図１０に示した処理の場合と同様に、適当な間隔で、各タイミングの入力信号と対応する音階音を推定することができる。そして、その音階音のピッチを、ピッチ加工処理で使用できるように記憶しておくことができる。 Further, in the case of NO in step S40 and after step S35 or S38, information indicating that the estimated scale pitch at the processing timing is “none” is stored in association with the timing information (SZ), and step Proceed to SY. The timing information here is also the same as the timing information used in step S31 ′.
By performing the processing as described above, the scale sound corresponding to the input signal at each timing can be estimated at appropriate intervals, as in the case of the processing shown in FIG. Then, the pitch of the scale sound can be stored so that it can be used in the pitch processing.

次に、図２０に、この実施形態におけるピッチ加工処理のフローチャートを示す。
この実施形態においては、１つのフレーズ分の範囲について各処理タイミングでの入力信号のピッチＰ_Ｉと推定音階音ピッチＰ_Ｅの情報が利用できるようになった段階で、図２０に示す処理により、その範囲の各処理タイミングにおけるボイス信号のピッチＰ_Ｖを求めるようにしている。 Next, FIG. 20 shows a flowchart of the pitch machining process in this embodiment.
In this embodiment, at the stage where the information of the pitch P _I and the estimated chromatic note pitch P _E of the input signal at each processing timing for one range of phrase content becomes available, the processing shown in FIG. 20, The pitch P _V of the voice signal at each processing timing within the range is obtained.

この処理においてはまず、処理対象のフレーズ内の、有効な推定音階音ピッチＰ_Ｅがある範囲で、（Ｐ_Ｅ−Ｐ_Ｉ）の平均値を求め、これをΔＰ′とする（Ｓ１１１）。そして、正の定数Ｃと０から１までの範囲の乱数Ｒについて、ピッチシフト量ΔＰ＝ΔＰ′＋Ｃ×（Ｒ−０．５）とする（Ｓ１１２）。このΔＰは、フレーズ内については共通としてもよいし、処理タイミング毎に乱数を取りなおして別々に求めてもよい。そして、このΔＰを用い、各処理タイミングのＰ_Ｉと対応するボイス信号のピッチＰ_Ｖを、Ｐ_Ｖ＝Ｐ_Ｉ＋ΔＰにより求め（Ｓ１１３）、処理を終了する。
すなわち、Ｐ_Ｖ＝Ｐ_Ｉ＋〔（Ｐ_Ｅ−Ｐ_Ｉ）の平均値〕＋Ｃ×（Ｒ−０．５）としている。 In this process, first, an average value of (P _E −P _I ) is obtained within a range where there is an effective estimated musical scale pitch P _E in the phrase to be processed, and this is set as ΔP ′ (S111). Then, for a positive constant C and a random number R in the range from 0 to 1, a pitch shift amount ΔP = ΔP ′ + C × (R−0.5) is set (S112). This ΔP may be common within the phrase, or may be obtained separately by re-taking random numbers at each processing timing. Then, using the [Delta] P, the pitch _{P V} of the corresponding voice signal _{P I} of the processing _timing, determined by _{P V = P I + ΔP (} S113), the process ends.
That is, P _V = P _I + [average value of (P _E −P _I )] + C × (R−0.5).

ここで、図２１を用いて、図２０に示した処理によりＰ_Ｖを求める場合の、Ｐ_ＩとＰ_Ｖの関係について説明する。
この図においては、入力信号のピッチを実線９１で示し、有効な推定音階音ピッチＰ_Ｅが存在する範囲では、これを太線９２，９３で示している。そして、図２０のステップＳ１１１で求める平均値は、フレーズ内の期間のうち、太線９２，９３で示される期間についてのみ、（Ｐ_Ｅ−Ｐ_Ｉ）の値を加算し、その期間のデータ数で除すことにより求めることができる。また、太線９２で示される期間と太線９３でされる期間について平均値を別々に求め、これらの平均値について、各期間の長さに応じた重み付けを行ってさらに平均を取ることによっても求めることができる。 Here, the relationship between P _I and P _V when P _V is obtained by the process shown in FIG. 20 will be described with reference to FIG.
In this figure, the pitch of the input signal by a solid line 91, within the range of valid estimate chromatic note pitch P _E is present, shows this in bold lines 92 and 93. Then, the average value calculated by the step S111 in FIG. 20, of the period in the phrase, the period indicated by the thick line 92 and 93 only _adds the value of _(P E -P _I), the number of data of the period It can be obtained by dividing. In addition, an average value is separately obtained for the period indicated by the thick line 92 and the period indicated by the thick line 93, and these average values are also obtained by weighting according to the length of each period and further averaging. Can do.

そして、Ｃ＝０の場合には、Ｐ_Ｖは、Ｐ_Ｉに単にその平均値を加えた値となり、実線９１を平行移動させた一点鎖線９４で示す値を取ることになる。また、Ｃが０でない場合には、Ｐ_Ｖは、この一点鎖線９４を中心とした、破線９５，９６に挟まれた幅Ｃの領域内の値を取ることになる。
そして、この図からわかるように、Ｐ_Ｖを図２０に示した処理により求めるようにした場合でも、個々の処理タイミングについては必ずしも成り立つとは限らないが、全体としてはＰ_ＩよりもＰ_Ｅに近くなるようなＰ_Ｖを求めることができる。すなわち、入力信号よりも音階音に近いピッチのボイス信号を生成できる。そして、このことにより、この実施形態の電子楽器によっても、ボイス信号のピッチが音階音の音高から大きく外れてしまうことを防止し、その重畳により出力音が不自然な聴感になってしまうことを防止できる。 In the case of C = 0, the P _V is simply becomes a value obtained by adding the average value to P _I, takes a value indicated by the dashed line 94 is moved parallel to the solid line 91. Further, when C is not 0, P _V is centered on the dashed line 94, takes a value in a region having a width C which is sandwiched between the dashed line 95, 96.
Then, as can be seen from this figure, even when so determined by the processing shown the P _V in FIG. 20, but not necessarily true for individual processing timing, the P _E than P _I as a whole it is possible to find the nearby made such a P _V. That is, it is possible to generate a voice signal having a pitch closer to the scale sound than the input signal. As a result, even with the electronic musical instrument of this embodiment, the pitch of the voice signal is prevented from greatly deviating from the pitch of the scale sound, and the superimposition of the output sound makes the sound unnatural. Can be prevented.

なお、この実施形態においても、第１の実施形態の場合のように、Ｐ_Ｅが有効な値である場合のみ、ボイス信号の生成を行うようにしてもよい。このようにすれば、音階音のピッチに近い部分のみ入力信号を強調し、出力音を自然な聴感にすることができる。
ただし、逆に、次のフレーズが始まるまではフレーズ内で求めたΔＰを継続して使用する等して、フレーズの切れ目においてもボイス信号の生成を行うようにすることも考えられる。
また、フレーズの検出を、ピッチ検出部４１におけるピッチ検出が適切に行えない状態が所定時間以上継続している場合に、入力信号がフレーズの切れ目であるとして行なうようにしてもよい。このようにする場合、ピッチ検出部４１から、入力信号のピッチの検出結果をフレーズ検出部４８に入力するようにすればよい。 Also in this embodiment, as in the first embodiment, only when the P _E is a valid value, it may be performed to generate the voice signal. In this way, it is possible to enhance the input signal only in the part close to the pitch of the scale sound and make the output sound natural.
However, conversely, it is also conceivable that the voice signal is generated even at the break of the phrase by continuously using ΔP obtained in the phrase until the next phrase starts.
Alternatively, the phrase may be detected as a break between phrases when the state in which the pitch detection by the pitch detector 41 cannot be performed properly continues for a predetermined time or longer. In this case, the pitch detection unit 41 may input the pitch detection result of the input signal to the phrase detection unit 48.

以上で実施形態の説明を終了するが、装置の構成や具体的な処理内容等が上述の各実施形態で説明したものに限られないことはもちろんである。
例えば、入力信号と対応する音階音を推定する際、ピッチの近い音階音の検索と、ピッチの差が所定誤差以内か否かの確認という２段階の処理を行わず、予め音階音毎に所定誤差を考慮してピッチ範囲を定めておき、そのいずれかに入力音のピッチが属する場合に、入力音と対応する音階音が、そのピッチ範囲を持つ音階音であると推定するようにしてもよい。また、音階音も、必ずしも平均律の音階に属する音でなくてもよい。 Although the description of the embodiment has been completed above, it is a matter of course that the configuration of the apparatus, specific processing contents, and the like are not limited to those described in the above-described embodiments.
For example, when estimating the scale sound corresponding to the input signal, the two steps of searching for a scale sound with a close pitch and confirming whether the pitch difference is within a predetermined error are not performed, A pitch range is determined in consideration of errors, and when the pitch of the input sound belongs to any of them, it is assumed that the scale sound corresponding to the input sound is a scale sound having the pitch range. Good. Also, the scale sound does not necessarily have to belong to the scale of equal temperament.

また、上述した実施形態においては、ピッチ変換処理にＬｅｎｔ法を採用したが、これ以外の方法でピッチ変換を行うようにしてもよい。さらに、処理対象をアナログの音響信号とし、ピッチ検出処理、ピッチ加工処理、ピッチ変換処理、ミックス処理等を、アナログ回路によって行うようにしてもよい。
さらにこれらの変形や、第１及び第２の実施形態で説明した内容は、矛盾を生じない範囲で適宜組み合わせることもできる。 In the above-described embodiment, the Lent method is adopted for the pitch conversion processing. However, pitch conversion may be performed by a method other than this. Furthermore, the processing target may be an analog acoustic signal, and pitch detection processing, pitch processing processing, pitch conversion processing, mix processing, and the like may be performed by an analog circuit.
Furthermore, these modifications and the contents described in the first and second embodiments can be appropriately combined within a range in which no contradiction occurs.

また、この発明が、電子楽器以外の音響信号処理装置に適用できることはもちろんであり、例えば、カラオケ装置、ミキサ、音源装置、ＭＩＤＩシーケンサ、音響信号を処理するソフトウェアを実行可能なＰＣ等、波形を示す音響信号を取り扱う機能を有する装置であれば、任意の装置に適用することが可能である。さらに、この発明を、単体のエフェクタあるいは装置にエフェクタ機能を付与するためのプログラムとして実施することも可能である。 Of course, the present invention can be applied to an acoustic signal processing apparatus other than an electronic musical instrument. Any device can be used as long as it has a function of handling the acoustic signal shown. Furthermore, the present invention can be implemented as a program for providing an effector function to a single effector or apparatus.

また、この発明のプログラムは、コンピュータにハードウェアを制御させて上述したような音響信号処理装置として機能させるためのプログラムであり、予めＲＯＭやＨＤＤ等に記憶させておくほか、ＣＤ−ＲＯＭあるいはフレキシブルディスク等の不揮発性記録媒体（メモリ）に記録して提供し、そのメモリからこのプログラムをＲＡＭに読み出させてＣＰＵに実行させたり、プログラムを記録した記録媒体を備える外部機器あるいはプログラムをＨＤＤ等の記憶手段に記憶した外部機器からダウンロードして実行させたりしても、同様の効果を得ることができる。 Further, the program of the present invention is a program for causing a computer to control hardware so as to function as the above-described acoustic signal processing apparatus. In addition to being stored in advance in a ROM, HDD, etc., a CD-ROM or flexible The program is recorded on a non-volatile recording medium (memory) such as a disk, and this program is read from the memory to the RAM and executed by the CPU, or an external device or program including the recording medium on which the program is recorded is stored in the HDD or the like. The same effect can be obtained even when downloaded from an external device stored in the storage means and executed.

以上の説明から明らかなように、この発明の音響信号処理装置又はプログラムによれば、常に自然な聴感の出力信号を得られるエフェクトを実現することができる。
従って、この発明によれば、自然かつ斬新な音を生成可能な音響信号処理装置を提供することができる。 As is clear from the above description, according to the acoustic signal processing apparatus or program of the present invention, it is possible to realize an effect capable of always obtaining an output signal with a natural audibility.
Therefore, according to the present invention, it is possible to provide an acoustic signal processing device capable of generating natural and novel sounds.

この発明の音響信号処理装置の第１の実施形態である電子楽器の構成を示すブロック図である。1 is a block diagram showing a configuration of an electronic musical instrument which is a first embodiment of an acoustic signal processing device of the present invention. 図１に示した信号処理部に備えるダブリングエフェクタの機能構成を示す図である。It is a figure which shows the function structure of the doubling effector with which the signal processing part shown in FIG. 1 is equipped. 図２に示したボイス信号生成部の機能構成を示す図である。It is a figure which shows the function structure of the voice signal production | generation part shown in FIG. 図３に示した音階音推定部で入力信号と対応する音階音を推定した場合の推定結果の例を示す図である。It is a figure which shows the example of the estimation result at the time of estimating the scale sound corresponding to an input signal in the scale sound estimation part shown in FIG. 図３に示したピッチ検出部におけるピッチ検出処理について説明するための図である。It is a figure for demonstrating the pitch detection process in the pitch detection part shown in FIG.

図３に示したピッチ変換部におけるピッチ変換処理について説明するための図である。It is a figure for demonstrating the pitch conversion process in the pitch conversion part shown in FIG. その別の図である。It is another figure. 図１に示した電子楽器のＣＰＵが実行するピッチ検出処理のフローチャートである。It is a flowchart of the pitch detection process which CPU of the electronic musical instrument shown in FIG. 1 performs. 同じくピッチ検出制御処理のフローチャートである。It is a flowchart of a pitch detection control process similarly. 同じく音階音推定及びピッチ設定処理のフローチャートである。It is a flowchart of a scale sound estimation and pitch setting process similarly.

図１０のステップＳ４２で実行するピッチシフト量決定処理の第１の例のフローチャートである。It is a flowchart of the 1st example of the pitch shift amount determination process performed by step S42 of FIG. 図１１に示した処理によりΔＰを求める場合の、Ｐ_ＩとＰ_Ｖの関係の例を示す図である。When determining the ΔP in the process shown in FIG. 11 is a diagram showing an example of the relationship between P _I and P _V. ピッチシフト量決定処理の第２の例のフローチャートである。It is a flowchart of the 2nd example of pitch shift amount determination processing. 図１３に示した処理によりΔＰを求める場合の、Ｐ_ＩとＰ_Ｖの関係の例を示す図である。When determining the ΔP in the process shown in FIG. 13 is a diagram showing an example of the relationship between P _I and P _V. 図１に示した電子楽器のＣＰＵが実行する基準区間設定処理のフローチャートである。It is a flowchart of the reference | standard area setting process which CPU of the electronic musical instrument shown in FIG. 1 performs. 同じくピッチ変換処理のフローチャートである。It is a flowchart of a pitch conversion process similarly. この発明の音響信号処理装置の第２の実施形態である電子楽器におけるボイス信号生成部の機能構成を示す図である。It is a figure which shows the function structure of the voice signal production | generation part in the electronic musical instrument which is 2nd Embodiment of the acoustic signal processing apparatus of this invention. 第２の実施形態において電子楽器のＣＰＵが実行するフレーズ検出処理のフローチャートである。It is a flowchart of the phrase detection process which CPU of an electronic musical instrument performs in 2nd Embodiment. 同じく音階音推定及びピッチ設定処理のフローチャートである。It is a flowchart of a scale sound estimation and pitch setting process similarly. 同じくピッチ加工処理のフローチャートである。It is a flowchart of a pitch process similarly. 図２０に示した処理によりＰ_Ｖを求める場合の、Ｐ_ＩとＰ_Ｖの関係について説明するための図である。The processing shown in FIG. 20 of the case of obtaining the P _V, which is a diagram for explaining a relationship between P _I and P _V.

Explanation of symbols

１０…電子楽器、１１…ＣＰＵ，１２…ＲＯＭ、１３…ＲＡＭ、１４…検出回路、１５…表示回路、１６…オーディオ信号Ｉ／Ｆ、１７…通信Ｉ／Ｆ、１８…音源部、１９…信号処理部、２０…システムバス、２１…操作子、２２…表示器、２３…サウンドシステム、３０…ダブリングエフェクタ、４０，４０′…ボイス信号生成部、４１…ピッチ検出部、４２…ピッチバッファ、４３…音階音推定部、４４…音高テーブル、４５…推定値バッファ、４６，４６′…ピッチ加工部、４７…ピッチ変換部、４８…フレーズ検出部、５０…遅延処理部、６０…ミックス部、６１，６４…ゲイン調整部、６２，６５…パン調整部、６３，６６…加算部
DESCRIPTION OF SYMBOLS 10 ... Electronic musical instrument, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... Detection circuit, 15 ... Display circuit, 16 ... Audio signal I / F, 17 ... Communication I / F, 18 ... Sound source part, 19 ... Signal Processing unit 20 ... System bus 21 ... Operator 22 ... Indicator 23 ... Sound system 30 ... Doubling effector 40, 40 '... Voice signal generation unit 41 ... Pitch detection unit 42 ... Pitch buffer 43 ... scale sound estimation unit, 44 ... pitch table, 45 ... estimated value buffer, 46, 46 '... pitch processing unit, 47 ... pitch conversion unit, 48 ... phrase detection unit, 50 ... delay processing unit, 60 ... mix unit, 61, 64 ... gain adjustment unit, 62, 65 ... pan adjustment unit, 63, 66 ... addition unit

Claims

Processing signal generating means for generating a processing signal by performing pitch conversion processing on the input signal;
Mixing means for mixing and outputting the input signal and the processing signal;
Pitch storage means for storing the pitches of all scale sounds within a predetermined range of a certain scale;
Estimating means for estimating a scale sound corresponding to the input signal,
The processing signal generating means only when said estimating means could estimate the chromatic note corresponding to the input signal, row the pitch conversion processing so as to be close to the pitch of the estimated chromatic note pitch of the input signal Ri means der for generating the machining signals,
When the estimation means has a difference between a pitch of the input signal and a pitch of a scale sound stored in the pitch storage means within a predetermined error range, the scale sound corresponds to the scale sound corresponding to the input signal. Is a means of presuming that
The predetermined error range is defined so that an error range for a certain scale sound does not overlap an error range for another scale sound for all the scale sounds for which the pitch storage means stores the pitch. An acoustic signal processing device.

Computer
Processing signal generating means for generating a processing signal by performing pitch conversion processing on the input signal;
Mixing means for mixing and outputting the input signal and the processing signal;
Pitch storage means for storing the pitches of all scale sounds within a predetermined range of a certain scale;
A program for functioning as an estimation means for estimating a scale sound corresponding to the input signal,
The processing signal generating means only when said estimating means could estimate the chromatic note corresponding to the input signal, row the pitch conversion processing so as to be close to the pitch of the estimated chromatic note pitch of the input signal Ri means der for generating the machining signals,
When the estimation means has a difference between a pitch of the input signal and a pitch of a scale sound stored in the pitch storage means within a predetermined error range, the scale sound corresponds to the scale sound corresponding to the input signal. Is a means of presuming that
The predetermined error range is defined so that an error range for a certain scale sound does not overlap an error range for another scale sound for all the scale sounds for which the pitch storage means stores the pitch. Program.