JP7618983B2

JP7618983B2 - Electronic musical instrument, electronic musical instrument control method, and program

Info

Publication number: JP7618983B2
Application number: JP2020143617A
Authority: JP
Inventors: 克瀬戸口
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2025-01-22
Anticipated expiration: 2040-08-27
Also published as: JP2022038903A

Description

本発明は、鍵盤等の操作子の操作に応じて歌声を再生する電子楽器、電子楽器の制御方法、及びプログラムに関する。 The present invention relates to an electronic musical instrument that reproduces singing voices in response to the operation of controls such as a keyboard, a control method for an electronic musical instrument, and a program.

鍵盤楽器において、発音すべき音高とその発音タイミングとを表す楽曲データに基づき、ユーザに電子楽器をレッスンさせ、その電子楽器からの信号を入力して各種項目について評価及び採点を行う音楽教習システムが知られている（例えば特許文献１に記載のシステム）。 For keyboard instruments, a music training system is known that allows a user to take lessons on an electronic instrument based on music data that indicates the pitch to be played and the timing of the playing, and inputs signals from the electronic instrument to evaluate and score the user on various aspects (for example, the system described in Patent Document 1).

近年では、例えば電子鍵盤楽器とＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）により接続され、表示画面上で、楽曲の進行に伴い、画面上部より落ちてくるバー（ピアノロール）が画面下部の鍵盤に到達するタイミングでユーザが電子鍵盤楽器の鍵盤上の該当する鍵を押鍵することで得点を加算する、スマートデバイス向けアプリケーションソフトウェア（以下「アプリ」と呼ぶ）が知られている。このようなアプリでは、更に楽曲途中の採点途中経過を表示するとともに、楽曲終了後には採点結果を表示することができるものもあり、ユーザは飽きることなく、演奏の練習をすることができる（例えば非特許文献１に記載のアプリ）。 In recent years, application software for smart devices (hereafter referred to as "apps") has become known that is connected to an electronic keyboard instrument via MIDI (Musical Instrument Digital Interface), and that adds points by pressing the corresponding keys on the keyboard of the electronic keyboard instrument when a bar (piano roll) that falls from the top of the screen as the song progresses reaches the keys at the bottom of the screen. Some such apps can also display the progress of the scoring during the song, and the results of the scoring after the song has ended, allowing users to practice playing without getting bored (for example, the app described in non-patent document 1).

特開平１０－１８７０２１号公報Japanese Patent Application Publication No. 10-187021

“ＣｈｏｒｄａｎａＰｌａｙｆｏｒＰｉａｎｏ：ＭＩＤＩプレイヤー”，カシオ計算機株式会社，［令和２年７月２０日検索］，インターネット＜URL https://web.casio.com/app/ja/piano/＞"Chordana Play for Piano: MIDI Player", Casio Computer Co., Ltd., [Retrieved July 20, 2020], Internet <URL https://web.casio.com/app/ja/piano/>

ところで、上記従来技術は、ユーザによる楽器演奏の演奏経過や採点結果などをディスプレイに表示することを前提としており、このような表示は、電子楽器に搭載された高精細なディスプレイやスマートデバイスのディスプレイなどの表示装置があって初めて可能となる。 However, the above-mentioned conventional technology is based on the premise that the performance progress of the user's musical instrument performance and the scoring results are displayed on a display screen. Such displays are only possible with a display device such as a high-definition display mounted on an electronic musical instrument or a display on a smart device.

しかしながら、一般的な電子楽器、特にこのような演奏練習機能の対象となる初心者向けの低価格な電子楽器においては、搭載する表示装置が小型かつ低解像度である、或いは表示装置自体を装備していない場合も多い。従って、このような電子楽器単体では、演奏練習機能自体は実装可能であっても、それを分かり易く飽きにくい機能としてユーザに提供することは困難であった。 However, in general electronic musical instruments, particularly low-cost electronic musical instruments aimed at beginners that are the target of such performance practice functions, the display devices installed are small and low resolution, or in many cases no display device is installed at all. Therefore, even if the performance practice function itself can be implemented in such electronic musical instruments alone, it has been difficult to provide it to users as a function that is easy to understand and will not tire of.

更に、表示装置に演奏経過が表示されたとしても、特に初心者においては自身の演奏に気を取られるあまり鍵盤以外の表示装置を見る余裕がないことが考えられる。 Furthermore, even if the performance progress is shown on the display device, beginners in particular may be so distracted by their own performance that they do not have time to look at the display device other than the keyboard.

そこで、本発明は、低品質な表示装置しか持たない若しくは表示装置を持たない電子楽器においても、分かり易くかつ意欲を失わせることなく飽きずに演奏の練習をさせることができるようにすること、および表示装置がなくともユーザに演奏練習に関する情報を伝達できるようにすることを目的とする。 The present invention aims to enable users to practice playing in an easy-to-understand manner, without losing motivation or getting bored, even on electronic musical instruments that have only a low-quality display device or no display device, and to enable users to receive information about playing practice even without a display device.

態様の一例の電子楽器は、演奏者の演奏情報を取得する演奏情報取得手段と、歌詞情報と音高情報とタイミング情報を少なくとも含む演奏ガイドデータと演奏情報から、楽曲の進行中に演奏者の演奏を異なるタイミングで複数回評価する演奏評価手段と、演奏情報と歌詞情報に基づき、歌詞を歌声で発声する歌声発声手段と、演奏評価手段の評価が前回の評価から変化した場合に、変化した評価に対応して歌声の声質を変更する声質変更手段と、を備える。 An electronic musical instrument according to one embodiment of the present invention includes a performance information acquisition means for acquiring performance information of a performer, a performance evaluation means for evaluating the performance of the performer multiple times at different times while a piece of music is being played based on the performance guide data and the performance information, the performance information including at least lyric information, pitch information, and timing information, a singing voice production means for vocalizing the lyrics based on the performance information and the lyric information, and a voice quality change means for, when the evaluation of the performance evaluation means has changed from the previous evaluation, changing the voice quality of the singing voice in response to the changed evaluation.

本発明によれば、低品質な表示装置しか持たない若しくは表示装置を持たない電子楽器においても、分かり易くかつ意欲を失わせることなく飽きずに演奏の練習させることが可能となり、また、表示装置がなくともユーザに演奏練習に関する情報を伝達できるようにすることが可能となる。 The present invention makes it possible to allow a user to practice playing an instrument in an easy-to-understand manner, without losing motivation or becoming bored, even on electronic musical instruments that have only a low-quality display device or no display device, and also makes it possible to convey information about playing practice to the user even without a display device.

電子鍵盤楽器の一実施形態の外観例を示す図である。1 is a diagram showing an example of the appearance of an embodiment of an electronic keyboard instrument; 電子鍵盤楽器の制御システムの一実施形態のハードウェア構成例を示すブロック図である。1 is a block diagram showing an example of a hardware configuration of an embodiment of a control system for an electronic keyboard instrument; 音声合成ＬＳＩの構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a voice synthesis LSI. 実施形態の動作説明図である。FIG. 実施形態のデータ構成例を示す図である。FIG. 2 is a diagram illustrating an example of a data configuration according to an embodiment. 演奏レッスンの全体処理の例を示すフローチャートである。13 is a flowchart showing an example of the overall processing of a performance lesson. レッスン処理の詳細例を示すフローチャートである。13 is a flowchart showing a detailed example of a lesson process. 自動再生処理の詳細例を示すフローチャートである。13 is a flowchart showing a detailed example of an automatic playback process. 演奏ガイド処理の詳細例を示すフローチャートである。13 is a flowchart showing a detailed example of a performance guide process. 押鍵・離鍵処理の詳細例を示すフローチャートである。11 is a flowchart showing a detailed example of a key pressing/releasing process. 採点処理の詳細例を示すフローチャートである。13 is a flowchart showing a detailed example of a scoring process. 声質更新処理の詳細例を示すフローチャートである。13 is a flowchart showing a detailed example of a voice quality updating process. 声質変更処理の例を示すフローチャートである。13 is a flowchart showing an example of a voice quality change process. ノイズ混合比補間処理及びフォルマント補間処理の詳細例を示すフローチャートである。13 is a flowchart showing a detailed example of a noise mixing ratio interpolation process and a formant interpolation process. 音声合成ＬＳＩ内の音声合成部３００内の発声モデル部３０３の他の構成例を示すブロック図である。13 is a block diagram showing another example of the configuration of the vocalization model unit 303 in the voice synthesis unit 300 in the voice synthesis LSI. FIG.

以下、本発明を実施するための形態について図面を参照しながら詳細に説明する。図１は、電子鍵盤楽器の一実施形態１００の外観例を示す図である。電子鍵盤楽器１００は、操作子としての複数の鍵からなる鍵盤１０１と、音量の指定、歌詞自動再生のテンポ設定、歌詞自動再生開始等の各種設定を指示する第１のスイッチパネル１０２と、ソングの選曲や楽器音色の選択等を行う第２のスイッチパネル１０３を備える。また、鍵盤１０１の各鍵は、ＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ：発光ダイオード）１０４を備える。このＬＥＤ１０４は、それが含まれる鍵が歌詞自動再生時に次に指定されるべき鍵であるときには最大輝度で光り、その鍵が歌詞自動再生時に次の次に指定されるべき鍵であるときには最大輝度の半分の輝度で光る。更に、電子鍵盤楽器１００は、特には図示しないが、演奏により生成された楽音を放音するスピーカを裏面部、側面部、又は背面部等に備える。 Hereinafter, the embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing an example of the appearance of an embodiment 100 of an electronic keyboard instrument. The electronic keyboard instrument 100 includes a keyboard 101 consisting of a plurality of keys as operators, a first switch panel 102 for instructing various settings such as specifying the volume, setting the tempo for automatic lyrics playback, and starting automatic lyrics playback, and a second switch panel 103 for selecting songs and selecting instrument tones. Each key of the keyboard 101 includes an LED (Light Emitting Diode) 104. This LED 104 lights up at maximum brightness when the key it includes is the key that should be specified next during automatic lyrics playback, and lights up at half the maximum brightness when the key is the key that should be specified after the next key during automatic lyrics playback. Furthermore, the electronic keyboard instrument 100 includes a speaker on the back, side, or rear part, etc., which emits musical tones generated by playing, although not shown in particular.

図２は、図１の電子鍵盤楽器１００の制御システム２００の一実施形態のハードウェア構成例を示す図である。図２において、制御システム２００は、ＣＰＵ（中央演算処理装置）２０１、ＲＯＭ（リードオンリーメモリ）２０２、ＲＡＭ（ランダムアクセスメモリ）２０３、音源ＬＳＩ（大規模集積回路）２０４、音声合成ＬＳＩ２０５、図１の鍵盤１０１、第１のスイッチパネル１０２と第２のスイッチパネル１０３が接続されるキースキャナ２０６、及び図１の鍵盤１０１上の各鍵が備えるＬＥＤ１０４が接続されるＬＥＤコントローラ２０７、外部のネットワークとＭＩＤＩデータ等のやりとりを行うネットワークインタフェース２０８が、それぞれシステムバス２０９に接続されている。また、ＣＰＵ２０１には、歌声データの自動再生のシーケンスを制御するためのタイマ２１０が接続される。更に、音源ＬＳＩ２０４及び音声合成ＬＳＩ２０５からそれぞれ出力される楽音出力データ２１８及び歌声音声出力データ２１７は、Ｄ／Ａコンバータ２１１、２１２によりそれぞれアナログ楽音出力信号及びアナログ歌声音声出力信号に変換される。アナログ楽音出力信号及びアナログ歌声音声出力信号は、ミキサ２１３で混合され、その混合信号がアンプ２１４で増幅された後に、特には図示しないスピーカ又は出力端子から出力される。 2 is a diagram showing an example of the hardware configuration of an embodiment of the control system 200 of the electronic keyboard instrument 100 of FIG. 1. In FIG. 2, the control system 200 includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a sound source LSI (Large Scale Integrated Circuit) 204, a voice synthesis LSI 205, a key scanner 206 to which the keyboard 101 of FIG. 1, the first switch panel 102 and the second switch panel 103 are connected, an LED controller 207 to which the LEDs 104 of each key on the keyboard 101 of FIG. 1 are connected, and a network interface 208 for exchanging MIDI data with an external network is connected to a system bus 209. A timer 210 for controlling the sequence of automatic playback of vocal data is also connected to the CPU 201. Furthermore, the musical tone output data 218 and singing voice output data 217 output from the sound source LSI 204 and the voice synthesis LSI 205, respectively, are converted by D/A converters 211 and 212 into an analog musical tone output signal and an analog singing voice output signal, respectively. The analog musical tone output signal and the analog singing voice output signal are mixed in a mixer 213, and the mixed signal is amplified in an amplifier 214 and then output from a speaker or an output terminal (not shown).

ＣＰＵ２０１は、ＲＡＭ２０３をワークメモリとして使用しながらＲＯＭ２０２に記憶された制御プログラムを実行することにより、図１の電子鍵盤楽器１００の制御動作を実行する。また、ＲＯＭ２０２は、上記制御プログラム及び各種制御データのほか、歌詞データを含む後述する演奏ガイドデータを記憶する。 The CPU 201 executes the control program stored in the ROM 202 while using the RAM 203 as a work memory, thereby executing the control operation of the electronic keyboard instrument 100 in FIG. 1. The ROM 202 also stores the control program and various control data, as well as performance guide data, which will be described later, including lyric data.

ＣＰＵ２０１には、本実施形態で使用するタイマ２１０が実装されており、例えば電子鍵盤楽器１００における演奏ガイドデータの自動再生の進行をカウントする。 The CPU 201 is equipped with a timer 210 used in this embodiment, which counts the progress of automatic playback of performance guide data in the electronic keyboard instrument 100, for example.

音源ＬＳＩ２０４は、ＣＰＵ２０１からの発音制御指示に従って、例えば特には図示しない波形ＲＯＭから楽音波形データを読み出し、Ｄ／Ａコンバータ２１１に出力する。音源ＬＳＩ２０４は、同時に最大２５６ボイスを発音させる能力を有する。 The sound source LSI 204 reads musical tone waveform data, for example from a waveform ROM (not shown), in accordance with sound generation control instructions from the CPU 201, and outputs the data to the D/A converter 211. The sound source LSI 204 has the ability to generate up to 256 voices simultaneously.

音声合成ＬＳＩ２０５は、ＣＰＵ２０１から、歌詞のテキストデータである歌詞情報と音高に関する音高情報を歌声データ２１５として与えられると、それに対応する歌声の音声データである歌声音声出力データ２１７を合成し、Ｄ／Ａコンバータ２１２に出力する。 When the voice synthesis LSI 205 receives from the CPU 201 lyric information, which is text data of lyrics, and pitch information regarding pitch as singing voice data 215, it synthesizes singing voice output data 217, which is the voice data of the corresponding singing voice, and outputs it to the D/A converter 212.

キースキャナ２０６は、図１の鍵盤１０１の押鍵／離鍵状態、第１のスイッチパネル１０２、及び第２のスイッチパネル１０３のスイッチ操作状態を定常的に走査し、ＣＰＵ２０１に割込みをかけて状態変化を伝える。 The key scanner 206 constantly scans the key-on/key-off state of the keyboard 101 in FIG. 1, the switch operation state of the first switch panel 102, and the second switch panel 103, and notifies the CPU 201 of state changes by issuing an interrupt.

ＬＥＤコントローラ２０７は、図１の鍵盤１０１上の各鍵が備える各ＬＥＤ１０４の表示状態を制御するＩＣ（集積回路）である。 The LED controller 207 is an IC (integrated circuit) that controls the display state of each LED 104 provided on each key on the keyboard 101 in FIG. 1.

図３は、本実施形態における音声合成部３００の構成例を示すブロック図である。ここで、音声合成部３００は、図２の音声合成ＬＳＩ２０５が実行する一機能として電子鍵盤楽器１００に内蔵される。 Figure 3 is a block diagram showing an example of the configuration of the voice synthesis unit 300 in this embodiment. Here, the voice synthesis unit 300 is built into the electronic keyboard instrument 100 as one function executed by the voice synthesis LSI 205 in Figure 2.

音声合成部３００は、図２のＣＰＵ２０１から指示される歌詞情報及び音高情報を含む歌声データ２１５を入力することにより、歌声音声出力データ２１７を合成し出力する。このとき音声合成部３００のプロセッサは、音響モデル部３０１に設定されている音響モデルに対して、ＣＰＵ２０１により入力される歌詞情報と音高情報を含む歌声データ２１５に応じて音響モデル部３０１が出力したスペクトル情報３０９と音源情報３１０とに基づいて、歌い手の歌声を推論した歌声音声出力データ２１７を出力する発声処理を実行する。音声合成部３００は、例えば下記特許文献に記載の技術に基づいて実装される。 The voice synthesis unit 300 synthesizes and outputs singing voice output data 217 by inputting singing voice data 215 including lyric information and pitch information instructed by the CPU 201 in Fig. 2. At this time, the processor of the voice synthesis unit 300 executes vocalization processing to output singing voice output data 217 that infers the singer's singing voice based on the spectrum information 309 and sound source information 310 output by the acoustic model unit 301 in response to the singing voice data 215 including lyric information and pitch information input by the CPU 201, for the acoustic model set in the acoustic model unit 301. The voice synthesis unit 300 is implemented based on the technology described in the following patent document, for example.

（特許文献）：特許第６６１０７１４号公報
音声合成部３００の動作の詳細は上記特許文献に開示されているが、その概略動作について以下に説明する。 (Patent Document): Japanese Patent No. 6610714 Although details of the operation of the speech synthesis unit 300 are disclosed in the above-mentioned patent document, an outline of the operation will be described below.

音声合成部３００は、テキスト解析部３０２と音響モデル部３０１と発声モデル部３０３とフォルマント補間処理部３０６とノイズ重畳部３０７とを含む。 The speech synthesis unit 300 includes a text analysis unit 302, an acoustic model unit 301, a speech model unit 303, a formant interpolation processing unit 306, and a noise superimposition unit 307.

音声合成部３００は、歌詞のテキストである歌詞情報と音源情報３１０とを含む歌声データ２１５に対応する歌声音声出力データ３２１を、音響モデル部３０１に設定された音響モデルという統計モデルを用いて予測することにより合成する、統計的音声合成処理を実行する。 The voice synthesis unit 300 executes a statistical voice synthesis process to synthesize singing voice output data 321 corresponding to singing voice data 215 including lyric information, which is the text of lyrics, and sound source information 310, by predicting the data using a statistical model called an acoustic model set in the acoustic model unit 301.

テキスト解析部３０２は、図２のＣＰＵ２０１より指定される歌詞の音素、音高等に関する歌詞情報を含む歌声データ２１５を入力し、そのデータを解析する。この結果、テキスト解析部３０２は、歌声データ２１５に対応する音素、品詞、単語等を表現する言語特徴量系列３０８を解析して出力する。 The text analysis unit 302 inputs singing voice data 215 including lyric information related to the phonemes, pitch, etc. of the lyrics specified by the CPU 201 in FIG. 2, and analyzes the data. As a result, the text analysis unit 302 analyzes and outputs a linguistic feature sequence 308 expressing the phonemes, parts of speech, words, etc. corresponding to the singing voice data 215.

音響モデル部３０１は、言語特徴量系列３０８と歌声データ２１５中の音高情報とを入力することにより、それに対応するスペクトル情報３０９及び音源情報３１０を推定して出力する。即ち、音響モデル部３０１は、テキスト解析部３０２から入力する言語特徴量系列３０８及び歌声データ２１５中の音高情報とに基づいて、例えば機械学習により学習結果として設定された音響モデルを用いて、生成確率を最大にするようなスペクトル情報３０９及び音源情報３１０の推定値を出力する。 The acoustic model unit 301 receives the language feature sequence 308 and the pitch information in the singing data 215, and estimates and outputs the corresponding spectral information 309 and sound source information 310. That is, the acoustic model unit 301 outputs estimates of the spectral information 309 and sound source information 310 that maximize the generation probability, based on the language feature sequence 308 and the pitch information in the singing data 215 received from the text analysis unit 302, using an acoustic model set as a learning result by machine learning, for example.

発声モデル部３０３は、スペクトル情報３０９及び音源情報３１０を入力することにより、ＣＰＵ２０１より指定される歌詞情報と音高情報を含む歌声データ２１５に対応する歌声音声出力データ３２１を生成する。歌声音声出力データ３２１は、図２のＤ／Ａコンバータ２１２からミキサ２１３及びアンプ２１４を介して出力され、特には図示しないスピーカから放音される。 The vocal model unit 303 receives the spectrum information 309 and the sound source information 310 to generate singing voice output data 321 corresponding to the singing voice data 215 including the lyric information and pitch information specified by the CPU 201. The singing voice output data 321 is output from the D/A converter 212 in FIG. 2 via the mixer 213 and the amplifier 214, and is output from a speaker (not shown).

音響モデル部３０１が出力する音響特徴量は、人間の声道をモデル化したスペクトル情報３０９と、人間の声帯をモデル化した音源情報３１０とを含む。スペクトル情報３０９のパラメータとしては例えば、人間の声道特性である複数のフォルマント周波数を効率的にモデル化することができる線スペクトル対（ＬｉｎｅＳｐｅｃｔｒａｌＰａｉｒｓ：ＬＳＰ）又は線スペクトル周波数（ＬｉｎｅＳｐｅｃｔｒａｌＦｒｅｑｕｅｎｃｉｅｓ：ＬＳＦ）等を採用できる。音源情報３１０としては、人間の音声のピッチ周波数を示す基本周波数（Ｆ０）及びパワー値を採用できる。発声モデル部３０３は、音源生成部３０４と合成フィルタ部３０５とを含む。音源生成部３０４は、人間の声帯をモデル化した部分であり、音響モデル部３０１から入力する音源情報３１０の系列を順次入力することにより、例えば、音源情報３１０に含まれる基本周波数（Ｆ０）及びパワー値で周期的に繰り返されるパルス列（有声音音素の場合）、又は音源情報３１０に含まれるパワー値を有するホワイトノイズ（無声音音素の場合）、或いはそれらが混合された信号からなる音源信号を生成する。合成フィルタ部３０５は、人間の声道をモデル化した部分であり、音響モデル部３０１から順次入力するスペクトル情報３０９の系列に基づいて声道をモデル化するデジタルフィルタを形成する。音源生成部３０４から入力する音源信号を励振源信号として上記デジタルフィルタが励振されることにより、歌声音声出力データ３２１のもととなるデジタル信号のフィルタ出力データ３１３が出力される。 The acoustic features output by the acoustic model unit 301 include spectral information 309 that models the human vocal tract and sound source information 310 that models the human vocal cords. For example, the parameters of the spectral information 309 can be line spectral pairs (LSP) or line spectral frequencies (LSF), which can efficiently model multiple formant frequencies that are human vocal tract characteristics. For the sound source information 310, a fundamental frequency (F0) and a power value that indicate the pitch frequency of human voice can be used. The vocalization model unit 303 includes a sound source generation unit 304 and a synthesis filter unit 305. The sound source generating unit 304 is a part that models the human vocal cords, and by sequentially inputting a series of sound source information 310 input from the acoustic model unit 301, generates a sound source signal consisting of, for example, a pulse train (in the case of a voiced phoneme) that is periodically repeated at the fundamental frequency (F0) and power value included in the sound source information 310, or a white noise (in the case of an unvoiced phoneme) having a power value included in the sound source information 310, or a signal that is a mixture of these. The synthesis filter unit 305 is a part that models the human vocal tract, and forms a digital filter that models the vocal tract based on the series of spectrum information 309 that is sequentially input from the acoustic model unit 301. The sound source signal input from the sound source generating unit 304 is used as an excitation source signal to excite the digital filter, and filter output data 313 of a digital signal that is the basis of singing voice output data 321 is output.

歌声音声出力データ２１７に対するサンプリング周波数は、例えば１６ＫＨｚ（キロヘルツ）である。また、スペクトル情報３０９のパラメータとして、例えばＬＳＰ分析処理により得られるＬＳＦパラメータが採用される場合、その更新フレーム周期は、例えば５ミリ秒である。更に、ＬＳＦ分析処理の場合、分析窓長は例えば２５ミリ秒、窓関数は例えばブラックマン窓、分析次数は例えば１０次である。 The sampling frequency for the singing voice output data 217 is, for example, 16 KHz (kilohertz). Furthermore, when LSF parameters obtained by, for example, LSP analysis processing are used as the parameters of the spectrum information 309, the update frame period is, for example, 5 milliseconds. Furthermore, in the case of LSF analysis processing, the analysis window length is, for example, 25 milliseconds, the window function is, for example, a Blackman window, and the analysis order is, for example, 10th order.

図２及び図３の構成のもとでの、本実施形態の動作概略について説明する。まず、ＣＰＵ２０１は、歌詞情報と音高情報とタイミング情報を少なくとも含む演奏ガイドデータに基づき、演奏者に楽曲の演奏をガイドする演奏ガイド手段として動作する。具体的には、図２において、ＣＰＵ２０１は、メモリであるＲＯＭ２０２に記憶されている自動再生のための歌詞情報と音高情報とタイミング情報とを少なくとも含む一連の演奏ガイドデータの組を順次読み出しながら、その演奏ガイドデータの組に含まれるタイミング情報に対応するタイミングでその演奏ガイドデータの組に含まれる歌詞情報と音高情報とを自動再生する、自動再生処理を実行する。この自動再生処理の詳細については、図８のフローチャートを用いて後述する。 The operation of this embodiment will be described in outline with the configuration of Figs. 2 and 3. First, the CPU 201 operates as a performance guide means for guiding the performer in playing a piece of music based on performance guide data including at least lyric information, pitch information, and timing information. Specifically, in Fig. 2, the CPU 201 executes an automatic playback process in which it sequentially reads out a series of sets of performance guide data including at least lyric information, pitch information, and timing information for automatic playback stored in the ROM 202, which is a memory, and automatically plays back the lyric information and pitch information included in the set of performance guide data at a timing corresponding to the timing information included in the set of performance guide data. Details of this automatic playback process will be described later with reference to the flowchart of Fig. 8.

このとき、ＣＰＵ２０１は、自動再生される音高情報に対応する鍵盤１０１上の鍵を指示することで、自動再生に同期してユーザが押鍵操作して演奏レッスン（演奏練習）することをガイドする、演奏ガイド処理を実行する。より具体的には、ＣＰＵ２０１は、この演奏ガイド処理において、自動再生のタイミングに同期させて、例えば図１の２つのＬＥＤ１０４が光っている鍵として示されるように、次に自動再生される音高情報に対応する鍵（操作子）が備えるＬＥＤ１０４を強い輝度例えば最大輝度で光らせると共に、次の次に自動再生される音高情報に対応する鍵が備えるＬＥＤ１０４を弱い輝度例えば最大輝度の半分の輝度で光らせる。この演奏ガイド処理の詳細については、図９のフローチャートを用いて後述する。 At this time, the CPU 201 executes a performance guide process that indicates the key on the keyboard 101 that corresponds to the pitch information to be automatically played back, and guides the user to perform a performance lesson (performance practice) by pressing the key in synchronization with the automatic playback. More specifically, in this performance guide process, the CPU 201 synchronizes with the timing of the automatic playback, and causes the LED 104 of the key (operator) corresponding to the pitch information to be automatically played back next to be lit with high brightness, for example, maximum brightness, and causes the LED 104 of the key corresponding to the pitch information to be automatically played back next to be lit with low brightness, for example, half the maximum brightness, as shown by the key with two lit LEDs 104 in FIG. 1. The details of this performance guide process will be described later using the flowchart in FIG. 9.

次に、ＣＰＵ２０１は、演奏者の演奏情報を取得する演奏情報取得手段として動作する。具体的には、ＣＰＵ２０１は、上記演奏ガイドに従って、演奏者が図１の鍵盤１０１上の鍵を押鍵又は離鍵する演奏操作を取得する。 Next, the CPU 201 operates as a performance information acquisition means for acquiring performance information of the performer. Specifically, the CPU 201 acquires the performance operations of the performer pressing or releasing the keys on the keyboard 101 in FIG. 1 according to the performance guide.

更に、ＣＰＵ２０１は、演奏ガイドデータと演奏情報から、楽曲の進行中に演奏者の演奏を随時評価する演奏評価手段として動作する。具体的には、ＣＰＵ２０１は、演奏レッスンにおける鍵の押鍵タイミング（操作タイミング）及び押鍵音高（操作音高）を自動再生されるタイミング情報及び音高情報と比較して、演奏レッスンを採点する採点処理を実行する。この採点処理の詳細については、図１１のフローチャートを用いて後述する。 The CPU 201 also operates as a performance evaluation means for evaluating the performer's performance from time to time while the piece of music is being played, based on the performance guide data and performance information. Specifically, the CPU 201 executes a scoring process for scoring the performance lesson by comparing the key press timing (operation timing) and key press pitch (operation pitch) in the performance lesson with the automatically reproduced timing information and pitch information. Details of this scoring process will be described later using the flowchart in FIG. 11.

そして、ＣＰＵ２０１は、演奏情報と歌詞情報に基づき、歌詞を歌声で発声する歌声発声手段として動作する。具体的には、ＣＰＵ２０１は、演奏レッスンにおける鍵盤１０１上の鍵の押鍵タイミング（操作タイミング）及び鍵の押鍵音高（操作音高）が自動再生されるタイミング情報及び音高情報に正しく対応している場合に、その押鍵タイミングで、自動再生される歌詞情報と音高情報とを歌声データ２１５として図３のテキスト解析部３０２を介して音響モデル部３０１に入力し、音響モデル部３０１から出力される音源情報３１０が設定される音源生成部３０４が出力する音源信号によって音響モデル部３０１から出力されるスペクトル情報３０９に基づいて形成される合成フィルタ部３０５のデジタルフィルタを励振させることにより、フィルタ出力データ３１３を出力し、そのフィルタ出力データ３１３を図２の歌声音声出力データ２１７として出力させる。 Then, the CPU 201 operates as a singing voice producing means for vocalizing lyrics based on the performance information and lyric information. Specifically, when the timing (operation timing) of pressing a key on the keyboard 101 in the performance lesson and the key pressing pitch (operation pitch) of the key correctly correspond to the timing information and pitch information for automatic playback, the CPU 201 inputs the lyric information and pitch information to be automatically played back at the key pressing timing as singing voice data 215 to the acoustic model unit 301 via the text analysis unit 302 in FIG. 3, and outputs filter output data 313 by exciting the digital filter of the synthesis filter unit 305 formed based on the spectrum information 309 output from the acoustic model unit 301 by the sound source signal output by the sound source generation unit 304 in which the sound source information 310 output from the acoustic model unit 301 is set, and the filter output data 313 is output as the singing voice output data 217 in FIG. 2.

このとき、ＣＰＵ２０１は、演奏評価に対応して歌声の声質を変更する声質変更手段として動作する。具体的には、図２のＣＰＵ２０１及び図３の音声合成部３００内のフォルマント補間処理部３０６とノイズ重畳部３０７は、発声処理において出力される歌声音声出力データ２１７の声質を、前述した採点処理での採点途中結果に応じて変化させる声質変更処理を実行する。 At this time, the CPU 201 operates as a voice quality change means for changing the voice quality of the singing voice in response to the performance evaluation. Specifically, the CPU 201 in FIG. 2 and the formant interpolation processing unit 306 and the noise superimposition unit 307 in the voice synthesis unit 300 in FIG. 3 execute a voice quality change process for changing the voice quality of the singing voice output data 217 output in the vocalization process in response to the intermediate scoring results in the scoring process described above.

このとき、声質変更手段として動作するＣＰＵ２０１は、複数の特定の演奏評価に対応する複数の声質の間を、楽曲進行中の演奏評価に応じた割合で、補間する。また、ＣＰＵ２０１は、例えば人声のフォルマント成分と人声に混合するノイズ成分の割合を変更することにより、声質を変更させる。 At this time, the CPU 201, operating as a voice quality change means, interpolates between a plurality of voice qualities corresponding to a plurality of specific performance evaluations at a ratio according to the performance evaluation during the progression of the music piece. The CPU 201 also changes the voice quality, for example, by changing the ratio of the formant components of the human voice and the noise components mixed with the human voice.

より具体的には、上述の声質変更処理において、まずＣＰＵ２０１が、前述した採点処理での採点途中結果に応じて、演奏レッスンの進行の度合いを示す練習進行度データ３１１を算出する。そして、図３のフォルマント補間処理部３０６が、採点途中結果が良い場合に対応する女性の声を含む心地よい声質を有し自動再生される図３の歌声データ２１５中の歌詞情報に対応して音響モデル部３０１から出力される１組以上のスペクトル情報３０９と、採点途中結果が悪い場合に対応する男性の声を含む耳障りな声質を有し自動再生される歌声データ２１５中の歌詞情報に対応して音響モデル部３０１から出力される１組以上のスペクトル情報３０９との間で、ＣＰＵ２０１から与えられる練習進行度データ３１１に応じた割合で補間処理を実行することにより、目標スペクトル情報３１２を算出し、発声モデル部３０３内の合成フィルタ部３０５に入力させる。 More specifically, in the above-mentioned voice quality change process, the CPU 201 first calculates practice progress data 311 indicating the degree of progress in the performance lesson according to the intermediate scoring results in the above-mentioned scoring process. Then, the formant interpolation processing unit 306 in FIG. 3 executes an interpolation process at a ratio according to the practice progress data 311 given by the CPU 201 between one or more sets of spectrum information 309 output from the acoustic model unit 301 corresponding to the lyric information in the singing voice data 215 in FIG. 3 that has a pleasant voice quality including a female voice and is automatically played back, which corresponds to a case where the intermediate scoring result is good, and one or more sets of spectrum information 309 output from the acoustic model unit 301 corresponding to the lyric information in the singing voice data 215 that has a harsh voice quality including a male voice and is automatically played back, which corresponds to a case where the intermediate scoring result is bad, thereby calculating the target spectrum information 312 and inputting it to the synthesis filter unit 305 in the vocalization model unit 303.

上述の声質変更処理において、上述のフォルマント補間処理部３０６での動作に加えて、図３のノイズ混合比補間処理部３１６が、採点途中結果が良い場合に対応する女性の声を含む心地よい声質を有し自動再生される図３の歌声データ２１５中の歌詞情報に対応して音響モデル部３０１から出力される１組以上のノイズ混合比と、採点途中結果が悪い場合に対応する男性の声を含む耳障りな声質を有し自動再生される歌声データ２１５中の歌詞情報に対応して音響モデル部３０１から出力される１組以上のノイズ混合比との間で、ＣＰＵ２０１から与えられる前述した練習進行度データ３１１に応じた割合で補間処理を実行することにより、目標ノイズ混合比３１７を算出してノイズ重畳部３０７に入力させる。そして、ノイズ重畳部３０７は、歌声音声出力データ２１７の最大振幅値にノイズ混合比補間処理部３１６が算出した目標ノイズ混合比３１７を乗じた振幅値を有するノイズデータ３１５を生成し、合成フィルタ部３０５から出力されるフィルタ出力データ３１３と混合して、歌声音声出力データ２１７として出力する。 In the above-mentioned voice quality change process, in addition to the operation of the formant interpolation processing unit 306, the noise mixing ratio interpolation processing unit 316 of Figure 3 performs an interpolation process in a ratio corresponding to the above-mentioned practice progress data 311 provided by the CPU 201 between one or more sets of noise mixing ratios output from the acoustic model unit 301 corresponding to the lyric information in the singing voice data 215 of Figure 3, which has a pleasant voice quality including a female voice and is automatically played back corresponding to a case where the intermediate scoring result is good, and one or more sets of noise mixing ratios output from the acoustic model unit 301 corresponding to the lyric information in the singing voice data 215, which has a harsh voice quality including a male voice and is automatically played back corresponding to a case where the intermediate scoring result is poor, to calculate a target noise mixing ratio 317 and input it to the noise superimposition unit 307. Then, the noise superimposing unit 307 generates noise data 315 having an amplitude value obtained by multiplying the maximum amplitude value of the singing voice output data 217 by the target noise mixing ratio 317 calculated by the noise mixing ratio interpolation processing unit 316, mixes it with the filter output data 313 output from the synthesis filter unit 305, and outputs it as the singing voice output data 217.

上述の２つの声質変更処理により、例えば図４（ａ）に示されるように、ユーザが演奏レッスンにおいて電子鍵盤楽器１００に歌唱を行わせる場合に、歌唱時の声質をあるキャラクタ（例えば男性大人）から別のキャラクタ（例えば女性大人）に徐々に変えていく機能（以降「モーフィング機能」と呼ぶ）が実現される。そして、本実施例では、図４（ａ）に示されるように、演奏レッスン開始時の歌唱機能の声質を例えば男性大人に設定し、演奏レッスン開始後課題をクリアし採点途中結果が上がっていくごとに、女性大人の声質に徐々に変化し、逆に点数が下がっていった場合は少し耳障りないわゆるダミ声と言われている声質に変化していく。 The above-mentioned two voice quality changing processes realize a function (hereinafter referred to as a "morphing function") that gradually changes the singing voice quality from one character (e.g., adult male) to another character (e.g., adult female) when the user makes the electronic keyboard instrument 100 sing during a performance lesson, as shown in FIG. 4(a), for example. In this embodiment, as shown in FIG. 4(a), the voice quality of the singing function at the start of the performance lesson is set to, for example, an adult male, and after the performance lesson begins, as the tasks are cleared and the intermediate scoring results improve, the voice quality gradually changes to an adult female voice quality, and conversely, if the score decreases, the voice quality changes to a slightly harsh voice, known as a hoarse voice.

上述したように、本実施形態における音声合成部３００は、人の声帯の振動に相当する励振源を人の声道の特性に相当するフィルタを通過させることで音声を発声する。図４（ｂ）に示されるように、声道特性に相当するフィルタの特性はいわゆる人声のフォルマントに該当し、人の声のキャラクタはこの特性に大きく依存する。そこで、本実施形態では、図３のフォルマント補間処理部３０６において、採点途中結果に基づいてＣＰＵ２０１から出力される練習進行度データ３１１に基づいて音響モデル部３０１が出力する複数の特性のスペクトル情報３０９を補間して得られる目標スペクトル情報３１２によって、合成フィルタ部３０５における特性を徐々に変えていくことにより、或る人物の声のキャラクタを別の人の声のキャラクタに滑らかに変化させることができる。 As described above, the voice synthesis unit 300 in this embodiment produces a voice by passing an excitation source corresponding to the vibration of the human vocal cords through a filter corresponding to the characteristics of the human vocal tract. As shown in FIG. 4(b), the characteristics of the filter corresponding to the vocal tract characteristics correspond to the so-called formants of the human voice, and the character of the human voice depends heavily on these characteristics. Therefore, in this embodiment, in the formant interpolation processing unit 306 in FIG. 3, the characteristics in the synthesis filter unit 305 are gradually changed by the target spectrum information 312 obtained by interpolating the spectrum information 309 of multiple characteristics output by the acoustic model unit 301 based on the practice progress data 311 output by the CPU 201 based on the intermediate scoring results, thereby smoothly changing the character of one person's voice to the character of another person's voice.

また、合成フィルタ部３０５での特性の他に、白色ノイズ成分を音声に加えることでより本物に近い音声となる。そこで、本実施形態では更に、図３のノイズ混合比補間処理部３１６が採点途中結果に基づいてＣＰＵ２０１から出力される練習進行度データ３１１に基づく補間処理により得られる目標ノイズ混合比３１７を算出し、ノイズ重畳部３０７がその目標ノイズ混合比３１７に基づいて白色ノイズの加算量を増減させて得られるノイズデータ３１５を算出し、そのノイズデータ３１５を合成フィルタ部３０５が出力するフィルタ出力データ３１３に混合して歌声音声出力データを生成する。これにより、いわゆるハスキーボイスの特性などの表現豊かな特性を有する採点途中結果が反映された歌唱を行わせることが可能となる。 In addition to the characteristics of the synthesis filter unit 305, a white noise component is added to the voice to make it more realistic. Therefore, in this embodiment, the noise mixing ratio interpolation processing unit 316 in FIG. 3 calculates a target noise mixing ratio 317 obtained by interpolation processing based on the practice progress data 311 output from the CPU 201 based on the intermediate scoring results, and the noise superimposition unit 307 calculates noise data 315 obtained by increasing or decreasing the amount of white noise added based on the target noise mixing ratio 317, and mixes the noise data 315 with the filter output data 313 output by the synthesis filter unit 305 to generate singing voice output data. This makes it possible to sing in a way that reflects the intermediate scoring results, which have expressive characteristics such as the characteristics of a so-called husky voice.

図１、図２、及び図３の構成を有する本実施形態の電子鍵盤楽器１００の動作について、以下に詳細に説明する。 The operation of the electronic keyboard instrument 100 of this embodiment having the configuration shown in Figures 1, 2, and 3 will be described in detail below.

図５（ａ）は、本実施形態において、図２のＲＯＭ２０２からＲＡＭ２０３に読み込まれる演奏ガイドデータのデータ構成例を示す図である。このデータ構成例は、ＭＩＤＩ用ファイルフォーマットの一つであるスタンダードＭＩＤＩファイルのフォーマットに準拠している。この曲データは、チャンクと呼ばれるデータブロックから構成される。具体的には、曲データは、ファイルの先頭にあるヘッダチャンクと、それに続く歌詞パート用の歌詞データが格納されるトラックチャンクとから構成される。なお、伴奏パート用の自動演奏データが格納されるトラックチャンクを別に備えてもよい。 Figure 5 (a) is a diagram showing an example of the data structure of performance guide data loaded from ROM 202 in Figure 2 to RAM 203 in this embodiment. This data structure example conforms to the standard MIDI file format, which is one of the MIDI file formats. This song data is made up of data blocks called chunks. Specifically, the song data is made up of a header chunk at the beginning of the file, followed by track chunks in which lyric data for the lyric parts is stored. Note that there may also be a separate track chunk in which automatic performance data for the accompaniment parts is stored.

ヘッダチャンクは、ＣｈｕｎｋＩＤ、ＣｈｕｎｋＳｉｚｅ、ＦｏｒｍａｔＴｙｐｅ、ＮｕｍｂｅｒＯｆＴｒａｃｋ、及びＴｉｍｅＤｉｖｉｓｉｏｎの４つの値からなる。ＣｈｕｎｋＩＤは、ヘッダチャンクであることを示す"MThd"という半角４文字に対応する４バイトのアスキーコード「4D 54 68 64」（数字は１６進数）である。ＣｈｕｎｋＳｉｚｅは、ヘッダチャンクにおいて、ＣｈｕｎｋＩＤとＣｈｕｎｋＳｉｚｅを除く、ＦｏｒｍａｔＴｙｐｅ、ＮｕｍｂｅｒＯｆＴｒａｃｋ、及びＴｉｍｅＤｉｖｉｓｉｏｎの部分のデータ長を示す４バイトデータであり、データ長は６バイト：「00 00 00 06」（数字は１６進数）に固定されている。ＦｏｒｍａｔＴｙｐｅは、本実施形態の場合、単一トラックを使用するフォーマット０を意味する２バイトのデータ「00 00」（数字は１６進数）である。ＮｕｍｂｅｒＯｆＴｒａｃｋは、本実施形態の場合、歌詞パートに対応する１トラックを使用することを示す２バイトのデータ「00 01」（数字は１６進数）である。ＴｉｍｅＤｉｖｉｓｉｏｎは、４分音符あたりの分解能を示すタイムベース値を示すデータであり、本実施形態の場合、１０進法で４８０を示す２バイトのデータ「01 E0」（数字は１６進数）である。 A header chunk consists of four values: ChunkID, ChunkSize, FormatType, NumberOfTrack, and TimeDivision. ChunkID is the 4-byte ASCII code "4D 54 68 64" (numbers are hexadecimal), which corresponds to the four half-width characters "MThd" that indicate that it is a header chunk. ChunkSize is 4-byte data that indicates the data length of the FormatType, NumberOfTrack, and TimeDivision parts of the header chunk, excluding ChunkID and ChunkSize, and the data length is fixed at 6 bytes: "00 00 00 06" (numbers are hexadecimal). In this embodiment, FormatType is 2-byte data "00 00" (numbers are hexadecimal), which means format 0 uses a single track. NumberOfTrack is 2-byte data "00 01" (numbers are hexadecimal), which means that one track corresponding to the lyrics part is used. TimeDivision is data indicating the time base value that indicates the resolution per quarter note, and in this embodiment, it is 2-byte data "01 E0" (numbers are hexadecimal), which indicates 480 in decimal.

トラックチャンクは、ＣｈｕｎｋＩＤ、ＣｈｕｎｋＳｉｚｅと、ＤｅｌｔａＴｉｍｅ［ｉ］及びＥｖｅｎｔ［ｉ］からなる演奏データ組（０≦ｉ≦Ｌ－１）とからなる。ＣｈｕｎｋＩＤは、トラックチャンクであることを示す"MTrk"という半角４文字に対応する４バイトのアスキーコード「4D 54 72 6B」（数字は１６進数）である。ＣｈｕｎｋＳｉｚｅは、トラックチャンクにおいて、ＣｈｕｎｋＩＤとＣｈｕｎｋＳｉｚｅを除く部分のデータ長を示す４バイトデータである。 A track chunk consists of a performance data set (0≦i≦L-1) consisting of ChunkID, ChunkSize, DeltaTime[i] and Event[i]. ChunkID is a 4-byte ASCII code "4D 54 72 6B" (numbers are hexadecimal) that corresponds to the four half-width characters "MTrk" that indicate a track chunk. ChunkSize is 4-byte data that indicates the data length of the portion of the track chunk excluding ChunkID and ChunkSize.

ＤｅｌｔａＴｉｍｅ［ｉ］は、その直前のＥｖｅｎｔ［ｉ－１］（ｉ＝０の場合は先頭）の実行時刻からの待ち時間（相対時間）を示すタイミング情報であり、１～４バイトの可変長データである。Ｅｖｅｎｔ［ｉ］は、歌詞のテキストデータである歌詞情報と音高を指示する音高情報を含むメタイベントである。各演奏ガイドデータ組ＤｅｌｔａＴｉｍｅ１［ｉ］及びＥｖｅｎｔ［ｉ］において、その直前のＥｖｅｎｔ［ｉ－１］の実行時刻からＤｅｌｔａＴｉｍｅ［ｉ］だけ待った上でＥｖｅｎｔ［ｉ］が実行されることにより、歌詞の自動再生（発声）の進行が実現される。 DeltaTime[i] is timing information indicating the wait time (relative time) from the execution time of the immediately preceding Event[i-1] (the beginning when i=0), and is variable length data of 1 to 4 bytes. Event[i] is a meta event that contains lyric information, which is the text data of the lyrics, and pitch information that indicates the pitch. In each performance guide data set DeltaTime1[i] and Event[i], Event[i] is executed after waiting DeltaTime[i] from the execution time of the immediately preceding Event[i-1], thereby realizing the progress of automatic playback (vocalization) of lyrics.

図５（ｂ）及び図５（ｃ）のデータ構成については後述する。 The data structure of Figures 5(b) and 5(c) will be described later.

図６は、演奏レッスンの全体処理の例を示すフローチャートである。この処理は、図２において、ＣＰＵ２０１がＲＯＭ２０２に記憶された演奏レッスンの全体処理プログラムをＲＡＭ２０３にロードして実行する処理として実現され、電子鍵盤楽器１００を制御する特には図示しないメイン処理プログラムから呼び出される。 Figure 6 is a flow chart showing an example of the overall processing of a performance lesson. This processing is realized as a process in which the CPU 201 loads the overall processing program of the performance lesson stored in the ROM 202 into the RAM 203 and executes it in Figure 2, and is called from a main processing program (not shown) that controls the electronic keyboard instrument 100.

まず、ＣＰＵ２０１は、ユーザに、図１の第２のスイッチパネル１０３を操作させて、ユーザが演奏レッスンを実施したい歌声曲を選択させる（ステップＳ６０１）。 First, the CPU 201 has the user operate the second switch panel 103 in FIG. 1 to select the vocal song for which the user wishes to practice playing lessons (step S601).

次に、ＣＰＵ２０１は、レッスン処理を実行する（ステップＳ６０２）。この処理の詳細については、後述する。 Next, the CPU 201 executes lesson processing (step S602). Details of this processing will be described later.

レッスン処理によりユーザの演奏レッスンが終了すると、ＣＰＵ２０１は、ユーザ演奏の評価得点をフィードバックする（ステップＳ６０３）。ここでは、ＣＰＵ２０１は、図２の音声合成ＬＳＩ２０５に歌声データ２１５を与えることにより、ユーザが最終的に到達した評価得点に対応した声質にて音声で評価得点を発声する。後述するように、点数は０点から１０点までの１１段階あり、満点の１０点であれば女声で「じゅってん」、０点であればダミ声で「れいてん」と発声される。 When the user's performance lesson is completed by the lesson process, the CPU 201 feeds back the evaluation score of the user's performance (step S603). Here, the CPU 201 provides the singing voice data 215 to the voice synthesis LSI 205 in FIG. 2, and vocalizes the evaluation score in a voice with a voice quality corresponding to the evaluation score finally achieved by the user. As described below, there are 11 scores ranging from 0 to 10, and a perfect score of 10 will result in the pronunciation of "jutten" in a female voice, and a score of 0 will result in the pronunciation of "reiten" in a hoarse voice.

図７は、図６のステップＳ６０２のレッスン処理の詳細例を示すフローチャートである。ＣＰＵ２０１はまず、初期化処理を実行する（ステップＳ７０１）。この処理では、レッスン処理の実行に必要なパラメータの初期化や、最初に押鍵すべき図１の鍵盤１０１上の鍵のＬＥＤ１０４を最大輝度の半分の輝度での点灯状態にする等の処理が実行される。 Figure 7 is a flow chart showing a detailed example of the lesson process in step S602 in Figure 6. First, the CPU 201 executes an initialization process (step S701). In this process, the parameters required for executing the lesson process are initialized, and the LED 104 of the key on the keyboard 101 in Figure 1 that is to be pressed first is turned on at half the maximum brightness.

次に、ＣＰＵ２０１は、図６のステップＳ６０１においてユーザが選択した歌声曲の演奏ガイドデータの再生を開始するための、歌声曲開始処理を実行する（ステップＳ７０２）。この歌声曲開始処理において、ＣＰＵ２０１は、ＴｉｃｋＴｉｍｅの初期化処理を実行する。本実施形態において、歌詞の進行は、ＴｉｃｋＴｉｍｅという時間を単位として進行する。図５の曲データのヘッダチャンク内のＴｉｍｅＤｉｖｉｓｉｏｎ値として指定されるタイムベース値は４分音符の分解能を示しており、この値が例えば４８０ならば、４分音符は４８０ＴｉｃｋＴｉｍｅの時間長を有する。また、図５の曲データのトラックチャンク内の待ち時間ＤｅｌｔａＴｉｍｅ［ｉ］値も、ＴｉｃｋＴｉｍｅの時間単位によりカウントされる。ここで、１ＴｉｃｋＴｉｍｅが実際に何秒になるかは、曲データに対して指定されるテンポによって異なる。今、テンポ値をＴｅｍｐｏ［ビート／分］、上記タイムベース値をＴｉｍｅＤｉｖｉｓｉｏｎとすれば、ＣＰＵ２０１は、下記（１）式に対応する演算処理により、ＴｉｃｋＴｉｍｅ［秒］を算出する。 Next, the CPU 201 executes vocal song start processing to start playback of the performance guide data for the vocal song selected by the user in step S601 of FIG. 6 (step S702). In this vocal song start processing, the CPU 201 executes TickTime initialization processing. In this embodiment, the progression of lyrics progresses in units of time called TickTime. The time base value specified as the TimeDivision value in the header chunk of the song data in FIG. 5 indicates a quarter note resolution, and if this value is, for example, 480, a quarter note has a time length of 480 TickTime. The waiting time DeltaTime[i] value in the track chunk of the song data in FIG. 5 is also counted in units of TickTime. Here, how many seconds 1 TickTime actually is depends on the tempo specified for the song data. Now, assuming that the tempo value is Tempo [beats/minute] and the time base value is TimeDivision, the CPU 201 calculates TickTime [seconds] by performing an arithmetic process corresponding to the following formula (1).

次に、ＣＰＵ２０１は、ステップＳ７０２の歌声曲開始処理において、図２のタイマ２１０に対して、上記算出したＴｉｃｋＴｉｍｅ［秒］によるタイマ割込みを設定する。この結果、タイマ２１０において上記ＴｉｃｋＴｉｍｅ［秒］が経過する毎に、ＣＰＵ２０１に対して歌声曲進行のための割込み（以下「自動再生割込み」と記載）が発生する。従って、この自動再生割込みに基づいてＣＰＵ２０１で実行される自動再生処理（後述する図８）では、１ＴｉｃｋＴｉｍｅ毎に演奏ガイドデータの組を進行させる制御処理が実行されることになる。 Next, in the vocal song start process of step S702, the CPU 201 sets a timer interrupt for the timer 210 in Figure 2 with the calculated TickTime [seconds]. As a result, every time the TickTime [seconds] elapses in the timer 210, an interrupt for vocal song progression (hereinafter referred to as an "automatic playback interrupt") is generated for the CPU 201. Therefore, in the automatic playback process (Figure 8 described later) executed by the CPU 201 based on this automatic playback interrupt, a control process is executed to progress the set of performance guide data every TickTime.

なお、テンポ値Ｔｅｍｐｏは、初期状態では図２のＲＯＭ２０２に所定の値、例えば６０［ビート／秒］が記憶されているとする。楽曲のテンポ値が演奏ガイドデータのイベントとして演奏ガイドデータに含まれている場合には、そのテンポ値を使用してもよい。或いは、不揮発性メモリに、前回終了時のテンポ値が記憶されていてもよい。また、ユーザは、図１の第１のスイッチパネル１０２等を操作して、テンポ値を変更することができ、特には図示しないが、その度に上記ＴｉｃｋＴｉｍｅ［秒］の算出とタイマ２１０へのタイマ割込み設定とが実行される。 In the initial state, the tempo value Tempo is stored in the ROM 202 in FIG. 2 as a predetermined value, for example, 60 beats per second. If the tempo value of the music piece is included in the performance guide data as an event in the performance guide data, that tempo value may be used. Alternatively, the tempo value at the time of the previous end may be stored in the non-volatile memory. The user can also change the tempo value by operating the first switch panel 102 in FIG. 1, and each time this is done, the TickTime [seconds] is calculated and a timer interrupt is set in the timer 210, although this is not shown in the figure.

続いて、ＣＰＵ２０１は、ステップＳ７０２の歌声曲開始処理において、自動再生処理の進行において、ＴｉｃｋＴｉｍｅを単位として、直前のイベントの発生時刻からの相対時間をカウントするためのＲＡＭ２０３上の変数ＤｅｌｔａＴの値を０に初期設定する。次に、ＣＰＵ２０１は、図５に例示される曲データのトラックチャンク内の演奏データ組ＤｅｌｔａＴｉｍｅ［ｉ］及びＥｖｅｎｔ［ｉ］（１≦ｉ≦Ｌ－１）の夫々ｉの値を指定するためのＲＡＭ２０３上の変数ＳｏｎｇＩｎｄｅｘの値を０に初期設定する。これにより、図５の例では、初期状態としてまず、トラックチャンク内の先頭の演奏データ組ＤｅｌｔａＴｉｍｅ［０］とＥｖｅｎｔ［０］が参照される。更に、ＣＰＵ２０１は、歌詞の進行をするか（＝１）しないか（＝０）を示すＲＡＭ２０３上の変数ＳｏｎｇＳｔａｒｔの値を１（進行する）に初期設定する。 Next, in the vocal song start process of step S702, the CPU 201 initializes the value of the variable DeltaT in the RAM 203, which counts the relative time from the occurrence time of the previous event in units of TickTime during the progress of the automatic playback process, to 0. Next, the CPU 201 initializes the value of the variable SongIndex in the RAM 203, which specifies the value of i of each of the performance data sets DeltaTime[i] and Event[i] (1≦i≦L-1) in the track chunk of the song data illustrated in FIG. 5, to 0. As a result, in the example of FIG. 5, the first performance data sets DeltaTime[0] and Event[0] in the track chunk are first referenced as the initial state. Furthermore, the CPU 201 initializes the value of the variable SongStart in the RAM 203, which indicates whether the lyrics will progress (=1) or not (=0), to 1 (progress).

更に、ＣＰＵ２０１は、ステップＳ７０２の歌声曲開始処理において、ＲＡＭ２０３上の変数Ｍａｒｇｉｎの値を、上記ＤｅｌｔａＴｉｍｅ［０］の値の３０％に最も近い整数値とする。Ｍａｒｇｉｎ変数値については後述する。また、ＣＰＵ２０１は、ステップＳ７０２の歌声曲開始処理において、ＲＡＭ２０３上の変数である後述する正解フラグの値と、後述する変数ＮｏｔｅＯｎ＿ｉｎ、ＮｏｔｅＯｎ＿ｏｕｔ、及びＮｏｔｅＯｎを、それぞれ０にリセットする。 Furthermore, in the vocal song start processing of step S702, the CPU 201 sets the value of the variable Margin in the RAM 203 to an integer value that is closest to 30% of the value of the above DeltaTime[0]. The Margin variable value will be described later. In addition, in the vocal song start processing of step S702, the CPU 201 resets the value of the correct answer flag, which is a variable in the RAM 203 and the variables NoteOn_in, NoteOn_out, and NoteOn, which will be described later, to 0.

ステップＳ７０２の処理の後、ＣＰＵ２０１は、ステップＳ７０３からＳ７０９までの一連の処理を繰り返し実行することにより、歌詞の自動再生処理とユーザによる演奏レッスンの処理を進行させる。 After processing in step S702, the CPU 201 repeatedly executes the series of processes from steps S703 to S709 to proceed with automatic lyric playback processing and user performance lesson processing.

図８は、上記図７のステップＳ７０３からＳ７０９の繰返し処理の期間中に、上記ＴｉｃｋＴｉｍｅ［秒］毎にタイマ２１０で発生する自動再生割込みに基づいて実行される自動再生処理の例を示すフローチャートである。この処理は、タイマ２１０から自動再生割込みが発生した場合に、ＣＰＵ２０１が、図７のレッスン処理において実行中の処理（ステップＳ７０３からＳ７０９の何れかの処理）を中断し、ＲＯＭ２０２からＲＡＭ２０３に予めロードされている自動再生処理プログラムを実行する機能として実現される。 Figure 8 is a flow chart showing an example of automatic playback processing executed based on an automatic playback interrupt generated by timer 210 every TickTime [seconds] during the period of repeated processing of steps S703 to S709 in Figure 7 above. This processing is realized as a function in which, when an automatic playback interrupt occurs from timer 210, CPU 201 interrupts the processing being executed in the lesson processing of Figure 7 (any of steps S703 to S709) and executes an automatic playback processing program preloaded from ROM 202 to RAM 203.

まず、ＣＰＵ２０１は、ＲＡＭ２０３の変数ＳｏｎｇＳｔａｒｔ値が１であるか否か、即ち歌詞の自動再生の進行が指示されているか否かを判定する（ステップＳ８０１）。 First, the CPU 201 determines whether the value of the variable SongStart in the RAM 203 is 1, i.e., whether automatic playback of lyrics has been instructed (step S801).

ＣＰＵ２０１は、歌詞の自動再生の進行が指示されていないと判定した（ステップＳ８０１の判定がＮＯである）場合には、ＣＰＵ２０１は、歌詞の進行は行わずに図８のフローチャートで例示される自動再生処理をそのまま終了する。 If the CPU 201 determines that the automatic playback of lyrics has not been instructed (the determination in step S801 is NO), the CPU 201 does not progress the lyrics and ends the automatic playback process illustrated in the flowchart of FIG. 8.

ＣＰＵ２０１は、歌詞の自動再生の進行が指示されていると判定した（ステップＳ８０１の判定がＹＥＳである）場合には、ＲＡＭ２０３にロードされている図５（ａ）のデータ構成を有する演奏ガイドデータのトラックチャンクに関する前回のイベントの発生時刻からの相対時刻を示すＤｅｌｔａＴ値が、ＳｏｎｇＩｎｄｅｘ値が示すこれから実行しようとする演奏ガイドデータ組の待ち時間ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］からＭａｒｇｉｎ値を減算した値に到達したか否かを判定する（ステップＳ８０２）。ステップＳ８０２の判定がＮＯならば、更に、ＤｅｌｔａＴ値が、ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］にＭａｒｇｉｎ値を加算した値に到達したか否かを判定する（ステップＳ８０３）。 If the CPU 201 determines that automatic playback of lyrics has been instructed (YES in step S801), it determines whether the DeltaT value, which indicates the relative time from the occurrence time of the previous event for the track chunk of the performance guide data having the data structure of FIG. 5(a) loaded in the RAM 203, has reached a value obtained by subtracting the Margin value from the waiting time DeltaTime[SongIndex] of the performance guide data set to be executed, which is indicated by the SongIndex value (step S802). If the determination in step S802 is NO, it further determines whether the DeltaT value has reached a value obtained by adding the Margin value to DeltaTime[SongIndex] (step S803).

本実施形態においては、各演奏ガイドデータの組の歌詞が発声されるべきタイミングでユーザが図１の鍵盤１０１上で上記組に設定されている音高情報が示す音高に一致する正しい音高の鍵を押鍵したか否かが判定される。この場合、ユーザの演奏レッスンに余裕を持たせるために、図４（ｃ）に示されるように、ユーザの押鍵を正解とするタイミングは、各演奏ガイドデータの組に設定されているタイミング情報＝ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］にぴったり一致するタイミングの前後に幅を持たせるようにすることができる。この時間幅を押鍵期間と呼ぶこととする。押鍵期間はジャストタイミングの前後１００ミリ秒のように絶対時間とすることもできるが、本実施例では音長の３０％をジャストタイミングの前後の押鍵期間とする。即ち、テンポ６０のときの四分音符であれば音長は１秒であるからジャストタイミングの前後３００ミリ秒の区間、即ち６００ミリ秒が押鍵期間となる。ジャストタイミングの前後を同じ時間だけ押鍵期間とするのではなく、前を２０％、後ろを４０％のように設定することもできる。本実施形態では、ステップＳ８０９でジャストタイミングＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］の３０％の値が計算されてＲＡＭ２０３上の変数Ｍａｒｇｉｎにセットされる。そして、ステップＳ８０２で、前回のイベントの発生時刻からの相対時刻を示す変数値ＤｅｌｔａＴが押鍵期間の開始時刻に対応する“ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］－Ｍａｒｇｉｎ”に到達したか否かが判定され、続くステップＳ８０３で、ＤｅｌｔａＴが押鍵期間の終了時刻に対応する“ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］＋Ｍａｒｇｉｎ”に到達したか否かが判定される。 In this embodiment, it is determined whether the user pressed the correct key pitch on the keyboard 101 in FIG. 1 at the timing when the lyrics of each set of performance guide data should be spoken, which matches the pitch indicated by the pitch information set for the set. In this case, in order to allow the user some leeway in their performance lessons, as shown in FIG. 4(c), the timing at which the user's key press is considered correct can be set to have a width before and after the timing that exactly matches the timing information = DeltaTime [SongIndex] set for each set of performance guide data. This time width is called the key press period. The key press period can be an absolute time, such as 100 milliseconds before and after the just timing, but in this embodiment, 30% of the note length is set as the key press period before and after the just timing. In other words, for a quarter note at a tempo of 60, the note length is 1 second, so the key press period is a section of 300 milliseconds before and after the just timing, that is, 600 milliseconds. Instead of setting the key-press period for the same amount of time before and after Just Timing, it is also possible to set it to 20% before and 40% after. In this embodiment, in step S809, a value of 30% of Just Timing DeltaTime[SongIndex] is calculated and set to the variable Margin in RAM 203. Then, in step S802, it is determined whether the variable value DeltaT indicating the relative time from the occurrence time of the previous event has reached "DeltaTime[SongIndex]-Margin" corresponding to the start time of the key-press period, and in the following step S803, it is determined whether DeltaT has reached "DeltaTime[SongIndex]+Margin" corresponding to the end time of the key-press period.

ステップＳ８０２及びＳ８０３の何れの判定もＮＯの場合、ＣＰＵ２０１は、前回のイベントの発生時刻からの相対時刻を示すＲＡＭ２０３上の変数ＤｅｌｔａＴの値を＋１インクリメントさせて、今回の割込みに対応する１ＴｉｃｋＴｉｍｅ単位分だけ時刻を進行させる（ステップＳ８０４）。その後、ＣＰＵ２０１は、図８のフローチャートで示される自動再生処理を終了し、図７のレッスン処理の中断していた処理の実行に戻る。 If the determinations in both steps S802 and S803 are NO, the CPU 201 increments the value of the variable DeltaT in the RAM 203, which indicates the relative time from the occurrence time of the previous event, by +1, and advances the time by 1 TickTime unit corresponding to the current interrupt (step S804). After that, the CPU 201 ends the automatic playback process shown in the flowchart of FIG. 8, and returns to the execution of the interrupted lesson process in FIG. 7.

ステップＳ８０２において、前回のイベントの発生時刻からの相対時刻を示す変数値ＤｅｌｔａＴが押鍵期間の開始時刻に対応する“ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］－Ｍａｒｇｉｎ”に到達したと判定された場合、ＣＰＵ２０１は、押鍵期間に突入したことを示すＲＡＭ２０３上の変数ＮｏｔｅＯｎ＿ｉｎの値を１にセットし、更に、押鍵期間であることを示すＲＡＭ２０３上の変数ＮｏｔｅＯｎの値を１にセットする（ステップＳ８０５）。続いて、ＣＰＵ２０１は、ステップＳ８０４の処理に進んで、前回のイベントの発生時刻からの相対時刻を示すＲＡＭ２０３上の変数ＤｅｌｔａＴの値を＋１インクリメントさせて、今回の割込みに対応する１ＴｉｃｋＴｉｍｅ単位分だけ時刻を進行させ、その後、図８のフローチャートで示される自動再生処理を終了し、図７のレッスン処理において中断していた処理の実行に戻る。 If it is determined in step S802 that the variable value DeltaT, which indicates the relative time from the occurrence time of the previous event, has reached "DeltaTime[SongIndex]-Margin" corresponding to the start time of the key-press period, the CPU 201 sets the value of the variable NoteOn_in in the RAM 203, which indicates that the key-press period has begun, to 1, and further sets the value of the variable NoteOn in the RAM 203, which indicates that the key-press period is in progress, to 1 (step S805). Next, the CPU 201 proceeds to the process of step S804, increments the value of the variable DeltaT in the RAM 203, which indicates the relative time from the occurrence time of the previous event, by +1, and advances the time by 1 TickTime unit corresponding to the current interrupt, and then ends the automatic playback process shown in the flowchart of FIG. 8, and returns to the execution of the process that was interrupted in the lesson process of FIG. 7.

ステップＳ８０２の判定がＮＯとなった後、ステップＳ８０３において、前回のイベントの発生時刻からの相対時刻を示す変数値ＤｅｌｔａＴが押鍵期間の終了時刻に対応する“ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］＋Ｍａｒｇｉｎ”に到達したと判定された場合、ＣＰＵ２０１は、押鍵期間からちょうど出るところであることを示すＲＡＭ２０３上の変数ＮｏｔｅＯｎ＿ｏｕｔの値を１にセットし、更に、押鍵期間であることを示すＲＡＭ２０３上の変数ＮｏｔｅＯｎの値を押鍵期間でなくなったことを示す値０にセットする（ステップＳ８０６）。 After the determination in step S802 is NO, if it is determined in step S803 that the variable value DeltaT indicating the relative time from the occurrence time of the previous event has reached "DeltaTime[SongIndex]+Margin" corresponding to the end time of the key press period, the CPU 201 sets the value of the variable NoteOn_out in RAM 203, indicating that the key press period is just about to end, to 1, and further sets the value of the variable NoteOn in RAM 203, indicating that the key press period is in progress, to the value 0 indicating that the key press period is no longer in progress (step S806).

次に、ＣＰＵ５０１は、変数ＤｅｌｔａＴの値がＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］から既に進んでいる分に１をプラスした時点「ＤｅｌｔａＴ－ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］＋１」を新たな変数ＤｅｌｔａＴの値とする（ステップＳ８０７）。 Next, the CPU 501 sets the new value of the variable DeltaT to "DeltaT - DeltaTime[SongIndex] + 1", which is the point at which the value of the variable DeltaT has already advanced from DeltaTime[SongIndex] (step S807).

更に、ＣＰＵ２０１は、到達判定に用いる演奏ガイドデータの組を１つ進めるために、ＲＡＭ２０３上の変数ＳｏｎｇＩｎｄｅｘの値を＋１インクリメントする（ステップＳ８０８）。 Furthermore, the CPU 201 increments the value of the variable SongIndex in the RAM 203 by +1 to advance the set of performance guide data used in the arrival determination by one (step S808).

最後に、ＣＰＵ２０１は、次の演奏ガイドデータの組への到達判定に用いる図４（ｃ）のＭａｒｇｉｎ値を、新たに更新されたＳｏｎｇＩｎｄｅｘ値によって参照される新たなＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］値に０．３（３０％）を乗じた値「ＩＮＴ（ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］×０．３）」（「ＩＮＴ（）」は括弧内の値の整数値を算出する演算を示す）に設定する（ステップＳ８０９）。その後、ＣＰＵ２０１は、図８のフローチャートで示される自動再生処理を終了し、図７のレッスン処理の中断していた処理の実行に戻る。 Finally, the CPU 201 sets the Margin value in FIG. 4(c) used to determine whether the next set of performance guide data has been reached to "INT(DeltaTime[SongIndex]×0.3)" (where "INT()" indicates an operation to calculate the integer value of the value in parentheses) obtained by multiplying the new DeltaTime[SongIndex] value referenced by the newly updated SongIndex value by 0.3 (30%) (step S809). After that, the CPU 201 ends the automatic playback process shown in the flowchart in FIG. 8, and returns to the execution of the interrupted lesson process in FIG. 7.

図７のフローチャートの説明に戻り、ステップＳ７０３からＳ７０９の繰り返し処理において、ＣＰＵ２０１はまず、演奏ガイド処理を実行する（ステップＳ７０３）。ＣＰＵ２０１は、この演奏ガイド処理において、自動再生のタイミングに同期させて、例えば図１の２つのＬＥＤ１０４が光っている鍵として示されるように、次に自動再生される音高情報に対応する鍵（操作子）が備えるＬＥＤ１０４を強い輝度例えば最大輝度で光らせると共に、次の次に自動再生される音高情報に対応する鍵が備えるＬＥＤ１０４を弱い輝度例えば最大輝度の半分の輝度で光らせる。この演奏ガイド処理の詳細については、図９のフローチャートを用いて後述する。 Returning to the explanation of the flowchart in FIG. 7, in the repeated processing of steps S703 to S709, the CPU 201 first executes a performance guide process (step S703). In this performance guide process, the CPU 201 synchronizes with the timing of automatic playback, and causes the LED 104 of the key (operator) corresponding to the pitch information to be automatically played next to be lit with high brightness, for example, maximum brightness, as shown by the key with two lit LEDs 104 in FIG. 1, and causes the LED 104 of the key corresponding to the pitch information to be automatically played after that to be lit with low brightness, for example, half the maximum brightness. The details of this performance guide process will be described later using the flowchart in FIG. 9.

次に、ＣＰＵ２０１は、押鍵・離鍵処理を実行する（ステップＳ７０４）。この押鍵・離鍵処理において、ＣＰＵ２０１は、ユーザにより図１の鍵盤１０１上の何れかの鍵が新たに押鍵された場合において、演奏レッスンにおける鍵盤１０１上の鍵（操作子）の押鍵タイミング（操作タイミング）が自動再生されるタイミング情報に正しく対応しており（図４（ｃ）の押鍵期間に入っており）、かつ鍵の押鍵音高（操作音高）が演奏ガイドデータの組の音高情報に正しく対応している（一致している）と判定した場合には、その押鍵タイミングで図２の音声合成ＬＳＩ２０５から歌声音声出力データ２１７を出力させるための発声イベントを生成する。 Next, the CPU 201 executes key pressing/release processing (step S704). In this key pressing/release processing, when the user presses a new key on the keyboard 101 in FIG. 1, if the CPU 201 determines that the key pressing timing (operation timing) of the key (operator) on the keyboard 101 in the performance lesson correctly corresponds to the timing information to be automatically played (entering the key pressing period in FIG. 4(c)), and that the key pressing pitch (operation pitch) correctly corresponds to (matches) the pitch information of the set of performance guide data, it generates a vocalization event for outputting the singing voice output data 217 from the voice synthesis LSI 205 in FIG. 2 at that key pressing timing.

また、ステップＳ７０４の押鍵・離鍵処理において、ＣＰＵ２０１は、ユーザにより図１の鍵盤１０１上の何れかの鍵が新たに押鍵された場合において、演奏レッスンにおける押鍵タイミングが自動再生されるタイミング情報に正しく対応している（図４（ｃ）の押鍵期間に入っている）が、鍵の押鍵音高（操作音高）が演奏ガイドデータの組の音高情報に正しく対応していない（一致していない）と判定した場合には、その押鍵タイミングで図２の音源ＬＳＩ２０４から所定の（例えばユーザが図１の第２のスイッチパネル１０３上で予め選択している楽器音と演奏ガイドデータの音高による）楽音出力データ２１４を出力させるための発音イベントを生成する。 In addition, in the key pressing/release processing of step S704, when the user presses a new key on the keyboard 101 in FIG. 1, if the CPU 201 determines that the key pressing timing in the performance lesson correctly corresponds to the timing information to be automatically played back (entering the key pressing period in FIG. 4(c)), but the key pressing pitch (operation pitch) does not correctly correspond (does not match) to the pitch information of the performance guide data set, it generates a sound generation event to output a predetermined musical sound output data 214 (for example, based on the instrument sound and the pitch of the performance guide data previously selected by the user on the second switch panel 103 in FIG. 1) from the sound source LSI 204 in FIG. 2 at the key pressing timing.

更に、ステップＳ７０４の押鍵・離鍵処理において、ＣＰＵ２０１は、ユーザにより図１の鍵盤１０１上の何れかの鍵が新たに押鍵された場合において、演奏レッスンにおける押鍵タイミングが自動再生されるタイミング情報に正しく対応していない（図４（ｃ）の押鍵期間に入っていない）と判定した場合には、音声合成ＬＳＩ２０５に歌声音声出力データ２１７を発声させるためのイベント、及び音源ＬＳＩ２０４に楽音出力データ２１４を発音させるためのイベントの何れも生成しない。 Furthermore, in the key pressing/release processing of step S704, if the user presses a new key on the keyboard 101 in FIG. 1 and the CPU 201 determines that the key pressing timing in the performance lesson does not correctly correspond to the timing information to be automatically played back (not within the key pressing period in FIG. 4(c)), it does not generate either an event for causing the voice synthesis LSI 205 to produce the singing voice output data 217 or an event for causing the sound source LSI 204 to produce the musical sound output data 214.

一方、ステップＳ７０４の押鍵・離鍵処理において、ＣＰＵ２０１は、ユーザにより図１の鍵盤１０１上の何れかの鍵が離鍵された場合には、音声合成ＬＳＩ２０５における対応する歌声音声出力データ２１７の発声又は音源ＬＳＩ２０４における対応する楽音出力データ２１４の発音を終了させるための離鍵イベントを生成する。 On the other hand, in the key press/release process of step S704, when the user releases any key on the keyboard 101 of FIG. 1, the CPU 201 generates a key release event to end the production of the corresponding singing voice output data 217 in the voice synthesis LSI 205 or the production of the corresponding musical tone output data 214 in the sound source LSI 204.

以上のステップＳ７０４の押鍵・離鍵処理の詳細は、図１０のフローチャートを用いて後述する。 Details of the key press/release process in step S704 will be described later using the flowchart in Figure 10.

次に、ＣＰＵ２０１は、採点処理を実行する（ステップＳ７０５）。この採点処理において、ＣＰＵ２０１は、演奏レッスンにおける鍵の押鍵タイミング（操作タイミング）及び押鍵音高（操作音高）を自動再生されるタイミング情報及び音高情報と比較して、演奏レッスンを採点する。この採点処理の詳細については、図１１のフローチャートを用いて後述する。 Next, the CPU 201 executes a scoring process (step S705). In this scoring process, the CPU 201 compares the key press timing (operation timing) and key press pitch (operation pitch) in the performance lesson with the automatically reproduced timing information and pitch information to score the performance lesson. Details of this scoring process will be described later using the flowchart in FIG. 11.

次に、ＣＰＵ２０１は、声質更新処理を実行する（ステップＳ７０６）。この声質更新処理において、ＣＰＵ２０１は、ステップＳ７０５の採点処理の採点途中結果に応じて、後述するステップＳ７０８の発声・発音処理において出力される歌声音声出力データ２１７の声質を示す値を設定する処理を実行する。 Next, the CPU 201 executes a voice quality update process (step S706). In this voice quality update process, the CPU 201 executes a process of setting a value indicating the voice quality of the singing voice output data 217 output in the vocalization/pronunciation process in step S708, which will be described later, according to the intermediate scoring results of the scoring process in step S705.

続いて、ＣＰＵ２０１は、練習進行度算出処理を実行する（ステップＳ７０７）。この練習進行度算出処理において、図２のＣＰＵ２０１及び図３の音声合成部３００内のフォルマント補間処理部３０６とノイズ重畳部３０７が、後述するステップＳ７０８の発声・発音処理において出力される歌声音声出力データ２１７の声質をステップＳ７０５の採点処理の採点途中結果に応じて変化させるための、練習進行度データ３１１を算出する。前述の声質更新処理及び上述の練習進行度算出処理の詳細については、図１２のフローチャートを用いて後述する。 Next, the CPU 201 executes a practice progress calculation process (step S707). In this practice progress calculation process, the CPU 201 in FIG. 2 and the formant interpolation processing unit 306 and the noise superimposition unit 307 in the voice synthesis unit 300 in FIG. 3 calculate practice progress data 311 for changing the voice quality of the singing voice output data 217 output in the vocalization/pronunciation process in step S708, which will be described later, according to the intermediate scoring results in the scoring process in step S705. Details of the voice quality update process and the practice progress calculation process described above will be described later using the flowchart in FIG. 12.

更に、ＣＰＵ２０１は、発声・発音処理を実行する（ステップＳ７０８）。この発声・発音処理において、ＣＰＵ２０１は、ステップＳ７０４の押鍵・離鍵処理で歌声音声出力データ２１７の発声イベントを生成した場合には、ＲＡＭ２０３上の発行イベント領域に保持されているその発声イベントを図２の音声合成ＬＳＩ２０５に対して発行することにより、音声合成ＬＳＩ２０５から歌声音声出力データ２１７を出力させる。また、この発声・発音処理において、ＣＰＵ２０１は、ステップＳ７０４の押鍵・離鍵処理で楽音出力データ２１４の発音イベントを生成した場合には、ＲＡＭ２０３上の発行イベント領域に保持されているその発音イベントを図２の音源ＬＳＩ２０４に対して発行することにより、音源ＬＳＩ２０４から楽音出力データ２１４を出力させる。更に、この発声・発音処理において、ＣＰＵ２０１は、ステップＳ７０４の押鍵・離鍵処理で離鍵イベントを生成した場合には、ＲＡＭ２０３上の発行イベント領域に保持されているその離鍵イベントを図２の音声合成ＬＳＩ２０５又は音源ＬＳＩ２０４に対して発行することにより、対応する歌声音声出力データ２１７又は楽音出力データ２１４の出力を停止させる。 Furthermore, the CPU 201 executes vocalization/pronunciation processing (step S708). In this vocalization/pronunciation processing, when the CPU 201 generates a vocalization event of the singing voice output data 217 in the key press/release processing of step S704, the CPU 201 issues the vocalization event stored in the issued event area on the RAM 203 to the voice synthesis LSI 205 of Fig. 2, thereby causing the voice synthesis LSI 205 to output the singing voice output data 217. In addition, in this vocalization/pronunciation processing, when the CPU 201 generates a pronunciation event of the musical tone output data 214 in the key press/release processing of step S704, the CPU 201 issues the pronunciation event stored in the issued event area on the RAM 203 to the sound source LSI 204 of Fig. 2, thereby causing the sound source LSI 204 to output the musical tone output data 214. Furthermore, in this vocalization/sound generation process, if the CPU 201 generates a key release event in the key press/key release process of step S704, it issues the key release event stored in the issued event area on the RAM 203 to the voice synthesis LSI 205 or sound source LSI 204 in FIG. 2, thereby stopping the output of the corresponding singing voice output data 217 or musical sound output data 214.

最後に、ＣＰＵ２０１は、ＲＡＭ２０３から読み出されるべき演奏ガイドデータの組がなくなって歌声曲が終了したか否かを判定する（ステップＳ７０９）。ステップＳ７０９の判定がＮＯならば、ステップＳ７０３の処理に戻って、ステップＳ７０３からＳ７０９の一連の処理を繰り返し実行する。ステップＳ７０９の判定がＹＥＳになったら、ＣＰＵ２０１は、図７のフローチャートで示される図６のステップＳ６０２のレッスン処理を終了する。 Finally, the CPU 201 determines whether or not there are no more sets of performance guide data to be read from the RAM 203 and the vocal piece has ended (step S709). If the determination in step S709 is NO, the process returns to step S703 and repeats the series of steps S703 to S709. If the determination in step S709 is YES, the CPU 201 ends the lesson process in step S602 of FIG. 6, which is shown in the flowchart of FIG. 7.

図９は、図７のステップＳ７０３の演奏ガイド処理の詳細例を示すフローチャートである。 Figure 9 is a flowchart showing a detailed example of the performance guide process in step S703 of Figure 7.

ＣＰＵ２０１はまず、ＲＡＭ２０３上の変数ＮｏｔｅＯｎ＿ｉｎの値が１であるか否か、即ち現在の自動再生のタイミングが押鍵期間（図４（ｃ）参照）に突入したか否かを判定する（ステップＳ９０１）。前述したように、この変数ＮｏｔｅＯｎ＿ｉｎの値は、図８の自動再生処理のステップＳ８０２において、前回のイベントの発生時刻からの相対時刻を示す変数値ＤｅｌｔａＴが押鍵期間の開始時刻に対応する“ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］－Ｍａｒｇｉｎ”に到達したと判定された場合に、押鍵期間に突入したことを示す値「１」にセットされる。 First, the CPU 201 determines whether the value of the variable NoteOn_in in the RAM 203 is 1, i.e., whether the current timing of automatic playback has entered the key-press period (see FIG. 4C) (step S901). As described above, the value of this variable NoteOn_in is set to the value "1", indicating that the key-press period has begun, when it is determined in step S802 of the automatic playback process in FIG. 8 that the variable value DeltaT, which indicates the relative time from the occurrence time of the previous event, has reached "DeltaTime[SongIndex]-Margin", which corresponds to the start time of the key-press period.

ステップＳ９０１の判定がＹＥＳになると、ＣＰＵ２０１は、ＲＡＭ２０３上の現在の変数値ＳｏｎｇＩｎｄｅｘ値によって参照されるＲＡＭ２０３上の演奏ガイドデータ組Ｅｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ］に含まれる音高情報に対応する図１の鍵盤１０１上の鍵のＬＥＤ１０４を、図２のＬＥＤコントローラ２０７を介して最大輝度で点灯させる（ステップＳ９０２）。これにより、ユーザは、この最大輝度でＬＥＤ１０４が点灯した鍵を、次に押鍵すべき鍵であると認識することができる。 When the determination in step S901 is YES, the CPU 201 lights up, via the LED controller 207 in Fig. 2, the LED 104 of the key on the keyboard 101 in Fig. 1 that corresponds to the pitch information contained in the performance guide data set Event[SongIndex] in the RAM 203 referenced by the current variable value SongIndex value in the RAM 203, at maximum brightness (step S902). This allows the user to recognize that the key whose LED 104 is lit at maximum brightness is the next key to be pressed.

次に、ＣＰＵ２０１は、ＲＡＭ２０３上の現在の変数値ＳｏｎｇＩｎｄｅｘ値に＋１した値によって参照される演奏ガイドデータ組Ｅｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ＋１］がＲＡＭ２０３上に存在するか否かを判定する（ステップＳ９０３）。 Next, the CPU 201 determines whether the performance guide data set Event[SongIndex+1] referenced by the current variable value SongIndex+1 on the RAM 203 exists on the RAM 203 (step S903).

ステップＳ９０３の判定がＹＥＳならば、ＣＰＵ２０１は、ＲＡＭ２０３上のＲＡＭ２０３上の演奏ガイドデータ組Ｅｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ＋１］に含まれる音高情報に対応する図１の鍵盤１０１上の鍵のＬＥＤ１０４を、図２のＬＥＤコントローラ２０７を介して最大輝度の半分の輝度で点灯させる（ステップＳ９０４）。これにより、ユーザは、最大輝度の半分の輝度でＬＥＤ１０４が点灯した鍵を、次の次に押鍵すべき鍵であると認識することができる。 If the determination in step S903 is YES, the CPU 201 causes the LED 104 of the key on the keyboard 101 in FIG. 1 that corresponds to the pitch information contained in the performance guide data set Event[SongIndex+1] on the RAM 203 to light up at half the maximum brightness via the LED controller 207 in FIG. 2 (step S904). This allows the user to recognize that the key whose LED 104 is lit up at half the maximum brightness is the next key to be pressed.

ステップＳ９０３の判定がＮＯならば、ＣＰＵ２０１は、ステップＳ９０４の処理は実行しない、この結果、ユーザは、最大輝度の半分の輝度でＬＥＤ１０４が点灯する鍵がないことにより、最大輝度でＬＥＤ１０４が点灯している鍵がレッスンの最後の鍵であると認識することができる。 If the determination in step S903 is NO, the CPU 201 does not execute the process of step S904. As a result, the user can recognize that the key with the LED 104 lit at maximum brightness is the last key of the lesson because there is no key with the LED 104 lit at half the maximum brightness.

ステップＳ９０４の処理の後又はステップＳ９０３の判定がＮＯとなった後、ＣＰＵ２０１は、ＲＡＭ２０３上の変数ＮｏｔｅＯｎ＿ｉｎの値を０にすることにより、現在の自動再生のタイミングが押鍵期間に突入した状態が終了する。その後、ＣＰＵ２０１は、図９のフローチャートで示される図７のステップＳ７０３の演奏ガイド処理を終了する。 After the process of step S904 or after the determination of step S903 becomes NO, the CPU 201 sets the value of the variable NoteOn_in in the RAM 203 to 0, thereby ending the state in which the current timing of automatic playback has entered the key-press period. The CPU 201 then ends the performance guide process of step S703 in FIG. 7, which is shown in the flowchart of FIG. 9.

前述したステップＳ９０１の判定がＮＯの場合、ＣＰＵ２０１は次に、ＲＡＭ２０３上の変数ＮｏｔｅＯｎ＿ｏｕｔの値が１であるか否か、即ち現在の自動再生のタイミングが押鍵期間（図４（ｃ）参照）から出るタイミングであるか否かを判定する（ステップＳ９０６）。前述したように、この変数ＮｏｔｅＯｎ＿ｏｕｔの値は、図８の自動再生処理のステップＳ８０３において、前回のイベントの発生時刻からの相対時刻を示す変数値ＤｅｌｔａＴが押鍵期間の終了時刻に対応する“ＤｅｌｔａＴｉｍｅ［ＳｏｎｇＩｎｄｅｘ］＋Ｍａｒｇｉｎ”に到達したと判定された場合に、押鍵期間を出ることを示す値「１」にセットされる。 If the determination in step S901 is NO, the CPU 201 next determines whether the value of the variable NoteOn_out in the RAM 203 is 1, i.e., whether the current timing of automatic playback is the timing of exiting the key-press period (see FIG. 4C) (step S906). As described above, the value of this variable NoteOn_out is set to the value "1", indicating the exit of the key-press period, when it is determined in step S803 of the automatic playback process in FIG. 8 that the variable value DeltaT, which indicates the relative time from the occurrence time of the previous event, has reached "DeltaTime[SongIndex]+Margin", which corresponds to the end time of the key-press period.

ステップＳ９０６の判定がＹＥＳになると、ＣＰＵ２０１は、ＲＡＭ２０３上の現在の変数値ＳｏｎｇＩｎｄｅｘ値から－１した値によって参照されるＲＡＭ２０３上の演奏ガイドデータ組Ｅｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ－１］に含まれる音高情報に対応する図１の鍵盤１０１上の鍵のＬＥＤ１０４を、図２のＬＥＤコントローラ２０７を介して消灯させる（ステップＳ９０７）。これにより、ユーザは、この鍵について、押鍵期間が終了したことを認識することができる。なお、ＳｏｎｇＩｎｄｅｘでなくＳｏｎｇＩｎｄｅｘ－１を参照するのは、図８のステップＳ８０６でＮｏｔｅＯｎ＿ｏｕｔ＝１になった場合には、続くステップＳ８０８でＳｏｎｇＩｎｄｅｘの値が＋１インクリメントされるため、Ｅｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ－１］を参照することにより直前の押鍵期間の鍵のＬＥＤ１０４を消灯できるようにするためである。 When the determination in step S906 is YES, the CPU 201 turns off the LED 104 of the key on the keyboard 101 in FIG. 1 that corresponds to the pitch information contained in the performance guide data set Event[SongIndex-1] on the RAM 203, which is referenced by the current variable value SongIndex value on the RAM 203 minus 1, via the LED controller 207 in FIG. 2 (step S907). This allows the user to recognize that the key-press period for this key has ended. Note that the reason SongIndex-1 is referenced instead of SongIndex is that when NoteOn_out=1 in step S806 in FIG. 8, the value of SongIndex is incremented by +1 in the following step S808, so that the LED 104 of the key in the previous key-press period can be turned off by referencing Event[SongIndex-1].

ステップＳ９０７の処理の後又はステップＳ９０６の判定がＮＯとなった後、ＣＰＵ２０１は、図９のフローチャートで示される図７のステップＳ７０３の演奏ガイド処理を終了する。 After the process of step S907 or the determination of step S906 is NO, the CPU 201 ends the performance guide process of step S703 in FIG. 7 shown in the flowchart of FIG. 9.

図１０は、図７のステップＳ７０４の押鍵・離鍵処理の詳細例を示すフローチャートである。 Figure 10 is a flowchart showing a detailed example of the key press/release process in step S704 of Figure 7.

ＣＰＵ２０１はまず、図２のキースキャナ２０６を介してユーザによって図１の鍵盤１０１上で新規押鍵がなされたか否かを判定する（ステップＳ１００１）。 First, the CPU 201 determines whether or not a new key has been pressed on the keyboard 101 in FIG. 1 by the user via the key scanner 206 in FIG. 2 (step S1001).

ステップＳ１００１の判定がＮＯならば、ＣＰＵ２０１は、ステップＳ１００７の離鍵の判定処理に進む。 If the determination in step S1001 is NO, the CPU 201 proceeds to the key release determination process in step S1007.

ステップＳ１００１の判定がＹＥＳならば、ＣＰＵ２０１は、ＲＡＭ２０３上の変数ＮｏｔｅＯｎの値が１であるか否か、即ち現在の自動再生のタイミングが押鍵期間（図４（ｃ）参照）に入っているか否かを判定する（ステップＳ１００２）。 If the determination in step S1001 is YES, the CPU 201 determines whether the value of the variable NoteOn in the RAM 203 is 1, i.e., whether the current timing of automatic playback is within the key press period (see FIG. 4(c)) (step S1002).

ステップＳ１００２の判定がＮＯならば、ＣＰＵ２０１は、ステップＳ１００７の離鍵の判定処理に進む。このように、ユーザにより図１の鍵盤１０１上の何れかの鍵が新たに押鍵された場合において、演奏レッスンにおける押鍵タイミングが自動再生されるタイミング情報に正しく対応していない（図４（ｃ）の押鍵期間に入っていない）と判定された場合には、音声合成ＬＳＩ２０５に歌声音声出力データ２１７を発声させるためのイベント、及び音源ＬＳＩ２０４に楽音出力データ２１４を発音させるためのイベントの何れも生成されないで、発声及び発音の何れも行われない。 If the determination in step S1002 is NO, the CPU 201 proceeds to the key release determination process in step S1007. In this way, when the user presses a new key on the keyboard 101 in FIG. 1, if it is determined that the key pressing timing in the performance lesson does not correctly correspond to the timing information for automatic playback (not within the key pressing period in FIG. 4(c)), neither an event for causing the voice synthesis LSI 205 to vocalize the singing voice output data 217 nor an event for causing the sound source LSI 204 to sound the musical tone output data 214 is generated, and neither vocalization nor pronunciation is performed.

このようにしてユーザは、押鍵を行ったにもかかわらず歌声音声出力データ２１７の発声も楽音出力データ２１４の発音もないことにより、自分の押鍵が間違ったタイミングであったことを認識することができる。 In this way, the user can recognize that he or she pressed the key at the wrong time because neither the singing voice output data 217 nor the musical sound output data 214 is produced despite the key being pressed.

ステップＳ１００２の判定がＹＥＳならば、ＣＰＵ２０１は、キースキャナ２０６を介して通知された新規押鍵の音高が、ＲＡＭ２０３上の現在のＳｏｎｇＩｎｄｅｘ値によって参照されるＲＡＭ２０３上のＥｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ］に含まれる音高情報に一致するか否かを判定する（ステップＳ１００３）。 If the determination in step S1002 is YES, the CPU 201 determines whether the pitch of the newly pressed key notified via the key scanner 206 matches the pitch information contained in Event[SongIndex] on RAM 203 referenced by the current SongIndex value on RAM 203 (step S1003).

ステップＳ１００３の判定がＹＥＳならば、ＣＰＵ２０１は、ＲＡＭ２０３上の現在のＳｏｎｇＩｎｄｅｘ値によって参照されるＲＡＭ２０３上のＥｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ］に含まれる歌詞情報と音高情報を歌声データ２１５として有する発声イベントを生成し、ＲＡＭ２０３の発行イベント領域にセットする（ステップＳ１００４）。続いて、ＣＰＵ２０１は、ＲＡＭ２０３上の正解フラグ変数の値を１にセットする（ステップＳ１００５）。 If the determination in step S1003 is YES, the CPU 201 generates a vocal event having the lyrics information and pitch information contained in Event[SongIndex] on the RAM 203 referenced by the current SongIndex value on the RAM 203 as singing voice data 215, and sets it in the issued event area of the RAM 203 (step S1004). Next, the CPU 201 sets the value of the correct flag variable on the RAM 203 to 1 (step S1005).

一方、ステップＳ１００３の判定がＮＯならば、ＣＰＵ２０１は、ＲＡＭ２０３上の現在のＳｏｎｇＩｎｄｅｘ値によって参照されるＲＡＭ２０３上のＥｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ］に含まれる音高情報と共に所定の音色（例えばピアノ音）の音色情報を発音制御データ２１６（図２参照）として有する発音イベントを生成し、ＲＡＭ２０３の発行イベント領域にセットする（ステップＳ１００６）。 On the other hand, if the determination in step S1003 is NO, the CPU 201 generates a sound generation event having tone color information of a predetermined tone color (e.g., piano tone) as sound generation control data 216 (see FIG. 2) together with pitch information contained in Event[SongIndex] on RAM 203 referenced by the current SongIndex value on RAM 203, and sets it in the issued event area of RAM 203 (step S1006).

以上のステップＳ１００４又はＳ１００５のように、ユーザにより図１の鍵盤１０１上の何れかの鍵が新たに押鍵されその押鍵タイミングが自動再生されるタイミング情報に正しく対応している場合に、押鍵音高が演奏ガイドデータの組の音高情報に正しく対応していれば正解として音声合成ＬＳＩ２０５から自動再生中の歌詞と音高に対応する歌声音声出力データ２１７を出力させることができ、押鍵音高が間違っていれば不正解として音源ＬＳＩ２０４から自動再生中の音高に対応する所定の音色の楽音出力データ２１４を出力させることができる。このように、ユーザは、演奏レッスンの正解／不正解を、いちいちディスプレイ等を見なくても、歌声音声出力データ２１７が発声されるか、楽音出力データ２１４が発音されるかによって簡単に認識することが可能となる。 As in steps S1004 or S1005 above, when the user newly presses any key on the keyboard 101 in FIG. 1 and the timing of the key press correctly corresponds to the timing information to be automatically played back, if the pressed key pitch correctly corresponds to the pitch information of the set of performance guide data, the voice synthesis LSI 205 can output singing voice output data 217 corresponding to the lyrics and pitch being automatically played back as a correct answer, and if the pressed key pitch is incorrect, the sound source LSI 204 can output musical sound output data 214 of a specified tone color corresponding to the pitch being automatically played back as an incorrect answer. In this way, the user can easily recognize whether the performance lesson is correct or incorrect by whether the singing voice output data 217 or the musical sound output data 214 is pronounced, without having to look at a display or the like each time.

ステップＳ１００１の判定がＮＯの場合或いはステップＳ１００５又はＳ１００６の処理の後、ＣＰＵ２０１は、図２のキースキャナ２０６を介して図１の鍵盤１０１上で新たな離鍵がユーザによってなされたか否かを判定する（ステップＳ１００７）。 If the determination in step S1001 is NO or after processing in step S1005 or S1006, the CPU 201 determines whether or not a new key has been released by the user on the keyboard 101 in FIG. 1 via the key scanner 206 in FIG. 2 (step S1007).

ステップＳ１００７の判定がＹＥＳならば、図２の音声合成ＬＳＩ２０５又は音源ＬＳＩ２０４に対して、現在発音中の歌声音声出力データ２１７又は楽音出力データ２１４の出力を停止させるための離鍵イベントを生成し、ＲＡＭ２０３の発行イベント領域にセットする（ステップＳ１００８）。 If the determination in step S1007 is YES, a key release event is generated to stop the output of the currently sounding singing voice output data 217 or musical tone output data 214 from the voice synthesis LSI 205 or the sound source LSI 204 in FIG. 2, and is set in the issued event area of the RAM 203 (step S1008).

ステップＳ１００８の処理の後又はステップＳ１００７の判定がＮＯの場合に、ＣＰＵ２０１は、図１０のフローチャートで示される図８のステップＳ７０４の押鍵・離鍵処理を終了する。 After processing in step S1008 or if the determination in step S1007 is NO, the CPU 201 ends the key press/release processing in step S704 in FIG. 8 shown in the flowchart in FIG. 10.

図１１は、図７のステップＳ７０５の採点処理の詳細例を示すフローチャートである。まずＣＰＵ２０１は、ＲＡＭ２０３上の変数ＮｏｔｅＯｎ＿ｏｕｔの値が１になっているか否か、即ち現在の自動再生のタイミングが押鍵期間（図４（ｃ）参照）から出るタイミングであるか否かを判定する（ステップＳ１１０１）。 Figure 11 is a flowchart showing a detailed example of the scoring process in step S705 in Figure 7. First, the CPU 201 determines whether the value of the variable NoteOn_out in the RAM 203 is 1, i.e., whether the current timing of automatic playback is the timing to exit the key-press period (see Figure 4 (c)) (step S1101).

ステップＳ１１０１の判定がＹＥＳの場合に、以下の採点処理が実行される。即ち、ＣＰＵ２０１はまず、ＲＡＭ２０３の正解フラグの変数値が１であるか否かを判定する（ステップＳ１１０２）。この正解フラグの変数値は、前述したように、ユーザにより図１の鍵盤１０１上の何れかの鍵が新たに押鍵されその押鍵タイミングが自動再生されるタイミング情報に正しく対応しており、かつ押鍵音高が演奏ガイドデータの組の音高情報に正しく対応している場合に、図１０のステップＳ１００５において値「１」にセットされる。 If the determination in step S1101 is YES, the following scoring process is executed. That is, the CPU 201 first determines whether the variable value of the correct flag in the RAM 203 is 1 (step S1102). As described above, the variable value of this correct flag is set to the value "1" in step S1005 of FIG. 10 when the user newly presses any key on the keyboard 101 in FIG. 1 and the key pressing timing correctly corresponds to the timing information to be automatically played back, and the key pressing pitch correctly corresponds to the pitch information in the set of performance guide data.

ステップＳ１１０２の判定がＹＥＳならば、ＣＰＵ２０１は、ＲＡＭ２０３上の採点途中結果を示す変数値に加点処理を行う（例えば＋１する）（ステップＳ１１０３）。続いて、ＣＰＵ２０１は、ＲＡＭ２０３上の正解フラグの変数値を０にリセットする（ステップＳ１１０４）。 If the determination in step S1102 is YES, the CPU 201 performs a point increment process (e.g., +1) on the variable value indicating the intermediate scoring result stored in the RAM 203 (step S1103). Next, the CPU 201 resets the variable value of the correct answer flag stored in the RAM 203 to 0 (step S1104).

一方、ステップＳ１１０２の判定がＮＯならば、ＣＰＵ２０１は、ＲＡＭ２０３上の採点途中結果を示す変数値に減点処理を行う（例えば－１する）（ステップＳ１１０５）。 On the other hand, if the determination in step S1102 is NO, the CPU 201 performs a subtraction process (e.g., subtracting 1) on the variable value indicating the intermediate scoring result stored in the RAM 203 (step S1105).

ステップＳ１１０４又はＳ１１０５の処理の後、ＣＰＵ２０１は、ＲＡＭ２０３上のＮｏｔｅＯｎ＿ｏｕｔ変数の値を０にリセットする（ステップＳ１１０６）。 After processing step S1104 or S1105, the CPU 201 resets the value of the NoteOn_out variable on the RAM 203 to 0 (step S1106).

ステップＳ１１０６の処理の後又はステップＳ１１０１の判定がＮＯの場合に、ＣＰＵ２０１は、図１１のフローチャートで示される図７のステップＳ７０５の採点処理を終了する。 After processing in step S1106 or if the determination in step S1101 is NO, the CPU 201 ends the scoring process in step S705 of FIG. 7, which is shown in the flowchart of FIG. 11.

以上のように、本実施形態では、押鍵期間が経過した際に押鍵期間中の正解フラグの状態を見るので、押鍵期間以外にユーザにより押鍵されても採点には影響を与えない。押鍵期間中に押鍵しない場合は減点となる。 As described above, in this embodiment, the state of the correct flag during the key pressing period is checked when the key pressing period has elapsed, so even if the user presses a key outside of the key pressing period, it does not affect the score. Points are deducted if a key is not pressed during the key pressing period.

図１２は、図７のステップＳ７０６の声質更新処理の詳細例を示すフローチャートである。前述したように、本実施例における声質変化は、初期状態を男声とし、採点途中結果の点数が上がるにつれて女声方向、点数が下がるにつれて男性のダミ声方向へと声質が変化させられる。声質は前述したフォルマントと呼ばれる声道の周波数特性の他に、声帯が発する音の周波数（声の高さ）によって決まる。フォルマントについては、図４（ｂ）に示されように、一般的に女声や子供では、周波数の高い領域に第１フォルマント以降のフォルマントが位置し、逆に男声では、周波数の低い領域にフォルマントが位置する。また、女性や子供は一般的に声の高さ（声帯の振動数）が高い。 Figure 12 is a flow chart showing a detailed example of the voice quality update process in step S706 in Figure 7. As described above, in this embodiment, the voice quality is changed so that the initial state is a male voice, and as the intermediate scoring score increases, the voice quality is changed toward a female voice, and as the score decreases, the voice quality is changed toward a male hoarse voice. Voice quality is determined by the frequency characteristics of the vocal tract called formants as described above, as well as the frequency of the sound emitted by the vocal cords (voice pitch). As shown in Figure 4 (b), in general, in female voices and children, the formants after the first formant are located in the high frequency range, while in male voices, the formants are located in the low frequency range. In addition, women and children generally have high voice pitches (frequency of vibration of the vocal cords).

上記の傾向を踏まえ、本実施例では、図５（ｂ）に示されるように、初期状態の男声の声質５を中心として上方向に５段階、下方向に５段階の計１１段階で声質が変化するものとする。フォルマントについては女声、男声及びダミ声間を補間するように変化させる。また、声質変更の３段階目（声質８と声質２）では声の高さを１オクターブ上下させる処理を実行する。 In light of the above trends, in this embodiment, as shown in FIG. 5(b), the voice quality is changed in 11 stages in total, five stages upward and five stages downward, centered on the initial male voice quality 5. The formants are changed to interpolate between female voices, male voices, and hoarse voices. In addition, in the third stage of voice quality change (voice quality 8 and voice quality 2), a process is performed to raise or lower the voice pitch by one octave.

図１２のフローチャートで示される処理において、ＣＰＵ２０１は、ＲＡＭ２０３に記憶されている前回の採点途中結果の点数と今回の採点途中結果の点数を比較し（ステップＳ１２０１）、点数がアップしたか否かを判定する（ステップＳ１２０２）。 In the process shown in the flowchart of FIG. 12, the CPU 201 compares the score of the previous intermediate scoring result stored in the RAM 203 with the score of the current intermediate scoring result (step S1201), and determines whether the score has increased (step S1202).

そして、ステップＳ１２０２の判定がＹＥＳの場合（点数がアップした場合）には、ＣＰＵ２０１は、声質が最大値の１０に到達していなければ（ステップＳ１２０４の判定がＮＯならば）、声質を１段階加算する（ステップＳ１２０５）。声質が最大値の１０に到達していれば（ステップＳ１２０４の判定がＹＥＳならば）、ＣＰＵ２０１は、ステップＳ１２０５の加算処理は実行しない。 If the determination in step S1202 is YES (if the score has increased), the CPU 201 adds one level to the voice quality (step S1205) if the voice quality has not reached the maximum value of 10 (if the determination in step S1204 is NO). If the voice quality has reached the maximum value of 10 (if the determination in step S1204 is YES), the CPU 201 does not execute the addition process in step S1205.

また、声質がオクターブ切替え段階である８又は２であった場合（ステップＳ１２０６の判定がＹＥＳならば）、ＣＰＵ２０１は、声の高さを１オクターブ上げる（ステップＳ１２０７）。ステップＳ１２０６の判定がＮＯならば、ＣＰＵ２０１は、オクターブは維持する。その後、ＣＰＵ２０１は、図１２のフローチャートで示される図７のステップＳ７０６の声質更新処理を終了する。 Also, if the voice quality is at 8 or 2, which is the octave switching stage (if the determination in step S1206 is YES), the CPU 201 raises the pitch of the voice by one octave (step S1207). If the determination in step S1206 is NO, the CPU 201 maintains the octave. After that, the CPU 201 ends the voice quality update process in step S706 in FIG. 7, which is shown in the flowchart in FIG. 12.

一方、ステップＳ１２０２の判定がＮＯの場合には、ＣＰＵ２０１は更に、ステップＳ１２０１での比較処理の結果、点数がダウンしたか否かを判定する（ステップＳ１２０３）。ステップＳ１２０３の判定もＮＯで、点数が維持されている場合には、ＣＰＵ２０１は、何もせずに、図１２のフローチャートで示される図７のステップＳ７０６の声質更新処理を終了する。 On the other hand, if the determination in step S1202 is NO, the CPU 201 further determines whether the score has decreased as a result of the comparison process in step S1201 (step S1203). If the determination in step S1203 is also NO and the score has been maintained, the CPU 201 does nothing and ends the voice quality update process in step S706 in FIG. 7 shown in the flowchart in FIG. 12.

ＣＰＵ２０１は、ステップＳ１２０３の判定がＹＥＳで点数がダウンしたと判定した場合には、声質が最小値の０に到達していなければ（ステップＳ１２０８の判定がＮＯならば）、声質を１段階減算する（ステップＳ１２０９）。声質が最小値の０に到達していれば（ステップＳ１２０８の判定がＹＥＳならば）、ＣＰＵ２０１は、ステップＳ１２０９の減算処理は実行しない。 If the CPU 201 determines in step S1203 that the score has decreased (YES), and the voice quality has not reached the minimum value of 0 (NO in step S1208), it subtracts one level from the voice quality (step S1209). If the voice quality has reached the minimum value of 0 (YES in step S1208), the CPU 201 does not execute the subtraction process in step S1209.

また、声質がオクターブ切替え段階である８又は２であった場合（ステップＳ１２１０の判定がＹＥＳならば）、ＣＰＵ２０１は、声の高さを１オクターブ下げる（ステップＳ１２１１）。ステップＳ１２１０の判定がＮＯならば、ＣＰＵ２０１は、オクターブは維持する。 Also, if the voice quality is at 8 or 2, which is the octave switching stage (if the determination in step S1210 is YES), the CPU 201 lowers the voice pitch by one octave (step S1211). If the determination in step S1210 is NO, the CPU 201 maintains the octave.

ＣＰＵ２０１は、上述のように声質を算出すると、その声質をＲＡＭ２０３上の変数ｃｕｒＮｕｍに記憶させた後に、図１２のフローチャートで示される図７のステップＳ７０６の声質更新処理を終了する。続いて、ＣＰＵ２０１は、図７のステップＳ７０７の練習進行度算出処理において、ステップＳ７０６で変数ｃｕｒＮｕｍに得た声質の値を用いて、（２）式で示される演算処理によって練習進行度データ３１１の値ｘを算出する。

ここで、ｃｕｒＮｕｍは図７のステップＳ７０６の声質更新処理によってＲＡＭ２０３上の変数ｃｕｒＮｕｍに得られている現在の声質の値である。また、ｍａｌｅＮｕｍは、男性の声質の値であり、前述したように例えば５である。更に、ｆｅｍａｌｅＮｕｍは、女性の声質の値であり、例えば最高値１０である。従って、上記（２）式の演算処理により算出される練習進行度データ３１１の値ｘは、男性の声質の値ｍａｌｅＮｕｍに対する採点途中結果に対応する現在の声質の値ｃｕｒＮｕｍの差分値が、男性の声質の値ｍａｌｅＮｕｍに対する女性の声質の値ｆｅｍａｌｅＮｕｍの差分値に対して、どの程度の割合であるかを示している。例えば、現在の声質の値ｃｕｒＮｕｍ＝１０（女性の声質の値と同じ最高の声質の値）であれば、上記（２）式の演算結果は「ｘ＝（１０―５）÷（１０－５）＝１」となる。また例えば、現在の声質の値ｃｕｒＮｕｍ＝５（男性の声質の値と同じ値）であれば、上記（２）式の演算結果は「ｘ＝（５―５）÷（１０－５）＝０」となる。更に例えば、現在の声質の値ｃｕｒＮｕｍ＝０（最低の声質の値）であれば、上記（２）式の演算結果は「ｘ＝（０―５）÷（１０－５）＝－１」となる。即ち、上記（２）式の演算処理によって算出される練習進行度データ３１１の値ｘは、図７のステップＳ７０５の採点処理に基づく採点途中結果が最高値となって、図７のステップＳ７０６の声質更新処理によって算出される現在の声質の値ｃｕｒＮｕｍが女性の声質と同じ最高値１０になれば、練習進行度データ３１１の値ｘ＝１となる。また、図７のステップＳ７０５の採点処理に基づく採点途中結果が平均値となって、図７のステップＳ７０６の声質更新処理によって算出される現在の声質の値ｃｕｒＮｕｍが男性の声質と同じ平均値５になれば、練習進行度データ３１１の値ｘ＝０となる。更に、図７のステップＳ７０５の採点処理に基づく採点途中結果が最低値となって、図７のステップＳ７０６の声質更新処理によって算出される現在の声質の値ｃｕｒＮｕｍが最低値０になれば、練習進行度データ３１１の値ｘ＝－１となる。 After calculating the voice quality as described above, the CPU 201 stores the voice quality in the variable curNum on the RAM 203, and then ends the voice quality update process of step S706 in Fig. 7 shown in the flowchart of Fig. 12. Next, in the practice progress calculation process of step S707 in Fig. 7, the CPU 201 uses the voice quality value obtained for the variable curNum in step S706 to calculate the value x of the practice progress data 311 by the arithmetic process shown in equation (2).

Here, curNum is the current voice quality value obtained in the variable curNum on the RAM 203 by the voice quality update process in step S706 of FIG. 7. Furthermore, maleNum is the value of male voice quality, for example 5 as described above. Furthermore, femaleNum is the value of female voice quality, for example the highest value 10. Therefore, the value x of the practice progress data 311 calculated by the calculation process of the above formula (2) indicates the ratio of the difference value of the current voice quality value curNum corresponding to the intermediate scoring result for the male voice quality value maleNum to the difference value of the female voice quality value femaleNum for the male voice quality value maleNum. For example, if the current voice quality value curNum=10 (the highest voice quality value, the same as the female voice quality value), the calculation result of the above formula (2) is "x=(10-5)÷(10-5)=1". For example, if the current voice quality value curNum = 5 (the same value as that of a male voice quality), the calculation result of the above formula (2) is "x = (5 - 5) / (10 - 5) = 0." For example, if the current voice quality value curNum = 0 (the lowest voice quality value), the calculation result of the above formula (2) is "x = (0 - 5) / (10 - 5) = -1." That is, if the value x of the practice progress data 311 calculated by the calculation process of the above formula (2) is the highest value based on the intermediate scoring result based on the scoring process of step S705 in Fig. 7, and the current voice quality value curNum calculated by the voice quality update process of step S706 in Fig. 7 is the highest value 10, the same as that of a female voice quality, then the value x of the practice progress data 311 will be x = 1. Furthermore, if the intermediate scoring result based on the scoring process in step S705 of Fig. 7 becomes the average value and the current voice quality value curNum calculated by the voice quality updating process in step S706 of Fig. 7 becomes the same average value 5 as a male voice quality, the value x of the practice progress data 311 will be 0. Furthermore, if the intermediate scoring result based on the scoring process in step S705 of Fig. 7 becomes the minimum value and the current voice quality value curNum calculated by the voice quality updating process in step S706 of Fig. 7 becomes the minimum value 0, the value x of the practice progress data 311 will be -1.

図５（ｃ）は、本実施例において音声合成ＬＳＩ２０５内の音響モデル部３０１に保持されている声質パラメータのデータ構成例（以下、「声質パラメータ構造体」と呼ぶ）を示す図である。「母音フラグ」は当該音素が母音であるか、子音であるかを示すフラグである。母音のときｔｒｕｅ、子音のときｆａｌｓｅとなる。「ノイズ混合比」は、音声に混合されるホワイトノイズの振幅比である。この値は、歌声音声出力データ２１７の最大振幅（１６ビットなら３２，７６８）を１としたときの比率になる。フォルマントパラメータである「ＬＳＦ１配列ポインタ」及び「ＬＳＦ２配列ポインタ」は、ＬＳＰ分析の結果得られるＬＳＰパラメータ値を周波数パラメータ値に変換して得られるＬＳＦ値への配列ポインタである。ＬＳＦは2つで１対のデータとなるため、「ＬＳＦ１配列ポインタ」と「ＬＳＦ２配列ポインタ」のペアがそれぞれＬＳＰ分析の次数分だけ、上記データ構造体に保持されることになる。全音素×全ノート番号分の数の上記声質パラメータ構造体のデータ群が、人間の声の１キャラクタ分の全声質パラメータ群となる。この声質パラメータ群が、男声、女声、ダミ声のキャラクタ毎に、音響モデル部３０１内の特には図示しない書込み可能ＲＯＭに保持されている。 Figure 5 (c) is a diagram showing an example of the data structure of the voice quality parameters (hereinafter referred to as the "voice quality parameter structure") stored in the acoustic model unit 301 in the voice synthesis LSI 205 in this embodiment. The "vowel flag" is a flag indicating whether the phoneme is a vowel or a consonant. If it is a vowel, it is true, and if it is a consonant, it is false. The "noise mixing ratio" is the amplitude ratio of the white noise mixed into the voice. This value is the ratio when the maximum amplitude of the singing voice output data 217 (32,768 for 16 bits) is set to 1. The "LSF1 array pointer" and "LSF2 array pointer", which are formant parameters, are array pointers to the LSF values obtained by converting the LSP parameter values obtained as a result of the LSP analysis into frequency parameter values. Since two LSFs make a pair of data, the above data structure stores the pairs of "LSF1 array pointer" and "LSF2 array pointer" for the number of orders of the LSP analysis. The data set of the above voice quality parameter structures, the number of which is equal to all phonemes x all note numbers, constitutes the entire voice quality parameter set for one human voice character. This voice quality parameter set is stored in a writable ROM (not shown) in the acoustic model unit 301 for each character, male voice, female voice, and hoarse voice.

図１３は、音声合成ＬＳＩ２０５での発声処理時に、図３の音声合成ＬＳＩ２０５のプロセッサによって、音声合成部３００のフォルマント補間処理部３０６及びノイズ混合比補間処理部３１６の各機能として実行される声質変更処理の例を示すフローチャートである。ここでは、男声、女声間の声質変更について説明する。男声、ダミ声間の声質変更も同様に実施できる。 Figure 13 is a flowchart showing an example of voice quality change processing executed by the processor of the voice synthesis LSI 205 in Figure 3 as the functions of the formant interpolation processing unit 306 and the noise mixing ratio interpolation processing unit 316 of the voice synthesis unit 300 during vocalization processing in the voice synthesis LSI 205. Here, a voice quality change between a male voice and a female voice is explained. A voice quality change between a male voice and a hoarse voice can also be implemented in the same way.

音声合成ＬＳＩ２０５のプロセッサは、図７のステップＳ７０８の発声・発音処理によりＣＰＵ２０１から発行された歌声データ２１５に基づいて図３のテキスト解析部３０２を介して設定された発声すべき音素番号を取得し（ステップＳ１３０１）、同じく歌声データ２１５に含まれる形で指定された音高情報であるノート番号を取得する（ステップＳ１３０２）。また、音声合成部３００のプロセッサは、図７のステップＳ７０７でＣＰＵ２０１が前述した（２）式で示される演算処理によって算出した練習進行度データ３１１の値ｘを取得する（ステップＳ１３０３）。 The processor of the voice synthesis LSI 205 obtains the phoneme number to be vocalized, which is set via the text analysis unit 302 of FIG. 3, based on the singing voice data 215 issued by the CPU 201 in the vocalization/pronunciation process of step S708 of FIG. 7 (step S1301), and also obtains the note number, which is pitch information specified in the form included in the singing voice data 215 (step S1302). The processor of the voice synthesis unit 300 also obtains the value x of the practice progress data 311 calculated by the CPU 201 in step S707 of FIG. 7 through the arithmetic process shown in the above-mentioned equation (2) (step S1303).

次に、音声合成ＬＳＩ２０５のプロセッサは、ステップＳ１３０１で取得した音素番号及びステップＳ１３０２で取得したノート番号に基づいて、音響モデル部３０１から、図５（ｃ）に示されるデータ形式の男声および女声の各声質パラメータ構造体を取得する（ステップＳ１３０４、Ｓ１３０５）。 Next, the processor of the speech synthesis LSI 205 acquires the voice quality parameter structures for the male and female voices in the data format shown in FIG. 5(c) from the acoustic model unit 301 based on the phoneme number acquired in step S1301 and the note number acquired in step S1302 (steps S1304, S1305).

次に、音声合成ＬＳＩ２０５のプロセッサは、ステップＳ１３０４又はＳ１３０５で取得した声質パラメータ構造体において、母音フラグが設定されているか否かを判定する（ステップＳ１３０６）。 Next, the processor of the speech synthesis LSI 205 determines whether or not a vowel flag is set in the voice quality parameter structure acquired in step S1304 or S1305 (step S1306).

ステップＳ１３０１で取得された現在の音素番号の音素が母音でなくステップＳ１３０６の判定がＮＯの場合、即ち、子音である場合には、当該音素はピッチを持たないノイズ音声である。この場合には、音声合成ＬＳＩ２０５のプロセッサは、ステップＳ１３０４で取得した男性の声質パラメータ構造体から、「ノイズ混合比」（図５（ｃ）参照）を取り出して目標ノイズ混合比３１７として図３のノイズ重畳部３０７にセットし、「ＬＳＦ１配列ポインタ」及び「ＬＳＦ２配列ポインタ」（図５（ｃ）参照）を取り出して図３の目標スペクトル情報３１２として図３の合成フィルタ部３０５にセットする（ステップＳ１３１０）。なお、男声の代わりに女声の声質パラメータ構造体からのデータを、目標ノイズ混合比３１７及び目標スペクトル情報３１２としてセットしてよいことはもちろんである。 If the phoneme of the current phoneme number acquired in step S1301 is not a vowel and the judgment in step S1306 is NO, i.e., if it is a consonant, the phoneme is a noise voice without pitch. In this case, the processor of the speech synthesis LSI 205 extracts the "noise mixture ratio" (see FIG. 5(c)) from the male voice quality parameter structure acquired in step S1304 and sets it as the target noise mixture ratio 317 in the noise superimposition unit 307 of FIG. 3, and extracts the "LSF1 array pointer" and "LSF2 array pointer" (see FIG. 5(c)) and sets them as the target spectrum information 312 in FIG. 3 in the synthesis filter unit 305 of FIG. 3 (step S1310). It is of course possible to set data from a female voice quality parameter structure instead of a male voice as the target noise mixture ratio 317 and the target spectrum information 312.

ステップＳ１３０１で取得された現在の音素番号の音素が母音であってステップＳ１３０６の判定がＹＥＳの場合、音声合成ＬＳＩ２０５のプロセッサはまず、図３のノイズ混合比補間処理部３１６の機能としてノイズ混合比補間処理を実行し（ステップＳ１３０７）、続いて、図３のフォルマント補間処理部３０６の機能としてフォルマント補間処理を実行する（ステップＳ１３０８）。そして、音声合成ＬＳＩ２０５のプロセッサは、上記ステップＳ１３０７のノイズ混合比補間処理によって得られた目標ノイズ混合比３１７を図３のノイズ重畳部３０７に目標パラメータとしてセットし、ステップＳ１３０８のフォルマント補間処理によって得られた目標スペクトル情報３１２を図３の合成フィルタ部３０５に目標パラメータとしてセットする。 If the phoneme of the current phoneme number acquired in step S1301 is a vowel and the determination in step S1306 is YES, the processor of the speech synthesis LSI 205 first executes a noise mixing ratio interpolation process as the function of the noise mixing ratio interpolation processing unit 316 in FIG. 3 (step S1307), and then executes a formant interpolation process as the function of the formant interpolation processing unit 306 in FIG. 3 (step S1308). The processor of the speech synthesis LSI 205 then sets the target noise mixing ratio 317 obtained by the noise mixing ratio interpolation process in step S1307 as a target parameter in the noise superimposition unit 307 in FIG. 3, and sets the target spectrum information 312 obtained by the formant interpolation process in step S1308 as a target parameter in the synthesis filter unit 305 in FIG. 3.

ステップＳ１３０９又はＳ１３１０の処理の後、音声合成ＬＳＩ２０５のプロセッサは、図１３のフローチャートで示される声質変更処理を終了する。 After processing of step S1309 or S1310, the processor of the voice synthesis LSI 205 ends the voice quality change processing shown in the flowchart of FIG. 13.

図１４（ａ）は、図１３のステップＳ１３０７のノイズ混合比補間処理の詳細例を示すフローチャートであり、音声合成ＬＳＩ２０５のプロセッサによって図３の音声合成部３００のノイズ重畳部３０７の機能として実行される。 Figure 14 (a) is a flowchart showing a detailed example of the noise mixing ratio interpolation process of step S1307 in Figure 13, which is executed by the processor of the speech synthesis LSI 205 as a function of the noise superimposition unit 307 of the speech synthesis unit 300 in Figure 3.

音声合成ＬＳＩ２０５のプロセッサは、図３のステップＳ１３０４で音響モデル部３０１から取得した男性の声質パラメータ構造体から男性のノイズ混合比の値をｍａｌｅＲａｔｅとして取り出し、また、図３のステップＳ１３０５で音響モデル部３０１から取得した女性の声質パラメータ構造体から女性のノイズ混合比の値をｆｅｍａｌｅＲａｔｅとして取り出し、更に図１３のステップＳ１３０３で取得した練習進行度データ３１１の値ｘを用いて、下記（３）式で示される演算処理を実行することにより、目標ノイズ混合比３１７の値ｔａｒｇｅｔＲａｔｅを算出する（ステップＳ１４０１）。

The processor of the speech synthesis LSI 205 extracts the value of the male noise mixing ratio as maleRate from the male voice quality parameter structure acquired from the acoustic model unit 301 in step S1304 of FIG. 3, and also extracts the value of the female noise mixing ratio as femaleRate from the female voice quality parameter structure acquired from the acoustic model unit 301 in step S1305 of FIG. 3. Furthermore, using the value x of the practice progress data 311 acquired in step S1303 of FIG. 13, the processor calculates the value targetRate of the target noise mixing ratio 317 by performing the arithmetic processing shown in equation (3) below (step S1401).

前述したように、練習進行度データ３１１の値ｘは、男性の声質の値ｍａｌｅＮｕｍに対する採点途中結果に対応する現在の声質の値ｃｕｒＮｕｍの差分値が、男性の声質の値ｍａｌｅＮｕｍに対する女性の声質の値ｆｅｍａｌｅＮｕｍの差分値に対して、どの程度の割合であるかを示している。従って、上記（３）式に示されるように、男性のノイズ混合比に対する女性のノイズ混合比の差分値（ｆｅｍａｌｅＲａｔｅ－ｍａｌｅＲａｔｅ）に採点結果に対応する練習進行度データ３１１の値ｘを乗算し、その乗算結果を男性のノイズ混合比ｍａｌｅＲａｔｅに加算することにより、採点結果に対応して補間された声質に対応する目標ノイズ混合比３１７の値ｔａｒｇｅｔＲａｔｅを算出することができる。 As described above, the value x of the practice progress data 311 indicates the ratio of the difference value of the current voice quality value curNum corresponding to the intermediate scoring result for the male voice quality value maleNum to the difference value of the female voice quality value femaleNum for the male voice quality value maleNum. Therefore, as shown in the above formula (3), the difference value of the female noise mixture ratio for the male noise mixture ratio (femaleRate-maleRate) is multiplied by the value x of the practice progress data 311 corresponding to the scoring result, and the multiplication result is added to the male noise mixture ratio maleRate, thereby calculating the value targetRate of the target noise mixture ratio 317 corresponding to the voice quality interpolated in response to the scoring result.

図３の音声合成ＬＳＩ２０５のプロセッサは、ノイズ混合比補間処理部３１６の機能として、上述の図１４（ａ）のフローチャートで示される図１３のステップＳ１３０７で算出した目標ノイズ混合比値ｔａｒｇｅｔＲａｔｅを、前述した図１３のステップＳ１３０９で図１３の音声合成ＬＳＩ２０５内のノイズ重畳部３０７に、目標パラメータである目標ノイズ混合比３１７としてセットする。そして、音声合成ＬＳＩ２０５のプロセッサは、このノイズ重畳部３０７の機能として、信号の最大振幅値に対して上記目標ノイズ混合比３１７の値ｔａｒｇｅｔＲａｔｅを乗じて得られる振幅値を有するノイズデータ３１５を生成し、フィルタ出力データ３１３に混合させる。 The processor of the voice synthesis LSI 205 in FIG. 3, as a function of the noise mixing ratio interpolation processing unit 316, sets the target noise mixing ratio value targetRate calculated in step S1307 of FIG. 13 shown in the flowchart of FIG. 14(a) as the target noise mixing ratio 317, which is a target parameter, in the noise superimposition unit 307 in the voice synthesis LSI 205 in FIG. 13 in step S1309 of FIG. 13. Then, as a function of this noise superimposition unit 307, the processor of the voice synthesis LSI 205 generates noise data 315 having an amplitude value obtained by multiplying the maximum amplitude value of the signal by the value targetRate of the target noise mixing ratio 317, and mixes it with the filter output data 313.

図１４（ｂ）は、図１３のステップＳ１３０８のフォルマント補間処理の詳細例を示すフローチャートであり、音声合成ＬＳＩ２０５のプロセッサによって図３の音声合成部３００のフォルマント補間処理部３０６の機能として実行される。 Figure 14 (b) is a flowchart showing a detailed example of the formant interpolation process in step S1308 in Figure 13, which is executed by the processor of the speech synthesis LSI 205 as a function of the formant interpolation processing unit 306 of the speech synthesis unit 300 in Figure 3.

音声合成ＬＳＩ２０５のプロセッサは、例えば特には図示しないレジスタとして有する変数ｉの値を０（ステップＳ１４１１）から、＋１ずつインクリメントさせながら（ステップＳ１４１４）、その値がパラメータ次数Ｎに達したと判定するまで（ステップＳ１４１５）、ＬＳＦ１とＬＳＦ２のパラメータセット毎に、ステップＳ１４１２とＳ１４１３の処理を繰り返し実行する。 The processor of the speech synthesis LSI 205 repeatedly executes the processes of steps S1412 and S1413 for each parameter set of LSF1 and LSF2, incrementing the value of a variable i held, for example, as a register not specifically shown, from 0 (step S1411) by +1 (step S1414) until it is determined that the value has reached the parameter order N (step S1415).

まず、音声合成ＬＳＩ２０５のプロセッサは、図３のステップＳ１３０４で音響モデル部３０１から取得した男性の声質パラメータ構造体から男性の第ｉ次のＬＳＦ１配列ポインタを取り出し、そのポインタが参照する音響モデル部３０１内の特には図示しないメモリからＬＳＦ１パラメータの値をｍａｌｅＬＳＦ１［ｉ］として取得し、また、図３のステップＳ１３０５で音響モデル部３０１から取得した女性の声質パラメータ構造体から女性の第ｉ次のＬＳＦ１配列ポインタを取り出し、そのポインタが参照する音響モデル部３０１内のメモリからＬＳＦ１パラメータの値をｆｅｍａｌｅＬＳＦ１［ｉ］として取得し、更に図１３のステップＳ１３０３で取得した練習進行度データ３１１の値ｘを用いて、下記（４）式で示される演算処理を実行することにより、目標スペクトル情報３１２の一部である第ｉ次の目標ＬＳＦ１の値ｔａｒｇｅｔＬＳＦ１［ｉ］を算出する（ステップＳ１４１２）。

First, the processor of the speech synthesis LSI 205 extracts the i-th male LSF1 array pointer from the male voice quality parameter structure acquired from the acoustic model unit 301 in step S1304 of FIG. 3, and acquires the value of the LSF1 parameter as maleLSF1[i] from a memory (not shown) in the acoustic model unit 301 referenced by the pointer. Also, in step S1305 of FIG. 3, the processor extracts the i-th female LSF1 array pointer from the female voice quality parameter structure acquired from the acoustic model unit 301, and acquires the value of the LSF1 parameter as femaleLSF1[i] from the memory in the acoustic model unit 301 referenced by the pointer. Then, using the value x of the practice progress data 311 acquired in step S1303 of FIG. 13, the processor executes the arithmetic processing shown in equation (4) below to calculate the i-th target LSF1 value targetLSF1[i], which is part of the target spectrum information 312 (step S1412).

前述したように、練習進行度データ３１１の値ｘは、男性の声質の値ｍａｌｅＮｕｍに対する採点途中結果に対応する現在の声質の値ｃｕｒＮｕｍの周波数差分値が、男性の声質の値ｍａｌｅＮｕｍに対する女性の声質の値ｆｅｍａｌｅＮｕｍの周波数差分値に対して、どの程度の割合であるかを示している。従って、上記（４）式に示されるように、男性の第ｉ次のＬＳＦ１パラメータ値に対する女性の第ｉ次のＬＳＦ１パラメータ値の周波数差分値（ｆｅｍａｌｅＬＳＦ１［ｉ］－ｍａｌｅＬＳＦ１［ｉ］）に採点結果に対応する練習進行度データ３１１の値ｘを乗算し、その乗算結果を男性の第ｉ次のＬＳＦ１パラメータ値ｍａｌｅＬＳＦ１［ｉ］に加算することにより、採点結果に対応して補間された声質に対応する第ｉ次の目標ＬＳＦ１パラメータ値ｔａｒｇｅｔＬＳＦ１［ｉ］を算出することができる。 As described above, the value x of the practice progress data 311 indicates the ratio of the frequency difference value of the current voice quality value curNum corresponding to the intermediate scoring result for the male voice quality value maleNum to the frequency difference value of the female voice quality value femaleNum for the male voice quality value maleNum. Therefore, as shown in the above formula (4), the frequency difference value of the female's i-th LSF1 parameter value for the male's i-th LSF1 parameter value (femaleLSF1[i] - maleLSF1[i]) is multiplied by the value x of the practice progress data 311 corresponding to the scoring result, and the multiplication result is added to the male's i-th LSF1 parameter value maleLSF1[i], thereby calculating the i-th target LSF1 parameter value targetLSF1[i] corresponding to the voice quality interpolated in response to the scoring result.

次に、音声合成ＬＳＩ２０５のプロセッサは、図３のステップＳ１３０４で音響モデル部３０１から取得した男性の声質パラメータ構造体から男性の第ｉ次のＬＳＦ２配列ポインタを取り出し、そのポインタが参照する音響モデル部３０１内のメモリからＬＳＦ２パラメータの値をｍａｌｅＬＳＦ２［ｉ］として取得し、また、図３のステップＳ１３０５で音響モデル部３０１から取得した女性の声質パラメータ構造体から女性の第ｉ次のＬＳＦ２配列ポインタを取り出し、そのポインタが参照する音響モデル部３０１内のメモリからＬＳＦ２パラメータの値をｆｅｍａｌｅＬＳＦ２［ｉ］として取得し、更に図１３のステップＳ１３０３で取得した練習進行度データ３１１の値ｘを用いて、下記（５）式で示される演算処理を実行することにより、目標スペクトル情報３１２の一部である第ｉ次の目標ＬＳＦ２の値ｔａｒｇｅｔＬＳＦ２［ｉ］を算出する（ステップＳ１４１３）。

Next, the processor of the speech synthesis LSI 205 extracts the i-th male LSF2 array pointer from the male voice quality parameter structure acquired from the acoustic model unit 301 in step S1304 of FIG. 3, and acquires the value of the LSF2 parameter as maleLSF2[i] from the memory in the acoustic model unit 301 to which the pointer refers, and also extracts the i-th female LSF2 array pointer from the female voice quality parameter structure acquired from the acoustic model unit 301 in step S1305 of FIG. 3, and acquires the value of the LSF2 parameter as femaleLSF2[i] from the memory in the acoustic model unit 301 to which the pointer refers, and further calculates the i-th target LSF2 value targetLSF2[i], which is part of the target spectrum information 312, by performing the arithmetic processing shown in equation (5) below using the value x of the practice progress data 311 acquired in step S1303 of FIG. 13 (step S1413).

前述した（４）式のＬＳＦ１パラメータ値の場合と同様に、上記（５）式に示されるように、男性の第ｉ次のＬＳＦ２パラメータ値に対する女性の第ｉ次のＬＳＦ２パラメータ値の周波数差分値（ｆｅｍａｌｅＬＳＦ２［ｉ］－ｍａｌｅＬＳＦ２［ｉ］）に採点結果に対応する練習進行度データ３１１の値ｘを乗算し、その乗算結果を男性の第ｉ次のＬＳＦ２パラメータ値ｍａｌｅＬＳＦ２［ｉ］に加算することにより、採点結果に対応して補間された声質に対応する第ｉ次の目標ＬＳＦ２パラメータ値ｔａｒｇｅｔＬＳＦ１［ｉ］を算出することができる。 As in the case of the LSF1 parameter value in equation (4) above, as shown in equation (5) above, the frequency difference value (femaleLSF2[i] - maleLSF2[i]) of the female's i-th LSF2 parameter value relative to the male's i-th LSF2 parameter value is multiplied by the value x of the practice progress data 311 corresponding to the scoring result, and the multiplication result is added to the male's i-th LSF2 parameter value maleLSF2[i], thereby calculating the i-th target LSF2 parameter value targetLSF1[i] corresponding to the voice quality interpolated in accordance with the scoring result.

図３の音声合成ＬＳＩ２０５のプロセッサは、フォルマント補間処理部３０６の機能として、上述の図１４（ｂ）のフローチャートで示される図１３のステップＳ１３０８フォルマント補間処理で算出したＬＳＰ分析次数分の目標ＬＳＦ１パラメータ値ｔａｒｇｅｔＬＳＦ１［ｉ］と目標ＬＳＦ２パラメータ値ｔａｒｇｅｔＬＳＦ２［ｉ］のペア（０≦ｉ≦Ｎ－１）を、前述した図１３のステップＳ１３０９で図１３の音声合成ＬＳＩ２０５内の合成フィルタ部３０５に、目標パラメータである目標スペクトル情報３１２としてセットする。この結果、合成フィルタ部３０５は、上記目標スペクトル情報３１２を用いてデジタルフィルタを形成し、このデジタルフィルタに図３の発声モデル部３０３内の音源生成部３０４の機能により音源信号を入力させることにより、フィルタ出力データ３１３を出力する。最終的に、このフィルタ出力データ３１３はノイズデータ３１５と混合され、歌声音声出力データ２１７として出力される。 The processor of the speech synthesis LSI 205 in Fig. 3, as a function of the formant interpolation processing unit 306, sets a pair (0≦i≦N-1) of the target LSF1 parameter value targetLSF1[i] and the target LSF2 parameter value targetLSF2[i] for the LSP analysis order calculated in the formant interpolation processing in step S1308 in Fig. 13 shown in the flowchart in Fig. 14(b) described above, as target spectrum information 312, which is a target parameter, in the synthesis filter unit 305 in the speech synthesis LSI 205 in Fig. 13 in step S1309 described above in Fig. 13. As a result, the synthesis filter unit 305 forms a digital filter using the target spectrum information 312, and outputs filter output data 313 by inputting a sound source signal to this digital filter using the function of the sound source generation unit 304 in the vocalization model unit 303 in Fig. 3. Finally, this filter output data 313 is mixed with noise data 315 and output as singing voice output data 217.

上記図１３及び図１４のフローチャートによって説明した声質変更処理により、演奏レッスン開始時の歌唱機能の声質が例えば男性大人に設定され、演奏レッスン開始後採点途中結果が上がっていくごとに、ユーザの押鍵操作に基づいて電子鍵盤楽器１００のスピーカから発声される歌声音声の声質が女性大人の声質に徐々に変化し、逆に点数が下がっていった場合は上記歌声音声の声質が男性大人の声質から少し耳障りないわゆるダミ声と言われている声質に変化していく。更には、採点途中結果に応じてハスキーボイスのような声質の有り／無しの変化を加えることもできる。これにより、ユーザは、いちいちディスプレイを確認する必要なく、演奏レッスンの経過と共に自分の演奏操作の技量がどの程度になっているかを、発声される歌声音声の声質により簡単に確認することが可能となる。 By the voice quality change process described by the flowcharts of Figures 13 and 14 above, the voice quality of the singing function at the start of a performance lesson is set to, for example, an adult male voice, and as the intermediate score improves after the start of the performance lesson, the voice quality of the singing voice emitted from the speaker of the electronic keyboard instrument 100 based on the user's key press operation gradually changes to an adult female voice quality, and conversely, if the score decreases, the voice quality of the singing voice changes from an adult male voice quality to a voice quality that is a little harsh, so-called hoarse voice. Furthermore, it is possible to add a voice quality such as a husky voice to or not to depending on the intermediate score. This makes it possible for the user to easily check the level of their performance operation skill as the performance lesson progresses by the voice quality of the singing voice that is emitted, without having to check the display every time.

以上説明した実施形態では、押鍵すべき音符ごとに弾けた又は弾けないを判断し採点途中結果の点数を上下させていたが、いくつかの音符ごとのまとまり（フレーズ）や数小節の採点の平均を取って点数を上下させたり、連続して何回か押鍵できた場合又は押鍵できなかった場合に点数を上下させるようにしてもよい。 In the embodiment described above, the score for the intermediate scoring results was increased or decreased based on whether each note to be pressed was played or not, but the score could also be increased or decreased by taking the average score for a group of several notes (phrases) or several measures, or the score could be increased or decreased if a key was pressed or not several times in succession.

更に、上記のように音符のまとまりで採点する場合には難易度の高い部分で通常より加点するようなボーナスステージや、逆に簡単な場所で間違えた場合には減点を増やすようなペナルティステージを設けてもよい。 Furthermore, when scoring groups of notes as described above, bonus stages may be provided in which more points are added in more difficult sections, or penalty stages may be provided in which more points are deducted if a mistake is made in an easy section.

本実施例では、フォルマントの移動、ホワイトノイズの混合度及び音高を変化させているが、リバーブのような残響効果や声のピッチが揺らぐトレモロのような効果を入れたり、点数が下がるにつれて音高が不安定になっていくような演出を加えてもよい。 In this embodiment, the formants are shifted, and the mixing level and pitch of the white noise are changed, but it is also possible to add effects such as reverberation, or tremolo, which causes the pitch of the voice to fluctuate, or to add effects such as making the pitch more unstable as the score decreases.

本実施例では、次に押鍵すべき鍵に対応するＬＥＤを最大輝度で点灯し、次の次に押鍵すべき鍵に対応するＬＥＤを最大輝度の半分の輝度で点灯させるようにしたが、ＬＥＤの輝度を一定とし、同じ輝度において、次に押鍵すべき鍵に対応するＬＥＤを点灯させ、次の次に押鍵すべき鍵に対応するＬＥＤを点滅させることで、識別できるようにしてもよい。 In this embodiment, the LED corresponding to the next key to be pressed is lit at maximum brightness, and the LED corresponding to the key after that is lit at half the maximum brightness, but it is also possible to distinguish between the keys by keeping the LED brightness constant and lighting the LED corresponding to the next key to be pressed at the same brightness, and blinking the LED corresponding to the key after that.

本実施例では、電子鍵盤楽器１００として実施したが、その他の楽器形態、例えばギター型や管楽器型の電子楽器で本発明が実施されてもよい。 In this embodiment, the electronic keyboard instrument 100 is used, but the present invention may also be implemented in other types of musical instruments, such as guitar-type or wind-type electronic musical instruments.

本実施例では最高点を女声、最低点をダミ声としているがこれら声質の選択は他にも様々な組み合わせが考えられることは言うまでもない。 In this embodiment, the highest score is a female voice and the lowest score is a hoarse voice, but it goes without saying that there are many other possible combinations for selecting these voice qualities.

本実施例では、目標とする声を一般的な女性の声としているが、特定の人物の声をモデル化した音響モデルを使用してもよい。例えば著名な歌手の歌唱を学習した音響モデルを使用すれば、演奏が上達するにつれ、あこがれの歌手の声質に近づいていくような演出が可能となり、更に効果的な演奏レッスンを行えるようになる。 In this embodiment, the target voice is a typical female voice, but an acoustic model that models the voice of a specific person may also be used. For example, if an acoustic model that has been trained to sing the voice of a famous singer is used, it will be possible to produce a voice that approaches the voice of the singer you admire as your performance improves, making performance lessons even more effective.

本実施例では、声質パラメータとしてＬＳＦを採用したが、図３の合成フィルタ部３０５をフィルタバンクにより実現した場合には、フィルタバンクを構成する各フィルタの増幅率をフォルマント形状と見做し、各フィルタバンクの利得について声質の補間処理を実施することも可能である。 In this embodiment, LSF is used as the voice quality parameter, but if the synthesis filter unit 305 in FIG. 3 is realized by a filter bank, it is also possible to regard the amplification factor of each filter constituting the filter bank as a formant shape and perform voice quality interpolation processing for the gain of each filter bank.

更に、下記特許文献に記載の方法による音声の周波数振幅成分に対し移動平均フィルタをかけることにより生成される周波数振幅概形を声質パラメータと見做して、周波数領域において補間処理を実施することも可能である。その他声質の変更に関して実施例の記載に関わらず種々の方法を採用することができる。
（特許文献）：特開２００５－０８４６６１号公報 Furthermore, it is also possible to perform an interpolation process in the frequency domain by regarding the frequency amplitude outline generated by applying a moving average filter to the frequency amplitude components of the voice according to the method described in the following patent document as a voice quality parameter. Various other methods for changing the voice quality can be adopted regardless of the description in the embodiments.
(Patent Document): JP 2005-084661 A

本実施例では、図３の合成フィルタ部３０５への励振源信号としての音源信号は、図３の音声合成ＬＳＩ２０５内部において、音源生成部３０４が、音響モデル部３０１から出力される音高情報３１０に基づいて生成しているが、他の実施形態として、合成フィルタ部３０５に入力する音源信号を、図２の音源ＬＳＩ２０４から供給するようにしてもよい。 In this embodiment, the sound source signal as the excitation source signal to the synthesis filter unit 305 in FIG. 3 is generated by the sound source generation unit 304 inside the speech synthesis LSI 205 in FIG. 3 based on the pitch information 310 output from the acoustic model unit 301. However, in another embodiment, the sound source signal input to the synthesis filter unit 305 may be supplied from the sound source LSI 204 in FIG. 2.

図１５は、上記構成を実現するための、音声合成ＬＳＩ内の音声合成部３００内の発声モデル部３０３の他の構成例を示すブロック図である。前述した図７のステップＳ７０４の押鍵・離鍵処理における図１４のステップＳ１００４において、ＣＰＵ２０１は、ＲＡＭ２０３上の現在のＳｏｎｇＩｎｄｅｘ値によって参照されるＲＡＭ２０３上のＥｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ］に含まれる歌詞情報と音高情報を歌声データ２１５として有する発声イベントを生成し、ＲＡＭ２０３の発行イベント領域にセットする。これと共に、ＣＰＵ２０１は、同じくＥｖｅｎｔ［ＳｏｎｇＩｎｄｅｘ］に含まれる音高情報と共に所定の発音チャネル指定を発音制御データ２１６（図２参照）として有する発声音源指定用発音イベントを生成し、ＲＡＭ２０３の発行イベント領域にセットする。 Figure 15 is a block diagram showing another example of the configuration of the vocalization model unit 303 in the voice synthesis unit 300 in the voice synthesis LSI to realize the above configuration. In step S1004 in Figure 14 during the key press/release process in step S704 in Figure 7 described above, the CPU 201 generates a vocalization event having lyrics information and pitch information contained in Event [SongIndex] on RAM 203 referenced by the current SongIndex value on RAM 203 as singing voice data 215, and sets this in the issued event area of RAM 203. At the same time, the CPU 201 generates a vocalization event for specifying a vocalization sound source having a predetermined vocalization channel specification as vocalization control data 216 (see Figure 2) together with the pitch information also contained in Event [SongIndex], and sets this in the issued event area of RAM 203.

上記処理を受けて、ＣＰＵ２０１は、ステップＳ７０８の発声・発音処理において、ＲＡＭ２０３上の発行イベント領域に保持されている上記発声イベントを図２の音声合成ＬＳＩ２０５に対して発行すると共に、上記発声音源指定用発音イベントを図２の音源ＬＳＩ２０４に対して発行する。 In response to the above processing, in the speaking/pronunciation processing of step S708, the CPU 201 issues the speaking event stored in the issued event area on the RAM 203 to the voice synthesis LSI 205 of FIG. 2, and issues the pronunciation event for specifying the speaking sound source to the sound source LSI 204 of FIG. 2.

この結果、音源ＬＳＩ２０４は、上記発声音源指定用発音イベントによって指定されている特定の音源チャネル（複数チャネルでもよい）を使って、上記発声音源指定用発音イベントに含まれる音高情報に対応する音高を有する発声音源用楽音出力データ１５０１を生成し、図２では特には図示しない信号経路を介して図２の音声合成ＬＳＩ２０５に入力させる。 As a result, the sound source LSI 204 uses a specific sound source channel (or multiple channels) specified by the pronunciation event for specifying the vocal sound source to generate musical sound output data 1501 for the vocal sound source having a pitch corresponding to the pitch information included in the pronunciation event for specifying the vocal sound source, and inputs the data to the voice synthesis LSI 205 in FIG. 2 via a signal path not specifically shown in FIG. 2.

演奏者による演奏レッスンに基づいて音源ＬＳＩ２０４が生成、出力する上記発声音源用楽音出力データ１５０１が、図３の音声合成ＬＳＩ２０５において、音響モデル部３０１からフォルマント補間処理部３０６を介して入力する目標スペクトル情報３１２に基づいて合成フィルタ部３０５にて形成されるデジタルフィルタに入力することにより、合成フィルタ部３０５から歌声音声出力データ２１７が出力される。 The vocal sound source musical sound output data 1501 generated and output by the sound source LSI 204 based on the performer's performance lessons is input to a digital filter formed by the synthesis filter unit 305 based on the target spectrum information 312 input from the acoustic model unit 301 via the formant interpolation processing unit 306 in the voice synthesis LSI 205 of Figure 3, and singing voice output data 217 is output from the synthesis filter unit 305.

このようにして生成、出力される歌声音声出力データ２１７は、音源ＬＳＩ２０４で生成された楽器音を音源信号としている。このため、歌い手の歌声と比べると、忠実性は若干失われるが、音源ＬＳＩ２０４で設定された楽器音の雰囲気が良く残ると共に、歌い手の歌声の声質も良く残った歌声となり、効果的な歌声音声出力データ２１７を出力させることが可能となる。 The singing voice output data 217 thus generated and output uses the instrument sound generated by the sound source LSI 204 as the sound source signal. Therefore, compared to the singer's singing voice, the fidelity is slightly lost, but the atmosphere of the instrument sound set by the sound source LSI 204 is well preserved, and the vocal quality of the singer's singing voice is also well preserved, making it possible to output effective singing voice output data 217.

更に、発声音源用楽音出力データ１５０１としては、複数チャネルを用いたポリフォニック動作も可能であるため、その場合には複数の歌声がハモるような効果を奏することも可能となる。 Furthermore, the vocal sound source musical sound output data 1501 can also operate polyphonically using multiple channels, making it possible to produce the effect of multiple singing voices harmonizing together.

なお、発声音源用楽音出力データ１５０１としては、どのような波形信号でもよいが、音源信号としての性質上、倍音成分を多く含み、かつ長く持続する、例えばブラス音、ストリング音、オルガン音のような楽器音が好ましい。勿論、大きな効果を狙って、このような基準に全く従わないような楽器音、例えば動物の鳴き声のような楽器音が使用されても、非常におもしろい効果が得られる。具体的な実施例として、例えば愛犬の鳴き声をサンプリングして得られた波形データを用いた発声音源用楽音出力データ１５０１が合成フィルタ部３１０に入力されてもよい。そうすると、まるで愛犬が歌詞を歌っているように聞こえるという非常におもしろい効果が得られる。 Note that the vocal sound source musical sound output data 1501 may be any waveform signal, but due to its nature as a sound source signal, it is preferable to use instrument sounds such as brass, string, and organ sounds that contain many harmonic components and last for a long time. Of course, if an instrument sound that does not follow such standards at all, such as an animal's cry, is used to achieve a greater effect, a very interesting effect can be obtained. As a specific example, the vocal sound source musical sound output data 1501 using waveform data obtained by sampling the barking of a pet dog may be input to the synthesis filter unit 310. This will produce a very interesting effect, as if the pet dog is singing the lyrics.

以上、開示の実施形態とその利点について詳しく説明したが、当業者は、特許請求の範囲に明確に記載した本発明の範囲から逸脱することなく、様々な変更、追加、省略をすることができる。 Although the disclosed embodiments and their advantages have been described in detail above, those skilled in the art may make various modifications, additions, and omissions without departing from the scope of the present invention as clearly set forth in the claims.

その他、本発明は上述した実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、上述した実施形態で実行される機能は可能な限り適宜組み合わせて実施しても良い。上述した実施形態には種々の段階が含まれており、開示される複数の構成要件による適宜の組み合せにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、効果が得られるのであれば、この構成要件が削除された構成が発明として抽出され得る。 In addition, the present invention is not limited to the above-described embodiment, and various modifications can be made in the implementation stage without departing from the gist of the invention. Furthermore, the functions performed in the above-described embodiment may be implemented in appropriate combinations as much as possible. The above-described embodiment includes various steps, and various inventions can be extracted by appropriate combinations of the multiple components disclosed. For example, if the effect can be obtained even if some components are deleted from all the components shown in the embodiment, then the configuration from which these components are deleted can be extracted as an invention.

以上の実施形態に関して、更に以下の付記を開示する。
（付記１）
演奏者の演奏情報を取得する演奏情報取得手段と、
歌詞情報と音高情報とタイミング情報を少なくとも含む演奏ガイドデータと前記演奏情報から、楽曲の進行中に演奏者の演奏を異なるタイミングで複数回評価する演奏評価手段と、
前記演奏情報と前記歌詞情報に基づき、歌詞を歌声で発声する歌声発声手段と、
前記演奏評価手段の評価が前回の評価から変化した場合に、変化した評価に対応して前記歌声の声質を変更する声質変更手段と、
を備える電子楽器。
（付記２）
前記演奏ガイドデータに基づき、演奏者に楽曲の演奏をガイドする演奏ガイド手段を更に備える付記１に記載の電子楽器。
（付記３）
更に複数の発光素子を備え、
前記演奏ガイド手段は、前記演奏ガイドデータに含まれるタイミング情報に対応したタイミングで、前記演奏ガイドデータに含まれる音高情報に対応する発光素子を発光する、
付記２に記載の電子楽器。
（付記４）
前記声質変更手段は、前記演奏評価手段による複数の特定評価に対応する複数の声質の間を、前記演奏評価手段による楽曲進行中の評価に応じた割合で、補間する、付記１乃至３の何れかに記載の電子楽器。
（付記５）
前記声質変更手段は、人声のフォルマント成分と人声に混合するノイズ成分の割合を変更する、付記１乃至４のいずれかに記載の電子楽器。
（付記６）
電子楽器のプロセッサに、
演奏者の演奏情報を取得する演奏情報取得処理と、
歌詞情報と音高情報とタイミング情報を少なくとも含む演奏ガイドデータと前記演奏情報から、楽曲の進行中に演奏者の演奏を異なるタイミングで複数回評価する演奏評価処理と、
前記演奏情報と前記歌詞情報に基づき、歌詞を歌声で発声する歌声発声処理と、
前記演奏評価手段の評価が前回の評価から変化した場合に、変化した評価に対応して前記歌声の声質を変更する声質変更処理と、
を実行させるための電子楽器の制御方法。
（付記７）
電子楽器のプロセッサに、
演奏者の演奏情報を取得する演奏情報取得処理と、
歌詞情報と音高情報とタイミング情報を少なくとも含む演奏ガイドデータと前記演奏情報から、楽曲の進行中に演奏者の演奏を異なるタイミングで複数回評価する演奏評価処理と、
前記演奏情報と前記歌詞情報に基づき、歌詞を歌声で発声する歌声発声処理と、
前記演奏評価手段の評価が前回の評価から変化した場合に、変化した評価に対応して前記歌声の声質を変更する声質変更処理と、
を実行させるためのプログラム。 The following supplementary notes are further disclosed regarding the above-described embodiment.
(Appendix 1)
A performance information acquisition means for acquiring performance information of a performer;
a performance evaluation means for evaluating the performance of a performer multiple times at different timings during the progress of a piece of music based on performance guide data including at least lyric information, pitch information, and timing information and the performance information;
a singing voice producing means for producing lyrics by singing voice based on the performance information and the lyrics information;
a voice quality changing means for changing the voice quality of the singing voice in response to a change in the evaluation of the performance evaluation means when the evaluation of the performance evaluation means has changed from a previous evaluation;
An electronic musical instrument comprising:
(Appendix 2)
2. The electronic musical instrument according to claim 1, further comprising a performance guide means for guiding a performer in playing a piece of music based on the performance guide data.
(Appendix 3)
Further comprising a plurality of light emitting elements,
the performance guide means causes a light emitting element corresponding to pitch information included in the performance guide data to emit light at a timing corresponding to timing information included in the performance guide data;
3. The electronic musical instrument according to claim 2.
(Appendix 4)
4. The electronic musical instrument according to claim 1, wherein the voice quality changing means interpolates between a plurality of voice qualities corresponding to a plurality of specific evaluations by the performance evaluation means at a ratio according to the evaluation by the performance evaluation means during the progression of the musical piece.
(Appendix 5)
5. The electronic musical instrument according to claim 1, wherein the voice quality changing means changes a ratio of a formant component of a human voice to a noise component mixed with the human voice.
(Appendix 6)
Electronic musical instrument processors,
A performance information acquisition process for acquiring performance information of a performer;
a performance evaluation process for evaluating the performance of a performer multiple times at different timings during the progression of a piece of music based on performance guide data including at least lyrics information, pitch information, and timing information and the performance information;
a singing voice production process for producing lyrics by singing voice based on the performance information and the lyrics information;
a voice quality change process for changing the voice quality of the singing voice in response to a change in the evaluation of the performance evaluation means when the evaluation of the performance evaluation means has changed from a previous evaluation;
A method for controlling an electronic musical instrument to execute the above.
(Appendix 7)
Electronic musical instrument processors,
A performance information acquisition process for acquiring performance information of a performer;
a performance evaluation process for evaluating the performance of a performer multiple times at different timings during the progression of a piece of music based on performance guide data including at least lyrics information, pitch information, and timing information and the performance information;
a singing voice production process for producing lyrics by singing voice based on the performance information and the lyrics information;
a voice quality change process for changing the voice quality of the singing voice in response to a change in the evaluation of the performance evaluation means when the evaluation of the performance evaluation means has changed from a previous evaluation;
A program for executing.

１００電子鍵盤楽器
１０１鍵盤
１０２第１のスイッチパネル
１０３第２のスイッチパネル
１０４ＬＥＤ
２００制御システム
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４音源ＬＳＩ
２０５音声合成ＬＳＩ
２０６キースキャナ
２０７ＬＥＤコントローラ
２０８ネットワークインタフェース
２０９システムバス
２１０タイマ
２１１、２１２Ｄ／Ａコンバータ
２１３ミキサ
２１４アンプ
２１５歌声データ
２１６発音制御データ
２１７歌声音声出力データ
２１８楽音出力データ
３００音声合成部
３０１音響モデル部
３０２テキスト解析部
３０３発声モデル部
３０４音源生成部
３０５合成フィルタ部
３０６フォルマント補間処理部
３０７ノイズ重畳部
３０８言語特徴量系列
３０９スペクトル情報
３１０音源情報
３１１練習進行度データ
３１２目標スペクトル情報
３１３フィルタ出力データ
３１４ノイズ混合比
３１５ノイズデータ
３１６ノイズ混合比補間処理部
３１７目標ノイズ混合比
１５０１発声音源用楽音出力データ 100 Electronic keyboard instrument 101 Keyboard 102 First switch panel 103 Second switch panel 104 LED
200 Control system 201 CPU
202 ROM
203 RAM
204 Sound source LSI
205 Voice synthesis LSI
206 Key scanner 207 LED controller 208 Network interface 209 System bus 210 Timer 211, 212 D/A converter 213 Mixer 214 Amplifier 215 Singing voice data 216 Sound generation control data 217 Singing voice output data 218 Musical sound output data 300 Voice synthesis unit 301 Acoustic model unit 302 Text analysis unit 303 Vocalization model unit 304 Sound source generation unit 305 Synthesis filter unit 306 Formant interpolation processing unit 307 Noise superimposition unit 308 Language feature sequence 309 Spectral information 310 Sound source information 311 Practice progress data 312 Target spectral information 313 Filter output data 314 Noise mixture ratio 315 Noise data 316 Noise mixture ratio interpolation processing unit 317 Target noise mixture ratio 1501 Musical tone output data for vocal sound source

Claims

A performance information acquisition means for acquiring performance information of a performer;
a performance evaluation means for evaluating the performance of a performer multiple times at different timings during the progress of a piece of music based on performance guide data including at least lyric information, pitch information, and timing information and the performance information;
a singing voice producing means for producing lyrics by singing voice based on the performance information and the lyrics information;
a voice quality changing means for changing the voice quality of the singing voice in response to a change in the evaluation of the performance evaluation means when the evaluation of the performance evaluation means has changed from a previous evaluation;
An electronic musical instrument comprising:

The electronic musical instrument according to claim 1, further comprising a performance guide means for guiding the performer in playing a piece of music based on the performance guide data.

Further comprising a plurality of light emitting elements,
the performance guide means causes a light emitting element corresponding to pitch information included in the performance guide data to emit light at a timing corresponding to timing information included in the performance guide data;
3. The electronic musical instrument according to claim 2.

The electronic musical instrument according to any one of claims 1 to 3, wherein the voice quality change means interpolates between a plurality of voice qualities corresponding to a plurality of specific evaluations by the performance evaluation means at a ratio according to the evaluation by the performance evaluation means during the progression of the piece of music.

The electronic musical instrument according to any one of claims 1 to 4, wherein the voice quality change means changes the ratio of formant components of the human voice and noise components mixed with the human voice.

Electronic musical instrument processors,
A performance information acquisition process for acquiring performance information of a performer;
a performance evaluation process for evaluating the performance of a performer multiple times at different timings during the progression of a piece of music based on performance guide data including at least lyrics information, pitch information, and timing information and the performance information;
a singing voice production process for producing lyrics by singing voice based on the performance information and the lyrics information;
a voice quality change process for changing the voice quality of the singing voice in response to a change in evaluation of the performance evaluation process from a previous evaluation, when the evaluation of the performance evaluation process has changed from a previous evaluation;
A method for controlling an electronic musical instrument to execute the above.

Electronic musical instrument processors,
A performance information acquisition process for acquiring performance information of a performer;
a performance evaluation process for evaluating the performance of a performer multiple times at different timings during the progression of a piece of music based on performance guide data including at least lyrics information, pitch information, and timing information and the performance information;
a singing voice production process for producing lyrics by singing voice based on the performance information and the lyrics information;
a voice quality change process for changing the voice quality of the singing voice in response to a change in evaluation of the performance evaluation process from a previous evaluation, when the evaluation of the performance evaluation process has changed from a previous evaluation;
A program for executing.