JP5494083B2

JP5494083B2 - Karaoke equipment

Info

Publication number: JP5494083B2
Application number: JP2010066155A
Authority: JP
Inventors: 健太郎長谷見
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-03-23
Filing date: 2010-03-23
Publication date: 2014-05-14
Anticipated expiration: 2030-03-23
Also published as: JP2011197533A

Description

本発明は、カラオケ曲のパート別にエフェクト処理を施す技術に関する。 The present invention relates to a technique for performing effect processing for each part of a karaoke song.

カラオケ装置の普及と利用者の歌唱技術の向上に伴い、主旋律パート以外のパート（ハーモニー旋律パート）を歌唱するニーズが高まっている。しかし、ハーモニー旋律パートは主旋律パートに比べて旋律が難しいことが多いため、ハーモニー旋律パートを歌唱する利用者の歌唱を補助する技術が提案されている（特許文献１参照）。この技術によれば、複数のマイクのうちどのマイクからハーモニー旋律パートを歌唱する歌唱音声信号が出力されているかが判別され、ハーモニー旋律パートを歌唱する歌唱音声信号と判別された歌唱音声信号にはピッチ修正やコーラス効果の付加がなされる。その結果、当該技術によれば、ハーモニー旋律パートの歌唱がより的確で、かつ、よりバックコーラスらしく聞こえるようになる。 With the widespread use of karaoke equipment and the improvement of users' singing techniques, there is an increasing need to sing parts other than the main melody part (harmonic melody part). However, since the harmony melody part is often more difficult to melody than the main melody part, a technique for assisting a user who sings the harmony melody part has been proposed (see Patent Document 1). According to this technology, it is determined which singing voice signal for singing the harmony melody part is output from which microphone among a plurality of microphones, and the singing voice signal determined as the singing voice signal for singing the harmony melody part is Pitch correction and chorus effects are added. As a result, according to the technique, the singing of the harmony melody part can be heard more accurately and more like a back chorus.

特開平１０−１６１６７２号公報Japanese Patent Laid-Open No. 10-161672

しかし、上記の技術を利用した場合、ハーモニー旋律パートの歌唱者の子音の発音が強い場合や、声の音量が大きい場合に、主旋律パートの歌唱が埋もれてしまう可能性がある。特に上記の技術によれば、コーラス効果が付加されるため、ハーモニー旋律パートの歌唱音に厚みや広がりが出て、より主旋律パートの歌唱が埋もれてしまう可能性が高くなる。
本発明は、このような事情に鑑みてなされたものであり、主旋律パートの歌唱がハーモニー旋律パートの歌唱に埋没することを防ぐことを目的とする。 However, when the above technique is used, there is a possibility that the singing of the main melody part is buried when the consonant pronunciation of the singer of the harmony melody part is strong or when the volume of the voice is high. In particular, according to the technique described above, since the chorus effect is added, the singing sound of the harmony melody part is increased in thickness and spread, and the possibility that the singing of the main melody part is buried more increases.
This invention is made | formed in view of such a situation, and it aims at preventing the song of a main melody part being buried in the song of a harmony melody part.

上記の課題を解決するため、本発明は、カラオケ曲の主旋律データを記憶する記憶手段と、収音した音声を音声信号として出力する第１及び第２の収音手段により出力された第１及び第２の音声信号と、前記記憶手段に記憶された主旋律データとを比較して、一致度を判定する判定手段と、前記判定手段の判定結果に基づき、前記第１及び第２の音声信号のうち、前記一致度の高い音声信号を主旋律の音声信号と識別し、かつ、他方の音声信号をハーモニー旋律の音声信号と識別する識別手段と、前記識別手段により主旋律の音声信号と識別された音声信号について、第１の閾値以上の周波数帯域に、当該音声信号の倍音成分を付加する処理を行う第１の修正手段と、前記識別手段によりハーモニー旋律の音声信号と識別された音声信号について、第２の閾値以上の周波数帯域において音量を減少させる処理を行う第２の修正手段と、前記第１の修正手段により処理が行われた主旋律の音声信号と、前記第２の修正手段により処理が行われたハーモニー旋律の音声信号とを増幅して出力する出力手段とを有することを特徴とするカラオケ装置を提供する。 In order to solve the above-described problems, the present invention provides first and second sound output by a storage means for storing main melody data of karaoke music and first and second sound collection means for outputting the collected sound as an audio signal. The second sound signal is compared with the main melody data stored in the storage means to determine the degree of coincidence, and based on the determination result of the determination means, the first and second sound signals Among them, the voice signal having the high degree of coincidence is identified as the voice signal of the main melody and the voice signal identified as the voice signal of the main melody by the identification means for identifying the other voice signal as the voice signal of the harmony melody For the signal, a first correction unit that performs processing for adding a harmonic component of the audio signal to a frequency band equal to or higher than a first threshold, and an audio signal that is identified as a harmony melody audio signal by the identification unit. A second correcting means for performing a process of reducing the volume in a frequency band equal to or higher than a second threshold; a main melody audio signal processed by the first correcting means; and the second correcting means. There is provided a karaoke apparatus comprising output means for amplifying and outputting a sound signal of a harmony melody that has been processed.

好ましい態様において、上記カラオケ装置は、前記識別手段により主旋律の音声信号と識別される音声信号の特性を判別する判別手段をさらに有し、前記第１の修正手段の処理と前記第２の修正手段の処理のうち少なくとも１の処理が、前記判別手段により判別された特性に基づいて行われてもよい。 In a preferred aspect, the karaoke apparatus further includes a discriminating unit that discriminates a characteristic of the audio signal that is discriminated from the main melodic audio signal by the discriminating unit, and the processing of the first correcting unit and the second correcting unit. At least one of the processes may be performed based on the characteristics determined by the determination unit.

また、別の好ましい態様において、上記のカラオケ装置は、前記識別手段によりハーモニー旋律の音声信号と識別される音声信号の特性を判別する判別手段をさらに有し、前記第１の修正手段の処理と前記第２の修正手段の処理のうち少なくとも１の処理が、前記判別手段により判別された特性に基づいて行われてもよい。 Moreover, in another preferable aspect, the karaoke apparatus further includes a determination unit that determines a characteristic of the audio signal that is identified as the audio signal of the harmony melody by the identification unit, and the processing of the first correction unit; At least one of the processes of the second correcting unit may be performed based on the characteristics determined by the determining unit.

本発明によれば、主旋律パートの歌唱音声信号とハーモニー旋律パートの歌唱音声信号に別々にエフェクト処理を施すことにより、主旋律パートの歌唱がハーモニー旋律パートの歌唱に埋没することを防ぐことができる。 According to the present invention, by performing effect processing separately on the singing voice signal of the main melody part and the singing voice signal of the harmony melody part, the song of the main melody part can be prevented from being buried in the song of the harmony melody part.

本発明の一実施形態に係るカラオケシステムのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the karaoke system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る楽曲データの構成を示す図である。It is a figure which shows the structure of the music data which concern on one Embodiment of this invention. 本発明の一実施形態に係る機能構成図である。It is a functional lineblock diagram concerning one embodiment of the present invention. 本発明の一変形例に係る機能構成図である。It is a functional block diagram concerning one modification of the present invention.

＜実施形態＞
図１は、本発明の一実施形態に係るカラオケシステムのハードウェア構成を示す図である。同図においてカラオケ装置１は、２本のマイク２Ａ及び２Ｂと、スピーカ３と、ディスプレイ４に接続されている。マイク２Ａ及び２Ｂは、歌唱者の歌唱音声を収音し、アナログ信号として後述するＡ／Ｄコンバータ１１Ａ又は１１Ｂに出力する。スピーカ３は、後述するミキサ１４からステレオ信号を取得し、当該信号に基づいて音声を出力する。ディスプレイ４は、液晶ディスプレイ等の表示デバイスであって、後述するＶＤＰ（Video Display Processor）１７の制御の下、イメージ画像や歌詞テロップを表示する。 <Embodiment>
FIG. 1 is a diagram showing a hardware configuration of a karaoke system according to an embodiment of the present invention. In the figure, the karaoke apparatus 1 is connected to two microphones 2A and 2B, a speaker 3, and a display 4. The microphones 2A and 2B collect the singer's singing voice and output it to an A / D converter 11A or 11B described later as an analog signal. The speaker 3 acquires a stereo signal from a mixer 14 described later, and outputs sound based on the signal. The display 4 is a display device such as a liquid crystal display, and displays image images and lyrics telops under the control of a VDP (Video Display Processor) 17 described later.

カラオケ装置１は、図１に示されるように、Ａ／Ｄコンバータ１１Ａ並びに１１Ｂ、エフェクタ１２Ａ並びに１２Ｂ、Ｄ／Ａコンバータ１３Ａ並びに１３Ｂ、ミキサ１４、制御部１５、記憶部１６、ＶＤＰ１７、操作部１８及び音源１９を有している。Ａ／Ｄコンバータ１１Ａ及び１１Ｂは、上述のマイク２Ａ又は２Ｂから歌唱音声のアナログ信号を取得し、当該信号をデジタル信号に変換してエフェクタ１２Ａ又は１２Ｂに出力する。Ａ／Ｄコンバータ１１Ａ及び１１Ｂは、当該デジタル信号を制御部１５にも出力する。エフェクタ１２Ａ及び１２Ｂは、ＤＳＰ（Digital Signal Processor）であり、Ａ／Ｄコンバータ１１Ａ又は１１Ｂから取得したデジタル信号に、後述するエフェクト処理を施し、Ｄ／Ａコンバータ１３Ａ又は１３Ｂに出力する。Ｄ／Ａコンバータ１３Ａ及び１３Ｂは、エフェクタ１２Ａ又は１２Ｂからデジタル信号を取得し、当該信号をアナログ信号に変換してミキサ１４に出力する。ミキサ１４は、エフェクタ１２Ａ及び１２Ｂから取得したアナログ信号と、音源１９から取得する楽音信号とを混合、増幅し、ステレオ信号として上述のスピーカ３に出力する。 As shown in FIG. 1, the karaoke apparatus 1 includes A / D converters 11A and 11B, effectors 12A and 12B, D / A converters 13A and 13B, a mixer 14, a control unit 15, a storage unit 16, a VDP 17, and an operation unit 18. And a sound source 19. The A / D converters 11A and 11B acquire an analog signal of singing voice from the microphone 2A or 2B described above, convert the signal into a digital signal, and output the digital signal to the effector 12A or 12B. The A / D converters 11A and 11B also output the digital signal to the control unit 15. The effectors 12A and 12B are DSPs (Digital Signal Processors), which perform an effect process described later on the digital signal acquired from the A / D converter 11A or 11B, and output it to the D / A converter 13A or 13B. The D / A converters 13A and 13B acquire a digital signal from the effector 12A or 12B, convert the signal into an analog signal, and output the analog signal to the mixer 14. The mixer 14 mixes and amplifies the analog signal acquired from the effectors 12A and 12B and the musical sound signal acquired from the sound source 19, and outputs the mixed signal to the speaker 3 as a stereo signal.

制御部１５は、ＣＰＵ、ＲＯＭ、ＲＡＭ等からなり、ＣＰＵがＲＯＭに記憶されているプログラムをＲＡＭにロードして実行することにより、カラオケ装置１の各部を制御する。記憶部１６は、ＨＤＤ（Hard Disk Drive）等の記憶装置であって、カラオケ曲の楽曲データを複数記憶する。各楽曲データは、例えばＭＩＤＩ形式のデータであり、図２に示されるように、ヘッダ、楽音トラック、主旋律トラック、ハーモニー旋律トラック、歌詞トラック、イメージ画像トラック等を有する。ヘッダには、楽曲の曲名、ジャンル、演奏時間等の楽曲の属性データが書き込まれる。各トラックには、それぞれ楽音データ、主旋律データ、ハーモニー旋律データ、歌詞データ、イメージ画像データ等が書き込まれる。各トラックを識別するためのパラメータはＲＯＭにも記憶されており、ＣＰＵはこのパラメータを参照することにより各トラックに書き込まれたデータを識別することができる。例えば、いずれのデータが主旋律を表すデータであるかを識別することができる。ＶＤＰ１７は、制御部１５により記憶部１６から読み出された歌詞データ及びイメージ画像データを取得し、ディスプレイ４に歌詞テロップ及びイメージ画像を表示させる。操作部１８は、複数のボタンを有し、押下されたボタンに対応する操作信号を制御部１５に出力する。音源１９は、例えばＭＩＤＩ音源であり、制御部１５により記憶部１６から読み出された楽音データ等を取得し、当該データを楽音信号に変換してミキサ１４に出力する。 The control unit 15 includes a CPU, a ROM, a RAM, and the like. The CPU controls each unit of the karaoke apparatus 1 by loading a program stored in the ROM into the RAM and executing the program. The storage unit 16 is a storage device such as an HDD (Hard Disk Drive) and stores a plurality of song data of karaoke songs. Each piece of music data is, for example, data in MIDI format, and includes a header, a musical tone track, a main melody track, a harmony melody track, a lyrics track, an image image track, and the like as shown in FIG. In the header, song attribute data such as the song title, genre, and performance time are written. In each track, musical tone data, main melody data, harmony melody data, lyrics data, image image data, and the like are written. Parameters for identifying each track are also stored in the ROM, and the CPU can identify data written in each track by referring to the parameters. For example, it is possible to identify which data is data representing the main melody. The VDP 17 acquires lyric data and image image data read from the storage unit 16 by the control unit 15 and causes the display 4 to display lyric telops and image images. The operation unit 18 has a plurality of buttons and outputs an operation signal corresponding to the pressed button to the control unit 15. The sound source 19 is, for example, a MIDI sound source, acquires musical tone data read from the storage unit 16 by the control unit 15, converts the data into a musical tone signal, and outputs it to the mixer 14.

図３は、制御部１５のＣＰＵがＲＯＭに記憶されているプログラムを実行することによって実現される機能の構成図である。当該図面の機能群は、特にエフェクト処理の決定に関する機能である。
差分算出部１５１Ａ及び１５１Ｂは、Ａ／Ｄコンバータ１１Ａ又は１１Ｂから出力される歌唱音声信号を取得し、かつ、記憶部１６から読み出された主旋律データを取得し、両データの差分を算出する。ここで差分とは、両データの音高・音量の差や、発声（発音）タイミングの差等を定量化したものである。この差分を表すデータ（差分データ）は、差分算出部１５１Ａ及び１５１Ｂから、それぞれ採点部１５２Ａ、１５２Ｂに出力される。
採点部１５２Ａ及び１５２Ｂは、差分算出部１５１Ａ又は１５１Ｂから出力される差分データを蓄積し、所定のタイミングで、歌唱音声信号が主旋律データとどの程度一致しているかについて採点を行う。そして、採点結果をポイント値データとして比較部１５３に出力する。採点を行うタイミングは、例えば、カラオケ曲の演奏がスタートし、１フレーズの歌唱が終了したタイミングや、Ａメロが終了したタイミングである。 FIG. 3 is a configuration diagram of functions realized when the CPU of the control unit 15 executes a program stored in the ROM. The function group in the drawing is a function related to the determination of effect processing.
The difference calculation units 151A and 151B acquire the singing voice signal output from the A / D converter 11A or 11B, acquire the main melody data read from the storage unit 16, and calculate the difference between the two data. Here, the difference is a quantification of a difference in pitch and volume between both data, a difference in utterance (pronunciation) timing, and the like. Data representing this difference (difference data) is output from the difference calculation units 151A and 151B to the scoring units 152A and 152B, respectively.
The scoring units 152A and 152B accumulate the difference data output from the difference calculation unit 151A or 151B, and score how much the singing voice signal matches the main melody data at a predetermined timing. Then, the scoring result is output to the comparison unit 153 as point value data. The timing for scoring is, for example, the timing at which the performance of a karaoke song has started and the singing of one phrase has ended, or the timing at which the A melody has ended.

比較部１５３は、採点部１５２Ａ及び１５２Ｂからそれぞれ出力されたポイント値データを取得し、両データを比較することによって、いずれの歌唱音声信号が主旋律データに近似するかを判定する。そして判定結果を示すデータ（判定結果データ）をエフェクト処理決定部１５４に出力する。
エフェクト処理決定部１５４は、比較部１５３から出力された判定結果データに基づいて、エフェクタ１２Ａ及び１２Ｂにより実行されるエフェクト処理を決定する。具体的には、例えば、判定結果データにより、Ａ／Ｄコンバータ１１Ａにより出力された歌唱音声信号の方がＡ／Ｄコンバータ１１Ｂにより出力された歌唱音声信号よりも主旋律データに近似していることが示された場合には、エフェクタ１２Ａに対してはエンハンサ（Enhancer）処理の実行を指示するデータを出力し、かつ、エフェクタ１２Ｂに対してはディエッサ（De-esser）処理の実行を指示するデータを出力する。逆に、Ａ／Ｄコンバータ１１Ｂにより出力された歌唱音声信号の方がＡ／Ｄコンバータ１１Ａにより出力された歌唱音声信号よりも主旋律データに近似していることが示された場合には、エフェクタ１２Ａに対してはディエッサ処理の実行を指示するデータを出力し、かつ、エフェクタ１２Ｂに対してはエンハンサ処理の実行を指示するデータを出力する。 The comparison unit 153 acquires the point value data respectively output from the scoring units 152A and 152B, and compares both data to determine which singing voice signal approximates the main melody data. Then, data indicating the determination result (determination result data) is output to the effect process determination unit 154.
The effect process determination unit 154 determines the effect process to be executed by the effectors 12A and 12B based on the determination result data output from the comparison unit 153. Specifically, for example, based on the determination result data, the singing voice signal output by the A / D converter 11A is closer to the main melody data than the singing voice signal output by the A / D converter 11B. In the case shown, data for instructing the execution of the enhancer process is output to the effector 12A, and the data for instructing the execution of the de-esser process is output to the effector 12B. Output. Conversely, if it is shown that the singing voice signal output by the A / D converter 11B is closer to the main melody data than the singing voice signal output by the A / D converter 11A, the effector 12A In response to this, data instructing execution of the de-esser process is output, and data instructing execution of the enhancer process is output to the effector 12B.

ここでエンハンサ処理とは、所定の閾値以上の周波数帯域において原音を歪ませて（ディストーション処理を施して）倍音成分のみを取り出し、原音に付加する処理である（ハーモニックエンハンサ処理）。あるいは、所定の閾値以上の周波数帯域において原音に当該原音の位相をずらしたものを付加し、コムフィルタ効果によって得られる倍音成分を取り出し、原音に付加する処理である（フェイズエンハンサ処理）。主に高音域で使用され、いずれのエンハンサ処理を施した場合でも、音の輪郭をはっきりさせることができる。
エンハンサ処理の内容を規定するパラメータ（エフェクトパラメータ）には、スレッショルド周波数（Threshold Frequency）、レベル（Level）、ミックスバランス（Mix Balance）等がある。スレッショルド周波数は、倍音成分を付加する周波数帯域を指定する値であり、スレッショルド周波数として指定された値以上の周波数帯域において倍音成分が付加される。スレッショルド周波数は、例えば、１〜１０ｋＨｚの範囲内で設定される。レベルは、倍音成分を付加する度合いを指定する値である。ミックスバランスは、原音と倍音成分を混合する割合を指定する値である。これらの値は予め、上述の操作部１８や図示せぬリモコン等を操作して設定しておくことができる。上記の例においてエフェクト処理決定部１５４は、エフェクタ１２Ａ又は１２Ｂに対してエンハンサ処理の実行を指示する際、予め設定しておいたエフェクトパラメータも出力する。 Here, the enhancer process is a process in which the original sound is distorted in a frequency band equal to or higher than a predetermined threshold (distortion process is performed), and only the overtone component is extracted and added to the original sound (harmonic enhancer process). Alternatively, it is a process of adding an original sound whose phase is shifted in a frequency band equal to or higher than a predetermined threshold, extracting a harmonic component obtained by the comb filter effect, and adding it to the original sound (phase enhancer process). It is mainly used in the high sound range, and the outline of the sound can be made clear even if any enhancer processing is applied.
Parameters (effect parameters) that define the contents of enhancer processing include threshold frequency, threshold level, mix balance, and the like. The threshold frequency is a value that specifies a frequency band to which a harmonic component is added, and a harmonic component is added in a frequency band that is equal to or higher than the value specified as the threshold frequency. The threshold frequency is set within a range of 1 to 10 kHz, for example. The level is a value that specifies the degree to which a harmonic component is added. The mix balance is a value that specifies the ratio of mixing the original sound and the overtone component. These values can be set in advance by operating the operation unit 18 or a remote controller (not shown). In the above example, when the effect process determining unit 154 instructs the effector 12A or 12B to execute the enhancer process, the effect process determining unit 154 also outputs a preset effect parameter.

一方、ディエッサ処理とは、所定の閾値以上の周波数帯域において音量を抑制する処理である。主に歯擦音の周波数帯域で使用され、子音の発音を目立たなくさせることができるという効果が得られる。
ディエッサ処理の内容を規定するパラメータには、スレッショルド（Threshold）、レシオ（Ratio）、カットオフ周波数（Cutoff Frequency）等がある。スレッショルドは、抑制される音量のレベルを指定する値であり、スレッショルドとして指定された値以上の音量が抑制される。スレッショルドは、例えば、２〜１０ｋＨｚの範囲内で設定される。レシオは、音量が抑制される度合いを指定する値である。カットオフ周波数は、音量が抑制される周波数帯域を指定する値であり、カットオフ周波数として指定された値以上の周波数帯域において音量が抑制される。これらの値もまた、エンハンサ処理のパラメータと同様に、予め設定しておくことが可能である。上記の例においてエフェクト処理決定部１５４は、エフェクタ１２Ａ又は１２Ｂに対してディエッサ処理の実行を指示する際、予め設定しておいたエフェクトパラメータも出力する。 On the other hand, the de-esser process is a process for suppressing the volume in a frequency band equal to or higher than a predetermined threshold. It is mainly used in the frequency band of sibilance, and the effect that the consonant pronunciation can be made inconspicuous is obtained.
Parameters that define the contents of the de-esser processing include a threshold (Threshold), a ratio (Ratio), and a cutoff frequency (Cutoff Frequency). The threshold is a value that specifies the level of the volume to be suppressed, and a volume that is equal to or higher than the value specified as the threshold is suppressed. The threshold is set within a range of 2 to 10 kHz, for example. The ratio is a value that specifies the degree to which the volume is suppressed. The cutoff frequency is a value that designates a frequency band in which the volume is suppressed, and the volume is suppressed in a frequency band that is equal to or higher than the value specified as the cutoff frequency. These values can also be set in advance in the same manner as enhancer processing parameters. In the above example, when the effect processing determining unit 154 instructs the effector 12A or 12B to execute the de-esser processing, the effect processing determining unit 154 also outputs preset effect parameters.

エフェクト処理の実行を指示するデータと当該処理のパラメータを取得したエフェクタ１２Ａ及び１２Ｂは、取得したパラメータに基づいて実行を指示されたエフェクト処理を実行する。具体的には、例えば、エフェクタ１２Ａがエンハンサ処理の実行を指示された場合には、エフェクタ１２Ａは、Ａ／Ｄコンバータ１１Ａから出力された歌唱音声信号について、指定された閾値以上の周波数帯域において原音を歪ませて倍音成分のみを取り出し、原音に付加する処理を行う。また、エフェクタ１２Ｂがディエッサ処理の実行を指示された場合には、Ａ／Ｄコンバータ１１Ｂから出力された歌唱音声信号について、指定された閾値以上の周波数帯域において音量を抑制する処理を行う。 The effectors 12A and 12B that have acquired the data instructing execution of the effect process and the parameters of the process execute the effect process instructed to execute based on the acquired parameters. Specifically, for example, when the effector 12A is instructed to execute the enhancer process, the effector 12A generates the original sound in the frequency band equal to or higher than the specified threshold for the singing voice signal output from the A / D converter 11A. Is used to extract only the overtone component and add it to the original sound. When the effector 12B is instructed to execute the de-esser process, the singing voice signal output from the A / D converter 11B is subjected to a process for suppressing the volume in a frequency band equal to or higher than a specified threshold.

以上説明した実施形態によれば、主旋律パートを歌唱する歌唱音声が比較部１５３によって判別され、当該歌唱音声を表す信号に対してはエンハンサ処理が施される。一方、主旋律パートを歌唱する歌唱音声ではないと判別された歌唱音声（すなわち、ハーモニー旋律パートを歌唱する歌唱音声）に対してはディエッサ処理が施される。この結果、主旋律パートを歌唱する歌唱音声については音の輪郭がはっきりし、ハーモニー旋律パートを歌唱する歌唱音声については子音の発音が目立たなくなる。よって、主旋律パートの歌唱がハーモニー旋律パートの歌唱に埋没するという事態を防止することができる。 According to the embodiment described above, the singing voice for singing the main melody part is determined by the comparison unit 153, and the enhancer process is performed on the signal representing the singing voice. On the other hand, a de-esser process is performed with respect to the singing voice (that is, the singing voice that sings the harmony melody part) determined not to be the singing voice that sings the main melody part. As a result, the outline of the sound is clear for the singing voice that sings the main melody part, and the pronunciation of the consonant becomes inconspicuous for the singing voice that sings the harmony melody part. Therefore, the situation where the song of the main melody part is buried in the song of the harmony melody part can be prevented.

＜変形例＞
（１）上記の実施形態において、エフェクタ１２Ａ又は１２Ｂは、ディエッサ処理に代えてローパスフィルタ（Low Path Filter）処理を施してもよい。ここでローパスフィルタ処理とは、所定の閾値以上の周波数の成分を除去する処理である。ローパスフィルタ処理によれば、高音域の周波数成分が除去されるため、ハーモニー旋律パートの歌唱音声において子音の発音が目立たなくなり、主旋律パートの歌唱の埋没を防止することができる。なお、ローパスフィルタ処理の内容を規定するパラメータとしては、カットオフ周波数（Cutoff Frequency）やレゾナンス（Resonance）等がある。カットオフ周波数は、成分の除去が行われる周波数帯域を指定する値であり、カットオフ周波数として指定された値以上の周波数帯域において成分の除去が行われる。レゾナンスは、カットオフ周波数周辺の倍音を強調する度合いを指定する値である。これらの値もまた、エンハンサ処理のパラメータと同様に、予め設定しておくことができる。 <Modification>
(1) In the above embodiment, the effector 12A or 12B may perform a low path filter process instead of the de-esser process. Here, the low-pass filter process is a process for removing a component having a frequency equal to or higher than a predetermined threshold. According to the low-pass filter process, the frequency component in the high sound range is removed, so that the consonant pronunciation is not noticeable in the singing voice of the harmony melody part, and the singing of the main melody part can be prevented. Note that parameters that define the content of the low-pass filter process include a cut-off frequency and a resonance. The cutoff frequency is a value that designates a frequency band from which the component is removed, and the component is removed in a frequency band that is equal to or higher than the value designated as the cutoff frequency. The resonance is a value that specifies the degree of emphasizing the harmonics around the cutoff frequency. These values can also be set in advance in the same manner as enhancer processing parameters.

（２）上記の実施形態においては、各エフェクト処理につき１セットのエフェクトパラメータのみが設定されているが、複数セットのエフェクトパラメータを設定しておき、曲の属性に応じて使い分けてもよい。例えば、カラオケ装置１の記憶部１６は、各エフェクト処理につき曲のジャンルごとにエフェクトパラメータのセットを記憶しておき、エフェクト処理決定部１５４は、演奏する楽曲のデータのヘッダを参照して当該曲のジャンルを特定し、当該ジャンルに対応するエフェクトパラメータのセットを記憶部１６から読み出し、エフェクタ１２Ａ又は１２Ｂに対してエフェクト処理を指示する際に当該エフェクトパラメータのセットを出力してもよい。この変形例によれば、主旋律パートの歌唱の埋没を防止するという課題を解決する上で、曲の属性という要素も考慮することができる。 (2) In the above embodiment, only one set of effect parameters is set for each effect process. However, a plurality of sets of effect parameters may be set and used in accordance with the song attributes. For example, the storage unit 16 of the karaoke apparatus 1 stores a set of effect parameters for each genre of music for each effect process, and the effect process determination unit 154 refers to the header of the data of the music to be played, and May be specified, the effect parameter set corresponding to the genre may be read from the storage unit 16, and the effect parameter set may be output when the effect processing is instructed to the effector 12A or 12B. According to this modification, in solving the problem of preventing the singing of the main melody part from being buried, an element called a song attribute can also be considered.

（３）上記の実施形態においては、上述のように各エフェクト処理につき１セットのエフェクトパラメータのみが設定されているが、複数セットのエフェクトパラメータを設定しておき、入力される歌唱音声信号の特性に応じて使い分けてもよい。具体的には、カラオケ装置１の記憶部１６に、各エフェクト処理につき歌唱音声の特性ごとにエフェクトパラメータのセットを記憶しておき、入力される歌唱音声信号の特性を判別し、この判別された特性に基づいてエフェクタ１２Ａ又は１２Ｂに出力するエフェクトパラメータのセットを決定してもよい。以下、具体的に説明する。
図４は、本変形例に係る、制御部１５のＣＰＵがＲＯＭに記憶されているプログラムを実行することによって実現される機能の構成図である。本変形例に係る機能構成図では、上記の実施形態に係る機能構成図と比較して、解析部１５５Ａ及び１５５Ｂが追加されている。 (3) In the above embodiment, only one set of effect parameters is set for each effect process as described above. However, a plurality of sets of effect parameters are set, and the characteristics of the input singing voice signal You may use properly according to. Specifically, the storage unit 16 of the karaoke apparatus 1 stores a set of effect parameters for each characteristic of the singing voice for each effect process, and determines the characteristics of the input singing voice signal. A set of effect parameters to be output to the effector 12A or 12B may be determined based on the characteristics. This will be specifically described below.
FIG. 4 is a configuration diagram of functions realized by the CPU of the control unit 15 executing a program stored in the ROM according to the present modification. In the functional configuration diagram according to this modification, analysis units 155A and 155B are added as compared to the functional configuration diagram according to the above-described embodiment.

同図において解析部１５５Ａ及び１５５Ｂは、Ａ／Ｄコンバータ１１Ａ又は１１Ｂから出力される歌唱音声信号を取得し、当該信号の特性について解析を行う。例えば、解析部１５５Ａ及び１５５Ｂは、取得する歌唱音声信号の基本周波数が１８０Ｈｚ以上であるか否かについて判別する。これは、通常、男性の声の周波数は８０〜１２０Ｈｚ程度であり、女性の声の周波数は２４０〜５００Ｈｚ程度であることから、歌唱音声信号の基本周波数が１８０Ｈｚ以上であるか否かについて判別することにより、当該信号により表される音声が男性のものであるか女性のものであるかを判別することができるからである。解析部１５５Ａ及び１５５Ｂは、解析結果を示すデータ（解析結果データ）をエフェクト処理決定部１５４に出力し、エフェクト処理決定部１５４は、解析結果データにより表される特性に対応するエフェクトパラメータのセットを記憶部１６から読み出し、エフェクタ１２Ａ又は１２Ｂに対してエフェクト処理を指示する際に当該エフェクトパラメータのセットを出力する。この場合、エフェクト処理決定部１５４は、解析部１５５Ａから出力される解析結果データに基づいてエフェクタ１２Ａに出力するエフェクトパラメータのセットを特定し、解析部１５５Ｂから出力される解析結果データに基づいてエフェクタ１２Ｂに出力するエフェクトパラメータのセットを特定する。この変形例によれば、主旋律パートの歌唱の埋没を防止するという課題を解決する上で、歌唱音声の特性という要素も考慮することができる。 In the figure, analysis units 155A and 155B acquire the singing voice signal output from the A / D converter 11A or 11B, and analyze the characteristics of the signal. For example, the analysis units 155A and 155B determine whether or not the fundamental frequency of the singing voice signal to be acquired is 180 Hz or more. In general, the frequency of male voice is about 80 to 120 Hz and the frequency of female voice is about 240 to 500 Hz. Therefore, it is determined whether or not the basic frequency of the singing voice signal is 180 Hz or higher. This is because it is possible to determine whether the voice represented by the signal is male or female. The analysis units 155A and 155B output data indicating the analysis result (analysis result data) to the effect processing determination unit 154, and the effect processing determination unit 154 sets the effect parameter set corresponding to the characteristic represented by the analysis result data. When reading out from the storage unit 16 and instructing the effect processing to the effector 12A or 12B, the effect parameter set is output. In this case, the effect processing determination unit 154 identifies a set of effect parameters to be output to the effector 12A based on the analysis result data output from the analysis unit 155A, and the effector determination unit 154 determines the effector based on the analysis result data output from the analysis unit 155B. A set of effect parameters to be output to 12B is specified. According to this modification, in solving the problem of preventing the singing of the main melody part from being buried, an element of the characteristics of the singing voice can be taken into consideration.

なお、上記の本変形例に係る説明において、解析部から出力される解析結果データに基づいてエフェクトパラメータのセットが特定されるエフェクタは、エフェクタ１２Ａ及び１２Ｂのうちいずれか一方のみであってもよい。例えば、上記の本変形例に係る説明において、エフェクタ１２Ａに出力されるエフェクトパラメータのセットについては解析部１５５Ａから出力される解析結果データに基づいて特定し、エフェクタ１２Ｂに出力されるエフェクトパラメータのセットについては所定のセットを使用してもよい。なおここで所定のセットとは、記憶部１６に記憶される、予め設定しておいたエフェクトパラメータのセットのことである。
また、上記の本変形例に係る説明では、解析部１５５Ａから出力される解析結果データに基づいてエフェクタ１２Ｂに出力するエフェクトパラメータのセットを特定し、解析部１５５Ｂから出力される解析結果データに基づいてエフェクタ１２Ａに出力するエフェクトパラメータのセットを特定してもよい。この場合、相手方の歌唱音声の特性に基づいてエフェクトパラメータセットが決定されることになり、例えば、自身が男性であり主旋律パートを歌唱する場合に、相手方が女性であり音高が高い場合には、レベルをより高めに設定するといった調節が可能になる。なお、この場合も、解析部から出力される解析結果データに基づいてエフェクトパラメータのセットが特定されるエフェクタは、エフェクタ１２Ａ及び１２Ｂのうちいずれか一方のみであってもよい。 In the above description of the present modification, the effector whose effect parameter set is specified based on the analysis result data output from the analysis unit may be only one of the effectors 12A and 12B. . For example, in the above description of this modification, the effect parameter set output to the effector 12A is specified based on the analysis result data output from the analysis unit 155A, and the effect parameter set output to the effector 12B. A predetermined set may be used. Here, the predetermined set is a preset effect parameter set stored in the storage unit 16.
Further, in the above description of the present modification, a set of effect parameters to be output to the effector 12B is specified based on the analysis result data output from the analysis unit 155A, and based on the analysis result data output from the analysis unit 155B. The effect parameter set to be output to the effector 12A may be specified. In this case, the effect parameter set will be determined based on the characteristics of the other party's singing voice.For example, when you are singing the main melody part and you are a woman, It is possible to make adjustments such as setting the level higher. In this case, the effector whose effect parameter set is specified based on the analysis result data output from the analysis unit may be only one of the effectors 12A and 12B.

（４）上記の実施形態では、マイクを２本設けていたが、３本以上設けてもよい。この場合、追加するマイクごとに、マイク２Ａ及び２Ｂと同様に、Ａ／Ｄコンバータ、エフェクタ及びＤ／Ａコンバータが設けられ、かつ、差分算出部及び採点部の機能が設けられる。エフェクト処理決定部１５４は、入力される歌唱音声信号のうち最も主旋律データに近似する信号を処理するエフェクタにエンハンサ処理の実行を指示し、その他の歌唱音声信号を処理するエフェクタに対してはディエッサ処理の実行を処理する。 (4) In the above embodiment, two microphones are provided, but three or more microphones may be provided. In this case, each microphone to be added is provided with an A / D converter, an effector, and a D / A converter as well as the microphones 2A and 2B, and functions of a difference calculation unit and a scoring unit. The effect processing determination unit 154 instructs the effector that processes the signal closest to the main melody data among the input singing voice signals to execute the enhancer process, and the de-esser processing for the effectors that process other singing voice signals. Process the execution of.

１…カラオケ装置、２Ａ，２Ｂ…マイク、３…スピーカ、４…ディスプレイ、１１Ａ，１１Ｂ…Ａ／Ｄコンバータ、１２Ａ，１２Ｂ…エフェクタ、１３Ａ，１３Ｂ…Ｄ／Ａコンバータ、１４…ミキサ、１５…制御部、１６…記憶部、１７…ＶＤＰ、１８…操作部、１９…音源、１５１Ａ，１５１Ｂ…差分算出部、１５２Ａ，１５２Ｂ…採点部、１５３…比較部、１５４…エフェクト処理決定部、１５５Ａ，１５５Ｂ…解析部 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 2A, 2B ... Microphone, 3 ... Speaker, 4 ... Display, 11A, 11B ... A / D converter, 12A, 12B ... Effector, 13A, 13B ... D / A converter, 14 ... Mixer, 15 ... Control , 16 storage unit, 17 VDP, 18 operation unit, 19 sound source, 151 A, 151 B difference calculation unit, 152 A, 152 B scoring unit, 153 comparison unit, 154 effect determination unit, 155 A, 155 B ... analysis department

Claims

Storage means for storing main melody data of karaoke songs;
The first and second sound signals output by the first and second sound collecting means for outputting the collected sound as sound signals are compared with the main melody data stored in the storage means, and the degree of coincidence Determining means for determining
Based on the determination result of the determination means, of the first and second sound signals, the sound signal having a high degree of coincidence is identified as the sound signal of the main melody, and the other sound signal is determined as the sound signal of the harmony melody. An identification means for identifying;
First correcting means for performing processing for adding a harmonic component of the sound signal to a frequency band equal to or higher than a first threshold for the sound signal identified as the main melody sound signal by the identifying means;
Second correcting means for performing a process of reducing the sound volume in a frequency band equal to or higher than a second threshold for the sound signal identified as the sound signal of the harmony melody by the identifying means;
Output means for amplifying and outputting the sound signal of the main melody processed by the first correction means and the sound signal of the harmony melody processed by the second correction means. Karaoke device.

A discriminating unit for discriminating characteristics of the audio signal identified from the main melodic audio signal by the discriminating unit;
2. The karaoke according to claim 1, wherein at least one of the processing of the first correction unit and the processing of the second correction unit is performed based on the characteristic determined by the determination unit. apparatus.

A discriminating unit for discriminating characteristics of the audio signal identified from the audio signal of the harmony melody by the discriminating unit;
2. The karaoke according to claim 1, wherein at least one of the processing of the first correction unit and the processing of the second correction unit is performed based on the characteristic determined by the determination unit. apparatus.