JP7688554B2

JP7688554B2 - Karaoke System

Info

Publication number: JP7688554B2
Application number: JP2021161950A
Authority: JP
Inventors: 勇太岡田
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2025-06-04
Anticipated expiration: 2041-09-30
Also published as: JP2023051345A

Description

本発明はカラオケシステムに関する。 The present invention relates to a karaoke system.

カラオケシステムは、カラオケ装置を利用した様々なコンテンツを提供することができる。たとえば、利用者がカラオケ歌唱を行う際、アーティストの歌唱を真似ることがある（所謂、歌まね）。このような歌まねを評価するコンテンツを実施可能なカラオケシステムが提案されている。 Karaoke systems can provide a variety of content that utilizes karaoke devices. For example, when singing karaoke, users may imitate the singing of artists (so-called singing imitations). Karaoke systems have been proposed that can provide content that evaluates such singing imitations.

たとえば、特許文献１には、模範歌唱及び模範歌唱に対する歌い手の歌唱の音声信号から抽出したそれぞれの音声の特徴点に対する特定話者単語音声認識処理を行い、この処理で得られる音質、発生単語の長さを比較し、この比較での離れ状態である距離値から模範歌唱と歌唱との類似度を示す採点信号を出力、表示する技術が開示されている。また、特許文献２には、歌唱基準リファレンスデータ記憶手段が、カラオケ楽曲毎に、その歌唱基準である歌唱基準リファレンスデータを記憶し、歌まねリファレンスデータ記憶手段が、原アーティスト毎に、その歌唱特徴、発現位置及び発現区間を含んだ歌まねリファレンスデータを記憶し、歌唱採点手段が、利用者が任意のカラオケ楽曲を歌唱した際に、マイクロホンより入力された当該利用者の音声信号から抽出した歌唱評価データと、当該カラオケ楽曲の歌唱基準リファレンスデータと、当該カラオケ楽曲の原アーティストに対応した歌まねリファレンスデータとに基づいて、歌まねの巧拙を採点する技術が開示されている。 For example, Patent Document 1 discloses a technique for performing speaker-specific word speech recognition processing on feature points of voice extracted from the voice signals of a model singing and the singer's singing of the model singing, comparing the sound quality and length of the generated words obtained by this processing, and outputting and displaying a scoring signal indicating the similarity between the model singing and the singing based on a distance value indicating the degree of separation in this comparison. Patent Document 2 also discloses a technique in which a singing standard reference data storage means stores singing standard reference data that is the singing standard for each karaoke song, a singing imitation reference data storage means stores singing imitation reference data including the singing features, expression positions, and expression sections for each original artist, and a singing scoring means scores the skill of a singing imitation based on singing evaluation data extracted from the user's voice signal input from a microphone, the singing standard reference data for the karaoke song, and the singing imitation reference data corresponding to the original artist of the karaoke song when the user sings any karaoke song.

特開平１１－２５９０８１号公報Japanese Patent Application Publication No. 11-259081 特開２０１３－２３１８８１号公報JP 2013-231881 A

本発明の目的は、特定の歌唱者の歌唱音声が放音された歌唱区間を当てる新規なコンテンツを実施可能なカラオケシステムを提供することにある。 The object of the present invention is to provide a karaoke system capable of implementing new content that allows users to guess the singing section in which a specific singer's singing voice is emitted.

上記目的を達成するための一の発明は、サーバ装置とカラオケ装置とが通信可能に接続されたカラオケシステムであって、前記サーバ装置は、楽曲の楽曲識別情報、当該楽曲を歌唱した歌唱者の歌唱者識別情報、当該歌唱者が当該楽曲を歌唱した歌唱音声に対応する歌唱音声データ、及び当該歌唱音声データから抽出された特徴情報を紐付けて記憶する情報記憶部と、前記カラオケ装置の利用者により、ある楽曲及び特定の歌唱者が選択された場合、当該ある楽曲の楽曲識別情報及び当該特定の歌唱者の歌唱者識別情報に紐付けられている特徴情報と、当該ある楽曲の楽曲識別情報及び当該特定の歌唱者とは異なる他の歌唱者の歌唱者識別情報に紐付けられている特徴情報とを比較することにより求めた、類似度に基づいて、他の歌唱者の歌唱音声に対応する歌唱音声データを少なくとも一つ特定する特定部と、前記特定の歌唱者の歌唱音声に対応する歌唱音声データ、及び特定した前記他の歌唱者の歌唱音声に対応する歌唱音声データを前記カラオケ装置に送信する送信処理部と、を有し、前記カラオケ装置は、前記ある楽曲の歌唱区間の少なくとも１つに対し、前記特定の歌唱者の当該歌唱区間に対応する歌唱音声データを割り当て、それ以外の他の歌唱区間に対し、特定した前記他の歌唱者の当該他の歌唱区間に対応する歌唱音声データを割り当てることにより、コンテンツデータを生成するコンテンツ生成部と、前記ある楽曲のカラオケ演奏に併せて前記コンテンツデータを再生することにより歌唱音声を放音させ、且つ再生終了後、前記特定の歌唱者の歌唱音声が放音された歌唱区間を選択するための解答画面を表示させるコンテンツ実行部と、を有するカラオケシステムである。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 One invention for achieving the above object is a karaoke system in which a server device and a karaoke device are communicatively connected, the server device comprising an information storage unit for storing, in association with each other, song identification information of a song, singer identification information of a singer who sang the song, singing voice data corresponding to the singing voice of the singer who sang the song, and feature information extracted from the singing voice data, and when a certain song and a specific singer are selected by a user of the karaoke device, the server device identifies at least one singing voice data corresponding to the singing voice of the other singer based on a similarity obtained by comparing the feature information linked to the song identification information of the certain song and the singer identification information of the specific singer with the song identification information of the certain song and the singer identification information of a singer other than the specific singer. and a transmission processing unit which transmits singing voice data corresponding to the singing voice of the specific singer and singing voice data corresponding to the singing voice of the identified other singer to the karaoke device, wherein the karaoke device has a content generation unit which generates content data by assigning singing voice data of the specific singer corresponding to at least one of the singing sections of the certain song, and assigning singing voice data of the identified other singer corresponding to the other singing sections to other singing sections, and a content execution unit which plays back the content data in conjunction with a karaoke performance of the certain song, thereby emitting a singing voice, and after playback is completed, displays an answer screen for selecting the singing section in which the singing voice of the specific singer is emitted.
Other features of the present invention will become apparent from the following specification and drawings.

本発明によれば、特定の歌唱者の歌唱音声が放音された歌唱区間を当てるコンテンツを実施できる。 The present invention makes it possible to implement content that allows users to guess the singing section in which a specific singer's singing voice is emitted.

実施形態に係るカラオケシステムを示す図である。FIG. 1 is a diagram showing a karaoke system according to an embodiment. 実施形態に係るサーバ装置を示す図である。FIG. 2 is a diagram illustrating a server device according to an embodiment. 実施形態に係るカラオケ装置を示す図である。1 is a diagram showing a karaoke device according to an embodiment; 実施形態に係るカラオケ本体を示す図である。FIG. 2 is a diagram showing a karaoke main unit according to the embodiment. 実施形態に係るカラオケシステムの処理を示すフローチャートである。4 is a flowchart showing a process of the karaoke system according to the embodiment. 実施形態に係るリモコン装置の表示画面を示す図である。FIG. 4 is a diagram showing a display screen of the remote control device according to the embodiment. 実施形態における楽曲Ｘの各歌唱区間における類似度を示す図である。FIG. 11 is a diagram showing similarities in each singing section of a piece of music X in an embodiment. 実施形態における楽曲Ｘの各歌唱区間に対して割り当てた歌唱音声データを示す図である。11 is a diagram showing singing voice data assigned to each singing section of a piece of music X in an embodiment. FIG. 実施形態に係る表示装置に表示される解答画面を示す図である。FIG. 13 is a diagram showing an answer screen displayed on the display device according to the embodiment. 実施形態に係る表示装置に表示される解答画面を示す図である。FIG. 13 is a diagram showing an answer screen displayed on the display device according to the embodiment.

＜実施形態＞
図１～図９Ｂを参照して、実施形態に係るカラオケシステムについて説明する。 <Embodiment>
A karaoke system according to an embodiment will be described with reference to FIGS. 1 to 9B.

＝＝カラオケシステム＝＝
図１に示すように、カラオケシステム１は、カラオケ装置Ｋ及びサーバ装置Ｓを含む。カラオケ装置Ｋは、ネットワークＮを介してサーバ装置Ｓと通信可能に接続されている。ネットワークＮは、たとえば公衆電話回線網やインターネット回線等の伝送路である。 ==Karaoke System==
1, the karaoke system 1 includes a karaoke device K and a server device S. The karaoke device K is communicably connected to the server device S via a network N. The network N is a transmission path such as a public telephone line network or the Internet.

サーバ装置Ｓは、各種情報を管理するコンピュータである。サーバ装置Ｓは、ネットワークＮを介してカラオケ装置Ｋ以外の他のカラオケ装置（図示なし）とも通信可能となっている。 The server device S is a computer that manages various information. The server device S can also communicate with other karaoke devices (not shown) other than the karaoke device K via the network N.

＝＝サーバ装置＝＝
図２に示すように、サーバ装置Ｓは、記憶手段１０、通信手段２０、及び制御手段３０を備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 ==Server Device==
2, the server device S includes a storage unit 10, a communication unit 20, and a control unit 30. Each component is connected to a bus B via an interface (not shown).

［記憶手段］
記憶手段１０は、各種のデータを記憶する大容量の記憶装置である。本実施形態において、記憶手段１０の記憶領域の一部は、情報記憶部１００として機能する。 [Storage means]
The storage unit 10 is a large-capacity storage device that stores various types of data. In this embodiment, a part of the storage area of the storage unit 10 functions as an information storage unit 100.

（情報記憶部）
情報記憶部１００は、楽曲の楽曲識別情報、当該楽曲を歌唱した歌唱者の歌唱者識別情報、当該歌唱者の歌唱音声に対応する歌唱音声データ、及び当該歌唱音声データから抽出された特徴情報を紐付けて記憶する。 (Information storage unit)
The information storage unit 100 stores, in association with each other, song identification information of a song, singer identification information of a singer who sang the song, singing voice data corresponding to the singing voice of the singer, and feature information extracted from the singing voice data.

楽曲識別情報は、個々の楽曲を識別するための楽曲ＩＤ等、各楽曲に固有の情報である。歌唱者識別情報は、個々の歌唱者を識別するための歌唱者ＩＤ等、各歌唱者に固有の情報である。歌唱者は、サーバ装置Ｓと通信可能に接続されているカラオケ装置においてカラオケ歌唱を行った利用者、またはカラオケ歌唱可能な楽曲を歌唱するアーティスト本人である。歌唱音声データは、楽曲を歌唱した歌唱音声に対応するデータである。 Song identification information is information unique to each song, such as a song ID for identifying each song. Singer identification information is information unique to each singer, such as a singer ID for identifying each singer. A singer is a user who performs karaoke singing on a karaoke device that is communicatively connected to the server device S, or an artist who sings a song that can be karaoke. Singing voice data is data corresponding to the singing voice that sang the song.

特徴情報は、歌唱音声データから抽出した情報である。すなわち、ある楽曲を歌唱した歌唱音声に対応する歌唱音声データと、当該歌唱音声データから抽出した特徴情報とは一対一に対応している。 Feature information is information extracted from singing voice data. In other words, there is a one-to-one correspondence between singing voice data corresponding to the singing voice of a certain song and the feature information extracted from that singing voice data.

特徴情報は、具体的には、歌唱音声データの時系列におけるピッチ及び／または音量の推移である。また、特徴情報は、特殊な歌唱方法等に関する情報を含んでもよい。特殊な歌唱方法の種別は、ビブラートタイプ（上昇型、下降型、ペナント型）、シャープ傾向（リファレンスデータの音高に対してシャープする傾向の弱・強）、フラット傾向（リファレンスデータの音高に対してフラットする傾向の弱・強）、伴奏に対する発音位置のずれ（伴奏に対してカラオケ歌唱のタイミングを遅くする手法（所謂タメ）、伴奏に対してカラオケ歌唱のタイミングを早める手法（所謂走り））、フォール、ヒーカップ、シャクリ、ファルセット、コブシ、声質（ブレッシー、ハスキー、クリア等）等である。 Specifically, the feature information is the transition of pitch and/or volume in the time series of singing voice data. The feature information may also include information on special singing methods, etc. Types of special singing methods include vibrato type (ascending, descending, pennant type), sharp tendency (weak or strong tendency to sharpen relative to the pitch of the reference data), flat tendency (weak or strong tendency to flatten relative to the pitch of the reference data), deviation of the pronunciation position relative to the accompaniment (a method of delaying the timing of karaoke singing relative to the accompaniment (so-called pause), a method of speeding up the timing of karaoke singing relative to the accompaniment (so-called run)), fall, high cup, shake, falsetto, fist, voice quality (breathy, husky, clear, etc.), etc.

たとえば、利用者がカラオケ装置Ｋを介して自己の歌唱者ＩＤを入力し、カラオケシステム１にログインしたとする。そして、利用者は、カラオケ装置Ｋにおいて選曲した、ある楽曲のカラオケ歌唱を行ったとする。この場合、カラオケ装置Ｋは、カラオケ歌唱の歌唱音声に対応する歌唱音声データを公知の手法を用いて解析する。具体的に、カラオケ装置Ｋは、歌唱音声データから、所定時間長（たとえば１０～２０ｍｓｅｃ）のフレーム単位で１サンプルずつ時系列にピッチ及び／または音量を抽出し、ある楽曲の演奏開始を基準とするフレーム単位の経過時間に関連付けることにより、特徴情報を求める。或いは、カラオケ装置Ｋは、公知の歌唱採点機能を利用することにより、カラオケ歌唱中に特殊な歌唱方法を検知し、その歌唱方法の種別、及び検知した際の経過時間（ある楽曲の演奏開始を基準とするフレーム単位の経過時間）、歌唱方法を検知している継続時間等を特徴情報に含めてもよい。 For example, suppose that a user inputs his/her singer ID via karaoke device K and logs into karaoke system 1. Then, suppose that the user sings a karaoke version of a song selected on karaoke device K. In this case, karaoke device K analyzes singing voice data corresponding to the singing voice of the karaoke performance using a known method. Specifically, karaoke device K extracts pitch and/or volume sample by sample from the singing voice data in a time series for each frame of a predetermined time length (e.g., 10 to 20 msec), and obtains feature information by associating the extracted pitch and/or volume with the elapsed time in frame units based on the start of the performance of the song. Alternatively, karaoke device K may detect a special singing style during karaoke singing by using a known singing scoring function, and include in the feature information the type of singing style, the elapsed time at the time of detection (the elapsed time in frame units based on the start of the performance of the song), the duration for which the singing style is detected, etc.

カラオケ装置Ｋは、ある楽曲の楽曲ＩＤ、利用者の歌唱者ＩＤ、当該利用者の歌唱音声に対応する歌唱音声データ、及び当該歌唱音声データから抽出された特徴情報を紐付けてサーバ装置Ｓに送信する。サーバ装置Ｓは、受信した各種情報を情報記憶部１００に記憶させる。 The karaoke device K links the song ID of a song, the singer ID of the user, singing voice data corresponding to the singing voice of the user, and feature information extracted from the singing voice data, and transmits them to the server device S. The server device S stores the received various information in the information storage unit 100.

一方、アーティスト本人の歌唱音声データは、たとえば、ガイドボーカルの音声を録音する際に取得した歌唱音声データを流用することができる。特徴情報の抽出は、カラオケ歌唱の歌唱音声データから抽出する場合と同様の手法を用いることができる。 On the other hand, the singing voice data of the artist himself can be, for example, singing voice data acquired when recording the guide vocals. The feature information can be extracted using the same method as when extracting it from singing voice data of karaoke singing.

［通信手段］
通信手段２０は、カラオケ装置Ｋとの通信を行うためのインターフェースを提供する。 [Communication means]
The communication means 20 provides an interface for communicating with the karaoke device K.

［制御手段］
制御手段３０は、サーバ装置Ｓにおける各種の制御を行う。制御手段３０は、ＣＰＵおよびメモリ（いずれも図示無し）を備える。ＣＰＵは、メモリに記憶されたプログラムを実行することにより各種の機能を実現する。 [Control Means]
The control means 30 performs various controls in the server device S. The control means 30 includes a CPU and a memory (neither of which are shown in the figure). The CPU realizes various functions by executing programs stored in the memory.

本実施形態においては、メモリに記憶される歌唱区間当てコンテンツのプログラムをＣＰＵが実行することにより、制御手段３０は、特定部２００及び送信処理部３００として機能する。歌唱区間当てコンテンツは、特定の歌唱者の歌唱音声が放音された歌唱区間を当てるコンテンツである。 In this embodiment, the CPU executes a program of the singing section guessing content stored in the memory, and the control means 30 functions as the identification unit 200 and the transmission processing unit 300. The singing section guessing content is content for guessing the singing section in which the singing voice of a specific singer is emitted.

（特定部）
特定部２００は、カラオケ装置の利用者により、ある楽曲及び特定の歌唱者が選択された場合、当該ある楽曲の楽曲識別情報及び当該特定の歌唱者の歌唱者識別情報に紐付けられている特徴情報と、当該ある楽曲の楽曲識別情報及び他の歌唱者の歌唱者識別情報に紐付けられている特徴情報とを比較することにより求めた、類似度に基づいて、他の歌唱者の歌唱音声に対応する歌唱音声データを少なくとも一つ特定する。 (Specific section)
When a certain song and a specific singer are selected by a user of the karaoke device, the identification unit 200 identifies at least one singing voice data corresponding to the singing voice of the other singer based on the degree of similarity obtained by comparing the song identification information of the certain song and the feature information linked to the singer identification information of the specific singer with the song identification information of the certain song and the singer identification information of the other singer.

通常、歌唱区間当てコンテンツは、複数の利用者により実施される。以下、楽曲及び特定の歌唱者を選択する利用者を「出題者」といい、歌唱区間を当てるゲームを行う利用者を「解答者」という場合がある。出題者は、カラオケ装置Ｋのリモコン装置５０を操作し、歌唱区間当てコンテンツで使用する楽曲、及び特定の歌唱者を選択する。カラオケ装置Ｋは、選択された楽曲の楽曲識別情報、及び特定の歌唱者の歌唱者識別情報をサーバ装置Ｓに送信する。 Typically, singing section guessing content is performed by multiple users. Hereinafter, the user who selects the song and the specific singer may be referred to as the "question setter," and the user who plays the game to guess the singing section may be referred to as the "answerer." The question setter operates the remote control device 50 of the karaoke device K to select the song and the specific singer to be used in the singing section guessing content. The karaoke device K transmits the song identification information of the selected song and the singer identification information of the specific singer to the server device S.

特定の歌唱者は、選択された楽曲を過去に歌唱したことがある歌唱者（すなわち、選択された楽曲を歌唱した歌唱音声に対応する歌唱音声データが情報記憶部１００に記憶されている歌唱者）のうちの一人である。他の歌唱者は、特定の歌唱者とは異なる歌唱者であって、選択された楽曲を過去に歌唱したことがある歌唱者である。 The specific singer is one of the singers who have sung the selected song in the past (i.e., one of the singers for whom singing voice data corresponding to the singing voice of the selected song is stored in the information storage unit 100). The other singers are singers different from the specific singer and who have sung the selected song in the past.

類似度は、特定の歌唱者の歌唱音声データから抽出した特徴情報と、他の歌唱者の歌唱音声データから抽出した特徴情報との一致度合いを示すものである。類似度は、特徴情報を構成するピッチ及び／または音量の推移を対比することにより求めることができる。 Similarity indicates the degree of agreement between feature information extracted from singing voice data of a particular singer and feature information extracted from singing voice data of other singers. Similarity can be determined by comparing the changes in pitch and/or volume that constitute the feature information.

たとえば、特定の歌唱者の歌唱音声データから抽出した特徴情報のフレームＦ１からフレームＦ２０におけるピッチの平均値と、他の歌唱者の歌唱音声データから抽出した特徴情報のフレームＦ１からフレームＦ２０におけるピッチの平均値とが所定のピッチ範囲内（たとえば、±３０ｃｅｎｔ以内）であるとする。この場合、特定部２００は、フレームＦ１からフレームＦ２０における特徴情報が一致する（類似度が高い）と判断する。一方、特定の歌唱者の歌唱音声データから抽出した特徴情報のフレームＦ１からフレームＦ２０におけるピッチの平均値と、他の歌唱者の歌唱音声データから抽出した特徴情報のフレームＦ１からフレームＦ２０におけるピッチの平均値とが所定のピッチ範囲外であるとする。この場合、特定部２００は、フレームＦ１からフレームＦ２０における特徴情報が一致しない（類似度が低い）と判断する。 For example, suppose that the average pitch value in frames F1 to F20 of the feature information extracted from the singing voice data of a particular singer and the average pitch value in frames F1 to F20 of the feature information extracted from the singing voice data of another singer are within a predetermined pitch range (for example, within ±30 cents). In this case, the identification unit 200 determines that the feature information in frames F1 to F20 match (high similarity). On the other hand, suppose that the average pitch value in frames F1 to F20 of the feature information extracted from the singing voice data of a particular singer and the average pitch value in frames F1 to F20 of the feature information extracted from the singing voice data of another singer are outside the predetermined pitch range. In this case, the identification unit 200 determines that the feature information in frames F1 to F20 do not match (low similarity).

或いは、特定の歌唱者の歌唱音声データから抽出した特徴情報の複数のフレームＦ１００からＦ１５０において、歌唱方法の種別として上昇型のビブラートが検知されており、他の歌唱者の歌唱音声データから抽出した特徴情報の複数のフレームＦ９０からＦ１４０において、歌唱方法の種別として上昇型のビブラートが検知されているとする。この場合、特定部２００は、複数のフレームＦ１００からＦ１４０における特徴情報が一致すると判断する。一方、特定の歌唱者の歌唱音声データから抽出した特徴情報の複数のフレームＦ１００からＦ１５０において、歌唱方法の種別として上昇型のビブラートが検知されており、他の歌唱者の歌唱音声データから抽出した特徴情報の複数のフレームＦ９０からＦ１４０において、歌唱方法の種別として下降型のビブラートが検知されているとする。この場合、特定部２００は、複数のフレームＦ１００からＦ１４０における特徴情報が一致しないと判断する。 Or, suppose that ascending vibrato is detected as a type of singing style in multiple frames F100 to F150 of the feature information extracted from the singing voice data of a specific singer, and ascending vibrato is detected as a type of singing style in multiple frames F90 to F140 of the feature information extracted from the singing voice data of another singer. In this case, the identification unit 200 determines that the feature information in the multiple frames F100 to F140 matches. On the other hand, suppose that ascending vibrato is detected as a type of singing style in multiple frames F100 to F150 of the feature information extracted from the singing voice data of a specific singer, and descending vibrato is detected as a type of singing style in multiple frames F90 to F140 of the feature information extracted from the singing voice data of another singer. In this case, the identification unit 200 determines that the feature information in the multiple frames F100 to F140 does not match.

特定部２００は、楽曲を構成する全ての歌唱区間（全てのフレーム）において特徴情報が一致するかどうかの判断を行うことにより、特定の歌唱者の歌唱音声データから抽出した特徴情報と他の歌唱者の歌唱音声データから抽出した特徴情報との類似度を求める。特徴情報が一致するフレームの数が多いほど、特徴情報全体の類似度が高くなる。類似度は、たとえば、「０％（全てのフレームで特徴情報が一致しない）～１００％（全てのフレームで特徴情報が一致する）」の割合で示すことができる。 The identification unit 200 determines whether the feature information matches in all singing sections (all frames) that make up a song, thereby determining the similarity between the feature information extracted from the singing voice data of a particular singer and the feature information extracted from the singing voice data of other singers. The greater the number of frames in which the feature information matches, the higher the similarity of the overall feature information. The similarity can be expressed, for example, as a percentage from "0% (feature information does not match in all frames) to 100% (feature information matches in all frames)."

特定部２００は、類似度に基づいて、他の歌唱者の歌唱音声に対応する歌唱音声データを特定する。 The identification unit 200 identifies singing voice data corresponding to the singing voice of another singer based on the similarity.

特定する歌唱音声データの数は、少なくとも一つあればよい。たとえば、特定部２００は、類似度が予め設定した基準値（たとえば、８０％以上）を満たす歌唱音声データを特定することができる。 The number of singing voice data to be identified needs to be at least one. For example, the identification unit 200 can identify singing voice data whose similarity satisfies a preset reference value (e.g., 80% or more).

或いは、特定部２００は、楽曲を構成する演奏区間、より具体的には歌唱区間の種類に応じて、歌唱音声データを特定することができる。演奏区間は、カラオケ演奏が行われる区間である。演奏区間は、歌唱区間及び非歌唱区間を含む。歌唱区間は、たとえば、Ａメロ、Ｂメロ、Ｃメロ、サビ等のような、ある楽曲において歌唱すべき歌詞が設定されている区間である。非歌唱区間は、たとえば前奏、間奏、後奏を構成する１小節のような、ある楽曲において歌唱すべき歌詞が設定されていない区間である。 Alternatively, the identification unit 200 can identify singing voice data according to the type of performance section that constitutes a song, more specifically, the singing section. A performance section is a section in which karaoke is performed. A performance section includes a singing section and a non-singing section. A singing section is a section in which lyrics to be sung are set in a song, such as an A verse, a B verse, a C verse, a chorus, etc. A non-singing section is a section in which lyrics to be sung are not set in a song, such as a measure that constitutes an introduction, interlude, or postlude.

ここで、ある楽曲が、前奏、１番のＡメロ、１番のＢメロ、１番のサビ、間奏、２番のＡメロ、２番のＢメロ、２番のサビ、間奏、サビ、後奏の演奏区間で構成されているとする。この場合、歌唱区間の種類は、３種類（Ａメロ、Ｂメロ、サビ）となる。このうち、少なくとも１つの歌唱区間に対しては、特定の歌唱者の歌唱音声に対応する歌唱音声データを割り当てる必要がある（詳細は後述）。そこで、特定部２００は、他の歌唱者の歌唱音声に対応する歌唱音声データとして、類似度が高い順に２つの歌唱音声データを特定することができる。 Let us now assume that a certain song is composed of the following performance sections: introduction, first verse A melody, first verse B melody, first chorus, interlude, second verse A melody, second verse B melody, second chorus, interlude, chorus, and postlude. In this case, there are three types of singing sections (A melody, B melody, chorus). Of these, singing voice data corresponding to the singing voice of a specific singer needs to be assigned to at least one singing section (details will be described later). The identification unit 200 can then identify two singing voice data in descending order of similarity as singing voice data corresponding to the singing voice of another singer.

（送信処理部）
送信処理部３００は、特定の歌唱者の歌唱音声に対応する歌唱音声データ、及び特定した他の歌唱者の歌唱音声に対応する歌唱音声データをカラオケ装置に送信する。 (Transmission Processing Section)
The transmission processing unit 300 transmits singing voice data corresponding to the singing voice of the specific singer and singing voice data corresponding to the singing voices of the other specified singers to the karaoke device.

上記例において、送信処理部３００は、カラオケ装置Ｋから受信した特定の歌唱者の歌唱者識別情報に基づいて、情報記憶部１００から、当該歌唱者識別情報に紐付けられている歌唱音声データを読み出す。また、送信処理部３００は、情報記憶部１００から、特定部２００により特定された他の歌唱者の歌唱音声に対応する歌唱音声データを読み出す。送信処理部３００は、読み出した歌唱音声データをカラオケ装置Ｋに送信する。 In the above example, the transmission processing unit 300 reads out, from the information storage unit 100, singing voice data linked to the singer identification information of a specific singer received from the karaoke device K. The transmission processing unit 300 also reads out, from the information storage unit 100, singing voice data corresponding to the singing voice of another singer identified by the identification unit 200. The transmission processing unit 300 transmits the read singing voice data to the karaoke device K.

＝＝カラオケ装置＝＝
カラオケ装置Ｋは、楽曲のカラオケ演奏、及び利用者がカラオケ歌唱を行うための装置である。図３に示すように、カラオケ装置Ｋは、カラオケ本体４０、スピーカ５０、表示装置６０、マイク７０、及びリモコン装置８０を備える。 ==Karaoke Equipment==
The karaoke device K is a device for playing karaoke songs and for users to sing karaoke. As shown in FIG. 3, the karaoke device K includes a karaoke main unit 40, a speaker 50, a display device 60, a microphone 70, and a remote control device 80.

カラオケ本体４０は、選曲された楽曲の演奏制御、歌詞や背景映像等の表示制御、マイク７０を通じて入力された音声信号の処理といった、カラオケ演奏やカラオケ歌唱に関する各種の制御を行う。スピーカ５０は、カラオケ本体４０からの信号に基づいて放音するための構成である。表示装置６０は、カラオケ本体４０からの信号に基づいて映像や画像を画面に表示するための構成である。マイク７０は、歌唱者のカラオケ歌唱に基づく歌唱音声をアナログの音声信号に変換してカラオケ本体４０に入力するための構成である。リモコン装置８０は、カラオケ本体４０に対する各種操作をおこなうための装置である。 The karaoke main unit 40 performs various controls related to karaoke performance and karaoke singing, such as controlling the performance of the selected song, controlling the display of lyrics and background images, and processing audio signals input through the microphone 70. The speaker 50 is configured to emit sound based on a signal from the karaoke main unit 40. The display device 60 is configured to display videos and images on a screen based on a signal from the karaoke main unit 40. The microphone 70 is configured to convert the singing voice based on the singer's karaoke singing into an analog audio signal and input it to the karaoke main unit 40. The remote control device 80 is a device for performing various operations on the karaoke main unit 40.

図４に示すように、カラオケ本体４０は、記憶手段４０ａ、通信手段４０ｂ、入力手段４０ｃ、演奏手段４０ｄ、及び制御手段４０ｅを備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 As shown in FIG. 4, the karaoke main unit 40 includes a storage unit 40a, a communication unit 40b, an input unit 40c, a performance unit 40d, and a control unit 40e. Each component is connected to a bus B via an interface (not shown).

［記憶手段］
記憶手段４０ａは、各種のデータを記憶する大容量の記憶装置である。記憶手段４０ａは、楽曲データを記憶する。 [Storage means]
The storage unit 40a is a large-capacity storage device that stores various types of data, including music data.

楽曲データは、楽曲識別情報が付与されている。楽曲データは、伴奏データ、リファレンスデータ、演奏区間情報を含む。伴奏データは、カラオケ演奏音の元となるデータである。リファレンスデータは、楽曲の主旋律を示すデータである。演奏区間情報は、楽曲を構成する演奏区間の順番及び種類を示す情報である。また、記憶手段４０ａは、カラオケ演奏に合わせて楽曲の歌詞を表示装置６０等に表示させるための歌詞テロップデータや、カラオケ演奏時に表示装置６０等に表示される背景映像等の背景映像データを楽曲毎に記憶する。 The song data is provided with song identification information. The song data includes accompaniment data, reference data, and performance section information. The accompaniment data is data that is the source of the karaoke performance sound. The reference data is data that indicates the main melody of the song. The performance section information is information that indicates the order and type of the performance sections that make up the song. In addition, the storage means 40a stores, for each song, lyric subtitle data for displaying the lyrics of the song on the display device 60 or the like in sync with the karaoke performance, and background video data such as background video that is displayed on the display device 60 or the like during the karaoke performance.

［通信手段・入力手段］
通信手段４０ｂは、リモコン装置８０やサーバ装置Ｓとの通信を行うためのインターフェースを提供する。入力手段４０ｃは、利用者が各種の指示入力を行うための構成である。入力手段４０ｃは、カラオケ本体４０に設けられたボタン等である。或いは、リモコン装置８０が入力手段４０ｃとして機能してもよい。 [Communication means/input means]
The communication means 40b provides an interface for communicating with the remote control device 80 and the server device S. The input means 40c is configured to allow the user to input various instructions. The input means 40c is a button or the like provided on the karaoke main unit 40. Alternatively, the remote control device 80 may function as the input means 40c.

［演奏手段］
演奏手段４０ｄは、制御手段４０ｅの制御に基づき、楽曲のカラオケ演奏及びマイク７０を通じて入力された歌唱音声に基づく信号の処理を行う。演奏手段４０ｄは、音源、ミキサ、アンプ等を含む（いずれも図示なし）。 [Means of performance]
The performance means 40d, under the control of the control means 40e, performs karaoke performance of a song and processes a signal based on a singing voice input through the microphone 70. The performance means 40d includes a sound source, a mixer, an amplifier, etc. (none of which are shown).

［制御手段］
制御手段４０ｅは、カラオケ装置Ｋにおける各種の制御を行う。制御手段４０ｅは、ＣＰＵおよびメモリ（いずれも図示なし）を備える。ＣＰＵは、メモリに記憶されたプログラムを実行することにより各種の機能を実現する。 [Control Means]
The control means 40e performs various controls in the karaoke device K. The control means 40e includes a CPU and a memory (neither of which are shown in the figures). The CPU realizes various functions by executing programs stored in the memory.

本実施形態においては、メモリに記憶される歌唱区間当てコンテンツのプログラムをＣＰＵが実行することにより、制御手段４０ｅは、コンテンツ生成部４００及びコンテンツ実行部５００として機能する。 In this embodiment, the CPU executes the singing section matching content program stored in memory, and the control means 40e functions as a content generation unit 400 and a content execution unit 500.

（コンテンツ生成部）
コンテンツ生成部４００は、ある楽曲の歌唱区間の少なくとも１つに対し、特定の歌唱者の当該歌唱区間に対応する歌唱音声データを割り当て、それ以外の他の歌唱区間に対し、特定した他の歌唱者の当該他の歌唱区間に対応する歌唱音声データを割り当てることにより、コンテンツデータを生成する。 (Content Generation Unit)
A content generating section 400 generates content data by allocating singing voice data of a specific singer corresponding to at least one of the singing sections of a certain song, and by allocating singing voice data of other specified singers corresponding to the other singing sections to the other singing sections.

コンテンツデータは、歌唱区間当てコンテンツを実施する際に用いるデータである。 The content data is used when implementing the singing section guessing content.

コンテンツ生成部４００は、選曲された楽曲の歌唱区間の少なくとも１つに対し、特定の歌唱者の当該歌唱区間に対応する歌唱音声データを割り当てる。一方、コンテンツ生成部４００は、それ以外の他の歌唱区間に対しては、特定した他の歌唱者の当該他の歌唱区間に対応する歌唱音声データを割り当てる。コンテンツ生成部４００は、選曲された楽曲の全ての歌唱区間それぞれに対して一つずつ歌唱音声データを割り当てることにより、コンテンツデータを完成させる。 The content generating unit 400 assigns singing voice data of a specific singer corresponding to the singing section to at least one of the singing sections of the selected song. On the other hand, the content generating unit 400 assigns singing voice data of other identified singers corresponding to the other singing sections to the other singing sections. The content generating unit 400 completes the content data by assigning one singing voice data to each of all singing sections of the selected song.

ここで、コンテンツ生成部４００は、ある歌唱区間に対して割り当てる歌唱音声データの選択をランダムに行うことができる。たとえば、ある楽曲の歌唱区間がＡメロ、Ｂメロ、サビの３種類あり、他の歌唱者の歌唱音声に対応する歌唱音声データとして、２つの歌唱音声データ（歌唱音声データＳＤ１及びＳＤ２）が特定されたとする。 Here, the content generating unit 400 can randomly select the singing voice data to be assigned to a certain singing section. For example, suppose that a certain song has three singing sections, an A melody, a B melody, and a chorus, and two singing voice data (singing voice data SD1 and SD2) are identified as singing voice data corresponding to the singing voice of another singer.

この場合、コンテンツ生成部４００は、ある楽曲のＡメロの歌唱区間に対し、特定の歌唱者の歌唱音声データのうち、Ａメロの歌唱区間に対応する歌唱音声データを割り当てる。一方、コンテンツ生成部４００は、ある楽曲のＢメロの歌唱区間に対し、歌唱音声データＳＤ１のうち、Ｂメロの歌唱区間に対応する歌唱音声データを割り当て、ある楽曲のサビの歌唱区間に対し、歌唱音声データＳＤ２のうち、サビの歌唱区間に対応する歌唱音声データを割り当てる。 In this case, the content generating unit 400 assigns, to the singing section of the A melody of a certain song, singing voice data corresponding to the singing section of the A melody from among the singing voice data of a specific singer. On the other hand, the content generating unit 400 assigns, to the singing section of the B melody of a certain song, singing voice data corresponding to the singing section of the B melody from among the singing voice data SD1, and assigns, to the singing section of the chorus of a certain song, singing voice data corresponding to the singing section of the chorus from among the singing voice data SD2.

また、コンテンツ生成部４００は、他の歌唱区間に他の歌唱者の歌唱音声データを割り当てる場合に、当該他の歌唱区間における類似度が最も高い歌唱音声データを割り当てることができる。 In addition, when the content generating unit 400 assigns singing voice data of another singer to another singing section, it can assign the singing voice data that has the highest similarity in that other singing section.

たとえば、上記例において、コンテンツ生成部４００は、Ｂメロの歌唱区間において、他の歌唱者の歌唱音声データＳＤ１及びＳＤ２について、特定部２００で求められた特徴情報の類似度を参照する。コンテンツ生成部４００は、類似度が高い方の特徴情報に対応する他の歌唱者の歌唱音声データを、Ｂメロの歌唱区間に対して割り当てる。同様に、コンテンツ生成部４００は、類似度が高い方の特徴情報に対応する他の歌唱者の歌唱音声データを、サビの歌唱区間に対して割り当てる。 For example, in the above example, the content generation unit 400 refers to the similarity of the characteristic information determined by the identification unit 200 for the singing voice data SD1 and SD2 of the other singer in the singing section of the B melody. The content generation unit 400 assigns the singing voice data of the other singer corresponding to the characteristic information with the higher similarity to the singing section of the B melody. Similarly, the content generation unit 400 assigns the singing voice data of the other singer corresponding to the characteristic information with the higher similarity to the singing section of the chorus.

なお、上記例において、異なる歌唱区間に対して、同じ他の歌唱者の歌唱音声データが割り当てられる場合もありうる。たとえば、Ｂメロの歌唱区間とサビの歌唱区間で同じ他の歌唱者の歌唱音声データＳＤ１が割り当てられる場合もありうる。 In the above example, singing voice data of the same other singer may be assigned to different singing sections. For example, singing voice data SD1 of the same other singer may be assigned to the singing section of the bridge and the singing section of the chorus.

（コンテンツ実行部）
コンテンツ実行部５００は、ある楽曲のカラオケ演奏に併せてコンテンツデータを再生することにより歌唱音声を放音させ、且つ再生終了後、特定の歌唱者の歌唱音声が放音された歌唱区間を選択するための解答画面を表示させる。 (Content Execution Department)
A content execution section 500 reproduces the content data in accordance with the karaoke performance of a certain piece of music to emit a singing voice, and after the reproduction is completed, displays an answer screen for selecting a singing section in which the singing voice of a specific singer is emitted.

コンテンツデータが完成した後、コンテンツ実行部５００は、記憶手段４０ａからある楽曲の伴奏データを読み出し、演奏手段４０ｄを制御して、ある楽曲のカラオケ演奏を開始させる。また、コンテンツ実行部５００は、演奏手段４０ｄを制御して、カラオケ演奏に併せてコンテンツデータを再生させる。スピーカ５０からは、カラオケ演奏に合わせて歌唱音声データに基づく歌唱音声が放音される。解答者は、放音される歌唱音声を聴きながら誰の歌唱であるかを予想する。 After the content data is completed, the content execution unit 500 reads out the accompaniment data of a certain song from the storage means 40a and controls the performance means 40d to start a karaoke performance of the certain song. The content execution unit 500 also controls the performance means 40d to play the content data in sync with the karaoke performance. A singing voice based on the singing voice data is emitted from the speaker 50 in sync with the karaoke performance. The contestant listens to the singing voice that is being emitted and guesses who is singing.

再生終了後、コンテンツ実行部５００は、表示装置６０の表示画面に、特定の歌唱者の歌唱音声が放音された歌唱区間を選択するための解答画面を表示させる。 After playback ends, the content execution unit 500 displays an answer screen on the display screen of the display device 60 for selecting the singing section in which the singing voice of a specific singer is emitted.

解答者は、リモコン装置８０を操作して、正解と考える歌唱区間を選択する。コンテンツ実行部５００は、選択された歌唱区間が特定の歌唱者の歌唱音声が放音された歌唱区間である場合に「正解」を報知し、選択された歌唱区間が特定の歌唱者の歌唱音声が放音された歌唱区間でない場合に「不正解」を報知する。 The contestant operates the remote control device 80 to select the singing section that he or she considers to be the correct answer. The content execution unit 500 announces "correct answer" if the selected singing section is a singing section in which the singing voice of a specific singer is emitted, and announces "incorrect answer" if the selected singing section is not a singing section in which the singing voice of a specific singer is emitted.

たとえば、コンテンツ実行部５００は、表示装置６０の表示画面に「正解」または「不正解」の文字を表示させることにより、正誤の報知を行う。或いは、コンテンツ実行部５００は、スピーカ５０から「正解」または「不正解」の音声を放音させることにより、正誤の報知を行う。 For example, the content execution unit 500 notifies the user of the correct answer by displaying the words "Correct" or "Incorrect" on the display screen of the display device 60. Alternatively, the content execution unit 500 notifies the user of the correct answer by emitting the sound "Correct" or "Incorrect" from the speaker 50.

なお、解答者が所有するスマートフォン等の携帯端末に、歌唱区間当てコンテンツ用のアプリケーションソフトウェアが予めインストールされている場合、解答者は、携帯端末を操作して正解と考える歌唱区間を選択することも可能である。この場合、コンテンツ実行部５００は、携帯端末から送信された選択結果に応じて、「正解」または「不正解」の判定結果を携帯端末に送信する。携帯端末は、受信した判定結果を表示画面に表示させる。 If application software for the singing section guessing content is pre-installed on a mobile device such as a smartphone owned by the contestant, the contestant can operate the mobile device to select the singing section that he or she thinks is correct. In this case, the content execution unit 500 transmits a judgment result of "correct" or "incorrect" to the mobile device according to the selection result transmitted from the mobile device. The mobile device displays the received judgment result on the display screen.

＝＝カラオケシステムにおける処理について＝＝
次に、図５から図９Ｂを参照して本実施形態に係るカラオケシステム１における処理について述べる。図５は、カラオケシステム１における処理を示すフローチャートである。図６は、リモコン装置８０の表示画面を示す。図７は、楽曲Ｘ（後述）の各歌唱区間における類似度を示す。図８は、楽曲Ｘ（後述）の各歌唱区間に対して割り当てた歌唱音声データを示す。図９Ａ及び図９Ｂは、表示装置６０に表示される解答画面を示す。この例では、利用者Ｕ１及び利用者Ｕ２の２名がカラオケ装置Ｋを利用するとする。また、情報記憶部１００は、楽曲の楽曲識別情報、当該楽曲を歌唱した歌唱者の歌唱者識別情報、当該歌唱者が当該楽曲を歌唱した歌唱音声に対応する歌唱音声データ、及び当該歌唱音声データから抽出された特徴情報を紐付けて記憶しているとする。 ==About processing in the karaoke system==
Next, the process in the karaoke system 1 according to this embodiment will be described with reference to Fig. 5 to Fig. 9B. Fig. 5 is a flow chart showing the process in the karaoke system 1. Fig. 6 shows a display screen of the remote control device 80. Fig. 7 shows the similarity in each singing section of a song X (described later). Fig. 8 shows singing voice data assigned to each singing section of a song X (described later). Figs. 9A and 9B show answer screens displayed on the display device 60. In this example, it is assumed that two users U1 and U2 use the karaoke device K. It is also assumed that the information storage unit 100 stores, in association with each other, song identification information of a song, singer identification information of a singer who sang the song, singing voice data corresponding to the singing voice of the singer who sang the song, and feature information extracted from the singing voice data.

一の利用者がカラオケ装置Ｋのリモコン装置８０を操作し、歌唱区間当てコンテンツを選択した場合、カラオケ装置Ｋは、歌唱当てコンテンツを開始する（歌唱区間当てコンテンツを開始。ステップ１０）。 When a user operates the remote control device 80 of the karaoke device K and selects the singing section guessing content, the karaoke device K starts the singing section guessing content (Start singing section guessing content; step 10).

一の利用者は、リモコン装置８０を操作し、歌唱区間当てコンテンツで使用する楽曲を選曲する。カラオケ装置Ｋは、選曲された楽曲の楽曲ＩＤをサーバ装置Ｓに送信する（選曲された楽曲の楽曲ＩＤを送信。ステップ１１）。 A user operates the remote control device 80 to select a song to be used in the singing section guessing content. The karaoke device K transmits the song ID of the selected song to the server device S (transmitting the song ID of the selected song; step 11).

サーバ装置Ｓは、受信した楽曲の楽曲ＩＤに紐付けられている歌唱者ＩＤに基づいて、歌唱者のリストを作成する。サーバ装置Ｓは、作成した歌唱者のリストをカラオケ装置Ｋに送信する（歌唱者のリストを送信。ステップ１２）。 The server device S creates a list of singers based on the singer IDs linked to the song IDs of the received songs. The server device S transmits the created list of singers to the karaoke device K (transmitting list of singers. Step 12).

一の利用者は、リモコン装置８０を操作し、特定の歌唱者を選択する。カラオケ装置Ｋは、選択された特定の歌唱者の歌唱者ＩＤをサーバ装置Ｓに送信する（選択された特定の歌唱者の歌唱者ＩＤを送信。ステップ１３）。 A user operates the remote control device 80 to select a specific singer. The karaoke device K transmits the singer ID of the selected specific singer to the server device S (transmitting the singer ID of the selected specific singer. Step 13).

具体的に、リモコン装置８０は、楽曲の検索リストを表示する。利用者Ｕ１は、検索リストを使用して楽曲Ｘを選曲する。楽曲Ｘは、前奏、１番のＡメロ、１番のＢメロ、１番のサビ、間奏、２番のＡメロ、２番のＢメロ、２番のサビ、間奏、Ｃメロ、サビ、後奏の演奏区間で構成されているとする。 Specifically, the remote control device 80 displays a search list of songs. User U1 uses the search list to select song X. Song X is composed of the following performance sections: introduction, A melody of the first verse, B melody of the first verse, chorus of the first verse, interlude, A melody of the second verse, B melody of the second verse, chorus of the second verse, interlude, C melody, chorus, and epilogue.

カラオケ装置Ｋは、リモコン装置８０を介して利用者Ｕ１が選曲した楽曲Ｘの楽曲ＩＤをサーバ装置Ｓに送信する。サーバ装置Ｓは、情報記憶部１００を参照し、楽曲Ｘの楽曲ＩＤに紐付けられている歌唱者ＩＤに基づいて、歌唱者のリストを作成する。サーバ装置Ｓは、作成した歌唱者のリストをカラオケ装置Ｋに送信する。カラオケ装置Ｋのリモコン装置８０は、歌唱者のリストを表示させる（図６参照）。利用者Ｕ１は、歌唱者のリストの中から利用者Ｕ１自身を選択し、歌唱区間当てコンテンツの実行アイコンを選択する。カラオケ装置Ｋは、選択された利用者Ｕ１の歌唱者ＩＤをサーバ装置Ｓに送信する。この例において、利用者Ｕ１は「出題者」及び「特定の歌唱者」に相当する。 The karaoke device K transmits the song ID of the song X selected by the user U1 to the server device S via the remote control device 80. The server device S refers to the information storage unit 100 and creates a list of singers based on the singer IDs linked to the song ID of the song X. The server device S transmits the created list of singers to the karaoke device K. The remote control device 80 of the karaoke device K displays the list of singers (see FIG. 6). The user U1 selects himself/herself from the list of singers and selects the execution icon for the singing section guessing content. The karaoke device K transmits the singer ID of the selected user U1 to the server device S. In this example, the user U1 corresponds to the "question setter" and the "specific singer".

次に、サーバ装置Ｓの特定部２００は、ステップ１１で送信された楽曲の楽曲ＩＤ及びステップ１３で送信された歌唱者の歌唱者ＩＤに紐付けられている特徴情報と、楽曲の楽曲ＩＤ及び歌唱者とは異なる他の歌唱者の歌唱者ＩＤに紐付けられている特徴情報とを比較することにより求めた、類似度に基づいて、他の歌唱者の歌唱音声に対応する歌唱音声データを少なくとも一つ特定する（他の歌唱者の歌唱音声に対応する歌唱音声データを特定。ステップ１４）。 Next, the identification unit 200 of the server device S identifies at least one piece of singing voice data corresponding to the singing voice of the other singer based on the similarity obtained by comparing the feature information linked to the song ID of the song transmitted in step 11 and the singer ID of the singer transmitted in step 13 with the feature information linked to the song ID of the song and the singer ID of another singer different from the singer (identifying singing voice data corresponding to the singing voice of the other singer; step 14).

サーバ装置Ｓの送信処理部３００は、特定の歌唱者の歌唱音声に対応する歌唱音声データ、及びステップ１４で特定した他の歌唱者の歌唱音声に対応する歌唱音声データをカラオケ装置Ｋに送信する（歌唱音声データを送信。ステップ１５）。 The transmission processing unit 300 of the server device S transmits singing voice data corresponding to the singing voice of the specific singer and singing voice data corresponding to the singing voice of the other singers identified in step 14 to the karaoke device K (transmitting singing voice data; step 15).

具体的に、特定部２００は、情報記憶部１００を参照し、楽曲Ｘの楽曲ＩＤ及び利用者Ｕ１の歌唱者ＩＤに紐付けられている特徴情報ＣＵ１を読み出す。同様に、特定部２００は、楽曲Ｘの楽曲ＩＤ及び楽曲Ｘの楽曲ＩＤが紐付けられている利用者Ｕ１とは異なる他の歌唱者（アーティストＰ、及び利用者Ｕ３～利用者Ｕｎ）の歌唱者ＩＤに紐付けられている特徴情報ＣＰ、ＣＵ３～ＣＵｎを読み出す。 Specifically, the identification unit 200 refers to the information storage unit 100 and reads out characteristic information CU1 linked to the song ID of song X and the singer ID of user U1. Similarly, the identification unit 200 reads out characteristic information CP, CU3 to CUn linked to the song ID of song X and the singer IDs of other singers (artist P, and users U3 to Un) different from user U1 to whom the song ID of song X is linked.

特定部２００は、読み出した特徴情報ＣＵ１と、特徴情報ＣＰ、ＣＵ３～ＣＵｎそれぞれとを比較し、類似度を求める。この例において、特定部２００は、類似度に基づいて、他の歌唱者である利用者Ｕ７～Ｕ９の歌唱音声に対応する歌唱音声データＳＤ７～ＳＤ９を特定したとする。 The identification unit 200 compares the read characteristic information CU1 with each of the characteristic information CP and CU3 to CUn to determine the degree of similarity. In this example, the identification unit 200 identifies singing voice data SD7 to SD9 that correspond to the singing voices of other singers, users U7 to U9, based on the degree of similarity.

送信処理部３００は、カラオケ装置Ｋから受信した利用者Ｕ１の歌唱者ＩＤに基づいて、情報記憶部１００から、当該歌唱者ＩＤに紐付けられている歌唱音声データを読み出す。また、送信処理部３００は、情報記憶部１００から、特定部２００により特定された利用者Ｕ７～Ｕ９の歌唱音声に対応する歌唱音声データＳＤ７～ＳＤ９を読み出す。送信処理部３００は、読み出した歌唱音声データをカラオケ装置Ｋに送信する。 The transmission processing unit 300 reads out the singing voice data linked to the singer ID of user U1 received from the karaoke device K from the information storage unit 100. The transmission processing unit 300 also reads out the singing voice data SD7 to SD9 corresponding to the singing voices of users U7 to U9 identified by the identification unit 200 from the information storage unit 100. The transmission processing unit 300 transmits the read singing voice data to the karaoke device K.

次に、コンテンツ生成部４００は、楽曲の歌唱区間の少なくとも１つに対し、歌唱者の当該歌唱区間に対応する歌唱音声データを割り当て、それ以外の他の歌唱区間に対し、ステップ１４で特定した他の歌唱者の当該他の歌唱区間に対応する歌唱音声データを割り当てることにより、コンテンツデータを生成する（コンテンツデータを生成。ステップ１６）。 Next, the content generation unit 400 generates content data by assigning singing voice data of the singer corresponding to at least one of the singing sections of the song, and by assigning singing voice data of the other singers identified in step 14 corresponding to the other singing sections to the other singing sections (generating content data; step 16).

具体的に、コンテンツ生成部４００は、他の歌唱区間に利用者Ｕ７～Ｕ９の歌唱音声データＳＤ７～ＳＤ９を割り当てる場合に、当該他の歌唱区間における類似度が最も高い歌唱音声データを割り当てる。 Specifically, when the content generating unit 400 assigns the singing voice data SD7 to SD9 of the users U7 to U9 to another singing section, it assigns the singing voice data with the highest similarity in the other singing section.

図７は、特定部２００が求めた、楽曲Ｘの各歌唱区間における類似度を示したものである。この例において、類似度は５段階（数字が大きいほど類似度が高い）で示している。 Figure 7 shows the similarity in each singing section of song X, determined by the determination unit 200. In this example, the similarity is shown in five levels (the higher the number, the higher the similarity).

たとえば、Ａメロの歌唱区間において、歌唱音声データＳＤ７に対応する類似度は「２」であり、歌唱音声データＳＤ８に対応する類似度は「３」であり、歌唱音声データＳＤ９に対応する類似度は「５」である。この場合、コンテンツ生成部４００は、Ａメロの歌唱区間に対して、歌唱音声データＳＤ９のうち、Ａメロの歌唱区間に対応する歌唱音声データを割り当てる。なお、図７に示したサビの歌唱区間のように、同じ類似度を示す歌唱音声データが複数存在する場合もありうる。この場合、コンテンツ生成部４００は、いずれか一つの歌唱音声データを選択して割り当てる。 For example, in the singing section of the A melody, the similarity corresponding to the singing voice data SD7 is "2", the similarity corresponding to the singing voice data SD8 is "3", and the similarity corresponding to the singing voice data SD9 is "5". In this case, the content generation unit 400 assigns the singing voice data from the singing voice data SD9 corresponding to the singing section of the A melody to the singing section of the A melody. Note that there may be multiple singing voice data showing the same similarity, such as the singing section of the chorus shown in Figure 7. In this case, the content generation unit 400 selects and assigns one of the singing voice data.

また、コンテンツ生成部４００は、いずれかの歌唱区間に対して、利用者Ｕ１の歌唱音声データを割り当てる必要がある。ここで、図７に示したように、Ｂメロの歌唱区間においては、歌唱音声データＳＤ７～ＳＤ９いずれの類似度も低くなっている。よって、コンテンツ生成部４００は、Ｂメロの歌唱区間に対して、利用者Ｕ１の歌唱音声データのうち、Ｂメロの歌唱区間に対応する歌唱音声データを割り当てる。 The content generating unit 400 also needs to assign the singing voice data of user U1 to one of the singing sections. Here, as shown in FIG. 7, the similarity between the singing voice data SD7 to SD9 is low in the singing section of the B melody. Therefore, the content generating unit 400 assigns, to the singing section of the B melody, the singing voice data of user U1 that corresponds to the singing section of the B melody.

図８は、楽曲Ｘの各歌唱区間に対して割り当てた歌唱音声データを示している。この例において、楽曲ＸのＡメロの歌唱区間に対しては、歌唱音声データＳＤ９が割り当てられ、楽曲ＸのＢメロの歌唱区間に対しては、利用者Ｕ１の歌唱音声データが割り当てられ、楽曲ＸのＣメロの歌唱区間に対しては、歌唱音声データＳＤ７が割り当てられ、楽曲Ｘのサビの歌唱区間に対しては、歌唱音声データＳＤ８が割り当てられている。 Figure 8 shows the singing voice data assigned to each singing section of song X. In this example, singing voice data SD9 is assigned to the singing section of the A melody of song X, singing voice data of user U1 is assigned to the singing section of the B melody of song X, singing voice data SD7 is assigned to the singing section of the C melody of song X, and singing voice data SD8 is assigned to the singing section of the chorus of song X.

次に、コンテンツ実行部５００は、楽曲のカラオケ演奏に併せてコンテンツデータを再生することにより歌唱音声を放音させる（コンテンツデータに基づく歌唱音声を放音。ステップ１７）。再生終了後（ステップ１８でＹの場合）、コンテンツ実行部５００は、特定の歌唱者の歌唱音声が放音された歌唱区間を選択するための解答画面を表示させる（解答画面を表示。ステップ１９）。解答者の選択に基づいて、コンテンツ実行部５００は、解答の正誤を表示させる（解答の正誤を表示。ステップ２０）。 Next, the content execution unit 500 plays the content data in time with the karaoke performance of the song to emit a singing voice (emit singing voice based on the content data; step 17). After playback is completed (Y in step 18), the content execution unit 500 displays an answer screen for selecting a singing section in which the singing voice of a specific singer is emitted (display answer screen; step 19). Based on the answerer's selection, the content execution unit 500 displays whether the answer is correct or incorrect (display whether the answer is correct or incorrect; step 20).

具体的に、コンテンツ実行部５００は、演奏手段４０ｄを制御して、楽曲Ｘのカラオケ演奏に併せてステップ１６で生成したコンテンツデータを再生させる。スピーカ５０からは、カラオケ演奏に合わせて歌唱音声データに基づく歌唱音声が放音される。 Specifically, the content execution unit 500 controls the performance means 40d to play the content data generated in step 16 in accordance with the karaoke performance of the song X. The speaker 50 emits a singing voice based on the singing voice data in accordance with the karaoke performance.

再生終了後、コンテンツ実行部５００は、表示装置６０の表示画面に、利用者Ｕ１の歌唱音声が放音された歌唱区間を選択するための解答画面を表示させる（図９Ａ参照）。利用者Ｕ２は、リモコン装置８０を操作して、正解と考える歌唱区間を選択する。この例において、利用者Ｕ２は「解答者」に相当する。 After playback is completed, the content execution unit 500 displays an answer screen on the display screen of the display device 60 for selecting the singing section in which the singing voice of user U1 is emitted (see FIG. 9A). User U2 operates the remote control device 80 to select the singing section that he or she considers to be the correct answer. In this example, user U2 corresponds to the "answerer."

選択された歌唱区間が特定の歌唱者の歌唱音声が放音された歌唱区間である場合、コンテンツ実行部５００は、「正解」の文字を解答画面に表示させる（図９Ｂ参照）。 If the selected singing section is a singing section in which the singing voice of a specific singer is emitted, the content execution unit 500 displays the word "Correct" on the answer screen (see Figure 9B).

以上から明らかなように、本実施形態に係るカラオケシステム１は、サーバ装置Ｓとカラオケ装置Ｋとが通信可能に接続されている。サーバ装置Ｓは、楽曲の楽曲識別情報、当該楽曲を歌唱した歌唱者の歌唱者識別情報、当該歌唱者が当該楽曲を歌唱した歌唱音声に対応する歌唱音声データ、及び当該歌唱音声データから抽出された特徴情報を紐付けて記憶する情報記憶部１００と、カラオケ装置Ｋの利用者により、ある楽曲及び特定の歌唱者が選択された場合、当該ある楽曲の楽曲識別情報及び当該特定の歌唱者の歌唱者識別情報に紐付けられている特徴情報と、当該ある楽曲の楽曲識別情報及び当該特定の歌唱者とは異なる他の歌唱者の歌唱者識別情報に紐付けられている特徴情報とを比較することにより求めた、類似度に基づいて、他の歌唱者の歌唱音声に対応する歌唱音声データを少なくとも一つ特定する特定部２００と、特定の歌唱者の歌唱音声に対応する歌唱音声データ、及び特定した他の歌唱者の歌唱音声に対応する歌唱音声データをカラオケ装置Ｋに送信する送信処理部３００と、を有する。カラオケ装置Ｋは、ある楽曲の歌唱区間の少なくとも１つに対し、特定の歌唱者の当該歌唱区間に対応する歌唱音声データを割り当て、それ以外の他の歌唱区間に対し、特定した他の歌唱者の当該他の歌唱区間に対応する歌唱音声データを割り当てることにより、コンテンツデータを生成するコンテンツ生成部４００と、ある楽曲のカラオケ演奏に併せてコンテンツデータを再生することにより歌唱音声を放音させ、且つ再生終了後、特定の歌唱者の歌唱音声が放音された歌唱区間を選択するための解答画面を表示させるコンテンツ実行部５００とを有する。 As is clear from the above, the karaoke system 1 according to this embodiment has a server device S and a karaoke device K connected to each other so that they can communicate with each other. The server device S has an information storage unit 100 that stores, in association with each other, song identification information of a song, singer identification information of a singer who sang the song, singing voice data corresponding to the singing voice of the singer who sang the song, and characteristic information extracted from the singing voice data, and an identification unit 200 that, when a song and a specific singer are selected by a user of the karaoke device K, identifies at least one singing voice data corresponding to the singing voice of the other singer based on the similarity obtained by comparing the song identification information of the song and the characteristic information linked to the singer identification information of the specific singer with the song identification information of the song and the characteristic information linked to the singer identification information of another singer different from the specific singer, and a transmission processing unit 300 that transmits the singing voice data corresponding to the singing voice of the specific singer and the singing voice data corresponding to the identified singing voice of the other singer to the karaoke device K. The karaoke device K has a content generating unit 400 that generates content data by assigning singing voice data corresponding to a specific singer to at least one singing section of a certain song, and by assigning singing voice data corresponding to the other singing sections of the specified other singer to the other singing sections, and a content executing unit 500 that reproduces the content data in accordance with the karaoke performance of a certain song to emit a singing voice, and after the reproduction is completed, displays an answer screen for selecting the singing section in which the singing voice of the specific singer is emitted.

このようなカラオケ装置Ｋによれば、ある楽曲の歌唱区間の少なくとも１つに対し、特定の歌唱者の歌唱音声データを割り当て、それ以外の他の歌唱区間に対し、類似度に基づいて特定した他の歌唱者の歌唱音声データを割り当てることにより生成したコンテンツデータを用いて、歌唱区間当てコンテンツを楽しむことができる。すなわち、本実施形態に係るカラオケ装置Ｋによれば、特定の歌唱者の歌唱音声が放音された歌唱区間を当てる新規なコンテンツを実施できる。 With this karaoke device K, singing section guessing content can be enjoyed using content data generated by assigning singing voice data of a specific singer to at least one singing section of a song, and assigning singing voice data of other singers identified based on similarity to the remaining singing sections. In other words, with the karaoke device K of this embodiment, new content can be implemented in which the singing section in which the singing voice of a specific singer is emitted can be guessed.

また、本実施形態に係るカラオケ装置Ｋにおけるコンテンツ生成部４００は、他の歌唱区間に他の歌唱者の歌唱音声データを割り当てる場合に、当該他の歌唱区間における類似度が最も高い歌唱音声データを割り当てることができる。このようなカラオケ装置Ｋによれば、特定の歌唱者の歌唱音声と似ている、他の歌唱者の歌唱音声に対応する歌唱音声データを用いてコンテンツデータを生成することができる。 In addition, when allocating singing voice data of another singer to another singing section, the content generation unit 400 in the karaoke device K according to this embodiment can allocate singing voice data with the highest similarity in the other singing section. With such a karaoke device K, content data can be generated using singing voice data corresponding to the singing voice of another singer that is similar to the singing voice of a specific singer.

＜変形例＞
上記実施形態において、特定された他の歌唱者の歌唱音声に対応する歌唱音声データの類似度がいずれも低い可能性もありうる。この場合、当該歌唱音声データを用いてコンテンツデータを作成したとしても、特定した歌唱者の歌唱音声との差が明確であるため、コンテンツを十分に楽しむことができない恐れがある。 <Modification>
In the above embodiment, there is a possibility that the similarity of the singing voice data corresponding to the singing voice of the specified other singers is low. In this case, even if the content data is created using the singing voice data, the difference between the singing voice of the specified singer and the singing voice of the specified singer is clear, so that the content may not be fully enjoyed.

そこで、コンテンツ生成部４００は、他の歌唱区間に他の歌唱者の歌唱音声データを割り当てる場合に、当該他の歌唱区間における特定の歌唱者の特徴情報に応じて補正した歌唱音声データを割り当てることができる。補正は、公知の技術（たとえば特開平１０－１８７１８７号公報）を用いることができる。 When the content generating unit 400 assigns singing voice data of another singer to another singing section, the content generating unit 400 can assign singing voice data that has been corrected according to the characteristic information of a specific singer in the other singing section. The correction can be performed using known techniques (for example, JP-A-10-187187).

たとえば実施形態の例において、コンテンツ生成部４００は、ある楽曲のＢメロの歌唱区間に対して歌唱音声データＳＤ１を割り当てる場合に、特定の歌唱者の歌唱音声データから抽出された特徴情報を用い、歌唱音声データＳＤ１のピッチ及び／または音量を特定の歌唱者の歌唱音声データのピッチ及び／または音量に近似させる補正を行う。コンテンツ生成部４００は、補正した歌唱音声データＳＤ１をＢメロの歌唱区間に対して割り当てる。 For example, in an embodiment, when the content generating unit 400 assigns singing voice data SD1 to the singing section of the B melody of a certain song, the content generating unit 400 uses feature information extracted from the singing voice data of a specific singer and performs correction to approximate the pitch and/or volume of the singing voice data SD1 to the pitch and/or volume of the singing voice data of the specific singer. The content generating unit 400 assigns the corrected singing voice data SD1 to the singing section of the B melody.

このようなカラオケ装置Ｋによれば、特定された他の歌唱者の歌唱音声に対応する歌唱音声データの類似度がいずれも低い場合であっても、特定の歌唱者の歌唱音声に似た歌唱音声からなるコンテンツデータを生成することができる。よって、解答者は、歌唱区間当てコンテンツをより楽しむことができる。 With this karaoke device K, even if the similarity of the singing voice data corresponding to the singing voices of the other identified singers is low, it is possible to generate content data consisting of singing voices that resemble the singing voice of a specific singer. This allows contestants to enjoy the singing section guessing content even more.

なお、コンテンツ生成部４００は、類似度が予め設定されている基準値を満たすかどうかを判定し、基準値を満たさないと判定した場合にのみ、補正を行うことでもよい。 In addition, the content generating unit 400 may determine whether the similarity satisfies a preset reference value, and perform correction only if it is determined that the similarity does not satisfy the reference value.

また、カラオケ装置Ｋにおいて、コンテンツを選択する際に、難易度を設定できるようにしてもよい。高い難易度が設定された場合、コンテンツ生成部４００は、類似度が低い歌唱音声データに対しては上述の補正を行い、補正した歌唱音声データを用いてコンテンツデータを生成する。一方、低い難易度が設定された場合、コンテンツ生成部４００は、類似度が低い場合であっても歌唱音声データの補正を行わず、特定された歌唱音声データをそのまま用いてコンテンツデータを生成する。このようなカラオケ装置Ｋによれば、難易度に応じたコンテンツデータを生成することができる。よって、解答者は、歌唱区間当てコンテンツを様々な難易度で楽しむことができる。 In addition, the karaoke device K may be configured to allow the difficulty level to be set when selecting content. When a high difficulty level is set, the content generation unit 400 performs the above-mentioned correction on singing voice data with low similarity and generates content data using the corrected singing voice data. On the other hand, when a low difficulty level is set, the content generation unit 400 does not correct the singing voice data even if the similarity is low, and generates content data using the identified singing voice data as is. With such a karaoke device K, content data can be generated according to the difficulty level. Therefore, contestants can enjoy singing section guessing content with various levels of difficulty.

＜その他＞
上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。＜Other＞
The above embodiment is presented as an example, and does not limit the scope of the invention. The above configurations can be implemented in appropriate combinations, and various omissions, substitutions, and modifications can be made without departing from the gist of the invention. The above embodiment and its modifications are included in the scope of the invention and its equivalents described in the claims, as well as the scope and gist of the invention.

１カラオケシステム
１００情報記憶部
２００特定部
３００送信処理部
４００コンテンツ生成部
５００コンテンツ実行部
Ｋカラオケ装置
Ｓサーバ装置 REFERENCE SIGNS LIST 1 karaoke system 100 information storage unit 200 identification unit 300 transmission processing unit 400 content generation unit 500 content execution unit K karaoke device S server device

Claims

A karaoke system in which a server device and a karaoke device are communicatively connected,
The server device includes:
an information storage unit that stores, in association with each other, song identification information of a song, singer identification information of a singer who sang the song, singing voice data corresponding to the singing voice of the singer who sang the song, and feature information extracted from the singing voice data;
an identification unit which, when a certain song and a certain singer are selected by a user of the karaoke device, identifies at least one singing voice data corresponding to the singing voice of the other singer based on a degree of similarity obtained by comparing feature information linked to song identification information of the certain song and singer identification information of the specific singer with feature information linked to song identification information of the certain song and singer identification information of another singer different from the specific singer;
a transmission processing unit for transmitting singing voice data corresponding to the specific singer's singing voice and singing voice data corresponding to the specified other singer's singing voice to the karaoke device;
having
The karaoke device includes:
a content generating unit that generates content data by allocating singing voice data of the specific singer corresponding to at least one of the singing sections of the certain song, and allocating singing voice data of the other specific singer corresponding to the other singing sections to the other singing sections;
a content execution unit that reproduces the content data in accordance with the karaoke performance of the certain song to emit a singing voice, and after the reproduction ends, displays an answer screen for selecting a singing section in which the singing voice of the specific singer is emitted;
A karaoke system having:

The karaoke system according to claim 1, characterized in that, when the content generation unit assigns the singing voice data of the other singer to the other singing section, it assigns the singing voice data with the highest similarity in the other singing section.

The karaoke system according to claim 1 or 2, characterized in that when the content generating unit assigns the singing voice data of the other singer to the other singing section, the content generating unit assigns singing voice data that has been corrected according to characteristic information of a specific singer in the other singing section.