JP7772056B2

JP7772056B2 - Chord estimation device, training device, chord estimation method, and training method

Info

Publication number: JP7772056B2
Application number: JP2023508892A
Authority: JP
Inventors: 正博鈴木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2021-03-26
Filing date: 2022-03-03
Publication date: 2025-11-18
Anticipated expiration: 2042-03-03
Also published as: CN117043852A; WO2022202199A1; JP2025172909A; JPWO2022202199A1

Description

本発明は、楽器を演奏するためのコードを推定するコード推定装置および方法、並びに、コード推定装置を構築するための訓練装置および方法に関する。 The present invention relates to a chord estimation device and method for estimating chords for playing a musical instrument, and a training device and method for building a chord estimation device.

コードが付記された楽譜がある。演奏者は、ピアノ、ギターなどの楽器を用いてコードを演奏することで、楽器演奏を楽しむことができる。コード付きの楽譜を制作するにあたって、制作者は、音符で示されるメロディ、伴奏音などに基づいてコードを付与する作業を行う。コードを付与する作業は音楽的知識とセンスが必要とされる。下記特許文献１においては、演奏情報または音響信号からコードを推定するコード進行推定検出装置が開示されている。 There are musical scores with chords added. Performers can enjoy playing musical instruments by playing the chords on instruments such as piano and guitar. When creating musical scores with chords, the creator adds chords based on the melody and accompaniment indicated by the notes. The task of adding chords requires musical knowledge and sense. Patent Document 1 listed below discloses a chord progression estimation and detection device that estimates chords from performance information or audio signals.

特許第６１５１１２１号公報Patent No. 6151121

特許文献１においては、特定区間毎にコードが推定される。例えば、小節毎に１つのコードが推定される。与えられた音符から、さらに自由度の高いコード推定を行うことができれば、コード付き楽譜の制作を、より適切に支援可能となると期待される。In Patent Document 1, chords are estimated for each specific section. For example, one chord is estimated for each measure. If chord estimation could be performed with greater flexibility from given notes, it is expected that more appropriate support could be provided for the creation of sheet music with chords.

本発明の目的は、音符列に基づいて自由度の高いコード推定を行うことである。 The object of the present invention is to perform highly flexible chord estimation based on a sequence of notes.

本発明の一局面に従うコード推定装置は、複数の音符からなる音符列を含む時系列データを受け付ける受付部と、訓練済モデルを用い、時系列データに基づいて、音符列に対応するコード列を示すコード列情報を推定する推定部とを備える。 A chord estimation device according to one aspect of the present invention includes a receiving unit that receives time series data including a note sequence consisting of a plurality of notes, and an estimation unit that uses a trained model to estimate chord sequence information indicating a chord sequence corresponding to the note sequence based on the time series data.

本発明の他の局面に従う訓練装置は、複数の音符からなる参照音符列を含む入力時系列データを取得する第１の取得部と、参照音符列に対応するコード列を示す出力コード列情報を取得する第２の取得部と、入力時系列データと出力コード列情報との間の入出力関係を習得した訓練済モデルを構築する構築部とを備える。 A training device according to another aspect of the present invention comprises a first acquisition unit that acquires input time series data including a reference note sequence consisting of a plurality of notes, a second acquisition unit that acquires output chord sequence information indicating a chord sequence corresponding to the reference note sequence, and a construction unit that constructs a trained model that has acquired the input/output relationship between the input time series data and the output chord sequence information.

本発明のさらに他の局面に従うコード推定方法は、コンピュータにより実行され、複数の音符からなる音符列を含む時系列データを受け付け、訓練済モデルを用い、時系列データに基づいて、音符列に対応するコード列を示すコード列情報を推定する。 A chord estimation method according to yet another aspect of the present invention is executed by a computer, accepts time series data including a sequence of notes consisting of a plurality of notes, and uses a trained model to estimate chord sequence information indicating a chord sequence corresponding to the sequence of notes based on the time series data.

本発明のさらに他の局面に従う訓練方法は、コンピュータにより実行され、複数の音符からなる参照音符列を含む入力時系列データを取得し、参照音符列に対応するコード列を示す出力コード列情報を取得し、入力時系列データと出力コード列情報との間の入出力関係を習得した訓練済モデルを構築する。 A training method according to yet another aspect of the present invention is executed by a computer, acquires input time series data including a reference note sequence consisting of a plurality of notes, acquires output chord sequence information indicating a chord sequence corresponding to the reference note sequence, and constructs a trained model that has acquired the input/output relationship between the input time series data and the output chord sequence information.

本発明によれば、音符列に基づいて自由度の高いコード推定を行うことができる。 According to the present invention, highly flexible chord estimation can be performed based on a sequence of notes.

図１は本発明の一実施の形態に係るコード推定装置および訓練装置を含む処理システムの構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and a training device according to an embodiment of the present invention. 図２は訓練データに含まれる入力時系列データの一例を示す図である。FIG. 2 is a diagram showing an example of input time series data included in the training data. 図３は訓練データに含まれる出力コード列情報の一例を示す図である。FIG. 3 is a diagram showing an example of output code string information included in the training data. 図４は訓練装置およびコード推定装置の構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of the training device and the chord estimation device. 図５は表示部に表示されるアレンジ楽譜の一例を示す。FIG. 5 shows an example of an arranged musical score displayed on the display unit. 図６は訓練処理の一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the training process. 図７はコード推定処理の一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of the chord estimation process. 図８は訓練データに含まれる出力コード列情報の変形例を示す図である。FIG. 8 shows a modified example of the output code string information included in the training data.

（１）処理システムの構成
以下、本発明の実施の形態に係るコード推定装置、訓練装置、コード推定方法および訓練方法について図面を用いて詳細に説明する。図１は、本発明の一実施の形態に係るコード推定装置および訓練装置を含む処理システムの構成を示すブロック図である。図１に示すように、処理システム１００は、ＲＡＭ（ランダムアクセスメモリ）１１０、ＲＯＭ（リードオンリメモリ）１２０、ＣＰＵ（中央演算処理装置）１３０、記憶部１４０、操作部１５０および表示部１６０を備える。 (1) Configuration of the Processing System A chord estimation device, training device, chord estimation method, and training method according to embodiments of the present invention will now be described in detail with reference to the drawings. Fig. 1 is a block diagram showing the configuration of a processing system including a chord estimation device and training device according to one embodiment of the present invention. As shown in Fig. 1, the processing system 100 includes a RAM (random access memory) 110, a ROM (read only memory) 120, a CPU (central processing unit) 130, a storage unit 140, an operation unit 150, and a display unit 160.

処理システム１００は、パーソナルコンピュータ、タブレット端末またはスマートフォン等のコンピュータにより実現される。あるいは、処理システム１００は、イーサネット等の通信路により接続された複数のコンピュータの共同動作により実現されてもよいし、電子ピアノ等の演奏機能を備えた電子楽器により実現されてもよい。 The processing system 100 is realized by a computer such as a personal computer, tablet terminal, or smartphone. Alternatively, the processing system 100 may be realized by the cooperative operation of multiple computers connected by a communication path such as Ethernet, or may be realized by an electronic musical instrument with a performance function such as an electronic piano.

ＲＡＭ１１０、ＲＯＭ１２０、ＣＰＵ１３０、記憶部１４０、操作部１５０および表示部１６０は、バス１７０に接続される。ＲＡＭ１１０、ＲＯＭ１２０およびＣＰＵ１３０により訓練装置１０およびコード推定装置２０が構成される。本実施の形態では、訓練装置１０とコード推定装置２０とは共通の処理システム１００により構成されるが、別個の処理システムにより構成されてもよい。 RAM 110, ROM 120, CPU 130, memory unit 140, operation unit 150 and display unit 160 are connected to bus 170. RAM 110, ROM 120 and CPU 130 constitute the training device 10 and chord estimation device 20. In this embodiment, the training device 10 and chord estimation device 20 are constituted by a common processing system 100, but may also be constituted by separate processing systems.

ＲＡＭ１１０は、例えば揮発性メモリからなり、ＣＰＵ１３０の作業領域として用いられる。ＲＯＭ１２０は、例えば不揮発性メモリからなり、訓練プログラムおよびコード推定プログラムを記憶する。ＣＰＵ１３０は、ＲＯＭ１２０に記憶された訓練プログラムをＲＡＭ１１０上で実行することにより訓練処理を行う。また、ＣＰＵ１３０は、ＲＯＭ１２０に記憶されたコード推定プログラムをＲＡＭ１１０上で実行することによりコード推定処理を行う。訓練処理およびコード推定処理の詳細については後述する。 RAM 110 is made of, for example, volatile memory and is used as a working area for CPU 130. ROM 120 is made of, for example, non-volatile memory and stores a training program and a chord estimation program. CPU 130 performs training processing by executing the training program stored in ROM 120 on RAM 110. CPU 130 also performs chord estimation processing by executing the chord estimation program stored in ROM 120 on RAM 110. Details of the training processing and chord estimation processing will be described later.

訓練プログラムまたはコード推定プログラムは、ＲＯＭ１２０ではなく記憶部１４０に記憶されてもよい。あるいは、訓練プログラムまたはコード推定プログラムは、コンピュータが読み取り可能な記憶媒体に記憶された形態で提供され、ＲＯＭ１２０または記憶部１４０にインストールされてもよい。あるいは、処理システム１００がインターネット等のネットワークに接続されている場合には、当該ネットワーク上のサーバ（クラウドサーバを含む。）から配信された訓練プログラムまたはコード推定プログラムがＲＯＭ１２０または記憶部１４０にインストールされてもよい。The training program or chord estimation program may be stored in the memory unit 140 instead of the ROM 120. Alternatively, the training program or chord estimation program may be provided in a form stored on a computer-readable storage medium and installed in the ROM 120 or the memory unit 140. Alternatively, if the processing system 100 is connected to a network such as the Internet, the training program or chord estimation program may be distributed from a server (including a cloud server) on the network and installed in the ROM 120 or the memory unit 140.

記憶部１４０は、ハードディスク、光学ディスク、磁気ディスクまたはメモリカード等の記憶媒体を含み、訓練済モデルＭおよび複数の訓練データＤを記憶する。訓練済モデルＭまたは各訓練データＤは、記憶部１４０に記憶されず、コンピュータが読み取り可能な記憶媒体に記憶されていてもよい。あるいは、処理システム１００がネットワークに接続されている場合には、訓練済モデルＭまたは各訓練データＤは、当該ネットワーク上のサーバに記憶されていてもよい。 The storage unit 140 includes a storage medium such as a hard disk, optical disk, magnetic disk, or memory card, and stores the trained model M and multiple pieces of training data D. The trained model M or each piece of training data D may not be stored in the storage unit 140, but may be stored in a computer-readable storage medium. Alternatively, if the processing system 100 is connected to a network, the trained model M or each piece of training data D may be stored on a server on the network.

（２）訓練データ
訓練済モデルＭは、コード推定装置２０の使用者（以下、演奏者と呼ぶ。）が楽曲を演奏するときに参照するコード列を提示するために訓練された機械学習モデルである。訓練済モデルＭは、複数の訓練データＤを用いて構築される。訓練装置１０の使用者は、操作部１５０を操作することにより、訓練データＤを生成することができる。訓練データＤは、参照演奏者の音楽的知識または音楽的センス等に基づいて作成されたデータである。参照演奏者は、楽曲の演奏に関して比較的高い技量を有する。参照演奏者は、楽曲の演奏における演奏者の指導者または師であってもよい。 (2) Training Data The trained model M is a machine learning model trained to present chord sequences that a user of the chord estimation device 20 (hereinafter referred to as a performer) can refer to when playing a piece of music. The trained model M is constructed using multiple sets of training data D. A user of the training device 10 can generate the training data D by operating the operation unit 150. The training data D is data created based on the musical knowledge or musical sense of a reference performer. The reference performer has a relatively high level of skill in playing a piece of music. The reference performer may be the performer's instructor or teacher in playing a piece of music.

訓練データＤは、入力時系列データと出力コード列情報との組を示す。入力時系列データは、複数の音符からなる参照音符列を示す。例えば、入力時系列データは、複数の音符によってメロディや伴奏音を構成するデータである。入力時系列データは楽譜の画像を示す画像データであってもよい。出力コード列情報は、参照音符列に対応するコードが時系列に配置されたデータである。参照音符列に対応するコード列は、参照演奏者により付与される。 Training data D represents a set of input time series data and output chord sequence information. The input time series data represents a reference note sequence consisting of multiple notes. For example, the input time series data is data that comprises a melody or accompaniment sound made up of multiple notes. The input time series data may be image data representing an image of musical score. The output chord sequence information is data in which chords corresponding to the reference note sequence are arranged in time series. The chord sequence corresponding to the reference note sequence is assigned by a reference performer.

図２および図３は、各訓練データＤの一例を示す図である。図２の例は、複数の音符からなる参照音符列を含む入力時系列データを示す。図３の例は、参照音符列に対応するコード列を示す出力コード列情報を示す。 Figures 2 and 3 show examples of each training data D. The example in Figure 2 shows input time series data including a reference note sequence consisting of multiple notes. The example in Figure 3 shows output chord sequence information indicating the chord sequence corresponding to the reference note sequence.

本実施の形態においては、入力時系列データは、参照音符列に加えて、拍節構造および付加情報を有する。図２に示す入力時系列データＡは、曲の先頭の２小節分のデータを抜粋したデータである。入力時系列データＡは、“ｂａｒ”によって小節が区切られ、“ｂｅａｔ”によって拍が区切られている。このように、入力時系列データＡは、“ｂａｒ”および“ｂｅａｔ”情報により拍節構造を備える。要素Ａ１～Ａ３７は、最初の１小節の参照音符列を示す。つまり、要素Ａ１～Ａ３７は、要素Ａ１の前の“ｂａｒ”と要素Ａ３７の後の“ｂａｒ”によって小節に区切られている。また、要素Ａ８、Ａ１８、Ａ２６の後の“ｂｅａｔ”によって拍に区切られている。 In this embodiment, the input time series data has a metrical structure and additional information in addition to a reference note sequence. Input time series data A shown in Figure 2 is an excerpt of the first two bars of a song. Input time series data A is divided into bars by "bars" and into beats by "beats." In this way, input time series data A has a metrical structure based on "bar" and "beat" information. Elements A1 to A37 indicate the reference note sequence of the first bar. In other words, elements A1 to A37 are divided into bars by the "bar" before element A1 and the "bar" after element A37. Furthermore, elements A8, A18, and A26 are divided into beats by the "beats" after them.

要素Ａ０は、付加情報である。付加情報としては、例えば、調情報、ジャンル情報、難易度情報などが利用される。図２の例では、Ｋｅｙ要素により調情報が付加されている。調情報は、参照音符列で表現される音楽の調を指定する情報である。Ｋｅｙに続く数値は調を指定する数値である。付加情報として調情報が指定されることにより、参照音符列および調に応じたコード列が機械学習される。ジャンル情報は、参照音符列で表現される音楽のジャンルを指定する情報である。ジャンル情報としては、例えば、ロック、ポップス、ジャズなどのジャンルが指定される。付加情報としてジャンル情報が指定されることにより、参照音符列およびジャンルに応じたコード列が機械学習される。難易度情報は、参照音符列で示される楽譜の難易度を示す情報である。付加情報として難易度情報が指定されることにより、参照音符列および楽譜の難易度に応じたコード列が機械学習される。例えば、低難易度の楽譜であれば少ない音数から音符の補間を行いつつ機械学習が行われる。また、高難易度の楽譜であれば過剰な音数の中からコードを構成する音符を選択しつつ機械学習が行われる。 Element A0 is additional information. Examples of additional information include key information, genre information, and difficulty level information. In the example of Figure 2, key information is added using the Key element. Key information specifies the key of the music represented by the reference note sequence. The number following Key specifies the key. By specifying key information as additional information, a chord sequence corresponding to the reference note sequence and the key is machine-learned. Genre information specifies the genre of the music represented by the reference note sequence. Examples of genre information include rock, pop, and jazz. By specifying genre information as additional information, a chord sequence corresponding to the reference note sequence and the genre is machine-learned. Difficulty information indicates the difficulty level of the musical score represented by the reference note sequence. By specifying difficulty information as additional information, a chord sequence corresponding to the reference note sequence and the difficulty level of the musical score is machine-learned. For example, for a low-difficulty score, machine learning is performed by interpolating notes from a small number of notes. On the other hand, for a high-difficulty score, machine learning is performed by selecting notes that constitute chords from an excessive number of notes.

入力時系列データＡの要素のうち、要素Ａ０、“ｂａｒ”および“ｂｅａｔ”以外の要素は、参照音符列に対応する。要素Ａ１～Ａ３７は、１小節目の参照音符列を示す。本例では、要素Ａ０は入力時系列データＡにおける先頭、すなわち参照音符列（要素Ａ１～Ａ３７）の前に配置されるが、入力時系列データＡにおける任意の位置に配置されてもよい。 Of the elements of the input time series data A, elements other than element A0, "bar", and "beat" correspond to the reference note sequence. Elements A1 to A37 indicate the reference note sequence for the first bar. In this example, element A0 is placed at the beginning of the input time series data A, i.e., before the reference note sequence (elements A1 to A37), but it may be placed at any position in the input time series data A.

要素Ａ１～Ａ３７に例示するように、参照音符列において、“Ｌ”は左手を意味し、“Ｒ”は右手を意味し、“Ｌ”または“Ｒ”に続く数字は音階を意味する。また、“ｏｎ”および“ｏｆｆ”はそれぞれ押鍵および離鍵を意味する。また、“ｗａｉｔ”は待機を意味し、“ｗａｉｔ”に続く数字は時間の長さを意味する。したがって、要素Ａ１～Ａ５は、右手で音階７７および音階７４の鍵を押すと同時に、左手で音階５３と音階４６の鍵を同時に押した後、１１単位時間だけ維持することを示す。そして、１１単位時間だけ維持した後、要素Ａ６～Ａ８は、左手の音階５３と音階４６の鍵を同時に離した後、１単位時間だけ維持することを示す。そして、１単位時間だけ維持した後、要素Ａ９～要素Ａ１１は、左手で音階５３と音階４６を再び押した後、５単位時間だけ待機することを示す。As illustrated by elements A1 through A37, in the reference note sequence, "L" represents the left hand, "R" represents the right hand, and the number following "L" or "R" represents the scale. Furthermore, "on" and "off" represent key depression and key release, respectively. Furthermore, "wait" represents waiting, and the number following "wait" represents the length of time. Therefore, elements A1 through A5 indicate that the right hand simultaneously presses keys 77 and 74, while the left hand simultaneously presses keys 53 and 46, and then maintains these positions for 11 time units. After maintaining these positions for 11 time units, elements A6 through A8 indicate that the left hand simultaneously releases keys 53 and 46, and then maintains these positions for one time unit. After maintaining these positions for one time unit, elements A9 through A11 indicate that the left hand again presses keys 53 and 46, and then waits for five time units.

図３に示す出力コード列情報Ｂは、入力時系列データＡに含まれる参照音符列に対応するコード列を示す。入力時系列データＡの要素Ａ１～Ａ３７に対応するコード列は、要素Ｂ１～Ｂ３および要素Ｂ４～Ｂ６で表されている。つまり、要素Ｂ１～Ｂ６は、入力時系列データＡの１小節目に対応するコード列を示す。出力コード列情報Ｂにおいても、“ｂａｒ”によって小節が区切られ、“ｂｅａｔ”によって拍が区切られている。要素Ｂ１の前の“ｂａｒ”および要素Ｂ６の後の“ｂａｒ”によって区切られた範囲が１小節目に対応している。 The output chord sequence information B shown in Figure 3 indicates a chord sequence corresponding to the reference note sequence contained in the input time series data A. The chord sequence corresponding to elements A1 to A37 of the input time series data A is represented by elements B1 to B3 and elements B4 to B6. In other words, elements B1 to B6 indicate the chord sequence corresponding to the first measure of the input time series data A. In the output chord sequence information B, measures are also separated by "bar" and beats are separated by "beat." The range separated by the "bar" before element B1 and the "bar" after element B6 corresponds to the first measure.

出力コード列情報Ｂにおいて、１つのコードは３つの要素で示される。要素Ｂ１～Ｂ３において１小節目の１拍目のコードが規定される。要素Ｂ４～Ｂ６において１小節目の４拍目のコードが規定される。要素Ｂ７～Ｂ９において４小節目の１拍目のコードが規定される。コードを示す３つの要素のうち、１番目の要素（Ｂ１，Ｂ４，Ｂ７）は、基本コード情報を示す。基本コード情報（ｃｈｏｒｄ）は、１２音（Ｃ，Ｃ＃，Ｄ，Ｄ＃，・・・Ａ，Ａ＃，Ｂ）それぞれについてメジャーコードおよびマイナーコードの種別を指定する１～２４の数値を示す。コードを示す３つの要素のうち、２番目の要素（Ｂ２，Ｂ５，Ｂ８）は、コードタイプ情報を示す。コードタイプ情報（ｔｙｐｅ）は、テンションコードの種別を指定する数値を示す。コードを示す３つの要素のうち、３番目の要素（Ｂ３，Ｂ６，Ｂ９）は、コードルート情報を示す。コードルート情報（ｒｏｏｔ）は、オンコードのルート音を指定する数値を示す。 In output chord sequence information B, one chord is represented by three elements. Elements B1 to B3 specify the chord on the first beat of the first measure. Elements B4 to B6 specify the chord on the fourth beat of the first measure. Elements B7 to B9 specify the chord on the first beat of the fourth measure. Of the three elements indicating a chord, the first element (B1, B4, B7) indicates basic chord information. Basic chord information (chord) indicates a number from 1 to 24 specifying the type of major chord or minor chord for each of the 12 notes (C, C#, D, D#, ... A, A#, B). Of the three elements indicating a chord, the second element (B2, B5, B8) indicates chord type information. Chord type information (type) indicates a number specifying the type of tension chord. Of the three elements indicating a chord, the third element (B3, B6, B9) indicates chord root information. The chord root information (root) indicates a numerical value that specifies the root note of the on-chord.

（３）訓練装置およびコード推定装置
図４は、訓練装置１０およびコード推定装置２０の構成を示すブロック図である。図４に示すように、訓練装置１０は、機能部として、第１の取得部１１、第２の取得部１２および構築部１３を含む。図１のＣＰＵ１３０が訓練プログラムを実行することにより、訓練装置１０の機能部が実現される。訓練装置１０の機能部の少なくとも一部は、電子回路等のハードウエアにより実現されてもよい。 (3) Training Device and Chord Estimation Device Fig. 4 is a block diagram showing the configurations of the training device 10 and the chord estimation device 20. As shown in Fig. 4, the training device 10 includes, as functional units, a first acquisition unit 11, a second acquisition unit 12, and a construction unit 13. The functional units of the training device 10 are realized by the CPU 130 in Fig. 1 executing a training program. At least a part of the functional units of the training device 10 may be realized by hardware such as electronic circuits.

第１の取得部１１は、記憶部１４０等に記憶された各訓練データＤから入力時系列データＡを取得する。第２の取得部１２は、各訓練データＤから出力コード列情報Ｂを取得する。構築部１３は、各訓練データＤについて、第１の取得部１１により取得された入力時系列データＡを入力要素とし、第２の取得部１２により取得された出力コード列情報Ｂを出力要素とする機械学習を行う。複数の訓練データＤについて機械学習を繰り返すことにより、構築部１３は、入力時系列データＡと出力コード列情報Ｂとの間の入出力関係を示す訓練済モデルＭを構築する。 The first acquisition unit 11 acquires input time series data A from each training data D stored in the memory unit 140, etc. The second acquisition unit 12 acquires output code sequence information B from each training data D. The construction unit 13 performs machine learning for each training data D, using the input time series data A acquired by the first acquisition unit 11 as an input element and the output code sequence information B acquired by the second acquisition unit 12 as an output element. By repeating machine learning for multiple training data D, the construction unit 13 constructs a trained model M that indicates the input/output relationship between the input time series data A and the output code sequence information B.

本例では、構築部１３はＴｒａｎｓｆｏｒｍｅｒを訓練することにより訓練済モデルＭを構築するが、実施の形態はこれに限定されない。構築部１３は、時系列を扱う他の方式の機械学習モデルを訓練することにより訓練済モデルＭを構築してもよい。構築部１３により構築された訓練済モデルＭは、例えば記憶部１４０に記憶される。構築部１３により構築された訓練済モデルＭは、ネットワーク上のサーバ等に記憶されてもよい。 In this example, the construction unit 13 constructs the trained model M by training the Transformer, but the embodiment is not limited to this. The construction unit 13 may also construct the trained model M by training another type of machine learning model that handles time series. The trained model M constructed by the construction unit 13 is stored, for example, in the storage unit 140. The trained model M constructed by the construction unit 13 may also be stored on a server on a network, etc.

コード推定装置２０は、機能部として、受付部２１、推定部２２および生成部２３を含む。図１のＣＰＵ１３０がコード推定プログラムを実行することにより、コード推定装置２０の機能部が実現される。コード推定装置２０の機能部の少なくとも一部は、電子回路等のハードウエアにより実現されてもよい。 The chord estimation device 20 includes, as functional units, a reception unit 21, an estimation unit 22, and a generation unit 23. The functional units of the chord estimation device 20 are realized by the CPU 130 in Figure 1 executing a chord estimation program. At least some of the functional units of the chord estimation device 20 may be realized by hardware such as electronic circuits.

本実施の形態では、受付部２１は、複数の音符からなる音符列を含む時系列データを受け付ける。演奏者は、楽譜の画像を示す画像データを時系列データとして受付部２１に与えることができる。あるいは、演奏者は、操作部１５０を操作することにより時系列データを生成し、受付部２１に与えることができる。本例では、時系列データは、図２の入力時系列データＡと同様の構成を有する。つまり、時系列データは、音符列に加えて、拍節構造および付加情報を有する。 In this embodiment, the receiving unit 21 receives time series data including a sequence of notes consisting of multiple notes. The performer can provide image data showing an image of a musical score to the receiving unit 21 as time series data. Alternatively, the performer can generate time series data by operating the operation unit 150 and provide it to the receiving unit 21. In this example, the time series data has the same structure as the input time series data A in Figure 2. In other words, the time series data has a metrical structure and additional information in addition to a sequence of notes.

推定部２２は、記憶部１４０等に記憶された訓練済モデルＭを用いてコード列情報を推定する。コード列情報は、受付部２１により受け付けられた音符列に対応するコード列を示し、音符列および付加情報に基づいて推定される。時系列データが、入力時系列データＡと同様の構成を有することにより、コード列情報は出力コード列情報Ｂと同様の構成を有する。生成部２３は、受付部２１により受け付けられた時系列データの音符列と、推定部２２により推定されたコード列情報とに基づいて楽譜情報を生成する。例えば、楽譜情報は、ピアノのアレンジ楽譜の情報であり、五線譜の上にコード情報が付記されたデータである。あるいは、楽譜情報は、コード列情報が付加されたＭＩＤＩデータである。 The estimation unit 22 estimates chord sequence information using a trained model M stored in the memory unit 140 or the like. The chord sequence information indicates a chord sequence corresponding to the note sequence received by the reception unit 21, and is estimated based on the note sequence and additional information. Because the time series data has a structure similar to that of the input time series data A, the chord sequence information has a structure similar to that of the output chord sequence information B. The generation unit 23 generates musical score information based on the note sequence of the time series data received by the reception unit 21 and the chord sequence information estimated by the estimation unit 22. For example, the musical score information is information on an arranged piano score, and is data in which chord information is added to a staff. Alternatively, the musical score information is MIDI data to which chord sequence information has been added.

表示部１６０には、生成部２３により生成された楽譜情報に基づいてコード付き楽譜が表示される。図５は、表示部１６０に表示されるコード付き楽譜の一例を示す。図５に示すように、コード付き楽譜には、推定部２２により推定されたコード列情報が受付部２１により受け付けられた音符列の各音符に対応するように示される。The display unit 160 displays a chord-added musical score based on the musical score information generated by the generation unit 23. Figure 5 shows an example of a chord-added musical score displayed on the display unit 160. As shown in Figure 5, the chord-added musical score displays the chord sequence information estimated by the estimation unit 22 in correspondence with each note of the note sequence accepted by the acceptance unit 21.

（４）訓練処理およびコード推定処理
図６は、図４の訓練装置１０による訓練処理の一例を示すフローチャートである。図６の訓練処理は、図１のＣＰＵ１３０が訓練プログラムを実行することにより行われる。まず、第１の取得部１１は、各訓練データＤから入力時系列データＡを取得する（ステップＳ１）。また、第２の取得部１２は、各訓練データＤから出力コード列情報Ｂを取得する（ステップＳ２）。ステップＳ１，Ｓ２は、いずれが先に実行されてもよいし、同時に実行されてもよい。 (4) Training Process and Chord Estimation Process Figure 6 is a flowchart showing an example of training process by the training device 10 of Figure 4. The training process of Figure 6 is performed by the CPU 130 of Figure 1 executing a training program. First, the first acquisition unit 11 acquires input time-series data A from each training data D (step S1). Furthermore, the second acquisition unit 12 acquires output chord string information B from each training data D (step S2). Steps S1 and S2 may be executed first, or may be executed simultaneously.

次に、構築部１３は、各訓練データＤについて、ステップＳ１で取得された入力時系列データＡを入力要素とし、ステップＳ２で取得された出力コード列情報Ｂを出力要素とする機械学習を行う（ステップＳ３）。続いて、構築部１３は、十分な機械学習が実行されたか否かを判定する（ステップＳ４）。機械学習が不十分な場合、構築部１３はステップＳ３に戻る。十分な機械学習が実行されるまで、パラメータが変化されつつステップＳ３，Ｓ４が繰り返される。機械学習の繰り返し回数は、構築される訓練済モデルＭが満たすべき品質条件に応じて変化する。Next, the construction unit 13 performs machine learning for each training data D, using the input time-series data A acquired in step S1 as the input element and the output code string information B acquired in step S2 as the output element (step S3). Subsequently, the construction unit 13 determines whether sufficient machine learning has been performed (step S4). If the machine learning is insufficient, the construction unit 13 returns to step S3. Steps S3 and S4 are repeated while changing the parameters until sufficient machine learning has been performed. The number of times the machine learning is repeated varies depending on the quality conditions that the trained model M to be constructed must satisfy.

十分な機械学習が実行された場合、構築部１３は、ステップＳ３の機械学習により習得した入力時系列データＡと出力コード列情報Ｂとの間の入出力関係を訓練済モデルＭとして保存する（ステップＳ５）。これにより、訓練処理が終了する。 When sufficient machine learning has been performed, the construction unit 13 saves the input/output relationship between the input time series data A and the output code sequence information B acquired through the machine learning in step S3 as a trained model M (step S5). This completes the training process.

図７は、図４のコード推定装置２０によるコード推定処理の一例を示すフローチャートである。図７のコード推定処理は、図１のＣＰＵ１３０がコード推定プログラムを実行することにより行われる。まず、受付部２１は、時系列データを受け付ける（ステップＳ１１）。次に、推定部２２は、訓練処理のステップＳ５で保存された訓練済モデルＭを用いて、ステップＳ１１で受け付けられた時系列データからコード列情報を推定する（ステップＳ１２）。このとき、時系列データに含まれる音符列からは１つ、または複数のコード列を含むコード列情報が推定されるので、自由度の高いコード推定が行われる。また、時間的流れの中でコードチェンジのタイミングも推定されるので、より適切なコード推定が行われる。つまり、時系列データにはコードチェンジの区切りとなる情報は含まれていないが、推定部２２は、コードチェンジのタイミングを含めたコード推定を行う。 Figure 7 is a flowchart showing an example of chord estimation processing by the chord estimation device 20 of Figure 4. The chord estimation processing of Figure 7 is performed by the CPU 130 of Figure 1 executing a chord estimation program. First, the reception unit 21 receives time-series data (step S11). Next, the estimation unit 22 estimates chord sequence information from the time-series data received in step S11 using the trained model M saved in step S5 of the training process (step S12). At this time, chord sequence information including one or more chord sequences is estimated from the sequence of notes included in the time-series data, allowing for highly flexible chord estimation. Furthermore, the timing of chord changes over time is also estimated, allowing for more appropriate chord estimation. In other words, although the time-series data does not include information that marks the boundaries of chord changes, the estimation unit 22 performs chord estimation that includes the timing of chord changes.

その後、生成部２３は、ステップＳ１１で受け付けられた時系列データの音符列およびステップＳ１２で推定されたコード列情報に基づいて楽譜情報を生成する（ステップＳ１３）。生成された楽譜情報に基づいて、コード付き楽譜が表示部１６０に表示されてもよい。これにより、コード推定処理が終了する。The generation unit 23 then generates musical score information based on the note sequence of the time-series data received in step S11 and the chord sequence information estimated in step S12 (step S13). Based on the generated musical score information, a musical score with chords may be displayed on the display unit 160. This completes the chord estimation process.

（５）実施の形態の効果
以上説明したように、本実施の形態に係るコード推定装置２０は、複数の音符からなる音符列を含む時系列データを受け付ける受付部２１と、訓練済モデルＭを用いて、音符列に対応するコード列を示すコード列情報を推定する推定部２２とを備える。この構成によれば、訓練済モデルＭを用いて、時系列データにおける複数の音符の時間的流れから適切なコード列情報が推定される。これにより、音符列を含む時系列データに基づいてコード付き楽譜を提示することができる。音符列からは１つ、または複数のコード列が推定されるので、自由度の高いコード推定が行われる。 (5) Effects of the Embodiment As described above, the chord estimation device 20 according to the present embodiment includes a receiving unit 21 that receives time-series data including a note sequence consisting of a plurality of notes, and an estimation unit 22 that uses a trained model M to estimate chord sequence information indicating a chord sequence corresponding to the note sequence. With this configuration, the trained model M is used to estimate appropriate chord sequence information from the temporal flow of a plurality of notes in the time-series data. This makes it possible to present a musical score with chords based on time-series data including a note sequence. Since one or more chord sequences are estimated from a note sequence, chord estimation can be performed with a high degree of flexibility.

訓練済モデルＭは、複数の音符からなる参照音符列を含む入力時系列データＡと、参照音符列の各音符に対応するコード列を示す出力コード列情報Ｂとの間の入出力関係を習得した機械学習モデルであってもよい。この場合、時系列データからコード列情報を容易に推定することができる。The trained model M may be a machine learning model that has learned the input/output relationship between input time-series data A including a reference note sequence consisting of multiple notes, and output chord sequence information B indicating the chord sequence corresponding to each note in the reference note sequence. In this case, the chord sequence information can be easily estimated from the time-series data.

推定部２２は、コード列におけるコードチェンジのタイミングについても推定してもよい。これにより、音符列に対応した、より適切なコード推定が行われる。 The estimation unit 22 may also estimate the timing of chord changes in the chord sequence. This allows for more appropriate chord estimation corresponding to the note sequence.

入力時系列データＡは、参照音符列で表現される音楽のジャンルを指定するジャンル情報を含んでもよい。また、時系列データは、音符列で表現される音楽のジャンルを指定するジャンル情報を含んでもよい。そして、推定部２２は、ジャンル情報を含む時系列データに基づいて、コード列情報を推定してもよい。これにより、音楽のジャンルに適したコード推定が行われる。 The input time series data A may include genre information specifying the genre of music represented by the reference note sequence. The time series data may also include genre information specifying the genre of music represented by the note sequence. The estimation unit 22 may then estimate chord sequence information based on the time series data including genre information. This allows chord estimation appropriate for the music genre.

入力時系列データＡは、参照音符列で表現される音楽の調を指定する調情報を含んでもよい。また、時系列データは、音符列で表現される音楽の調を指定する調情報を含んでもよい。そして、推定部２２は、調情報を含む時系列データに基づいて、コード列情報を推定してもよい。これにより、音楽の調に適したコード推定が行われる。 The input time series data A may include key information specifying the key of the music expressed by the reference note sequence. The time series data may also include key information specifying the key of the music expressed by the note sequence. The estimation unit 22 may then estimate chord sequence information based on the time series data including key information. This allows chord estimation appropriate to the key of the music.

入力時系列データＡは、参照音符列で示される楽譜の難易度を指定する難易度情報を含んでもよい。また、時系列データは、音符列で示される楽譜の難易度を指定する難易度情報を含んでもよい。そして、推定部２２は、難易度情報を含む時系列データに基づいて、コード列情報を推定してもよい。これにより、音符列で示される楽譜の難易度に応じて適切なコード推定が行われる。 The input time series data A may include difficulty level information that specifies the difficulty level of the musical score represented by the reference note sequence. The time series data may also include difficulty level information that specifies the difficulty level of the musical score represented by the note sequence. The estimation unit 22 may then estimate chord sequence information based on the time series data that includes the difficulty level information. This allows appropriate chord estimation to be performed according to the difficulty level of the musical score represented by the note sequence.

コード推定装置２０は、音符列の各音符に対応するようにコード列情報が付されたコード付き楽譜を示す楽譜情報を生成する生成部２３をさらに備えてもよい。 The chord estimation device 20 may further include a generation unit 23 that generates musical score information indicating a chord-attached musical score to which chord sequence information is attached to correspond to each note in the musical note sequence.

本実施の形態に係る訓練装置１０は、複数の音符からなる参照音符列を含む入力時系列データＡを取得する第１の取得部１１と、参照音符列に対応するコード列を示す出力コード列情報Ｂを取得する第２の取得部１２と、入力時系列データＡと出力コード列情報Ｂとの間の入出力関係を習得した訓練済モデルＭを構築する構築部１３とを備える。この構成によれば、入力時系列データＡと出力コード列情報Ｂとの間の入出力関係を習得した訓練済モデルＭを容易に構築することができる。 The training device 10 according to this embodiment comprises a first acquisition unit 11 that acquires input time series data A including a reference note sequence consisting of a plurality of notes, a second acquisition unit 12 that acquires output chord sequence information B indicating a chord sequence corresponding to the reference note sequence, and a construction unit 13 that constructs a trained model M that has learned the input-output relationship between the input time series data A and the output chord sequence information B. This configuration makes it easy to construct a trained model M that has learned the input-output relationship between the input time series data A and the output chord sequence information B.

（６）他の実施の形態
上記実施の形態において、入力時系列データＡは付加情報を含み、時系列データは付加情報を含むが、実施の形態はこれに限定されない。入力時系列データＡは、参照音符列を含めばよく、付加情報を含まなくてもよい。同様に、時系列データは、音符列を含めばよく、付加情報を含まなくてもよい。 (6) Other Embodiments In the above embodiment, the input time series data A includes additional information, and the time series data includes additional information, but the embodiment is not limited to this. The input time series data A only needs to include a reference sequence of notes, and does not need to include additional information. Similarly, the time series data only needs to include a sequence of notes, and does not need to include additional information.

上記実施の形態において、入力時系列データＡは拍節構造として“ｂａｒ”および“ｂｅａｔ”情報を有するが、実施の形態はこれに限定されない。入力時系列データＡは拍節構造を有していなくてもよい。図８は、拍節構造を有していない入力時系列データＡに対応して準備された出力コード列情報Ｂの一例を示す図である。図８に示すように、出力コード列情報Ｂは、“ｂａｒ”および“ｂｅａｔ”情報からなる拍節構造を有していない。 In the above embodiment, the input time series data A has "bar" and "beat" information as a metrical structure, but the embodiment is not limited to this. The input time series data A does not have to have a metrical structure. Figure 8 is a diagram showing an example of output chord sequence information B prepared in response to input time series data A that does not have a metrical structure. As shown in Figure 8, the output chord sequence information B does not have a metrical structure consisting of "bar" and "beat" information.

上記実施の形態において、入力時系列データＡは付加情報として調情報、ジャンル情報および難易度情報を有する場合を例に説明した。構築部１３は、付加情報の種類に応じて異なる訓練済モデルＭを構築してもよいし、１つの訓練済モデルＭを構築してもよい。あるいは、入力時系列データＡは付加情報として、調情報、ジャンル情報および難易度情報のうち複数の情報を含めてもよい。 In the above embodiment, the input time series data A includes tone information, genre information, and difficulty level information as additional information. The construction unit 13 may construct different trained models M depending on the type of additional information, or may construct a single trained model M. Alternatively, the input time series data A may include multiple pieces of information selected from tone information, genre information, and difficulty level information as additional information.

また、上記実施の形態において、コード推定装置２０は生成部２３を含むが、実施の形態はこれに限定されない。演奏者は、推定部２２により推定されたコード列情報を所望の楽譜に転記することによりコード付き楽譜を作成することができる。そのため、コード推定装置２０は、生成部２３を含まなくてもよい。 In addition, in the above embodiment, the chord estimation device 20 includes the generation unit 23, but the embodiment is not limited to this. A performer can create a score with chords by transcribing the chord sequence information estimated by the estimation unit 22 into a desired musical score. Therefore, the chord estimation device 20 does not need to include the generation unit 23.

上記実施の形態において、訓練データＤはピアノにより演奏を行う際のコード列情報を推定するように訓練されるが、実施の形態はこれに限定されない。訓練データＤは、ギター、ドラム等の他の楽器により演奏を行う際のコード列情報を推定するように訓練されてもよい。 In the above embodiment, the training data D is trained to estimate chord sequence information when playing on a piano, but the embodiment is not limited to this. The training data D may also be trained to estimate chord sequence information when playing on other instruments such as a guitar or drums.

上記実施の形態において、コード推定装置２０の使用者が演奏者である場合を例に説明したが、コード推定装置２０の使用者は、例えば、楽譜の制作会社のスタッフであってもよい。また、訓練装置１０による機械学習は、楽譜の制作会社のスタッフにより事前に行われてもよい。 In the above embodiment, the user of the chord estimation device 20 is a performer, but the user of the chord estimation device 20 may also be, for example, a staff member of a music score production company. Furthermore, machine learning by the training device 10 may be performed in advance by a staff member of the music score production company.

Claims

a receiving unit that receives time-series data including a musical note sequence consisting of a plurality of musical notes;
an estimation unit that estimates chord sequence information indicating a chord sequence corresponding to the musical note sequence, the chord sequence information including chord change timings, based on the time-series data, using a trained model;
The chord estimation device, wherein the estimation unit also estimates the timing of chord changes in the chord sequence.

The chord estimation device of claim 1, wherein the trained model is a model that has learned the input/output relationship between input time-series data including a reference note sequence consisting of a plurality of notes and output chord sequence information indicating a chord sequence corresponding to the reference note sequence.

the input time-series data includes genre information that specifies the genre of music expressed by the reference sequence of notes;
the time-series data includes genre information that specifies the genre of music expressed by the sequence of notes;
The chord estimation device according to claim 2 , wherein the estimation unit estimates the chord string information based on the time-series data including genre information.

the input time-series data includes key information that specifies the key of the music expressed by the reference sequence of notes;
the time-series data includes key information that specifies the key of the music expressed by the sequence of notes;
The chord estimation device according to claim 2 , wherein the estimation unit estimates the chord string information based on the time-series data including key information.

the input time-series data includes difficulty level information that specifies the difficulty level of the musical score indicated by the reference note sequence,
the time-series data includes difficulty level information that specifies the difficulty level of the musical score represented by the sequence of notes,
The chord estimation device according to claim 2 , wherein the estimation unit estimates the chord string information based on the time-series data including difficulty level information.

The chord estimation device according to any one of claims 1 to 5, further comprising a generation unit that generates musical score information indicating a chord-attached musical score to which the chord sequence information is attached so as to correspond to each note of the musical note sequence.

a first acquisition unit that acquires input time-series data including a reference note sequence consisting of a plurality of notes;
a second acquisition unit that acquires output chord sequence information, which is information indicating a chord sequence corresponding to the reference note sequence and in which chord change timings are recorded;
a construction unit that constructs a trained model that has learned the input/output relationship between the input time series data and the output code sequence information, including the timing of code changes in the code sequence.

Accepts time series data containing a sequence of multiple notes,
using the trained model, based on the time-series data, to estimate chord sequence information indicating a chord sequence corresponding to the musical note sequence, the chord sequence information including chord change timings;
The computer-implemented chord estimation method also estimates the timing of chord changes in the chord sequence.

The computer-executed chord estimation method of claim 8, wherein the trained model is a model that has learned the input/output relationship between input time-series data including a reference note sequence consisting of multiple notes and output chord sequence information indicating a chord sequence corresponding to the reference note sequence.

the input time-series data includes genre information that specifies the genre of music expressed by the reference sequence of notes;
the time-series data includes genre information that specifies the genre of music expressed by the sequence of notes;
10. The computer-implemented chord estimation method of claim 9 , wherein said estimating comprises estimating said chord string information based on said time-series data including genre information.

the input time-series data includes key information that specifies the key of the music expressed by the reference sequence of notes;
the time-series data includes key information that specifies the key of the music expressed by the sequence of notes;
10. The computer-implemented chord estimation method of claim 9, wherein said estimating comprises estimating said chord string information based on said time-series data including key information.

the input time-series data includes difficulty level information that specifies the difficulty level of the musical score indicated by the reference note sequence,
the time-series data includes difficulty level information that specifies the difficulty level of the musical score represented by the sequence of notes,
10. The computer-implemented chord estimation method of claim 9, wherein said estimating comprises estimating said chord string information based on said time-series data including difficulty level information.

The computer-implemented chord estimation method described in any one of claims 8 to 12 further generates musical score information indicating a chord-attached musical score to which the chord sequence information is attached so as to correspond to each note of the musical note sequence.

Obtain input time series data including a reference note sequence consisting of a plurality of notes;
obtain output chord sequence information indicating a chord sequence corresponding to the reference note sequence, the output chord sequence information including recorded chord change timings;
A computer-implemented training method for constructing a trained model that has learned the input-output relationship between the input time-series data and the output code sequence information, including the timing of code changes in the code sequence.